Name | llmatic JSON |
Version |
0.0.1
JSON |
| download |
home_page | None |
Summary | No BS utilities for developing with LLMs |
upload_time | 2024-05-26 22:23:53 |
maintainer | None |
docs_url | None |
author | Agam More |
requires_python | <4.0,>=3.8 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# llmatic
**No BS utilities for developing with LLMs**
### Why?
LLMs are basically API calls that need:
- Swappability
- Rate limit handling
- Visibility on costs, latency and performance
- Response handling
So we are solving those problems exactly without any crazy vendor lock or abstractions ;)
### Examples
Calling an LLM (notice how you can call whatever LLM API you want, you're really free to do as you please).
```py
import openai
from llmatic import track, get_response
# Start tracking an LLM call
t = track(project_id="test", id="story") # start tacking an llm call, saved as json files.
# Make the LLM call, note you can use whatever API or LLM you want.
response = openai.Completion.create(
engine="gpt-4",
prompt="Write a short story about a robot learning to garden, 300-400 words, be creative.",
max_tokens=max_tokens,
n=1,
stop=None,
temperature=0.7
)
# End tracking and save call results
t.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)
# Evaluation
# Notice that this acts like the built in `assert`,
# if you don't want it to affect your runtime in production just use
# `log_only` (will only track the result) or dev_mode=True (won't run - useful for production).
t.eval("Is the story engaging?", scale=(0,10), model="claude-sonnet")
t.eval("Does the story contain the words ['water', 'flowers', 'soil']", scale=(0,10)) # This will use function calling to check "flowers" in story_text_response.
t.eval("Did the response complete in less than 0.5s?", scale=(0,1), log_only=True) # This will not trigger a conditional_retry, just log/track the eval
t.eval("Was the story between 300-400 words?", scale=(0,1))
# Extract and print the generated text
generated_text = get_response(response) # equivallent to response.choices[0].text.strip()
print(generated_text)
```
### Retries?
Well, we will need to take our relationship to the next level :)
To get the most benefit from llmatic, we need to wrap all of the relevant LLM calls with a context (`with llm(...)`):
```py
import openai
from llmatic import track, get_response, llm, condition
# Make the API call
t = track(id="story") # start tacking an llm call, saved as json files.
with llm(retries=3, tracker=t): # This will also retry any rate limit errors
response = openai.Completion.create(
engine="gpt-4",
prompt="Write a short story about a robot learning to garden, 300-400 words, be creative.",
max_tokens=max_tokens,
n=1,
stop=None,
temperature=0.7
)
t.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)
# Eval
t.eval("Was the story between 300-400 words?", scale=(0,1))
t.eval("Is the story creative?", scale=(0,10), "claude-sonnet")
t.conditional_retry(lambda score: score > 7, scale=(0,10), max_retry=3) # If our condition isn't met, retry the llm again
```
### Results (e.g. LangSmith killer)
We use a CLI tool to check them out. By default, tracking results will be saved in an SQLite database located at `$HOME/.llmatic/llmatic.db`.
- List all trackings: `llmatic list`
- Output the last track: `llmatic show <project_id> <tracking_id>` (for example, `llmatic show example_project story`)
This command will produce output like:
```plaintext
LLM Tracking Results for: story
------------------------------------------------------------
Tracking ID: story
Model: gpt-4
File and Function: examples_basic_main
Prompt: "Write a short story about a robot learning to garden, 300-400 words, be creative." (Cost: $0.000135)
Execution Time: 564 ms
Total Cost: $0.000150
Tokens: 30 (Prompt: 10, Completion: 20)
Created At: 2024-05-26 15:50:00
Evaluation Results:
------------------------------------------------------------
Is the story engaging? (Model: claude-sonnet): 8.00
Does the story contain the words ['water', 'flowers', 'soil'] (Model: claude-sonnet): 10.00
Generated Text:
------------------------------------------------------------
Once upon a time in a robotic garden... (Cost: $0.000015)
```
-Summarize all trackings for a project: llmatic summary <project_id> (for example, llmatic summary example_project)
This command will produce output like:
```
Run summary for project: example_project
------------------------------------------------------------
story - [gpt-4], 0.56s, $0.000150, examples_basic_main
another_story - [gpt-4], 0.60s, $0.000160, examples_another_function
```
- Compare trackings: `llmatic compare <id> <id2>` TBD

CLI Usage
List all trackings:
```bash
poetry run llmatic list
```
Output the last track:
```bash
poetry run llmatic show <project_id> <tracking_id>
```
Summarize all trackings for a project:
```bash
poetry run llmatic summary <project_id>
```
Remove all trackings for a project:
```bash
poetry run llmatic remove <project_id>
```
TODO:
- Write the actual code :)
- TS version?
- Think about cli utitilies for improveing prompts from the trackings.
- Think of ways to compare different models and prompts in a matrix style for best results using the eval values.
- Github action for eval and results.
### More why?
Most LLM frameworks try to lock you into their ecosystem, which is problematic for several reasons:
- Complexity: Many LLM frameworks are just wrappers around API calls. Why complicate things and add a learning curve?
- Instability: The LLM space changes often, leading to frequent syntax and functionality updates. This is not a solid foundation.
- Inflexibility: Integrating other libraries or frameworks becomes difficult due to hidden dependencies and incompatibilities. You need an agnostic way to use LLMs regardless of the underlying implementation.
- TDD-first: When working with LLMs, it's essential to ensure the results are adequate. This means adopting a defensive coding approach. However, no existing framework treats this as a core value.
- Evaluation: Comparing different runs after modifying your model, prompt, or interaction method is crucial. Current frameworks fall short here (yes, langsmith), often pushing vendor lock-in with their SaaS products.
Raw data
{
"_id": null,
"home_page": null,
"name": "llmatic",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Agam More",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/b9/33/53103b3769f1ce377be7fbbb555f23058e18e370447cf058240c3b13dddf/llmatic-0.0.1.tar.gz",
"platform": null,
"description": "# llmatic\n\n**No BS utilities for developing with LLMs**\n\n### Why?\n\nLLMs are basically API calls that need:\n\n- Swappability\n- Rate limit handling\n- Visibility on costs, latency and performance\n- Response handling\n\nSo we are solving those problems exactly without any crazy vendor lock or abstractions ;)\n\n### Examples\n\nCalling an LLM (notice how you can call whatever LLM API you want, you're really free to do as you please).\n\n```py\nimport openai\nfrom llmatic import track, get_response\n\n# Start tracking an LLM call\nt = track(project_id=\"test\", id=\"story\") # start tacking an llm call, saved as json files.\n\n# Make the LLM call, note you can use whatever API or LLM you want.\nresponse = openai.Completion.create(\n engine=\"gpt-4\",\n prompt=\"Write a short story about a robot learning to garden, 300-400 words, be creative.\",\n max_tokens=max_tokens,\n n=1,\n stop=None,\n temperature=0.7\n)\n\n# End tracking and save call results\nt.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)\n\n# Evaluation\n# Notice that this acts like the built in `assert`,\n# if you don't want it to affect your runtime in production just use\n# `log_only` (will only track the result) or dev_mode=True (won't run - useful for production).\n\nt.eval(\"Is the story engaging?\", scale=(0,10), model=\"claude-sonnet\")\nt.eval(\"Does the story contain the words ['water', 'flowers', 'soil']\", scale=(0,10)) # This will use function calling to check \"flowers\" in story_text_response.\nt.eval(\"Did the response complete in less than 0.5s?\", scale=(0,1), log_only=True) # This will not trigger a conditional_retry, just log/track the eval\nt.eval(\"Was the story between 300-400 words?\", scale=(0,1))\n\n# Extract and print the generated text\ngenerated_text = get_response(response) # equivallent to response.choices[0].text.strip()\nprint(generated_text)\n```\n\n### Retries?\n\nWell, we will need to take our relationship to the next level :)\n\nTo get the most benefit from llmatic, we need to wrap all of the relevant LLM calls with a context (`with llm(...)`):\n\n```py\nimport openai\nfrom llmatic import track, get_response, llm, condition\n\n# Make the API call\nt = track(id=\"story\") # start tacking an llm call, saved as json files.\nwith llm(retries=3, tracker=t): # This will also retry any rate limit errors\n response = openai.Completion.create(\n engine=\"gpt-4\",\n prompt=\"Write a short story about a robot learning to garden, 300-400 words, be creative.\",\n max_tokens=max_tokens,\n n=1,\n stop=None,\n temperature=0.7\n )\nt.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)\n\n# Eval\nt.eval(\"Was the story between 300-400 words?\", scale=(0,1))\nt.eval(\"Is the story creative?\", scale=(0,10), \"claude-sonnet\")\nt.conditional_retry(lambda score: score > 7, scale=(0,10), max_retry=3) # If our condition isn't met, retry the llm again\n```\n\n### Results (e.g. LangSmith killer)\n\nWe use a CLI tool to check them out. By default, tracking results will be saved in an SQLite database located at `$HOME/.llmatic/llmatic.db`.\n\n- List all trackings: `llmatic list`\n- Output the last track: `llmatic show <project_id> <tracking_id>` (for example, `llmatic show example_project story`)\n\n This command will produce output like:\n\n ```plaintext\n LLM Tracking Results for: story\n ------------------------------------------------------------\n Tracking ID: story\n Model: gpt-4\n File and Function: examples_basic_main\n Prompt: \"Write a short story about a robot learning to garden, 300-400 words, be creative.\" (Cost: $0.000135)\n Execution Time: 564 ms\n Total Cost: $0.000150\n Tokens: 30 (Prompt: 10, Completion: 20)\n Created At: 2024-05-26 15:50:00\n\n Evaluation Results:\n ------------------------------------------------------------\n Is the story engaging? (Model: claude-sonnet): 8.00\n Does the story contain the words ['water', 'flowers', 'soil'] (Model: claude-sonnet): 10.00\n\n Generated Text:\n ------------------------------------------------------------\n Once upon a time in a robotic garden... (Cost: $0.000015)\n\n ```\n\n-Summarize all trackings for a project: llmatic summary <project_id> (for example, llmatic summary example_project)\n\nThis command will produce output like:\n\n```\nRun summary for project: example_project\n------------------------------------------------------------\nstory - [gpt-4], 0.56s, $0.000150, examples_basic_main\nanother_story - [gpt-4], 0.60s, $0.000160, examples_another_function\n```\n\n- Compare trackings: `llmatic compare <id> <id2>` TBD\n \n\nCLI Usage\n\nList all trackings:\n\n```bash\npoetry run llmatic list\n```\n\nOutput the last track:\n\n```bash\npoetry run llmatic show <project_id> <tracking_id>\n```\n\nSummarize all trackings for a project:\n\n```bash\npoetry run llmatic summary <project_id>\n```\n\nRemove all trackings for a project:\n\n```bash\npoetry run llmatic remove <project_id>\n```\n\nTODO:\n\n- Write the actual code :)\n- TS version?\n- Think about cli utitilies for improveing prompts from the trackings.\n- Think of ways to compare different models and prompts in a matrix style for best results using the eval values.\n- Github action for eval and results.\n\n### More why?\n\nMost LLM frameworks try to lock you into their ecosystem, which is problematic for several reasons:\n\n- Complexity: Many LLM frameworks are just wrappers around API calls. Why complicate things and add a learning curve?\n- Instability: The LLM space changes often, leading to frequent syntax and functionality updates. This is not a solid foundation.\n- Inflexibility: Integrating other libraries or frameworks becomes difficult due to hidden dependencies and incompatibilities. You need an agnostic way to use LLMs regardless of the underlying implementation.\n- TDD-first: When working with LLMs, it's essential to ensure the results are adequate. This means adopting a defensive coding approach. However, no existing framework treats this as a core value.\n- Evaluation: Comparing different runs after modifying your model, prompt, or interaction method is crucial. Current frameworks fall short here (yes, langsmith), often pushing vendor lock-in with their SaaS products.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "No BS utilities for developing with LLMs",
"version": "0.0.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dd60986bd6c2617536fd4f7dc438c6b38d552f27e921cd6e84a5311461182726",
"md5": "d6d9f63f0235f4ad792c2bba7bdfc0ac",
"sha256": "12db1049b74a3529b0d8f8c1971813ad125d9d5ae030d46f2f0fcd0aa577b7c1"
},
"downloads": -1,
"filename": "llmatic-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d6d9f63f0235f4ad792c2bba7bdfc0ac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 8424,
"upload_time": "2024-05-26T22:23:51",
"upload_time_iso_8601": "2024-05-26T22:23:51.237143Z",
"url": "https://files.pythonhosted.org/packages/dd/60/986bd6c2617536fd4f7dc438c6b38d552f27e921cd6e84a5311461182726/llmatic-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b93353103b3769f1ce377be7fbbb555f23058e18e370447cf058240c3b13dddf",
"md5": "7d0528f835a0a06a75fae1b290492453",
"sha256": "57c0e737094c9a62be04ffb245ce2a569a85188dd4860a7d666da3ed9b7169b6"
},
"downloads": -1,
"filename": "llmatic-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "7d0528f835a0a06a75fae1b290492453",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 6817,
"upload_time": "2024-05-26T22:23:53",
"upload_time_iso_8601": "2024-05-26T22:23:53.249531Z",
"url": "https://files.pythonhosted.org/packages/b9/33/53103b3769f1ce377be7fbbb555f23058e18e370447cf058240c3b13dddf/llmatic-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-26 22:23:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llmatic"
}