llmatic

Name	llmatic JSON
Version	0.0.1 JSON
	download
home_page	None
Summary	No BS utilities for developing with LLMs
upload_time	2024-05-26 22:23:53
maintainer	None
docs_url	None
author	Agam More
requires_python	<4.0,>=3.8
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # llmatic

**No BS utilities for developing with LLMs**

### Why?

LLMs are basically API calls that need:

- Swappability
- Rate limit handling
- Visibility on costs, latency and performance
- Response handling

So we are solving those problems exactly without any crazy vendor lock or abstractions ;)

### Examples

Calling an LLM (notice how you can call whatever LLM API you want, you're really free to do as you please).

```py
import openai
from llmatic import track, get_response

# Start tracking an LLM call
t = track(project_id="test", id="story") # start tacking an llm call, saved as json files.

# Make the LLM call, note you can use whatever API or LLM you want.
response = openai.Completion.create(
    engine="gpt-4",
    prompt="Write a short story about a robot learning to garden, 300-400 words, be creative.",
    max_tokens=max_tokens,
    n=1,
    stop=None,
    temperature=0.7
)

# End tracking and save call results
t.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)

# Evaluation
#  Notice that this acts like the built in `assert`,
#  if you don't want it to affect your runtime in production just use
#  `log_only` (will only track the result) or dev_mode=True (won't run - useful for production).

t.eval("Is the story engaging?", scale=(0,10), model="claude-sonnet")
t.eval("Does the story contain the words ['water', 'flowers', 'soil']", scale=(0,10)) # This will use function calling to check "flowers" in story_text_response.
t.eval("Did the response complete in less than 0.5s?", scale=(0,1), log_only=True) # This will not trigger a conditional_retry, just log/track the eval
t.eval("Was the story between 300-400 words?", scale=(0,1))

# Extract and print the generated text
generated_text = get_response(response) # equivallent to response.choices[0].text.strip()
print(generated_text)
```

### Retries?

Well, we will need to take our relationship to the next level :)

To get the most benefit from llmatic, we need to wrap all of the relevant LLM calls with a context (`with llm(...)`):

```py
import openai
from llmatic import track, get_response, llm, condition

# Make the API call
t = track(id="story") # start tacking an llm call, saved as json files.
with llm(retries=3, tracker=t): # This will also retry any rate limit errors
  response = openai.Completion.create(
      engine="gpt-4",
      prompt="Write a short story about a robot learning to garden, 300-400 words, be creative.",
      max_tokens=max_tokens,
      n=1,
      stop=None,
      temperature=0.7
  )
t.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)

# Eval
t.eval("Was the story between 300-400 words?", scale=(0,1))
t.eval("Is the story creative?", scale=(0,10), "claude-sonnet")
t.conditional_retry(lambda score: score > 7, scale=(0,10), max_retry=3) # If our condition isn't met, retry the llm again
```

### Results (e.g. LangSmith killer)

We use a CLI tool to check them out. By default, tracking results will be saved in an SQLite database located at `$HOME/.llmatic/llmatic.db`.

- List all trackings: `llmatic list`
- Output the last track: `llmatic show <project_id> <tracking_id>` (for example, `llmatic show example_project story`)

  This command will produce output like:

  ```plaintext
  LLM Tracking Results for: story
  ------------------------------------------------------------
  Tracking ID: story
  Model: gpt-4
  File and Function: examples_basic_main
  Prompt: "Write a short story about a robot learning to garden, 300-400 words, be creative." (Cost: $0.000135)
  Execution Time: 564 ms
  Total Cost: $0.000150
  Tokens: 30 (Prompt: 10, Completion: 20)
  Created At: 2024-05-26 15:50:00

  Evaluation Results:
  ------------------------------------------------------------
  Is the story engaging? (Model: claude-sonnet): 8.00
  Does the story contain the words ['water', 'flowers', 'soil'] (Model: claude-sonnet): 10.00

  Generated Text:
  ------------------------------------------------------------
  Once upon a time in a robotic garden... (Cost: $0.000015)

  ```

-Summarize all trackings for a project: llmatic summary <project_id> (for example, llmatic summary example_project)

This command will produce output like:

```
Run summary for project: example_project
------------------------------------------------------------
story - [gpt-4], 0.56s, $0.000150, examples_basic_main
another_story - [gpt-4], 0.60s, $0.000160, examples_another_function
```

- Compare trackings: `llmatic compare <id> <id2>` TBD
  ![image](https://github.com/agamm/llmatic/assets/1269911/f0c4485e-3e57-4c17-b2c9-732e27d4229a)

CLI Usage

List all trackings:

```bash
poetry run llmatic list
```

Output the last track:

```bash
poetry run llmatic show <project_id> <tracking_id>
```

Summarize all trackings for a project:

```bash
poetry run llmatic summary <project_id>
```

Remove all trackings for a project:

```bash
poetry run llmatic remove <project_id>
```

TODO:

- Write the actual code :)
- TS version?
- Think about cli utitilies for improveing prompts from the trackings.
- Think of ways to compare different models and prompts in a matrix style for best results using the eval values.
- Github action for eval and results.

### More why?

Most LLM frameworks try to lock you into their ecosystem, which is problematic for several reasons:

- Complexity: Many LLM frameworks are just wrappers around API calls. Why complicate things and add a learning curve?
- Instability: The LLM space changes often, leading to frequent syntax and functionality updates. This is not a solid foundation.
- Inflexibility: Integrating other libraries or frameworks becomes difficult due to hidden dependencies and incompatibilities. You need an agnostic way to use LLMs regardless of the underlying implementation.
- TDD-first: When working with LLMs, it's essential to ensure the results are adequate. This means adopting a defensive coding approach. However, no existing framework treats this as a core value.
- Evaluation: Comparing different runs after modifying your model, prompt, or interaction method is crucial. Current frameworks fall short here (yes, langsmith), often pushing vendor lock-in with their SaaS products.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llmatic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Agam More",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/b9/33/53103b3769f1ce377be7fbbb555f23058e18e370447cf058240c3b13dddf/llmatic-0.0.1.tar.gz",
    "platform": null,
    "description": "# llmatic\n\n**No BS utilities for developing with LLMs**\n\n### Why?\n\nLLMs are basically API calls that need:\n\n- Swappability\n- Rate limit handling\n- Visibility on costs, latency and performance\n- Response handling\n\nSo we are solving those problems exactly without any crazy vendor lock or abstractions ;)\n\n### Examples\n\nCalling an LLM (notice how you can call whatever LLM API you want, you're really free to do as you please).\n\n```py\nimport openai\nfrom llmatic import track, get_response\n\n# Start tracking an LLM call\nt = track(project_id=\"test\", id=\"story\") # start tacking an llm call, saved as json files.\n\n# Make the LLM call, note you can use whatever API or LLM you want.\nresponse = openai.Completion.create(\n    engine=\"gpt-4\",\n    prompt=\"Write a short story about a robot learning to garden, 300-400 words, be creative.\",\n    max_tokens=max_tokens,\n    n=1,\n    stop=None,\n    temperature=0.7\n)\n\n# End tracking and save call results\nt.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)\n\n# Evaluation\n#  Notice that this acts like the built in `assert`,\n#  if you don't want it to affect your runtime in production just use\n#  `log_only` (will only track the result) or dev_mode=True (won't run - useful for production).\n\nt.eval(\"Is the story engaging?\", scale=(0,10), model=\"claude-sonnet\")\nt.eval(\"Does the story contain the words ['water', 'flowers', 'soil']\", scale=(0,10)) # This will use function calling to check \"flowers\" in story_text_response.\nt.eval(\"Did the response complete in less than 0.5s?\", scale=(0,1), log_only=True) # This will not trigger a conditional_retry, just log/track the eval\nt.eval(\"Was the story between 300-400 words?\", scale=(0,1))\n\n# Extract and print the generated text\ngenerated_text = get_response(response) # equivallent to response.choices[0].text.strip()\nprint(generated_text)\n```\n\n### Retries?\n\nWell, we will need to take our relationship to the next level :)\n\nTo get the most benefit from llmatic, we need to wrap all of the relevant LLM calls with a context (`with llm(...)`):\n\n```py\nimport openai\nfrom llmatic import track, get_response, llm, condition\n\n# Make the API call\nt = track(id=\"story\") # start tacking an llm call, saved as json files.\nwith llm(retries=3, tracker=t): # This will also retry any rate limit errors\n  response = openai.Completion.create(\n      engine=\"gpt-4\",\n      prompt=\"Write a short story about a robot learning to garden, 300-400 words, be creative.\",\n      max_tokens=max_tokens,\n      n=1,\n      stop=None,\n      temperature=0.7\n  )\nt.end(model=model, prompt=prompt, response=response) # Save the cost (inputs/outputs), latency (execution time)\n\n# Eval\nt.eval(\"Was the story between 300-400 words?\", scale=(0,1))\nt.eval(\"Is the story creative?\", scale=(0,10), \"claude-sonnet\")\nt.conditional_retry(lambda score: score > 7, scale=(0,10), max_retry=3) # If our condition isn't met, retry the llm again\n```\n\n### Results (e.g. LangSmith killer)\n\nWe use a CLI tool to check them out. By default, tracking results will be saved in an SQLite database located at `$HOME/.llmatic/llmatic.db`.\n\n- List all trackings: `llmatic list`\n- Output the last track: `llmatic show <project_id> <tracking_id>` (for example, `llmatic show example_project story`)\n\n  This command will produce output like:\n\n  ```plaintext\n  LLM Tracking Results for: story\n  ------------------------------------------------------------\n  Tracking ID: story\n  Model: gpt-4\n  File and Function: examples_basic_main\n  Prompt: \"Write a short story about a robot learning to garden, 300-400 words, be creative.\" (Cost: $0.000135)\n  Execution Time: 564 ms\n  Total Cost: $0.000150\n  Tokens: 30 (Prompt: 10, Completion: 20)\n  Created At: 2024-05-26 15:50:00\n\n  Evaluation Results:\n  ------------------------------------------------------------\n  Is the story engaging? (Model: claude-sonnet): 8.00\n  Does the story contain the words ['water', 'flowers', 'soil'] (Model: claude-sonnet): 10.00\n\n  Generated Text:\n  ------------------------------------------------------------\n  Once upon a time in a robotic garden... (Cost: $0.000015)\n\n  ```\n\n-Summarize all trackings for a project: llmatic summary <project_id> (for example, llmatic summary example_project)\n\nThis command will produce output like:\n\n```\nRun summary for project: example_project\n------------------------------------------------------------\nstory - [gpt-4], 0.56s, $0.000150, examples_basic_main\nanother_story - [gpt-4], 0.60s, $0.000160, examples_another_function\n```\n\n- Compare trackings: `llmatic compare <id> <id2>` TBD\n  ![image](https://github.com/agamm/llmatic/assets/1269911/f0c4485e-3e57-4c17-b2c9-732e27d4229a)\n\nCLI Usage\n\nList all trackings:\n\n```bash\npoetry run llmatic list\n```\n\nOutput the last track:\n\n```bash\npoetry run llmatic show <project_id> <tracking_id>\n```\n\nSummarize all trackings for a project:\n\n```bash\npoetry run llmatic summary <project_id>\n```\n\nRemove all trackings for a project:\n\n```bash\npoetry run llmatic remove <project_id>\n```\n\nTODO:\n\n- Write the actual code :)\n- TS version?\n- Think about cli utitilies for improveing prompts from the trackings.\n- Think of ways to compare different models and prompts in a matrix style for best results using the eval values.\n- Github action for eval and results.\n\n### More why?\n\nMost LLM frameworks try to lock you into their ecosystem, which is problematic for several reasons:\n\n- Complexity: Many LLM frameworks are just wrappers around API calls. Why complicate things and add a learning curve?\n- Instability: The LLM space changes often, leading to frequent syntax and functionality updates. This is not a solid foundation.\n- Inflexibility: Integrating other libraries or frameworks becomes difficult due to hidden dependencies and incompatibilities. You need an agnostic way to use LLMs regardless of the underlying implementation.\n- TDD-first: When working with LLMs, it's essential to ensure the results are adequate. This means adopting a defensive coding approach. However, no existing framework treats this as a core value.\n- Evaluation: Comparing different runs after modifying your model, prompt, or interaction method is crucial. Current frameworks fall short here (yes, langsmith), often pushing vendor lock-in with their SaaS products.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "No BS utilities for developing with LLMs",
    "version": "0.0.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dd60986bd6c2617536fd4f7dc438c6b38d552f27e921cd6e84a5311461182726",
                "md5": "d6d9f63f0235f4ad792c2bba7bdfc0ac",
                "sha256": "12db1049b74a3529b0d8f8c1971813ad125d9d5ae030d46f2f0fcd0aa577b7c1"
            },
            "downloads": -1,
            "filename": "llmatic-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d6d9f63f0235f4ad792c2bba7bdfc0ac",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 8424,
            "upload_time": "2024-05-26T22:23:51",
            "upload_time_iso_8601": "2024-05-26T22:23:51.237143Z",
            "url": "https://files.pythonhosted.org/packages/dd/60/986bd6c2617536fd4f7dc438c6b38d552f27e921cd6e84a5311461182726/llmatic-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b93353103b3769f1ce377be7fbbb555f23058e18e370447cf058240c3b13dddf",
                "md5": "7d0528f835a0a06a75fae1b290492453",
                "sha256": "57c0e737094c9a62be04ffb245ce2a569a85188dd4860a7d666da3ed9b7169b6"
            },
            "downloads": -1,
            "filename": "llmatic-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7d0528f835a0a06a75fae1b290492453",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 6817,
            "upload_time": "2024-05-26T22:23:53",
            "upload_time_iso_8601": "2024-05-26T22:23:53.249531Z",
            "url": "https://files.pythonhosted.org/packages/b9/33/53103b3769f1ce377be7fbbb555f23058e18e370447cf058240c3b13dddf/llmatic-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-26 22:23:53",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llmatic"
}

Agam More