lmwrapper

Name	lmwrapper JSON
Version	0.17.0.0 JSON
	download
home_page	None
Summary	An object-oriented wrapper around language models with caching, batching, and more.
upload_time	2025-07-13 18:55:23
maintainer	None
docs_url	None
author	David Gros, Claudio Spiess
requires_python	>=3.10
license	None
keywords	large language models openai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            `lmwrapper` provides a wrapper around OpenAI API and Hugging Face Language models, focusing
on being a clean, object-oriented, and user-friendly interface. It has three main goals:

A) Make it easier to use the OpenAI API.

B) Make it easier to reuse your code for other language models with minimal changes.

C) Simplifying and support research-focused use cases (like help when running on datasets, and allow research that requires accessing model internals of local models).

Some key features currently include local disk caching of responses, and super simple
use of the OpenAI batching API which can save 50% on costs.

`lmwrapper` is lightweight and can serve as a flexible stand-in for the OpenAI API.

## Installation

For usage with just OpenAI models:

```bash
pip install lmwrapper
```

For usage with HuggingFace models as well:

```bash
pip install 'lmwrapper[hf]'
```

For usage with Claude models from Anthropic:
```bash
pip install 'lmwrapper[anthropic]'
```

For development dependencies:

```bash
pip install 'lmwrapper[dev]'
```

The above args can be combined. For example:
```bash
pip install 'lmwrapper[hf,anthropic]'
```

<!---
If you prefer using `conda`/`mamba` to manage your environments, you may edit the `environment.yml` file to your liking & setup and create a new environment based on it:

```bash
mamba env create -f environment.yml
```

Please note that this method is for development and not supported.
-->

## Example usage

<!---
### Basic Completion and Prompting

```python
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from lmwrapper.structs import LmPrompt

lm = get_open_ai_lm(
    model_name=OpenAiModelNames.gpt_3_5_turbo_instruct,
    api_key_secret=None,  # By default, this will read from the OPENAI_API_KEY environment variable.
    # If that isn't set, it will try the file ~/oai_key.txt
    # You need to place the key in one of these places,
    # or pass in a different location. You can get an API
    # key at (https://platform.openai.com/account/api-keys)
)

prediction = lm.predict(
    LmPrompt(  # A LmPrompt object lets your IDE hint on args
        "Once upon a",
        max_tokens=10,
        temperature=1, # Set this to 0 for deterministic completions
    )
)
print(prediction.completion_text)
# " time, there were three of us." - Example. This will change with each sample.
```
-->

### Chat

```python
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from lmwrapper.structs import LmPrompt, LmChatTurn

lm = get_open_ai_lm(
    model_name=OpenAiModelNames.gpt_4_1_nano,
    api_key_secret=None,  # By default, this will read from the OPENAI_API_KEY environment variable.
    # If that isn't set, it will try the file ~/oai_key.txt
    # You need to place the key in one of these places,
    # or pass in a different location. You can get an API
    # key at (https://platform.openai.com/account/api-keys)
)

# Single user utterance
pred = lm.predict("What is 2+2?")
print(pred.completion_text)  # "2 + 2 equals 4."


# Use a LmPrompt to have more control of the parameters
pred = lm.predict(LmPrompt(
    "What is 2+6?",
    max_tokens=10,
    temperature=0, # Set this to 0 for deterministic completions
))
print(pred.completion_text)  # "2 + 6 equals 8."

# Conversation alternating between `user` and `assistant`.
pred = lm.predict(LmPrompt(
    [
        "What is 2+2?",  # user turn
        "4",  # assistant turn
        "What is 5+3?"  # user turn
        "8",  # assistant turn
        "What is 4+4?"  # user turn
        # We use few-shot turns to encourage the answer to be our desired format.
        #   If you don't give example turns you might get something like
        #   "4 + 4 equals 8." instead of just "8" as desired.
    ],
    max_tokens=10,
))
print(pred.completion_text)  # "8"

# If you want things like the system message, you can use LmChatTurn objects
pred = lm.predict(LmPrompt(
    text=[
        LmChatTurn(role="system", content="You always answer like a pirate"),
        LmChatTurn(role="user", content="How does bitcoin work?"),
    ],
    max_tokens=25,
    temperature=0,
))
print(pred.completion_text)
# "Arr, me matey! Bitcoin be a digital currency that be workin' on a technology called blockchain..."
```

## Caching

Add `cache = True` in the prompt to cache the output to disk. Any
subsequent calls with this prompt will return the same value. Note that
this might be unexpected behavior if your temperature is non-zero. (You
will always sample the same output on reruns).

```python
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from lmwrapper.structs import LmPrompt

lm = get_open_ai_lm(OpenAiModelNames.gpt_4_1_nano)

prompt = LmPrompt(
  "Briefly describe the city of Paris", 
  cache=True,
  temperature=1,
  max_tokens=25,
)
first_prediction = lm.predict(prompt)
print(first_prediction.completion_text) 
# ... eg, "Paris is a city of romance and art, renowned for its iconic landmarks, vibrant culture, and rich history."

# The response to this prompt is now saved to the disk.
# You could rerun this script and you would load from cache near-instantly.
# This can simplify running experimentation and data processing scripts
# where you are running a dataset through a model and doing analysis.
# You then only have to actually query the model once.
repredict = lm.predict(prompt)
print(repredict.completion_text)
assert first_prediction.completion_text == repredict.completion_text
lm.remove_prompt_from_cache(prompt)
pred_after_clear = lm.predict(prompt)
assert pred_after_clear.completion_text != first_prediction.completion_text
```


## OpenAI Batching

The OpenAI [batching API](https://platform.openai.com/docs/guides/batch) has a 50% reduced cost when willing to accept a 24-hour turnaround. This makes it good for processing datasets or other non-interactive tasks (which is the main target for `lmwrapper` currently).

`lmwrapper` takes care of managing the batch files and other details so that it's as easy 
as the normal API.

<!-- skip gh-action -->
```python
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from lmwrapper.structs import LmPrompt
from lmwrapper.batch_config import CompletionWindow

def load_dataset() -> list:
    """Load some toy task"""
    return ["France", "United States", "China"]

def make_prompts(data) -> list[LmPrompt]:
    """Make some toy prompts for our data"""
    return [
        LmPrompt(
            f"What is the capital of {country}? Answer with just the city name.",
            max_tokens=10,
            temperature=0,
            cache=True,
            metadata={"country": country}
        ) 
        for country in data
    ]

data = load_dataset()
prompts = make_prompts(data)
lm = get_open_ai_lm(OpenAiModelNames.gpt_4_1_nano)
predictions = lm.predict_many(
    prompts,
    completion_window=CompletionWindow.BATCH_ANY 
    #                 ^ swap out for CompletionWindow.ASAP
    #                   to complete as soon as possible via
    #                   the non-batching API at a higher cost.
) # The batch is submitted here

for pred in predictions:  # Will wait for the batch to complete
    country = pred.prompt.metadata['country']
    print(f"Country: {country} --- Capital: {pred.completion_text}")
    if country == "France": assert pred.completion_text == "Paris" 
    # ...
```

The above code could technically take up to 24hrs to complete. However,
OpenAI seems to complete these quicker (for example, these three prompts in ~1 minute or less). In a large batch, you don't have to keep the process running for hours. Thanks to `lmwrapper` cacheing it will automatically load or pick back up waiting on the
existing batch when the script is reran.

The `lmwrapper` cache lets you also intermix cached and uncached examples.

<!-- skip test -->
```python
# ... above code

def load_more_data() -> list:
    """Load some toy task"""
    return ["Mexico", "Canada"]

data = load_data() + load_more_data()
prompts = make_prompts(data)
# If we submit the five prompts, only the two new prompts will be
# submitted to the batch. The already completed prompts will
# be loaded near-instantly from the local cache.
predictions = list(lm.predict_many(
    prompts,
    completion_window=CompletionWindow.BATCH_ANY
))
```

`lmwrapper` is designed to automatically manage the batching of thousands or millions of prompts. 
If needed, it will automatically split up prompts into sub-batches and will manage
issues around rate limits.

This feature is mostly designed for the OpenAI cost savings. You could swap out the model for HuggingFace and the same code
will still work. However, internally it is like a loop over the prompts.
Eventually in `lmwrapper` we want to do more complex batching if
GPU/CPU/accelerator memory is available.

#### Caveats / Implementation needs
This feature is still somewhat experimental. It likely works in typical
usecases, but there are few known things 
to sort out / TODOs:

- [X] Retry batch API connection errors
- [X] Automatically splitting up batches when have >50,000 prompts (limit from OpenAI) 
- [X] Recovering / splitting up batches when hitting your token Batch Queue Limit (see [docs on limits](https://platform.openai.com/docs/guides/rate-limits/usage-tiers))
- [X] Handle canceled batches during current run (use the [web interface](https://platform.openai.com/batches) to cancel)
- [X] Handle/recover canceled batches outside of current run
- [X] Handle if openai batch expires unfinished in 24hrs (though not actually tested or observed this)
- [X] Automatically splitting up batch when exceeding 100MB prompts limit
- [X] Handling of failed prompts (like when have too many tokens). Use LmPrediction.has_errors and LmPrediction.error_message to check for an error on a response.
- [X] Handle when there are duplicate prompts in batch submission
- [ ] Claude batching
- [ ] Handle when a given prompt has `num_completions>1`
- [ ] Automatically clean up API files after done (right now end up with a lot of file in [storage](https://platform.openai.com/storage/files). There isn't an obvious cost for these batch files, but this might change and it would be better to clean them up.)
- [ ] Fancy batching of HF
- [ ] Configuration setting for whether process killing (like with ctrl+c) should attempt to kill the batch. Right now it intentially persists in background. Maybe takes adding a predictions.cancel_all() prop and user can make their own context.
- [ ] Concurrent batching when in ASAP mode
- [ ] A "scheduler" paradigm trading off batch, flex, and ASAP processing to hit some timing target
- [ ] Test on free-tier accounts. It is not clear what the tiny request limit counts

Please open an issue if you want to discuss one of these or something else.

Note, in the progress bars in PyCharm can be bit cleaner if you enable 
[terminal emulation](https://stackoverflow.com/a/64727188) in your run configuration.

## Hugging Face models

Local Causal LM models on Hugging Face models can be used interchangeably with the
OpenAI models.

Note: The universe of Huggingface models is diverse and inconsistent. Some (especially the non-completion ones) might require special prompt formatting to work as expected. Some models might not work at all.

```python
from lmwrapper.huggingface_wrapper import get_huggingface_lm
from lmwrapper.structs import LmPrompt

# Download a small model for demo
lm = get_huggingface_lm("gpt2") # 124M parameters

prediction = lm.predict(LmPrompt(
    "The capital of Germany is Berlin. The capital of France is",
    max_tokens=1,
    temperature=0,
))
print(prediction.completion_text)
assert prediction.completion_text == " Paris"
```
<!-- Model internals -->

Additionally, with HuggingFace models `lmwrapper` provides an interface for
accessing the model internal states.

## Claude
```python
from lmwrapper.claude_wrapper import (
    get_claude_lm, ClaudeModelNames
)
lm = get_claude_lm(ClaudeModelNames.claude_3_5_haiku)
prediction = lm.predict("Define 'anthropology' in one short sentence") 
print(prediction.completion_text) # Anthropology is the scientific study of human cultures, societies, behaviors, and...
```
Note Anthropic does not expose any information about the tokenization for Claude.

### Retries on rate limit

```python
from lmwrapper.openai_wrapper import *

lm = get_open_ai_lm(
    OpenAiModelNames.gpt_3_5_turbo_instruct,
    retry_on_rate_limit=True
)
```

## Other features

### Built-in token counting

```python
from lmwrapper.openai_wrapper import *
from lmwrapper.structs import LmPrompt

lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
assert lm.estimate_tokens_in_prompt(
    LmPrompt("My name is Spingldorph", max_tokens=10)) == 7
assert not lm.could_completion_go_over_token_limit(LmPrompt(
    "My name is Spingldorph", max_tokens=1000))
```

## TODOs

If you are interested in one of these particular features or something else
please make a Github Issue.

- [X] Openai completion
- [X] Openai chat
- [X] Huggingface interface
- [X] Huggingface device checking on PyTorch
- [X] Move cache to be per project
- [X] Redesign cache away from generic `diskcache` to make it easier to manage as a sqlite db
- [X] Smart caching when num_completions > 1 (reusing prior completions)
- [X] OpenAI batching interface (experimental)
- [X] Anthropic interface (basic)
    - [X] Claude system messages
- [X] Use the huggingface chat templates for chat models if available
- [X] Be able to add metadata to a prompt
- [ ] Automatic cache eviction to limit count or disk size (right now have to run a SQL query to delete entries before a certain time or matching your criteria)
- [ ] Multimodal/images in super easy format (like automatically process pil, opencv, etc)
- [ ] sort through usage of quantized models
- [ ] Cost estimation of a prompt before running / "observability" monitoring of total cost
- [ ] Additional Huggingface runtimes (TensorRT, BetterTransformers, etc)
- [ ] async / streaming (not a top priority for non-interactive research use cases)
- [ ] some lightweight utilities to help with tool use

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lmwrapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "large language models, openai",
    "author": "David Gros, Claudio Spiess",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/d7/52/f397ab77eaae055f5f5f3fc717c07cf7581cdb74ddb5cc6608c9ced9fd91/lmwrapper-0.17.0.0.tar.gz",
    "platform": null,
    "description": "`lmwrapper` provides a wrapper around OpenAI API and Hugging Face Language models, focusing\non being a clean, object-oriented, and user-friendly interface. It has three main goals:\n\nA) Make it easier to use the OpenAI API.\n\nB) Make it easier to reuse your code for other language models with minimal changes.\n\nC) Simplifying and support research-focused use cases (like help when running on datasets, and allow research that requires accessing model internals of local models).\n\nSome key features currently include local disk caching of responses, and super simple\nuse of the OpenAI batching API which can save 50% on costs.\n\n`lmwrapper` is lightweight and can serve as a flexible stand-in for the OpenAI API.\n\n## Installation\n\nFor usage with just OpenAI models:\n\n```bash\npip install lmwrapper\n```\n\nFor usage with HuggingFace models as well:\n\n```bash\npip install 'lmwrapper[hf]'\n```\n\nFor usage with Claude models from Anthropic:\n```bash\npip install 'lmwrapper[anthropic]'\n```\n\nFor development dependencies:\n\n```bash\npip install 'lmwrapper[dev]'\n```\n\nThe above args can be combined. For example:\n```bash\npip install 'lmwrapper[hf,anthropic]'\n```\n\n<!---\nIf you prefer using `conda`/`mamba` to manage your environments, you may edit the `environment.yml` file to your liking & setup and create a new environment based on it:\n\n```bash\nmamba env create -f environment.yml\n```\n\nPlease note that this method is for development and not supported.\n-->\n\n## Example usage\n\n<!---\n### Basic Completion and Prompting\n\n```python\nfrom lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames\nfrom lmwrapper.structs import LmPrompt\n\nlm = get_open_ai_lm(\n    model_name=OpenAiModelNames.gpt_3_5_turbo_instruct,\n    api_key_secret=None,  # By default, this will read from the OPENAI_API_KEY environment variable.\n    # If that isn't set, it will try the file ~/oai_key.txt\n    # You need to place the key in one of these places,\n    # or pass in a different location. You can get an API\n    # key at (https://platform.openai.com/account/api-keys)\n)\n\nprediction = lm.predict(\n    LmPrompt(  # A LmPrompt object lets your IDE hint on args\n        \"Once upon a\",\n        max_tokens=10,\n        temperature=1, # Set this to 0 for deterministic completions\n    )\n)\nprint(prediction.completion_text)\n# \" time, there were three of us.\" - Example. This will change with each sample.\n```\n-->\n\n### Chat\n\n```python\nfrom lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames\nfrom lmwrapper.structs import LmPrompt, LmChatTurn\n\nlm = get_open_ai_lm(\n    model_name=OpenAiModelNames.gpt_4_1_nano,\n    api_key_secret=None,  # By default, this will read from the OPENAI_API_KEY environment variable.\n    # If that isn't set, it will try the file ~/oai_key.txt\n    # You need to place the key in one of these places,\n    # or pass in a different location. You can get an API\n    # key at (https://platform.openai.com/account/api-keys)\n)\n\n# Single user utterance\npred = lm.predict(\"What is 2+2?\")\nprint(pred.completion_text)  # \"2 + 2 equals 4.\"\n\n\n# Use a LmPrompt to have more control of the parameters\npred = lm.predict(LmPrompt(\n    \"What is 2+6?\",\n    max_tokens=10,\n    temperature=0, # Set this to 0 for deterministic completions\n))\nprint(pred.completion_text)  # \"2 + 6 equals 8.\"\n\n# Conversation alternating between `user` and `assistant`.\npred = lm.predict(LmPrompt(\n    [\n        \"What is 2+2?\",  # user turn\n        \"4\",  # assistant turn\n        \"What is 5+3?\"  # user turn\n        \"8\",  # assistant turn\n        \"What is 4+4?\"  # user turn\n        # We use few-shot turns to encourage the answer to be our desired format.\n        #   If you don't give example turns you might get something like\n        #   \"4 + 4 equals 8.\" instead of just \"8\" as desired.\n    ],\n    max_tokens=10,\n))\nprint(pred.completion_text)  # \"8\"\n\n# If you want things like the system message, you can use LmChatTurn objects\npred = lm.predict(LmPrompt(\n    text=[\n        LmChatTurn(role=\"system\", content=\"You always answer like a pirate\"),\n        LmChatTurn(role=\"user\", content=\"How does bitcoin work?\"),\n    ],\n    max_tokens=25,\n    temperature=0,\n))\nprint(pred.completion_text)\n# \"Arr, me matey! Bitcoin be a digital currency that be workin' on a technology called blockchain...\"\n```\n\n## Caching\n\nAdd `cache = True` in the prompt to cache the output to disk. Any\nsubsequent calls with this prompt will return the same value. Note that\nthis might be unexpected behavior if your temperature is non-zero. (You\nwill always sample the same output on reruns).\n\n```python\nfrom lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames\nfrom lmwrapper.structs import LmPrompt\n\nlm = get_open_ai_lm(OpenAiModelNames.gpt_4_1_nano)\n\nprompt = LmPrompt(\n  \"Briefly describe the city of Paris\", \n  cache=True,\n  temperature=1,\n  max_tokens=25,\n)\nfirst_prediction = lm.predict(prompt)\nprint(first_prediction.completion_text) \n# ... eg, \"Paris is a city of romance and art, renowned for its iconic landmarks, vibrant culture, and rich history.\"\n\n# The response to this prompt is now saved to the disk.\n# You could rerun this script and you would load from cache near-instantly.\n# This can simplify running experimentation and data processing scripts\n# where you are running a dataset through a model and doing analysis.\n# You then only have to actually query the model once.\nrepredict = lm.predict(prompt)\nprint(repredict.completion_text)\nassert first_prediction.completion_text == repredict.completion_text\nlm.remove_prompt_from_cache(prompt)\npred_after_clear = lm.predict(prompt)\nassert pred_after_clear.completion_text != first_prediction.completion_text\n```\n\n\n## OpenAI Batching\n\nThe OpenAI [batching API](https://platform.openai.com/docs/guides/batch) has a 50% reduced cost when willing to accept a 24-hour turnaround. This makes it good for processing datasets or other non-interactive tasks (which is the main target for `lmwrapper` currently).\n\n`lmwrapper` takes care of managing the batch files and other details so that it's as easy \nas the normal API.\n\n<!-- skip gh-action -->\n```python\nfrom lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames\nfrom lmwrapper.structs import LmPrompt\nfrom lmwrapper.batch_config import CompletionWindow\n\ndef load_dataset() -> list:\n    \"\"\"Load some toy task\"\"\"\n    return [\"France\", \"United States\", \"China\"]\n\ndef make_prompts(data) -> list[LmPrompt]:\n    \"\"\"Make some toy prompts for our data\"\"\"\n    return [\n        LmPrompt(\n            f\"What is the capital of {country}? Answer with just the city name.\",\n            max_tokens=10,\n            temperature=0,\n            cache=True,\n            metadata={\"country\": country}\n        ) \n        for country in data\n    ]\n\ndata = load_dataset()\nprompts = make_prompts(data)\nlm = get_open_ai_lm(OpenAiModelNames.gpt_4_1_nano)\npredictions = lm.predict_many(\n    prompts,\n    completion_window=CompletionWindow.BATCH_ANY \n    #                 ^ swap out for CompletionWindow.ASAP\n    #                   to complete as soon as possible via\n    #                   the non-batching API at a higher cost.\n) # The batch is submitted here\n\nfor pred in predictions:  # Will wait for the batch to complete\n    country = pred.prompt.metadata['country']\n    print(f\"Country: {country} --- Capital: {pred.completion_text}\")\n    if country == \"France\": assert pred.completion_text == \"Paris\" \n    # ...\n```\n\nThe above code could technically take up to 24hrs to complete. However,\nOpenAI seems to complete these quicker (for example, these three prompts in ~1 minute or less). In a large batch, you don't have to keep the process running for hours. Thanks to `lmwrapper` cacheing it will automatically load or pick back up waiting on the\nexisting batch when the script is reran.\n\nThe `lmwrapper` cache lets you also intermix cached and uncached examples.\n\n<!-- skip test -->\n```python\n# ... above code\n\ndef load_more_data() -> list:\n    \"\"\"Load some toy task\"\"\"\n    return [\"Mexico\", \"Canada\"]\n\ndata = load_data() + load_more_data()\nprompts = make_prompts(data)\n# If we submit the five prompts, only the two new prompts will be\n# submitted to the batch. The already completed prompts will\n# be loaded near-instantly from the local cache.\npredictions = list(lm.predict_many(\n    prompts,\n    completion_window=CompletionWindow.BATCH_ANY\n))\n```\n\n`lmwrapper` is designed to automatically manage the batching of thousands or millions of prompts. \nIf needed, it will automatically split up prompts into sub-batches and will manage\nissues around rate limits.\n\nThis feature is mostly designed for the OpenAI cost savings. You could swap out the model for HuggingFace and the same code\nwill still work. However, internally it is like a loop over the prompts.\nEventually in `lmwrapper` we want to do more complex batching if\nGPU/CPU/accelerator memory is available.\n\n#### Caveats / Implementation needs\nThis feature is still somewhat experimental. It likely works in typical\nusecases, but there are few known things \nto sort out / TODOs:\n\n- [X] Retry batch API connection errors\n- [X] Automatically splitting up batches when have >50,000 prompts (limit from OpenAI) \n- [X] Recovering / splitting up batches when hitting your token Batch Queue Limit (see [docs on limits](https://platform.openai.com/docs/guides/rate-limits/usage-tiers))\n- [X] Handle canceled batches during current run (use the [web interface](https://platform.openai.com/batches) to cancel)\n- [X] Handle/recover canceled batches outside of current run\n- [X] Handle if openai batch expires unfinished in 24hrs (though not actually tested or observed this)\n- [X] Automatically splitting up batch when exceeding 100MB prompts limit\n- [X] Handling of failed prompts (like when have too many tokens). Use LmPrediction.has_errors and LmPrediction.error_message to check for an error on a response.\n- [X] Handle when there are duplicate prompts in batch submission\n- [ ] Claude batching\n- [ ] Handle when a given prompt has `num_completions>1`\n- [ ] Automatically clean up API files after done (right now end up with a lot of file in [storage](https://platform.openai.com/storage/files). There isn't an obvious cost for these batch files, but this might change and it would be better to clean them up.)\n- [ ] Fancy batching of HF\n- [ ] Configuration setting for whether process killing (like with ctrl+c) should attempt to kill the batch. Right now it intentially persists in background. Maybe takes adding a predictions.cancel_all() prop and user can make their own context.\n- [ ] Concurrent batching when in ASAP mode\n- [ ] A \"scheduler\" paradigm trading off batch, flex, and ASAP processing to hit some timing target\n- [ ] Test on free-tier accounts. It is not clear what the tiny request limit counts\n\nPlease open an issue if you want to discuss one of these or something else.\n\nNote, in the progress bars in PyCharm can be bit cleaner if you enable \n[terminal emulation](https://stackoverflow.com/a/64727188) in your run configuration.\n\n## Hugging Face models\n\nLocal Causal LM models on Hugging Face models can be used interchangeably with the\nOpenAI models.\n\nNote: The universe of Huggingface models is diverse and inconsistent. Some (especially the non-completion ones) might require special prompt formatting to work as expected. Some models might not work at all.\n\n```python\nfrom lmwrapper.huggingface_wrapper import get_huggingface_lm\nfrom lmwrapper.structs import LmPrompt\n\n# Download a small model for demo\nlm = get_huggingface_lm(\"gpt2\") # 124M parameters\n\nprediction = lm.predict(LmPrompt(\n    \"The capital of Germany is Berlin. The capital of France is\",\n    max_tokens=1,\n    temperature=0,\n))\nprint(prediction.completion_text)\nassert prediction.completion_text == \" Paris\"\n```\n<!-- Model internals -->\n\nAdditionally, with HuggingFace models `lmwrapper` provides an interface for\naccessing the model internal states.\n\n## Claude\n```python\nfrom lmwrapper.claude_wrapper import (\n    get_claude_lm, ClaudeModelNames\n)\nlm = get_claude_lm(ClaudeModelNames.claude_3_5_haiku)\nprediction = lm.predict(\"Define 'anthropology' in one short sentence\") \nprint(prediction.completion_text) # Anthropology is the scientific study of human cultures, societies, behaviors, and...\n```\nNote Anthropic does not expose any information about the tokenization for Claude.\n\n### Retries on rate limit\n\n```python\nfrom lmwrapper.openai_wrapper import *\n\nlm = get_open_ai_lm(\n    OpenAiModelNames.gpt_3_5_turbo_instruct,\n    retry_on_rate_limit=True\n)\n```\n\n## Other features\n\n### Built-in token counting\n\n```python\nfrom lmwrapper.openai_wrapper import *\nfrom lmwrapper.structs import LmPrompt\n\nlm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)\nassert lm.estimate_tokens_in_prompt(\n    LmPrompt(\"My name is Spingldorph\", max_tokens=10)) == 7\nassert not lm.could_completion_go_over_token_limit(LmPrompt(\n    \"My name is Spingldorph\", max_tokens=1000))\n```\n\n## TODOs\n\nIf you are interested in one of these particular features or something else\nplease make a Github Issue.\n\n- [X] Openai completion\n- [X] Openai chat\n- [X] Huggingface interface\n- [X] Huggingface device checking on PyTorch\n- [X] Move cache to be per project\n- [X] Redesign cache away from generic `diskcache` to make it easier to manage as a sqlite db\n- [X] Smart caching when num_completions > 1 (reusing prior completions)\n- [X] OpenAI batching interface (experimental)\n- [X] Anthropic interface (basic)\n    - [X] Claude system messages\n- [X] Use the huggingface chat templates for chat models if available\n- [X] Be able to add metadata to a prompt\n- [ ] Automatic cache eviction to limit count or disk size (right now have to run a SQL query to delete entries before a certain time or matching your criteria)\n- [ ] Multimodal/images in super easy format (like automatically process pil, opencv, etc)\n- [ ] sort through usage of quantized models\n- [ ] Cost estimation of a prompt before running / \"observability\" monitoring of total cost\n- [ ] Additional Huggingface runtimes (TensorRT, BetterTransformers, etc)\n- [ ] async / streaming (not a top priority for non-interactive research use cases)\n- [ ] some lightweight utilities to help with tool use\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An object-oriented wrapper around language models with caching, batching, and more. ",
    "version": "0.17.0.0",
    "project_urls": {
        "Homepage": "https://github.com/DaiseyCode/lmwrapper"
    },
    "split_keywords": [
        "large language models",
        " openai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8fbbf5748eb8e13364fdfd20e50cc99f3606e3dea274c5edbf917a9f2d362f1b",
                "md5": "63304d638475e38d347032d4ce48760a",
                "sha256": "cd2e645251253f5d3f92237dca1a4c6e57b87b77512fa49cce4ebdf48af2231e"
            },
            "downloads": -1,
            "filename": "lmwrapper-0.17.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "63304d638475e38d347032d4ce48760a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 70734,
            "upload_time": "2025-07-13T18:55:22",
            "upload_time_iso_8601": "2025-07-13T18:55:22.701758Z",
            "url": "https://files.pythonhosted.org/packages/8f/bb/f5748eb8e13364fdfd20e50cc99f3606e3dea274c5edbf917a9f2d362f1b/lmwrapper-0.17.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d752f397ab77eaae055f5f5f3fc717c07cf7581cdb74ddb5cc6608c9ced9fd91",
                "md5": "87d95086d179f04ce0324018bc203929",
                "sha256": "5ceba68da26aed03b7ecbc4b24d60235958dc0a5476ab5e7eb5178e8ae1c046b"
            },
            "downloads": -1,
            "filename": "lmwrapper-0.17.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "87d95086d179f04ce0324018bc203929",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 102166,
            "upload_time": "2025-07-13T18:55:23",
            "upload_time_iso_8601": "2025-07-13T18:55:23.713150Z",
            "url": "https://files.pythonhosted.org/packages/d7/52/f397ab77eaae055f5f5f3fc717c07cf7581cdb74ddb5cc6608c9ced9fd91/lmwrapper-0.17.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 18:55:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DaiseyCode",
    "github_project": "lmwrapper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lmwrapper"
}

David Gros, Claudio Spiess