llmx


Namellmx JSON
Version 0.0.21a0 PyPI version JSON
download
home_page
SummaryLLMX: A library for LLM Text Generation
upload_time2024-02-06 22:51:11
maintainer
docs_urlNone
author
requires_python>=3.9
licenseThe MIT License (MIT) Copyright (c) 2023 Victor Dibia. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LLMX - An API for Chat Fine-Tuned Language Models

[![PyPI version](https://badge.fury.io/py/llmx.svg)](https://badge.fury.io/py/llmx)

A simple python package that provides a unified interface to several LLM providers of chat fine-tuned models [OpenAI, AzureOpenAI, PaLM, Cohere and local HuggingFace Models].

> **Note**
> llmx wraps multiple api providers and its interface _may_ change as the providers as well as the general field of LLMs evolve.

There is nothing particularly special about this library, but some of the requirements I needed when I started building this (that other libraries did not have):

- **Unified Model Interface**: Single interface to create LLM text generators with support for **multiple LLM providers**.

```python
from llmx import  llm

gen = llm(provider="openai") # support azureopenai models too.
gen = llm(provider="palm") # or google
gen = llm(provider="cohere") # or palm
gen = llm(provider="hf", model="HuggingFaceH4/zephyr-7b-beta", device_map="auto") # run huggingface model locally
```

- **Unified Messaging Interface**. Standardizes on the OpenAI ChatML message format and is designed for _chat finetuned_ models. For example, the standard prompt sent a model is formatted as an array of objects, where each object has a role (`system`, `user`, or `assistant`) and content (see below). A single request is list of only one message (e.g., write code to plot a cosine wave signal). A conversation is a list of messages e.g. write code for x, update the axis to y, etc. Same format for all models.

```python
messages = [
    {"role": "user", "content": "You are a helpful assistant that can explain concepts clearly to a 6 year old child."},
    {"role": "user", "content": "What is  gravity?"}
]
```

- **Good Utils (e.g., Caching etc)**: E.g. being able to use caching for faster responses. General policy is that cache is used if config (including messages) is the same. If you want to force a new response, set `use_cache=False` in the `generate` call.

```python
response = gen.generate(messages=messages, config=TextGeneratorConfig(n=1, use_cache=True))
```

Output looks like

```bash

TextGenerationResponse(
  text=[Message(role='assistant', content="Gravity is like a magical force that pulls things towards each other. It's what keeps us on the ground and stops us from floating away into space. ... ")],
  config=TextGenerationConfig(n=1, temperature=0.1, max_tokens=8147, top_p=1.0, top_k=50, frequency_penalty=0.0, presence_penalty=0.0, provider='openai', model='gpt-4', stop=None),
  logprobs=[], usage={'prompt_tokens': 34, 'completion_tokens': 69, 'total_tokens': 103})

```

Are there other libraries that do things like this really well? Yes! I'd recommend looking at [guidance](https://github.com/microsoft/guidance) which does a lot more. Interested in optimized inference? Try somthing like [vllm](https://github.com/vllm-project/vllm).

## Installation

Install from pypi. Please use **python3.10** or higher.

```bash
pip install llmx
```

Install in development mode

```bash
git clone
cd llmx
pip install -e .
```

Note that you may want to use the latest version of pip to install this package.
`python3 -m pip install --upgrade pip`

## Usage

Set your api keys first for each service.

```bash
# for openai and cohere
export OPENAI_API_KEY=<your key>
export COHERE_API_KEY=<your key>

# for PALM via MakerSuite
export PALM_API_KEY=<your key>

# for PaLM (Vertex AI), setup a gcp project, and get a service account key file
export PALM_SERVICE_ACCOUNT_KEY_FILE= <path to your service account key file>
export PALM_PROJECT_ID=<your gcp project id>
export PALM_PROJECT_LOCATION=<your project location>
```

You can also set the default provider and list of supported providers via a config file. Use the yaml format in this [sample `config.default.yml` file](llmx/configs/config.default.yml) and set the `LLMX_CONFIG_PATH` to the path of the config file.

```python
from llmx import llm
from llmx.datamodel import TextGenerationConfig

messages = [
    {"role": "system", "content": "You are a helpful assistant that can explain concepts clearly to a 6 year old child."},
    {"role": "user", "content": "What is  gravity?"}
]

openai_gen = llm(provider="openai")
openai_config = TextGenerationConfig(model="gpt-4", max_tokens=50)
openai_response = openai_gen.generate(messages, config=openai_config, use_cache=True)
print(openai_response.text[0].content)

```

See the [tutorial](/notebooks/tutorial.ipynb) for more examples.

## A Note on Using Local HuggingFace Models

While llmx can use the huggingface transformers library to run inference with local models, you might get more mileage from using a well-optimized server endpoint like [vllm](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html#openai-compatible-server), or FastChat. The general idea is that these tools let you provide an openai-compatible endpoint but also implement optimizations such as dynamic batching, quantization etc to improve throughput. The general steps are:

- install vllm, setup endpoint e.g., on port `8000`
- use openai as your provider to access that endpoint.

```python
from llmx import  llm
hfgen_gen = llm(
    provider="openai",
    api_base="http://localhost:8000",
    api_key="EMPTY,
)
...
```

## Current Work

- Supported models
  - [x] OpenAI
  - [x] PaLM ([MakerSuite](https://developers.generativeai.google/api/rest/generativelanguage), [Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models))
  - [x] Cohere
  - [x] HuggingFace (local)

## Caveats

- **Prompting**. llmx makes some assumptions around how prompts are constructed e.g., how the chat message interface is assembled into a prompt for each model type. If your application or use case requires more control over the prompt, you may want to use a different library (ideally query the LLM models directly).
- **Inference Optimization**. For hosted models (GPT-4, PalM, Cohere) etc, this library provides an excellent unified interface as the hosted api already takes care of inference optimizations. However, if you are looking for a library that is optimized for inference with **_local models_(e.g., huggingface)** (tensor parrelization, distributed inference etc), I'd recommend looking at [vllm](https://github.com/vllm-project/vllm) or [tgi](https://github.com/huggingface/text-generation-inference).

## Citation

If you use this library in your work, please cite:

```bibtex
@software{victordibiallmx,
author = {Victor Dibia},
license = {MIT},
month =  {10},
title = {LLMX - An API for Chat Fine-Tuned Language Models},
url = {https://github.com/victordibia/llmx},
year = {2023}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "llmx",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "Victor Dibia <victor.dibia@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3c/30/a85ab892d810159e91e408b2f46f19fe8bbd9890d58fcbc4e32f8948b691/llmx-0.0.21a0.tar.gz",
    "platform": null,
    "description": "# LLMX - An API for Chat Fine-Tuned Language Models\n\n[![PyPI version](https://badge.fury.io/py/llmx.svg)](https://badge.fury.io/py/llmx)\n\nA simple python package that provides a unified interface to several LLM providers of chat fine-tuned models [OpenAI, AzureOpenAI, PaLM, Cohere and local HuggingFace Models].\n\n> **Note**\n> llmx wraps multiple api providers and its interface _may_ change as the providers as well as the general field of LLMs evolve.\n\nThere is nothing particularly special about this library, but some of the requirements I needed when I started building this (that other libraries did not have):\n\n- **Unified Model Interface**: Single interface to create LLM text generators with support for **multiple LLM providers**.\n\n```python\nfrom llmx import  llm\n\ngen = llm(provider=\"openai\") # support azureopenai models too.\ngen = llm(provider=\"palm\") # or google\ngen = llm(provider=\"cohere\") # or palm\ngen = llm(provider=\"hf\", model=\"HuggingFaceH4/zephyr-7b-beta\", device_map=\"auto\") # run huggingface model locally\n```\n\n- **Unified Messaging Interface**. Standardizes on the OpenAI ChatML message format and is designed for _chat finetuned_ models. For example, the standard prompt sent a model is formatted as an array of objects, where each object has a role (`system`, `user`, or `assistant`) and content (see below). A single request is list of only one message (e.g., write code to plot a cosine wave signal). A conversation is a list of messages e.g. write code for x, update the axis to y, etc. Same format for all models.\n\n```python\nmessages = [\n    {\"role\": \"user\", \"content\": \"You are a helpful assistant that can explain concepts clearly to a 6 year old child.\"},\n    {\"role\": \"user\", \"content\": \"What is  gravity?\"}\n]\n```\n\n- **Good Utils (e.g., Caching etc)**: E.g. being able to use caching for faster responses. General policy is that cache is used if config (including messages) is the same. If you want to force a new response, set `use_cache=False` in the `generate` call.\n\n```python\nresponse = gen.generate(messages=messages, config=TextGeneratorConfig(n=1, use_cache=True))\n```\n\nOutput looks like\n\n```bash\n\nTextGenerationResponse(\n  text=[Message(role='assistant', content=\"Gravity is like a magical force that pulls things towards each other. It's what keeps us on the ground and stops us from floating away into space. ... \")],\n  config=TextGenerationConfig(n=1, temperature=0.1, max_tokens=8147, top_p=1.0, top_k=50, frequency_penalty=0.0, presence_penalty=0.0, provider='openai', model='gpt-4', stop=None),\n  logprobs=[], usage={'prompt_tokens': 34, 'completion_tokens': 69, 'total_tokens': 103})\n\n```\n\nAre there other libraries that do things like this really well? Yes! I'd recommend looking at [guidance](https://github.com/microsoft/guidance) which does a lot more. Interested in optimized inference? Try somthing like [vllm](https://github.com/vllm-project/vllm).\n\n## Installation\n\nInstall from pypi. Please use **python3.10** or higher.\n\n```bash\npip install llmx\n```\n\nInstall in development mode\n\n```bash\ngit clone\ncd llmx\npip install -e .\n```\n\nNote that you may want to use the latest version of pip to install this package.\n`python3 -m pip install --upgrade pip`\n\n## Usage\n\nSet your api keys first for each service.\n\n```bash\n# for openai and cohere\nexport OPENAI_API_KEY=<your key>\nexport COHERE_API_KEY=<your key>\n\n# for PALM via MakerSuite\nexport PALM_API_KEY=<your key>\n\n# for PaLM (Vertex AI), setup a gcp project, and get a service account key file\nexport PALM_SERVICE_ACCOUNT_KEY_FILE= <path to your service account key file>\nexport PALM_PROJECT_ID=<your gcp project id>\nexport PALM_PROJECT_LOCATION=<your project location>\n```\n\nYou can also set the default provider and list of supported providers via a config file. Use the yaml format in this [sample `config.default.yml` file](llmx/configs/config.default.yml) and set the `LLMX_CONFIG_PATH` to the path of the config file.\n\n```python\nfrom llmx import llm\nfrom llmx.datamodel import TextGenerationConfig\n\nmessages = [\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant that can explain concepts clearly to a 6 year old child.\"},\n    {\"role\": \"user\", \"content\": \"What is  gravity?\"}\n]\n\nopenai_gen = llm(provider=\"openai\")\nopenai_config = TextGenerationConfig(model=\"gpt-4\", max_tokens=50)\nopenai_response = openai_gen.generate(messages, config=openai_config, use_cache=True)\nprint(openai_response.text[0].content)\n\n```\n\nSee the [tutorial](/notebooks/tutorial.ipynb) for more examples.\n\n## A Note on Using Local HuggingFace Models\n\nWhile llmx can use the huggingface transformers library to run inference with local models, you might get more mileage from using a well-optimized server endpoint like [vllm](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html#openai-compatible-server), or FastChat. The general idea is that these tools let you provide an openai-compatible endpoint but also implement optimizations such as dynamic batching, quantization etc to improve throughput. The general steps are:\n\n- install vllm, setup endpoint e.g., on port `8000`\n- use openai as your provider to access that endpoint.\n\n```python\nfrom llmx import  llm\nhfgen_gen = llm(\n    provider=\"openai\",\n    api_base=\"http://localhost:8000\",\n    api_key=\"EMPTY,\n)\n...\n```\n\n## Current Work\n\n- Supported models\n  - [x] OpenAI\n  - [x] PaLM ([MakerSuite](https://developers.generativeai.google/api/rest/generativelanguage), [Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models))\n  - [x] Cohere\n  - [x] HuggingFace (local)\n\n## Caveats\n\n- **Prompting**. llmx makes some assumptions around how prompts are constructed e.g., how the chat message interface is assembled into a prompt for each model type. If your application or use case requires more control over the prompt, you may want to use a different library (ideally query the LLM models directly).\n- **Inference Optimization**. For hosted models (GPT-4, PalM, Cohere) etc, this library provides an excellent unified interface as the hosted api already takes care of inference optimizations. However, if you are looking for a library that is optimized for inference with **_local models_(e.g., huggingface)** (tensor parrelization, distributed inference etc), I'd recommend looking at [vllm](https://github.com/vllm-project/vllm) or [tgi](https://github.com/huggingface/text-generation-inference).\n\n## Citation\n\nIf you use this library in your work, please cite:\n\n```bibtex\n@software{victordibiallmx,\nauthor = {Victor Dibia},\nlicense = {MIT},\nmonth =  {10},\ntitle = {LLMX - An API for Chat Fine-Tuned Language Models},\nurl = {https://github.com/victordibia/llmx},\nyear = {2023}\n}\n```\n",
    "bugtrack_url": null,
    "license": "The MIT License (MIT)  Copyright (c) 2023 Victor Dibia.  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "LLMX: A library for LLM Text Generation",
    "version": "0.0.21a0",
    "project_urls": {
        "Bug Tracker": "https://github.com/victordibia/llmx/issues",
        "Homepage": "https://github.com/victordibia/llmx"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a651c74b8cea0d8008e356c8d7de9464c65aac73ca6dad0ca809a03bc463cc44",
                "md5": "1b7f7091e2c207fcd48da7a7a5c05af7",
                "sha256": "f8877752a790a5f7924248292e6ba9c215ab17f5aea8dc7293d3290f4edf98a6"
            },
            "downloads": -1,
            "filename": "llmx-0.0.21a0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1b7f7091e2c207fcd48da7a7a5c05af7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 20062,
            "upload_time": "2024-02-06T22:51:09",
            "upload_time_iso_8601": "2024-02-06T22:51:09.651121Z",
            "url": "https://files.pythonhosted.org/packages/a6/51/c74b8cea0d8008e356c8d7de9464c65aac73ca6dad0ca809a03bc463cc44/llmx-0.0.21a0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3c30a85ab892d810159e91e408b2f46f19fe8bbd9890d58fcbc4e32f8948b691",
                "md5": "fba1315dd5ac80daab588751302b40c8",
                "sha256": "384a3ac086834e4b7302c3f4ace9a1c631521f281347f12971123a017b2efba3"
            },
            "downloads": -1,
            "filename": "llmx-0.0.21a0.tar.gz",
            "has_sig": false,
            "md5_digest": "fba1315dd5ac80daab588751302b40c8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 23642,
            "upload_time": "2024-02-06T22:51:11",
            "upload_time_iso_8601": "2024-02-06T22:51:11.581837Z",
            "url": "https://files.pythonhosted.org/packages/3c/30/a85ab892d810159e91e408b2f46f19fe8bbd9890d58fcbc4e32f8948b691/llmx-0.0.21a0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-06 22:51:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "victordibia",
    "github_project": "llmx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "llmx"
}
        
Elapsed time: 8.53333s