Name | async-llm-handler JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | An asynchronous handler for multiple LLM APIs |
upload_time | 2024-08-03 00:22:29 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.7 |
license | None |
keywords |
api
async
llm
nlp
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Async LLM Handler
Async LLM Handler is a Python package that provides a unified interface for interacting with multiple Language Model APIs asynchronously. It currently supports Gemini, Claude, and OpenAI APIs.
## Features
- Asynchronous API calls
- Support for multiple LLM providers:
- Gemini (model: gemini_flash)
- Claude (models: claude_3_5_sonnet, claude_3_haiku)
- OpenAI (models: gpt_4o, gpt_4o_mini)
- Automatic rate limiting for each API
- Token counting and prompt clipping utilities
## Installation
Install the Async LLM Handler using pip:
```bash
pip install async-llm-handler
```
## Configuration
Before using the package, set up your environment variables in a `.env` file in your project's root directory:
```
GEMINI_API_KEY=your_gemini_api_key
CLAUDE_API_KEY=your_claude_api_key
OPENAI_API_KEY=your_openai_api_key
```
## Usage
### Basic Usage
```python
import asyncio
from async_llm_handler import Handler
async def main():
handler = Handler()
# Using the default model
response = await handler.query("What is the capital of France?")
print(response)
# Specifying a model
response = await handler.query("Explain quantum computing", model="claude_3_5_sonnet")
print(response)
asyncio.run(main())
```
### Advanced Usage
#### Using Multiple Models Concurrently
```python
import asyncio
from async_llm_handler import Handler
async def main():
handler = Handler()
prompt = "Explain the theory of relativity"
tasks = [
handler.query(prompt, model='gemini_flash'),
handler.query(prompt, model='gpt_4o'),
handler.query(prompt, model='claude_3_5_sonnet')
]
responses = await asyncio.gather(*tasks)
for model, response in zip(['Gemini Flash', 'GPT-4o', 'Claude 3.5 Sonnet'], responses):
print(f"Response from {model}:")
print(response)
print()
asyncio.run(main())
```
#### Limiting Input and Output Tokens
```python
import asyncio
from async_llm_handler import Handler
async def main():
handler = Handler()
long_prompt = "Provide a detailed explanation of the entire history of artificial intelligence, including all major milestones and breakthroughs."
response = await handler.query(long_prompt, model="gpt_4o", max_input_tokens=1000, max_output_tokens=500)
print(response)
asyncio.run(main())
```
### Supported Models
The package supports the following models:
1. Gemini:
- `gemini_flash`
2. Claude:
- `claude_3_5_sonnet`
- `claude_3_haiku`
3. OpenAI:
- `gpt_4o`
- `gpt_4o_mini`
You can specify these models using the `model` parameter in the `query` method.
### Error Handling
The package uses custom exceptions for error handling. Wrap your API calls in try-except blocks to handle potential errors:
```python
import asyncio
from async_llm_handler import Handler
from async_llm_handler.exceptions import LLMAPIError, RateLimitTimeoutError
async def main():
handler = Handler()
try:
response = await handler.query("What is the meaning of life?", model="gpt_4o")
print(response)
except LLMAPIError as e:
print(f"An API error occurred: {e}")
except RateLimitTimeoutError as e:
print(f"Rate limit exceeded: {e}")
asyncio.run(main())
```
### Rate Limiting
The package automatically handles rate limiting for each API. The current rate limits are:
- Gemini Flash: 30 requests per minute
- Claude 3.5 Sonnet: 5 requests per minute
- Claude 3 Haiku: 5 requests per minute
- GPT-4o: 5 requests per minute
- GPT-4o mini: 5 requests per minute
If you exceed these limits, the package will automatically wait before making the next request.
## Utility Functions
The package includes utility functions for token counting and prompt clipping:
```python
from async_llm_handler.utils import count_tokens, clip_prompt
text = "This is a sample text for token counting."
token_count = count_tokens(text)
print(f"Token count: {token_count}")
long_text = "This is a very long text that needs to be clipped..." * 100
clipped_text = clip_prompt(long_text, max_tokens=50)
print(f"Clipped text: {clipped_text}")
```
These utilities use the `cl100k_base` encoding by default, which is suitable for most modern language models.
## Logging
The package uses Python's built-in logging module. You can configure logging in your application to see debug information, warnings, and errors from the Async LLM Handler:
```python
import logging
logging.basicConfig(level=logging.INFO)
```
This will display INFO level logs and above from the Async LLM Handler.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": null,
"name": "async-llm-handler",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "api, async, llm, nlp",
"author": null,
"author_email": "Bryan Nsoh <bryan.anye.5@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/eb/30/904348605a9997982b47ec2ca7c91dedbbd38e2f272081dcce64fb78e3bd/async_llm_handler-0.2.0.tar.gz",
"platform": null,
"description": "# Async LLM Handler\n\nAsync LLM Handler is a Python package that provides a unified interface for interacting with multiple Language Model APIs asynchronously. It currently supports Gemini, Claude, and OpenAI APIs.\n\n## Features\n\n- Asynchronous API calls\n- Support for multiple LLM providers:\n - Gemini (model: gemini_flash)\n - Claude (models: claude_3_5_sonnet, claude_3_haiku)\n - OpenAI (models: gpt_4o, gpt_4o_mini)\n- Automatic rate limiting for each API\n- Token counting and prompt clipping utilities\n\n## Installation\n\nInstall the Async LLM Handler using pip:\n\n```bash\npip install async-llm-handler\n```\n\n## Configuration\n\nBefore using the package, set up your environment variables in a `.env` file in your project's root directory:\n\n```\nGEMINI_API_KEY=your_gemini_api_key\nCLAUDE_API_KEY=your_claude_api_key\nOPENAI_API_KEY=your_openai_api_key\n```\n\n## Usage\n\n### Basic Usage\n\n```python\nimport asyncio\nfrom async_llm_handler import Handler\n\nasync def main():\n handler = Handler()\n\n # Using the default model\n response = await handler.query(\"What is the capital of France?\")\n print(response)\n\n # Specifying a model\n response = await handler.query(\"Explain quantum computing\", model=\"claude_3_5_sonnet\")\n print(response)\n\nasyncio.run(main())\n```\n\n### Advanced Usage\n\n#### Using Multiple Models Concurrently\n\n```python\nimport asyncio\nfrom async_llm_handler import Handler\n\nasync def main():\n handler = Handler()\n prompt = \"Explain the theory of relativity\"\n \n tasks = [\n handler.query(prompt, model='gemini_flash'),\n handler.query(prompt, model='gpt_4o'),\n handler.query(prompt, model='claude_3_5_sonnet')\n ]\n \n responses = await asyncio.gather(*tasks)\n \n for model, response in zip(['Gemini Flash', 'GPT-4o', 'Claude 3.5 Sonnet'], responses):\n print(f\"Response from {model}:\")\n print(response)\n print()\n\nasyncio.run(main())\n```\n\n#### Limiting Input and Output Tokens\n\n```python\nimport asyncio\nfrom async_llm_handler import Handler\n\nasync def main():\n handler = Handler()\n\n long_prompt = \"Provide a detailed explanation of the entire history of artificial intelligence, including all major milestones and breakthroughs.\"\n\n response = await handler.query(long_prompt, model=\"gpt_4o\", max_input_tokens=1000, max_output_tokens=500)\n print(response)\n\nasyncio.run(main())\n```\n\n### Supported Models\n\nThe package supports the following models:\n\n1. Gemini:\n - `gemini_flash`\n\n2. Claude:\n - `claude_3_5_sonnet`\n - `claude_3_haiku`\n\n3. OpenAI:\n - `gpt_4o`\n - `gpt_4o_mini`\n\nYou can specify these models using the `model` parameter in the `query` method.\n\n### Error Handling\n\nThe package uses custom exceptions for error handling. Wrap your API calls in try-except blocks to handle potential errors:\n\n```python\nimport asyncio\nfrom async_llm_handler import Handler\nfrom async_llm_handler.exceptions import LLMAPIError, RateLimitTimeoutError\n\nasync def main():\n handler = Handler()\n\n try:\n response = await handler.query(\"What is the meaning of life?\", model=\"gpt_4o\")\n print(response)\n except LLMAPIError as e:\n print(f\"An API error occurred: {e}\")\n except RateLimitTimeoutError as e:\n print(f\"Rate limit exceeded: {e}\")\n\nasyncio.run(main())\n```\n\n### Rate Limiting\n\nThe package automatically handles rate limiting for each API. The current rate limits are:\n\n- Gemini Flash: 30 requests per minute\n- Claude 3.5 Sonnet: 5 requests per minute\n- Claude 3 Haiku: 5 requests per minute\n- GPT-4o: 5 requests per minute\n- GPT-4o mini: 5 requests per minute\n\nIf you exceed these limits, the package will automatically wait before making the next request.\n\n## Utility Functions\n\nThe package includes utility functions for token counting and prompt clipping:\n\n```python\nfrom async_llm_handler.utils import count_tokens, clip_prompt\n\ntext = \"This is a sample text for token counting.\"\ntoken_count = count_tokens(text)\nprint(f\"Token count: {token_count}\")\n\nlong_text = \"This is a very long text that needs to be clipped...\" * 100\nclipped_text = clip_prompt(long_text, max_tokens=50)\nprint(f\"Clipped text: {clipped_text}\")\n```\n\nThese utilities use the `cl100k_base` encoding by default, which is suitable for most modern language models.\n\n## Logging\n\nThe package uses Python's built-in logging module. You can configure logging in your application to see debug information, warnings, and errors from the Async LLM Handler:\n\n```python\nimport logging\n\nlogging.basicConfig(level=logging.INFO)\n```\n\nThis will display INFO level logs and above from the Async LLM Handler.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License.\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "An asynchronous handler for multiple LLM APIs",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/BryanNsoh/async_llm_handler"
},
"split_keywords": [
"api",
" async",
" llm",
" nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ae2923ea6a5be4c7ab2910ec8705d5504f33845b03c013a8fdc829b55ce47eda",
"md5": "b535a479086d881ed2ebed45a476c8c3",
"sha256": "32c1edd4e86862787d25ebaaa28bf1186bc26ba3b7920ef0acce29329e0f8f45"
},
"downloads": -1,
"filename": "async_llm_handler-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b535a479086d881ed2ebed45a476c8c3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 11602,
"upload_time": "2024-08-03T00:22:27",
"upload_time_iso_8601": "2024-08-03T00:22:27.911361Z",
"url": "https://files.pythonhosted.org/packages/ae/29/23ea6a5be4c7ab2910ec8705d5504f33845b03c013a8fdc829b55ce47eda/async_llm_handler-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "eb30904348605a9997982b47ec2ca7c91dedbbd38e2f272081dcce64fb78e3bd",
"md5": "a0d19ba157361277f3e29158ed666988",
"sha256": "244f25db0316cfffe7efdcf519cd3a8cd5dcb544bffbbb64d504f0e3ba827abe"
},
"downloads": -1,
"filename": "async_llm_handler-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "a0d19ba157361277f3e29158ed666988",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 12349,
"upload_time": "2024-08-03T00:22:29",
"upload_time_iso_8601": "2024-08-03T00:22:29.535942Z",
"url": "https://files.pythonhosted.org/packages/eb/30/904348605a9997982b47ec2ca7c91dedbbd38e2f272081dcce64fb78e3bd/async_llm_handler-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-03 00:22:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BryanNsoh",
"github_project": "async_llm_handler",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "async-llm-handler"
}