# llm_async — Async multi‑provider LLM client for Python
High-performance, async-first LLM client for OpenAI, Claude, Google Gemini, and OpenRouter. Built on top of aiosonic for fast, low-latency HTTP and true asyncio streaming across providers.
[](https://pypi.org/project/llm_async/)
[](https://pypi.org/project/llm_async/)
[](LICENSE)
[](#)
[](#)
[](https://github.com/astral-sh/ruff)
## Table of Contents
- [Features](#features)
- [Supported Providers & Features](#supported-providers--features)
- [Installation](#installation)
- [Using Poetry (Recommended)](#using-poetry-recommended)
- [Using pip](#using-pip)
- [Quickstart](#quickstart)
- [Usage](#usage)
- [Basic Chat Completion](#basic-chat-completion)
- [OpenAI](#openai)
- [OpenRouter](#openrouter)
- [Google Gemini](#google-gemini)
- [Custom Base URL](#custom-base-url)
- [Direct API Requests](#direct-api-requests)
- [Tool Usage](#tool-usage)
- [Structured Outputs](#structured-outputs)
- [OpenAI Responses API with Prompt Caching](#openai-responses-api-with-prompt-caching)
- [API Reference](#api-reference)
- [OpenAIProvider](#openaiprovider)
- [OpenRouterProvider](#openrouterprovider)
- [GoogleProvider](#googleprovider)
- [Development](#development)
- [Setup](#setup)
- [Running Tests](#running-tests)
- [Building](#building)
- [Roadmap](#roadmap)
- [Contributing](#contributing)
- [License](#license)
- [Authors](#authors)
## Features
### Supported Providers & Features
| Feature | OpenAI | Claude | Google Gemini | OpenRouter |
|---------|--------|--------|---------------|-----------|
| Chat Completions | ✅ | ✅ | ✅ | ✅ |
| Tool Calling | ✅ | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ | ✅ |
| Structured Outputs | ✅ | ❌ | ✅ | ✅ |
Notes:
- Structured Outputs: Supported by OpenAI, Google Gemini, and OpenRouter; not supported by Claude.
- See [Examples](#examples) for tool-call round-trips and streaming demos.
- **Async-first**: Built with asyncio for high-performance, non-blocking operations.
- **Provider Support**: Supports OpenAI, Anthropic Claude, Google Gemini, and OpenRouter for chat completions.
- **Tool Calling**: Tool execution with unified tool definitions across providers.
- **Structured Outputs**: Enforce JSON schema validation on responses (OpenAI, Google, OpenRouter).
- **Extensible**: Easy to add new providers by inheriting from `BaseProvider`.
- **Tested**: Comprehensive test suite with high coverage.
#### Performance
- Built on top of [aiosonic](https://github.com/sonic182/aiosonic) for fast, low-overhead async HTTP requests and streaming.
- True asyncio end-to-end: concurrent requests across providers with minimal overhead.
- Designed for fast tool-call round-trips and low-latency streaming.
## Installation
### Using Poetry (Recommended)
```bash
poetry add llm-async
```
### Using pip
```bash
pip install llm-async
```
## Quickstart
Minimal async example with streaming using OpenAI-compatible interface:
```python
import asyncio
from llm_async import OpenAIProvider
async def main():
provider = OpenAIProvider(api_key="YOUR_OPENAI_API_KEY")
# Stream tokens as they arrive
async for chunk in await provider.acomplete(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Give me 3 ideas for a CLI tool."}],
stream=True,
):
print(chunk.delta, end="", flush=True)
asyncio.run(main())
```
## Usage
### Basic Chat Completion
#### OpenAI
```python
import asyncio
from llm_async import OpenAIProvider
async def main():
# Initialize the provider with your API key
provider = OpenAIProvider(api_key="your-openai-api-key")
# Perform a chat completion
response = await provider.acomplete(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.main_response.content) # Output: The assistant's response
# Run the async function
asyncio.run(main())
```
#### OpenRouter
```python
import asyncio
import os
from llm_async import OpenRouterProvider
async def main():
# Initialize the provider with your API key
provider = OpenRouterProvider(api_key=os.getenv("OPENROUTER_API_KEY"))
# Perform a chat completion
response = await provider.acomplete(
model="openrouter/auto", # Let OpenRouter choose the best model
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
http_referer="https://github.com/your-username/your-app", # Optional
x_title="My AI App" # Optional
)
print(response.main_response.content) # Output: The assistant's response
# Run the async function
asyncio.run(main())
```
#### Google Gemini
```python
import asyncio
from llm_async.providers.google import GoogleProvider
async def main():
# Initialize the provider with your API key
provider = GoogleProvider(api_key="your-google-gemini-api-key")
# Perform a chat completion
response = await provider.acomplete(
model="gemini-2.5-flash",
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.main_response.content) # Output: The assistant's response
# Run the async function
asyncio.run(main())
```
### Custom Base URL
```python
provider = OpenAIProvider(
api_key="your-api-key",
base_url="https://custom-openai-endpoint.com/v1"
)
```
### Direct API Requests
Make direct requests to any provider API endpoint using the `request()` method:
```python
import asyncio
import os
from llm_async import OpenAIProvider
async def main():
provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"))
# GET request to list available models
response = await provider.request("GET", "/models")
print(f"Available models: {len(response.get('data', []))}")
for model in response.get("data", [])[:5]:
print(f" - {model.get('id')}")
# POST request with custom data
response = await provider.request("POST", "/endpoint", json_data={"key": "value"})
# Add custom headers
response = await provider.request("GET", "/endpoint", custom_header="value")
asyncio.run(main())
```
The `request()` method supports all HTTP verbs: GET, POST, PUT, DELETE, PATCH. It works across all providers (OpenAI, Claude, Google, OpenRouter, OpenAIResponses).
See `examples/provider_request.py` for a complete example.
### Tool Usage
```python
import asyncio
import os
from llm_async.models import Tool
from llm_async.providers import OpenAIProvider
# Define a calculator tool
calculator_tool = Tool(
name="calculator",
description="Perform basic arithmetic operations",
parameters={
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"]
},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
},
input_schema={
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"]
},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
}
)
def calculator(operation: str, a: float, b: float) -> float:
"""Calculator function that can be called by the LLM."""
if operation == "add":
return a + b
elif operation == "subtract":
return a - b
elif operation == "multiply":
return a * b
elif operation == "divide":
return a / b
return 0
async def main():
# Initialize provider
provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"))
# Tool executor mapping
tools_map = {"calculator": calculator}
# Initial user message
messages = [{"role": "user", "content": "What is 15 + 27?"}]
# First turn: Ask the LLM to perform a calculation
response = await provider.acomplete(
model="gpt-4o-mini",
messages=messages,
tools=[calculator_tool]
)
# Execute the tool call
tool_call = response.main_response.tool_calls[0]
tool_result = await provider.execute_tool(tool_call, tools_map)
# Second turn: Send the tool result back to the LLM
messages_with_tool = messages + [response.main_response.original] + [tool_result]
final_response = await provider.acomplete(
model="gpt-4o-mini",
messages=messages_with_tool
)
print(final_response.main_response.content) # Output: The final answer
asyncio.run(main())
```
## Recipes
- Streaming across providers: see `examples/stream_all_providers.py`
- Tool-call round-trip (calculator): see `examples/tool_call_all_providers.py`
- Structured outputs (JSON schema): see section below and examples
### Examples
The `examples` directory contains runnable scripts for local testing against all supported providers:
- `examples/tool_call_all_providers.py` shows how to execute the same calculator tool call round-trip with OpenAI, OpenRouter, Claude, and Google using shared message/tool definitions.
- `examples/stream_all_providers.py` streams completions from the same provider list so you can compare chunking formats and latency.
Both scripts expect a `.env` file with `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, `CLAUDE_API_KEY`, and `GEMINI_API_KEY` (plus optional per-provider model overrides). Run them via Poetry, e.g. `poetry run python examples/tool_call_all_providers.py`.
### Structured Outputs
Enforce JSON schema validation on model responses for consistent, type-safe outputs.
```python
import asyncio
import json
from llm_async import OpenAIProvider
from llm_async.providers.google import GoogleProvider
# Define response schema
response_schema = {
"type": "object",
"properties": {
"answer": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["answer", "confidence"],
"additionalProperties": False
}
async def main():
# OpenAI example
openai_provider = OpenAIProvider(api_key="your-openai-key")
response = await openai_provider.acomplete(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
response_schema=response_schema
)
result = json.loads(response.main_response.content)
print(f"OpenAI: {result}")
# Google Gemini example
google_provider = GoogleProvider(api_key="your-google-key")
response = await google_provider.acomplete(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "What is the capital of France?"}],
response_schema=response_schema
)
result = json.loads(response.main_response.content)
print(f"Gemini: {result}")
asyncio.run(main())
```
**Supported Providers**: OpenAI, Google Gemini, OpenRouter. Claude does not support structured outputs.
### OpenAI Responses API with Prompt Caching
The OpenAI Responses API provides stateless conversation state management using `previous_response_id` and prompt caching with `prompt_cache_key`. This enables efficient multi-turn conversations without maintaining conversation history on the client side.
```python
import asyncio
import uuid
from llm_async.models import Tool
from llm_async.models.message import Message
from llm_async.providers import OpenAIResponsesProvider
# Define a calculator tool
calculator_tool = Tool(
name="calculator",
description="Perform basic arithmetic operations",
parameters={
"type": "object",
"properties": {
"operation": {"type": "string", "enum": ["add", "subtract", "multiply", "divide"]},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
}
)
def calculator(operation: str, a: float, b: float) -> float:
if operation == "add":
return a + b
elif operation == "subtract":
return a - b
elif operation == "multiply":
return a * b
elif operation == "divide":
return a / b
return 0
async def main():
provider = OpenAIResponsesProvider(api_key="your-openai-api-key")
# Generate a session ID for prompt caching
session_id = uuid.uuid4().hex
# First turn: Ask the model to use a tool
response = await provider.acomplete(
model="gpt-4.1",
messages=[Message("user", "What is 15 + 27? Use the calculator tool.")],
tools=[calculator_tool],
tool_choice="required",
prompt_cache_key=session_id, # Enable prompt caching for this session
)
# Execute the tool locally
tool_call = response.main_response.tool_calls[0]
tool_result = await provider.execute_tool(tool_call, {"calculator": calculator})
# Second turn: Continue conversation using previous_response_id
# No need to send the entire conversation history - just the response ID and new tool output
final_response = await provider.acomplete(
model="gpt-4.1",
messages=[tool_result], # Send tool result as message
tools=[calculator_tool],
previous_response_id=response.original["id"], # Reference the previous response
prompt_cache_key=session_id, # Reuse the cached prompt
)
print(final_response.main_response.content) # Output: The final answer with calculation result
asyncio.run(main())
```
**Key Benefits**:
- **No history overhead**: Use `previous_response_id` to continue conversations without resending message history
- **Prompt caching**: Pass `prompt_cache_key` to reuse cached prompts across requests in the same session
- **Reduced costs**: Cached prefixes consume 90% fewer tokens
- **Lower latency**: Cached prefixes are processed faster
- **Session management**: Clients control session IDs (e.g., `uuid.uuid4().hex`) for cache routing
**How it works**:
1. First request establishes a response context and caches the prompt prefix (for prompts ≥1024 tokens)
2. Subsequent requests reference the first response via `previous_response_id`
3. Using the same `prompt_cache_key` routes requests to the same machine for consistent cache hits
4. Only send new content (tool outputs, user messages) instead of full conversation history
5. Cached prefixes remain active for 5-10 minutes of inactivity (up to 1 hour off-peak)
**See also**: `examples/openai_responses_tool_call_with_previous_id.py` for a complete working example.
## Why llm_async?
- Async-first performance (aiosonic-based) vs. sync or heavier HTTP stacks.
- Unified provider interface: same message/tool/streaming patterns across OpenAI, Claude, Gemini, OpenRouter.
- Structured outputs (OpenAI, Google, OpenRouter) with JSON schema validation.
- Tool-call round-trip helpers for consistent multi-turn execution.
- Minimal surface area: easy to extend with new providers via BaseProvider.
## API Reference
### OpenAIProvider
- `__init__(api_key: str, base_url: str = "https://api.openai.com/v1")`
- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`
Performs a chat completion. When `stream=True` the method returns an async iterator that yields StreamChunk objects as they arrive from the provider.
### OpenRouterProvider
- `__init__(api_key: str, base_url: str = "https://openrouter.ai/api/v1")`
- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`
Performs a chat completion using OpenRouter's unified API. Supports the same OpenAI-compatible interface with additional optional headers:
- `http_referer`: Your application's URL (recommended)
- `x_title`: Your application's name (recommended)
OpenRouter provides access to hundreds of AI models from various providers through a single API.
### GoogleProvider
- `__init__(api_key: str, base_url: str = "https://generativelanguage.googleapis.com/v1beta/models/")`
- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`
Performs a chat completion using Google's Gemini API. Supports structured outputs and uses camelCase for API keys (e.g., `generationConfig`).
**Streaming**
- **Usage**: `async for chunk in await provider.acomplete(..., stream=True):` print or process `chunk` in real time.
**Example output**
```
--- OpenAI streaming response ---
1. Peel and slice potatoes.
2. Par-cook potatoes briefly.
3. Whisk eggs with salt and pepper.
4. Sauté onions until translucent (optional).
5. Combine potatoes and eggs in a pan and cook until set.
6. Fold and serve.
--- Claude streaming response ---
1. Prepare potatoes by peeling and slicing.
2. Fry or boil until tender.
3. Beat eggs and season.
4. Mix potatoes with eggs and cook gently.
5. Serve warm.
```
## Development
### Setup
```bash
git clone https://github.com/sonic182/llm-async.git
cd llm_async
poetry install
```
### Running Tests
```bash
poetry run pytest
```
### Building
```bash
poetry build
```
## Roadmap
- Support for additional providers (e.g., Grok, Anthropic direct API)
- More advanced tool features
- Response caching and retry mechanisms
## Contributing
Contributions are welcome! Please open an issue or submit a pull request on [GitHub](https://github.com/sonic182/llm-async).
## License
MIT License - see the [LICENSE](LICENSE) file for details.
## Authors
- sonic182
Raw data
{
"_id": null,
"home_page": "https://github.com/sonic182/llm-async",
"name": "llm-async",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "llm, asyncio, ai, openai, anthropic, claude, gemini, google, openrouter, streaming, tool calling, json schema, structured outputs, python, aiosonic",
"author": "Johanderson Mogollon",
"author_email": "johander1822@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/45/9f/1adf324757bd7bf8acf40d441a5c2982bba776e1f7a605c6d88bd2616bdc/llm_async-0.3.1.tar.gz",
"platform": null,
"description": "# llm_async \u2014 Async multi\u2011provider LLM client for Python\n\nHigh-performance, async-first LLM client for OpenAI, Claude, Google Gemini, and OpenRouter. Built on top of aiosonic for fast, low-latency HTTP and true asyncio streaming across providers.\n\n[](https://pypi.org/project/llm_async/)\n[](https://pypi.org/project/llm_async/)\n[](LICENSE)\n[](#)\n[](#)\n[](https://github.com/astral-sh/ruff)\n\n## Table of Contents\n\n- [Features](#features)\n - [Supported Providers & Features](#supported-providers--features)\n- [Installation](#installation)\n - [Using Poetry (Recommended)](#using-poetry-recommended)\n - [Using pip](#using-pip)\n- [Quickstart](#quickstart)\n- [Usage](#usage)\n - [Basic Chat Completion](#basic-chat-completion)\n - [OpenAI](#openai)\n - [OpenRouter](#openrouter)\n - [Google Gemini](#google-gemini)\n - [Custom Base URL](#custom-base-url)\n - [Direct API Requests](#direct-api-requests)\n - [Tool Usage](#tool-usage)\n - [Structured Outputs](#structured-outputs)\n - [OpenAI Responses API with Prompt Caching](#openai-responses-api-with-prompt-caching)\n- [API Reference](#api-reference)\n - [OpenAIProvider](#openaiprovider)\n - [OpenRouterProvider](#openrouterprovider)\n - [GoogleProvider](#googleprovider)\n- [Development](#development)\n - [Setup](#setup)\n - [Running Tests](#running-tests)\n - [Building](#building)\n- [Roadmap](#roadmap)\n- [Contributing](#contributing)\n- [License](#license)\n- [Authors](#authors)\n\n## Features\n\n### Supported Providers & Features\n\n| Feature | OpenAI | Claude | Google Gemini | OpenRouter |\n|---------|--------|--------|---------------|-----------|\n| Chat Completions | \u2705 | \u2705 | \u2705 | \u2705 |\n| Tool Calling | \u2705 | \u2705 | \u2705 | \u2705 |\n| Streaming | \u2705 | \u2705 | \u2705 | \u2705 |\n| Structured Outputs | \u2705 | \u274c | \u2705 | \u2705 |\n\nNotes:\n- Structured Outputs: Supported by OpenAI, Google Gemini, and OpenRouter; not supported by Claude.\n- See [Examples](#examples) for tool-call round-trips and streaming demos.\n\n- **Async-first**: Built with asyncio for high-performance, non-blocking operations.\n- **Provider Support**: Supports OpenAI, Anthropic Claude, Google Gemini, and OpenRouter for chat completions.\n- **Tool Calling**: Tool execution with unified tool definitions across providers.\n- **Structured Outputs**: Enforce JSON schema validation on responses (OpenAI, Google, OpenRouter).\n- **Extensible**: Easy to add new providers by inheriting from `BaseProvider`.\n- **Tested**: Comprehensive test suite with high coverage.\n\n#### Performance\n- Built on top of [aiosonic](https://github.com/sonic182/aiosonic) for fast, low-overhead async HTTP requests and streaming.\n- True asyncio end-to-end: concurrent requests across providers with minimal overhead.\n- Designed for fast tool-call round-trips and low-latency streaming.\n\n## Installation\n\n### Using Poetry (Recommended)\n\n```bash\npoetry add llm-async\n```\n\n### Using pip\n\n```bash\npip install llm-async\n```\n\n## Quickstart\n\nMinimal async example with streaming using OpenAI-compatible interface:\n\n```python\nimport asyncio\nfrom llm_async import OpenAIProvider\n\nasync def main():\n provider = OpenAIProvider(api_key=\"YOUR_OPENAI_API_KEY\")\n # Stream tokens as they arrive\n async for chunk in await provider.acomplete(\n model=\"gpt-4o-mini\",\n messages=[{\"role\": \"user\", \"content\": \"Give me 3 ideas for a CLI tool.\"}],\n stream=True,\n ):\n print(chunk.delta, end=\"\", flush=True)\n\nasyncio.run(main())\n```\n\n## Usage\n\n### Basic Chat Completion\n\n#### OpenAI\n```python\nimport asyncio\nfrom llm_async import OpenAIProvider\n\nasync def main():\n # Initialize the provider with your API key\n provider = OpenAIProvider(api_key=\"your-openai-api-key\")\n\n # Perform a chat completion\n response = await provider.acomplete(\n model=\"gpt-4o-mini\",\n messages=[\n {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n ]\n )\n\n print(response.main_response.content) # Output: The assistant's response\n\n# Run the async function\nasyncio.run(main())\n```\n\n#### OpenRouter\n```python\nimport asyncio\nimport os\nfrom llm_async import OpenRouterProvider\n\nasync def main():\n # Initialize the provider with your API key\n provider = OpenRouterProvider(api_key=os.getenv(\"OPENROUTER_API_KEY\"))\n\n # Perform a chat completion\n response = await provider.acomplete(\n model=\"openrouter/auto\", # Let OpenRouter choose the best model\n messages=[\n {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n ],\n http_referer=\"https://github.com/your-username/your-app\", # Optional\n x_title=\"My AI App\" # Optional\n )\n\n print(response.main_response.content) # Output: The assistant's response\n\n# Run the async function\nasyncio.run(main())\n```\n\n#### Google Gemini\n```python\nimport asyncio\nfrom llm_async.providers.google import GoogleProvider\n\nasync def main():\n # Initialize the provider with your API key\n provider = GoogleProvider(api_key=\"your-google-gemini-api-key\")\n\n # Perform a chat completion\n response = await provider.acomplete(\n model=\"gemini-2.5-flash\",\n messages=[\n {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n ]\n )\n\n print(response.main_response.content) # Output: The assistant's response\n\n# Run the async function\nasyncio.run(main())\n```\n\n### Custom Base URL\n\n```python\nprovider = OpenAIProvider(\n api_key=\"your-api-key\",\n base_url=\"https://custom-openai-endpoint.com/v1\"\n)\n```\n\n### Direct API Requests\n\nMake direct requests to any provider API endpoint using the `request()` method:\n\n```python\nimport asyncio\nimport os\nfrom llm_async import OpenAIProvider\n\nasync def main():\n provider = OpenAIProvider(api_key=os.getenv(\"OPENAI_API_KEY\"))\n \n # GET request to list available models\n response = await provider.request(\"GET\", \"/models\")\n \n print(f\"Available models: {len(response.get('data', []))}\")\n for model in response.get(\"data\", [])[:5]:\n print(f\" - {model.get('id')}\")\n \n # POST request with custom data\n response = await provider.request(\"POST\", \"/endpoint\", json_data={\"key\": \"value\"})\n \n # Add custom headers\n response = await provider.request(\"GET\", \"/endpoint\", custom_header=\"value\")\n\nasyncio.run(main())\n```\n\nThe `request()` method supports all HTTP verbs: GET, POST, PUT, DELETE, PATCH. It works across all providers (OpenAI, Claude, Google, OpenRouter, OpenAIResponses).\n\nSee `examples/provider_request.py` for a complete example.\n\n### Tool Usage\n\n```python\nimport asyncio\nimport os\nfrom llm_async.models import Tool\nfrom llm_async.providers import OpenAIProvider\n\n# Define a calculator tool\ncalculator_tool = Tool(\n name=\"calculator\",\n description=\"Perform basic arithmetic operations\",\n parameters={\n \"type\": \"object\",\n \"properties\": {\n \"operation\": {\n \"type\": \"string\",\n \"enum\": [\"add\", \"subtract\", \"multiply\", \"divide\"]\n },\n \"a\": {\"type\": \"number\"},\n \"b\": {\"type\": \"number\"}\n },\n \"required\": [\"operation\", \"a\", \"b\"]\n },\n input_schema={\n \"type\": \"object\",\n \"properties\": {\n \"operation\": {\n \"type\": \"string\",\n \"enum\": [\"add\", \"subtract\", \"multiply\", \"divide\"]\n },\n \"a\": {\"type\": \"number\"},\n \"b\": {\"type\": \"number\"}\n },\n \"required\": [\"operation\", \"a\", \"b\"]\n }\n)\n\ndef calculator(operation: str, a: float, b: float) -> float:\n \"\"\"Calculator function that can be called by the LLM.\"\"\"\n if operation == \"add\":\n return a + b\n elif operation == \"subtract\":\n return a - b\n elif operation == \"multiply\":\n return a * b\n elif operation == \"divide\":\n return a / b\n return 0\n\nasync def main():\n # Initialize provider\n provider = OpenAIProvider(api_key=os.getenv(\"OPENAI_API_KEY\"))\n \n # Tool executor mapping\n tools_map = {\"calculator\": calculator}\n \n # Initial user message\n messages = [{\"role\": \"user\", \"content\": \"What is 15 + 27?\"}]\n \n # First turn: Ask the LLM to perform a calculation\n response = await provider.acomplete(\n model=\"gpt-4o-mini\",\n messages=messages,\n tools=[calculator_tool]\n )\n \n # Execute the tool call\n tool_call = response.main_response.tool_calls[0]\n tool_result = await provider.execute_tool(tool_call, tools_map)\n \n # Second turn: Send the tool result back to the LLM\n messages_with_tool = messages + [response.main_response.original] + [tool_result]\n \n final_response = await provider.acomplete(\n model=\"gpt-4o-mini\",\n messages=messages_with_tool\n )\n \n print(final_response.main_response.content) # Output: The final answer\n\nasyncio.run(main())\n```\n\n## Recipes\n- Streaming across providers: see `examples/stream_all_providers.py`\n- Tool-call round-trip (calculator): see `examples/tool_call_all_providers.py`\n- Structured outputs (JSON schema): see section below and examples\n\n### Examples\n\nThe `examples` directory contains runnable scripts for local testing against all supported providers:\n\n- `examples/tool_call_all_providers.py` shows how to execute the same calculator tool call round-trip with OpenAI, OpenRouter, Claude, and Google using shared message/tool definitions.\n- `examples/stream_all_providers.py` streams completions from the same provider list so you can compare chunking formats and latency.\n\nBoth scripts expect a `.env` file with `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, `CLAUDE_API_KEY`, and `GEMINI_API_KEY` (plus optional per-provider model overrides). Run them via Poetry, e.g. `poetry run python examples/tool_call_all_providers.py`.\n\n### Structured Outputs\n\nEnforce JSON schema validation on model responses for consistent, type-safe outputs.\n\n```python\nimport asyncio\nimport json\nfrom llm_async import OpenAIProvider\nfrom llm_async.providers.google import GoogleProvider\n\n# Define response schema\nresponse_schema = {\n \"type\": \"object\",\n \"properties\": {\n \"answer\": {\"type\": \"string\"},\n \"confidence\": {\"type\": \"number\"}\n },\n \"required\": [\"answer\", \"confidence\"],\n \"additionalProperties\": False\n}\n\nasync def main():\n # OpenAI example\n openai_provider = OpenAIProvider(api_key=\"your-openai-key\")\n response = await openai_provider.acomplete(\n model=\"gpt-4o-mini\",\n messages=[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}],\n response_schema=response_schema\n )\n result = json.loads(response.main_response.content)\n print(f\"OpenAI: {result}\")\n\n # Google Gemini example\n google_provider = GoogleProvider(api_key=\"your-google-key\")\n response = await google_provider.acomplete(\n model=\"gemini-2.5-flash\",\n messages=[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}],\n response_schema=response_schema\n )\n result = json.loads(response.main_response.content)\n print(f\"Gemini: {result}\")\n\nasyncio.run(main())\n```\n\n**Supported Providers**: OpenAI, Google Gemini, OpenRouter. Claude does not support structured outputs.\n\n### OpenAI Responses API with Prompt Caching\n\nThe OpenAI Responses API provides stateless conversation state management using `previous_response_id` and prompt caching with `prompt_cache_key`. This enables efficient multi-turn conversations without maintaining conversation history on the client side.\n\n```python\nimport asyncio\nimport uuid\nfrom llm_async.models import Tool\nfrom llm_async.models.message import Message\nfrom llm_async.providers import OpenAIResponsesProvider\n\n# Define a calculator tool\ncalculator_tool = Tool(\n name=\"calculator\",\n description=\"Perform basic arithmetic operations\",\n parameters={\n \"type\": \"object\",\n \"properties\": {\n \"operation\": {\"type\": \"string\", \"enum\": [\"add\", \"subtract\", \"multiply\", \"divide\"]},\n \"a\": {\"type\": \"number\"},\n \"b\": {\"type\": \"number\"}\n },\n \"required\": [\"operation\", \"a\", \"b\"]\n }\n)\n\ndef calculator(operation: str, a: float, b: float) -> float:\n if operation == \"add\":\n return a + b\n elif operation == \"subtract\":\n return a - b\n elif operation == \"multiply\":\n return a * b\n elif operation == \"divide\":\n return a / b\n return 0\n\nasync def main():\n provider = OpenAIResponsesProvider(api_key=\"your-openai-api-key\")\n \n # Generate a session ID for prompt caching\n session_id = uuid.uuid4().hex\n \n # First turn: Ask the model to use a tool\n response = await provider.acomplete(\n model=\"gpt-4.1\",\n messages=[Message(\"user\", \"What is 15 + 27? Use the calculator tool.\")],\n tools=[calculator_tool],\n tool_choice=\"required\",\n prompt_cache_key=session_id, # Enable prompt caching for this session\n )\n \n # Execute the tool locally\n tool_call = response.main_response.tool_calls[0]\n tool_result = await provider.execute_tool(tool_call, {\"calculator\": calculator})\n \n # Second turn: Continue conversation using previous_response_id\n # No need to send the entire conversation history - just the response ID and new tool output\n final_response = await provider.acomplete(\n model=\"gpt-4.1\",\n messages=[tool_result], # Send tool result as message\n tools=[calculator_tool],\n previous_response_id=response.original[\"id\"], # Reference the previous response\n prompt_cache_key=session_id, # Reuse the cached prompt\n )\n \n print(final_response.main_response.content) # Output: The final answer with calculation result\n\nasyncio.run(main())\n```\n\n**Key Benefits**:\n- **No history overhead**: Use `previous_response_id` to continue conversations without resending message history\n- **Prompt caching**: Pass `prompt_cache_key` to reuse cached prompts across requests in the same session\n- **Reduced costs**: Cached prefixes consume 90% fewer tokens\n- **Lower latency**: Cached prefixes are processed faster\n- **Session management**: Clients control session IDs (e.g., `uuid.uuid4().hex`) for cache routing\n\n**How it works**:\n1. First request establishes a response context and caches the prompt prefix (for prompts \u22651024 tokens)\n2. Subsequent requests reference the first response via `previous_response_id` \n3. Using the same `prompt_cache_key` routes requests to the same machine for consistent cache hits\n4. Only send new content (tool outputs, user messages) instead of full conversation history\n5. Cached prefixes remain active for 5-10 minutes of inactivity (up to 1 hour off-peak)\n\n**See also**: `examples/openai_responses_tool_call_with_previous_id.py` for a complete working example.\n\n## Why llm_async?\n- Async-first performance (aiosonic-based) vs. sync or heavier HTTP stacks.\n- Unified provider interface: same message/tool/streaming patterns across OpenAI, Claude, Gemini, OpenRouter.\n- Structured outputs (OpenAI, Google, OpenRouter) with JSON schema validation.\n- Tool-call round-trip helpers for consistent multi-turn execution.\n- Minimal surface area: easy to extend with new providers via BaseProvider.\n\n## API Reference\n\n### OpenAIProvider\n\n- `__init__(api_key: str, base_url: str = \"https://api.openai.com/v1\")`\n- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`\n\n Performs a chat completion. When `stream=True` the method returns an async iterator that yields StreamChunk objects as they arrive from the provider.\n\n### OpenRouterProvider\n\n- `__init__(api_key: str, base_url: str = \"https://openrouter.ai/api/v1\")`\n- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`\n\n Performs a chat completion using OpenRouter's unified API. Supports the same OpenAI-compatible interface with additional optional headers:\n - `http_referer`: Your application's URL (recommended)\n - `x_title`: Your application's name (recommended)\n\n OpenRouter provides access to hundreds of AI models from various providers through a single API.\n\n### GoogleProvider\n\n- `__init__(api_key: str, base_url: str = \"https://generativelanguage.googleapis.com/v1beta/models/\")`\n- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`\n\n Performs a chat completion using Google's Gemini API. Supports structured outputs and uses camelCase for API keys (e.g., `generationConfig`).\n\n**Streaming**\n- **Usage**: `async for chunk in await provider.acomplete(..., stream=True):` print or process `chunk` in real time.\n\n**Example output**\n\n```\n--- OpenAI streaming response ---\n1. Peel and slice potatoes.\n2. Par-cook potatoes briefly.\n3. Whisk eggs with salt and pepper.\n4. Saut\u00e9 onions until translucent (optional).\n5. Combine potatoes and eggs in a pan and cook until set.\n6. Fold and serve.\n--- Claude streaming response ---\n1. Prepare potatoes by peeling and slicing.\n2. Fry or boil until tender.\n3. Beat eggs and season.\n4. Mix potatoes with eggs and cook gently.\n5. Serve warm.\n```\n\n## Development\n\n### Setup\n\n```bash\ngit clone https://github.com/sonic182/llm-async.git\ncd llm_async\npoetry install\n```\n\n### Running Tests\n\n```bash\npoetry run pytest\n```\n\n### Building\n\n```bash\npoetry build\n```\n\n## Roadmap\n\n- Support for additional providers (e.g., Grok, Anthropic direct API)\n- More advanced tool features\n- Response caching and retry mechanisms\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request on [GitHub](https://github.com/sonic182/llm-async).\n\n## License\n\nMIT License - see the [LICENSE](LICENSE) file for details.\n\n## Authors\n\n- sonic182\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Multi-LLM Provider Library",
"version": "0.3.1",
"project_urls": {
"Bug Tracker": "https://github.com/sonic182/llm-async/issues",
"Changelog": "https://github.com/sonic182/llm-async/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/sonic182/llm-async#readme",
"Homepage": "https://github.com/sonic182/llm-async",
"Repository": "https://github.com/sonic182/llm-async",
"Source": "https://github.com/sonic182/llm-async"
},
"split_keywords": [
"llm",
" asyncio",
" ai",
" openai",
" anthropic",
" claude",
" gemini",
" google",
" openrouter",
" streaming",
" tool calling",
" json schema",
" structured outputs",
" python",
" aiosonic"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "26099aa2f68fd2b4f4cc78e733592975cac19fc315c2c9c9dc1670bc7faaa015",
"md5": "24b546b2cde6afd52f4761d55e5c9083",
"sha256": "24733343cb82754e93d1a10c5ddaa11be26d0f4e226f2d5de4fe64b27cf48be9"
},
"downloads": -1,
"filename": "llm_async-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "24b546b2cde6afd52f4761d55e5c9083",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 27520,
"upload_time": "2025-11-08T15:13:44",
"upload_time_iso_8601": "2025-11-08T15:13:44.452764Z",
"url": "https://files.pythonhosted.org/packages/26/09/9aa2f68fd2b4f4cc78e733592975cac19fc315c2c9c9dc1670bc7faaa015/llm_async-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "459f1adf324757bd7bf8acf40d441a5c2982bba776e1f7a605c6d88bd2616bdc",
"md5": "975897c277dadda6953e84872bc6bded",
"sha256": "94827844c870f24ceed48075fcacfddaf1d4a7b6c7b2415642b7e40f2c49e47d"
},
"downloads": -1,
"filename": "llm_async-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "975897c277dadda6953e84872bc6bded",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 24379,
"upload_time": "2025-11-08T15:13:45",
"upload_time_iso_8601": "2025-11-08T15:13:45.781833Z",
"url": "https://files.pythonhosted.org/packages/45/9f/1adf324757bd7bf8acf40d441a5c2982bba776e1f7a605c6d88bd2616bdc/llm_async-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-08 15:13:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sonic182",
"github_project": "llm-async",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llm-async"
}