llm-async

Name	llm-async JSON
Version	0.3.1 JSON
	download
home_page	https://github.com/sonic182/llm-async
Summary	Multi-LLM Provider Library
upload_time	2025-11-08 15:13:45
maintainer	None
docs_url	None
author	Johanderson Mogollon
requires_python	<4.0,>=3.10
license	MIT
keywords	llm asyncio ai openai anthropic claude gemini google openrouter streaming tool calling json schema structured outputs python aiosonic
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # llm_async — Async multi‑provider LLM client for Python

High-performance, async-first LLM client for OpenAI, Claude, Google Gemini, and OpenRouter. Built on top of aiosonic for fast, low-latency HTTP and true asyncio streaming across providers.

[![PyPI - Version](https://img.shields.io/pypi/v/llm_async.svg)](https://pypi.org/project/llm_async/)
[![Python Versions](https://img.shields.io/pypi/pyversions/llm_async.svg)](https://pypi.org/project/llm_async/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-passing-brightgreen.svg)](#)
[![Coverage](https://img.shields.io/badge/coverage-%E2%80%94-blue.svg)](#)
[![Code Style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

## Table of Contents

- [Features](#features)
  - [Supported Providers & Features](#supported-providers--features)
- [Installation](#installation)
  - [Using Poetry (Recommended)](#using-poetry-recommended)
  - [Using pip](#using-pip)
- [Quickstart](#quickstart)
- [Usage](#usage)
  - [Basic Chat Completion](#basic-chat-completion)
    - [OpenAI](#openai)
    - [OpenRouter](#openrouter)
    - [Google Gemini](#google-gemini)
  - [Custom Base URL](#custom-base-url)
  - [Direct API Requests](#direct-api-requests)
  - [Tool Usage](#tool-usage)
  - [Structured Outputs](#structured-outputs)
  - [OpenAI Responses API with Prompt Caching](#openai-responses-api-with-prompt-caching)
- [API Reference](#api-reference)
  - [OpenAIProvider](#openaiprovider)
  - [OpenRouterProvider](#openrouterprovider)
  - [GoogleProvider](#googleprovider)
- [Development](#development)
  - [Setup](#setup)
  - [Running Tests](#running-tests)
  - [Building](#building)
- [Roadmap](#roadmap)
- [Contributing](#contributing)
- [License](#license)
- [Authors](#authors)

## Features

### Supported Providers & Features

| Feature | OpenAI | Claude | Google Gemini | OpenRouter |
|---------|--------|--------|---------------|-----------|
| Chat Completions | ✅ | ✅ | ✅ | ✅ |
| Tool Calling | ✅ | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ | ✅ |
| Structured Outputs | ✅ | ❌ | ✅ | ✅ |

Notes:
- Structured Outputs: Supported by OpenAI, Google Gemini, and OpenRouter; not supported by Claude.
- See [Examples](#examples) for tool-call round-trips and streaming demos.

- **Async-first**: Built with asyncio for high-performance, non-blocking operations.
- **Provider Support**: Supports OpenAI, Anthropic Claude, Google Gemini, and OpenRouter for chat completions.
- **Tool Calling**: Tool execution with unified tool definitions across providers.
- **Structured Outputs**: Enforce JSON schema validation on responses (OpenAI, Google, OpenRouter).
- **Extensible**: Easy to add new providers by inheriting from `BaseProvider`.
- **Tested**: Comprehensive test suite with high coverage.

#### Performance
- Built on top of [aiosonic](https://github.com/sonic182/aiosonic) for fast, low-overhead async HTTP requests and streaming.
- True asyncio end-to-end: concurrent requests across providers with minimal overhead.
- Designed for fast tool-call round-trips and low-latency streaming.

## Installation

### Using Poetry (Recommended)

```bash
poetry add llm-async
```

### Using pip

```bash
pip install llm-async
```

## Quickstart

Minimal async example with streaming using OpenAI-compatible interface:

```python
import asyncio
from llm_async import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="YOUR_OPENAI_API_KEY")
    # Stream tokens as they arrive
    async for chunk in await provider.acomplete(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Give me 3 ideas for a CLI tool."}],
        stream=True,
    ):
        print(chunk.delta, end="", flush=True)

asyncio.run(main())
```

## Usage

### Basic Chat Completion

#### OpenAI
```python
import asyncio
from llm_async import OpenAIProvider

async def main():
    # Initialize the provider with your API key
    provider = OpenAIProvider(api_key="your-openai-api-key")

    # Perform a chat completion
    response = await provider.acomplete(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, how are you?"}
        ]
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())
```

#### OpenRouter
```python
import asyncio
import os
from llm_async import OpenRouterProvider

async def main():
    # Initialize the provider with your API key
    provider = OpenRouterProvider(api_key=os.getenv("OPENROUTER_API_KEY"))

    # Perform a chat completion
    response = await provider.acomplete(
        model="openrouter/auto",  # Let OpenRouter choose the best model
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, how are you?"}
        ],
        http_referer="https://github.com/your-username/your-app",  # Optional
        x_title="My AI App"  # Optional
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())
```

#### Google Gemini
```python
import asyncio
from llm_async.providers.google import GoogleProvider

async def main():
    # Initialize the provider with your API key
    provider = GoogleProvider(api_key="your-google-gemini-api-key")

    # Perform a chat completion
    response = await provider.acomplete(
        model="gemini-2.5-flash",
        messages=[
            {"role": "user", "content": "Hello, how are you?"}
        ]
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())
```

### Custom Base URL

```python
provider = OpenAIProvider(
    api_key="your-api-key",
    base_url="https://custom-openai-endpoint.com/v1"
)
```

### Direct API Requests

Make direct requests to any provider API endpoint using the `request()` method:

```python
import asyncio
import os
from llm_async import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"))
    
    # GET request to list available models
    response = await provider.request("GET", "/models")
    
    print(f"Available models: {len(response.get('data', []))}")
    for model in response.get("data", [])[:5]:
        print(f"  - {model.get('id')}")
    
    # POST request with custom data
    response = await provider.request("POST", "/endpoint", json_data={"key": "value"})
    
    # Add custom headers
    response = await provider.request("GET", "/endpoint", custom_header="value")

asyncio.run(main())
```

The `request()` method supports all HTTP verbs: GET, POST, PUT, DELETE, PATCH. It works across all providers (OpenAI, Claude, Google, OpenRouter, OpenAIResponses).

See `examples/provider_request.py` for a complete example.

### Tool Usage

```python
import asyncio
import os
from llm_async.models import Tool
from llm_async.providers import OpenAIProvider

# Define a calculator tool
calculator_tool = Tool(
    name="calculator",
    description="Perform basic arithmetic operations",
    parameters={
        "type": "object",
        "properties": {
            "operation": {
                "type": "string",
                "enum": ["add", "subtract", "multiply", "divide"]
            },
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    },
    input_schema={
        "type": "object",
        "properties": {
            "operation": {
                "type": "string",
                "enum": ["add", "subtract", "multiply", "divide"]
            },
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    }
)

def calculator(operation: str, a: float, b: float) -> float:
    """Calculator function that can be called by the LLM."""
    if operation == "add":
        return a + b
    elif operation == "subtract":
        return a - b
    elif operation == "multiply":
        return a * b
    elif operation == "divide":
        return a / b
    return 0

async def main():
    # Initialize provider
    provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"))
    
    # Tool executor mapping
    tools_map = {"calculator": calculator}
    
    # Initial user message
    messages = [{"role": "user", "content": "What is 15 + 27?"}]
    
    # First turn: Ask the LLM to perform a calculation
    response = await provider.acomplete(
        model="gpt-4o-mini",
        messages=messages,
        tools=[calculator_tool]
    )
    
    # Execute the tool call
    tool_call = response.main_response.tool_calls[0]
    tool_result = await provider.execute_tool(tool_call, tools_map)
    
    # Second turn: Send the tool result back to the LLM
    messages_with_tool = messages + [response.main_response.original] + [tool_result]
    
    final_response = await provider.acomplete(
        model="gpt-4o-mini",
        messages=messages_with_tool
    )
    
    print(final_response.main_response.content)  # Output: The final answer

asyncio.run(main())
```

## Recipes
- Streaming across providers: see `examples/stream_all_providers.py`
- Tool-call round-trip (calculator): see `examples/tool_call_all_providers.py`
- Structured outputs (JSON schema): see section below and examples

### Examples

The `examples` directory contains runnable scripts for local testing against all supported providers:

- `examples/tool_call_all_providers.py` shows how to execute the same calculator tool call round-trip with OpenAI, OpenRouter, Claude, and Google using shared message/tool definitions.
- `examples/stream_all_providers.py` streams completions from the same provider list so you can compare chunking formats and latency.

Both scripts expect a `.env` file with `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, `CLAUDE_API_KEY`, and `GEMINI_API_KEY` (plus optional per-provider model overrides). Run them via Poetry, e.g. `poetry run python examples/tool_call_all_providers.py`.

### Structured Outputs

Enforce JSON schema validation on model responses for consistent, type-safe outputs.

```python
import asyncio
import json
from llm_async import OpenAIProvider
from llm_async.providers.google import GoogleProvider

# Define response schema
response_schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "confidence": {"type": "number"}
    },
    "required": ["answer", "confidence"],
    "additionalProperties": False
}

async def main():
    # OpenAI example
    openai_provider = OpenAIProvider(api_key="your-openai-key")
    response = await openai_provider.acomplete(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
        response_schema=response_schema
    )
    result = json.loads(response.main_response.content)
    print(f"OpenAI: {result}")

    # Google Gemini example
    google_provider = GoogleProvider(api_key="your-google-key")
    response = await google_provider.acomplete(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
        response_schema=response_schema
    )
    result = json.loads(response.main_response.content)
    print(f"Gemini: {result}")

asyncio.run(main())
```

**Supported Providers**: OpenAI, Google Gemini, OpenRouter. Claude does not support structured outputs.

### OpenAI Responses API with Prompt Caching

The OpenAI Responses API provides stateless conversation state management using `previous_response_id` and prompt caching with `prompt_cache_key`. This enables efficient multi-turn conversations without maintaining conversation history on the client side.

```python
import asyncio
import uuid
from llm_async.models import Tool
from llm_async.models.message import Message
from llm_async.providers import OpenAIResponsesProvider

# Define a calculator tool
calculator_tool = Tool(
    name="calculator",
    description="Perform basic arithmetic operations",
    parameters={
        "type": "object",
        "properties": {
            "operation": {"type": "string", "enum": ["add", "subtract", "multiply", "divide"]},
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    }
)

def calculator(operation: str, a: float, b: float) -> float:
    if operation == "add":
        return a + b
    elif operation == "subtract":
        return a - b
    elif operation == "multiply":
        return a * b
    elif operation == "divide":
        return a / b
    return 0

async def main():
    provider = OpenAIResponsesProvider(api_key="your-openai-api-key")
    
    # Generate a session ID for prompt caching
    session_id = uuid.uuid4().hex
    
    # First turn: Ask the model to use a tool
    response = await provider.acomplete(
        model="gpt-4.1",
        messages=[Message("user", "What is 15 + 27? Use the calculator tool.")],
        tools=[calculator_tool],
        tool_choice="required",
        prompt_cache_key=session_id,  # Enable prompt caching for this session
    )
    
    # Execute the tool locally
    tool_call = response.main_response.tool_calls[0]
    tool_result = await provider.execute_tool(tool_call, {"calculator": calculator})
    
    # Second turn: Continue conversation using previous_response_id
    # No need to send the entire conversation history - just the response ID and new tool output
    final_response = await provider.acomplete(
        model="gpt-4.1",
        messages=[tool_result],  # Send tool result as message
        tools=[calculator_tool],
        previous_response_id=response.original["id"],  # Reference the previous response
        prompt_cache_key=session_id,  # Reuse the cached prompt
    )
    
    print(final_response.main_response.content)  # Output: The final answer with calculation result

asyncio.run(main())
```

**Key Benefits**:
- **No history overhead**: Use `previous_response_id` to continue conversations without resending message history
- **Prompt caching**: Pass `prompt_cache_key` to reuse cached prompts across requests in the same session
- **Reduced costs**: Cached prefixes consume 90% fewer tokens
- **Lower latency**: Cached prefixes are processed faster
- **Session management**: Clients control session IDs (e.g., `uuid.uuid4().hex`) for cache routing

**How it works**:
1. First request establishes a response context and caches the prompt prefix (for prompts ≥1024 tokens)
2. Subsequent requests reference the first response via `previous_response_id` 
3. Using the same `prompt_cache_key` routes requests to the same machine for consistent cache hits
4. Only send new content (tool outputs, user messages) instead of full conversation history
5. Cached prefixes remain active for 5-10 minutes of inactivity (up to 1 hour off-peak)

**See also**: `examples/openai_responses_tool_call_with_previous_id.py` for a complete working example.

## Why llm_async?
- Async-first performance (aiosonic-based) vs. sync or heavier HTTP stacks.
- Unified provider interface: same message/tool/streaming patterns across OpenAI, Claude, Gemini, OpenRouter.
- Structured outputs (OpenAI, Google, OpenRouter) with JSON schema validation.
- Tool-call round-trip helpers for consistent multi-turn execution.
- Minimal surface area: easy to extend with new providers via BaseProvider.

## API Reference

### OpenAIProvider

- `__init__(api_key: str, base_url: str = "https://api.openai.com/v1")`
- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`

  Performs a chat completion. When `stream=True` the method returns an async iterator that yields StreamChunk objects as they arrive from the provider.

### OpenRouterProvider

- `__init__(api_key: str, base_url: str = "https://openrouter.ai/api/v1")`
- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`

  Performs a chat completion using OpenRouter's unified API. Supports the same OpenAI-compatible interface with additional optional headers:
  - `http_referer`: Your application's URL (recommended)
  - `x_title`: Your application's name (recommended)

  OpenRouter provides access to hundreds of AI models from various providers through a single API.

### GoogleProvider

- `__init__(api_key: str, base_url: str = "https://generativelanguage.googleapis.com/v1beta/models/")`
- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`

  Performs a chat completion using Google's Gemini API. Supports structured outputs and uses camelCase for API keys (e.g., `generationConfig`).

**Streaming**
- **Usage**: `async for chunk in await provider.acomplete(..., stream=True):` print or process `chunk` in real time.

**Example output**

```
--- OpenAI streaming response ---
1. Peel and slice potatoes.
2. Par-cook potatoes briefly.
3. Whisk eggs with salt and pepper.
4. Sauté onions until translucent (optional).
5. Combine potatoes and eggs in a pan and cook until set.
6. Fold and serve.
--- Claude streaming response ---
1. Prepare potatoes by peeling and slicing.
2. Fry or boil until tender.
3. Beat eggs and season.
4. Mix potatoes with eggs and cook gently.
5. Serve warm.
```

## Development

### Setup

```bash
git clone https://github.com/sonic182/llm-async.git
cd llm_async
poetry install
```

### Running Tests

```bash
poetry run pytest
```

### Building

```bash
poetry build
```

## Roadmap

- Support for additional providers (e.g., Grok, Anthropic direct API)
- More advanced tool features
- Response caching and retry mechanisms

## Contributing

Contributions are welcome! Please open an issue or submit a pull request on [GitHub](https://github.com/sonic182/llm-async).

## License

MIT License - see the [LICENSE](LICENSE) file for details.

## Authors

- sonic182

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sonic182/llm-async",
    "name": "llm-async",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "llm, asyncio, ai, openai, anthropic, claude, gemini, google, openrouter, streaming, tool calling, json schema, structured outputs, python, aiosonic",
    "author": "Johanderson Mogollon",
    "author_email": "johander1822@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/45/9f/1adf324757bd7bf8acf40d441a5c2982bba776e1f7a605c6d88bd2616bdc/llm_async-0.3.1.tar.gz",
    "platform": null,
    "description": "# llm_async \u2014 Async multi\u2011provider LLM client for Python\n\nHigh-performance, async-first LLM client for OpenAI, Claude, Google Gemini, and OpenRouter. Built on top of aiosonic for fast, low-latency HTTP and true asyncio streaming across providers.\n\n[![PyPI - Version](https://img.shields.io/pypi/v/llm_async.svg)](https://pypi.org/project/llm_async/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/llm_async.svg)](https://pypi.org/project/llm_async/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![Tests](https://img.shields.io/badge/tests-passing-brightgreen.svg)](#)\n[![Coverage](https://img.shields.io/badge/coverage-%E2%80%94-blue.svg)](#)\n[![Code Style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n\n## Table of Contents\n\n- [Features](#features)\n  - [Supported Providers & Features](#supported-providers--features)\n- [Installation](#installation)\n  - [Using Poetry (Recommended)](#using-poetry-recommended)\n  - [Using pip](#using-pip)\n- [Quickstart](#quickstart)\n- [Usage](#usage)\n  - [Basic Chat Completion](#basic-chat-completion)\n    - [OpenAI](#openai)\n    - [OpenRouter](#openrouter)\n    - [Google Gemini](#google-gemini)\n  - [Custom Base URL](#custom-base-url)\n  - [Direct API Requests](#direct-api-requests)\n  - [Tool Usage](#tool-usage)\n  - [Structured Outputs](#structured-outputs)\n  - [OpenAI Responses API with Prompt Caching](#openai-responses-api-with-prompt-caching)\n- [API Reference](#api-reference)\n  - [OpenAIProvider](#openaiprovider)\n  - [OpenRouterProvider](#openrouterprovider)\n  - [GoogleProvider](#googleprovider)\n- [Development](#development)\n  - [Setup](#setup)\n  - [Running Tests](#running-tests)\n  - [Building](#building)\n- [Roadmap](#roadmap)\n- [Contributing](#contributing)\n- [License](#license)\n- [Authors](#authors)\n\n## Features\n\n### Supported Providers & Features\n\n| Feature | OpenAI | Claude | Google Gemini | OpenRouter |\n|---------|--------|--------|---------------|-----------|\n| Chat Completions | \u2705 | \u2705 | \u2705 | \u2705 |\n| Tool Calling | \u2705 | \u2705 | \u2705 | \u2705 |\n| Streaming | \u2705 | \u2705 | \u2705 | \u2705 |\n| Structured Outputs | \u2705 | \u274c | \u2705 | \u2705 |\n\nNotes:\n- Structured Outputs: Supported by OpenAI, Google Gemini, and OpenRouter; not supported by Claude.\n- See [Examples](#examples) for tool-call round-trips and streaming demos.\n\n- **Async-first**: Built with asyncio for high-performance, non-blocking operations.\n- **Provider Support**: Supports OpenAI, Anthropic Claude, Google Gemini, and OpenRouter for chat completions.\n- **Tool Calling**: Tool execution with unified tool definitions across providers.\n- **Structured Outputs**: Enforce JSON schema validation on responses (OpenAI, Google, OpenRouter).\n- **Extensible**: Easy to add new providers by inheriting from `BaseProvider`.\n- **Tested**: Comprehensive test suite with high coverage.\n\n#### Performance\n- Built on top of [aiosonic](https://github.com/sonic182/aiosonic) for fast, low-overhead async HTTP requests and streaming.\n- True asyncio end-to-end: concurrent requests across providers with minimal overhead.\n- Designed for fast tool-call round-trips and low-latency streaming.\n\n## Installation\n\n### Using Poetry (Recommended)\n\n```bash\npoetry add llm-async\n```\n\n### Using pip\n\n```bash\npip install llm-async\n```\n\n## Quickstart\n\nMinimal async example with streaming using OpenAI-compatible interface:\n\n```python\nimport asyncio\nfrom llm_async import OpenAIProvider\n\nasync def main():\n    provider = OpenAIProvider(api_key=\"YOUR_OPENAI_API_KEY\")\n    # Stream tokens as they arrive\n    async for chunk in await provider.acomplete(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": \"Give me 3 ideas for a CLI tool.\"}],\n        stream=True,\n    ):\n        print(chunk.delta, end=\"\", flush=True)\n\nasyncio.run(main())\n```\n\n## Usage\n\n### Basic Chat Completion\n\n#### OpenAI\n```python\nimport asyncio\nfrom llm_async import OpenAIProvider\n\nasync def main():\n    # Initialize the provider with your API key\n    provider = OpenAIProvider(api_key=\"your-openai-api-key\")\n\n    # Perform a chat completion\n    response = await provider.acomplete(\n        model=\"gpt-4o-mini\",\n        messages=[\n            {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n            {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n        ]\n    )\n\n    print(response.main_response.content)  # Output: The assistant's response\n\n# Run the async function\nasyncio.run(main())\n```\n\n#### OpenRouter\n```python\nimport asyncio\nimport os\nfrom llm_async import OpenRouterProvider\n\nasync def main():\n    # Initialize the provider with your API key\n    provider = OpenRouterProvider(api_key=os.getenv(\"OPENROUTER_API_KEY\"))\n\n    # Perform a chat completion\n    response = await provider.acomplete(\n        model=\"openrouter/auto\",  # Let OpenRouter choose the best model\n        messages=[\n            {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n            {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n        ],\n        http_referer=\"https://github.com/your-username/your-app\",  # Optional\n        x_title=\"My AI App\"  # Optional\n    )\n\n    print(response.main_response.content)  # Output: The assistant's response\n\n# Run the async function\nasyncio.run(main())\n```\n\n#### Google Gemini\n```python\nimport asyncio\nfrom llm_async.providers.google import GoogleProvider\n\nasync def main():\n    # Initialize the provider with your API key\n    provider = GoogleProvider(api_key=\"your-google-gemini-api-key\")\n\n    # Perform a chat completion\n    response = await provider.acomplete(\n        model=\"gemini-2.5-flash\",\n        messages=[\n            {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n        ]\n    )\n\n    print(response.main_response.content)  # Output: The assistant's response\n\n# Run the async function\nasyncio.run(main())\n```\n\n### Custom Base URL\n\n```python\nprovider = OpenAIProvider(\n    api_key=\"your-api-key\",\n    base_url=\"https://custom-openai-endpoint.com/v1\"\n)\n```\n\n### Direct API Requests\n\nMake direct requests to any provider API endpoint using the `request()` method:\n\n```python\nimport asyncio\nimport os\nfrom llm_async import OpenAIProvider\n\nasync def main():\n    provider = OpenAIProvider(api_key=os.getenv(\"OPENAI_API_KEY\"))\n    \n    # GET request to list available models\n    response = await provider.request(\"GET\", \"/models\")\n    \n    print(f\"Available models: {len(response.get('data', []))}\")\n    for model in response.get(\"data\", [])[:5]:\n        print(f\"  - {model.get('id')}\")\n    \n    # POST request with custom data\n    response = await provider.request(\"POST\", \"/endpoint\", json_data={\"key\": \"value\"})\n    \n    # Add custom headers\n    response = await provider.request(\"GET\", \"/endpoint\", custom_header=\"value\")\n\nasyncio.run(main())\n```\n\nThe `request()` method supports all HTTP verbs: GET, POST, PUT, DELETE, PATCH. It works across all providers (OpenAI, Claude, Google, OpenRouter, OpenAIResponses).\n\nSee `examples/provider_request.py` for a complete example.\n\n### Tool Usage\n\n```python\nimport asyncio\nimport os\nfrom llm_async.models import Tool\nfrom llm_async.providers import OpenAIProvider\n\n# Define a calculator tool\ncalculator_tool = Tool(\n    name=\"calculator\",\n    description=\"Perform basic arithmetic operations\",\n    parameters={\n        \"type\": \"object\",\n        \"properties\": {\n            \"operation\": {\n                \"type\": \"string\",\n                \"enum\": [\"add\", \"subtract\", \"multiply\", \"divide\"]\n            },\n            \"a\": {\"type\": \"number\"},\n            \"b\": {\"type\": \"number\"}\n        },\n        \"required\": [\"operation\", \"a\", \"b\"]\n    },\n    input_schema={\n        \"type\": \"object\",\n        \"properties\": {\n            \"operation\": {\n                \"type\": \"string\",\n                \"enum\": [\"add\", \"subtract\", \"multiply\", \"divide\"]\n            },\n            \"a\": {\"type\": \"number\"},\n            \"b\": {\"type\": \"number\"}\n        },\n        \"required\": [\"operation\", \"a\", \"b\"]\n    }\n)\n\ndef calculator(operation: str, a: float, b: float) -> float:\n    \"\"\"Calculator function that can be called by the LLM.\"\"\"\n    if operation == \"add\":\n        return a + b\n    elif operation == \"subtract\":\n        return a - b\n    elif operation == \"multiply\":\n        return a * b\n    elif operation == \"divide\":\n        return a / b\n    return 0\n\nasync def main():\n    # Initialize provider\n    provider = OpenAIProvider(api_key=os.getenv(\"OPENAI_API_KEY\"))\n    \n    # Tool executor mapping\n    tools_map = {\"calculator\": calculator}\n    \n    # Initial user message\n    messages = [{\"role\": \"user\", \"content\": \"What is 15 + 27?\"}]\n    \n    # First turn: Ask the LLM to perform a calculation\n    response = await provider.acomplete(\n        model=\"gpt-4o-mini\",\n        messages=messages,\n        tools=[calculator_tool]\n    )\n    \n    # Execute the tool call\n    tool_call = response.main_response.tool_calls[0]\n    tool_result = await provider.execute_tool(tool_call, tools_map)\n    \n    # Second turn: Send the tool result back to the LLM\n    messages_with_tool = messages + [response.main_response.original] + [tool_result]\n    \n    final_response = await provider.acomplete(\n        model=\"gpt-4o-mini\",\n        messages=messages_with_tool\n    )\n    \n    print(final_response.main_response.content)  # Output: The final answer\n\nasyncio.run(main())\n```\n\n## Recipes\n- Streaming across providers: see `examples/stream_all_providers.py`\n- Tool-call round-trip (calculator): see `examples/tool_call_all_providers.py`\n- Structured outputs (JSON schema): see section below and examples\n\n### Examples\n\nThe `examples` directory contains runnable scripts for local testing against all supported providers:\n\n- `examples/tool_call_all_providers.py` shows how to execute the same calculator tool call round-trip with OpenAI, OpenRouter, Claude, and Google using shared message/tool definitions.\n- `examples/stream_all_providers.py` streams completions from the same provider list so you can compare chunking formats and latency.\n\nBoth scripts expect a `.env` file with `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, `CLAUDE_API_KEY`, and `GEMINI_API_KEY` (plus optional per-provider model overrides). Run them via Poetry, e.g. `poetry run python examples/tool_call_all_providers.py`.\n\n### Structured Outputs\n\nEnforce JSON schema validation on model responses for consistent, type-safe outputs.\n\n```python\nimport asyncio\nimport json\nfrom llm_async import OpenAIProvider\nfrom llm_async.providers.google import GoogleProvider\n\n# Define response schema\nresponse_schema = {\n    \"type\": \"object\",\n    \"properties\": {\n        \"answer\": {\"type\": \"string\"},\n        \"confidence\": {\"type\": \"number\"}\n    },\n    \"required\": [\"answer\", \"confidence\"],\n    \"additionalProperties\": False\n}\n\nasync def main():\n    # OpenAI example\n    openai_provider = OpenAIProvider(api_key=\"your-openai-key\")\n    response = await openai_provider.acomplete(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}],\n        response_schema=response_schema\n    )\n    result = json.loads(response.main_response.content)\n    print(f\"OpenAI: {result}\")\n\n    # Google Gemini example\n    google_provider = GoogleProvider(api_key=\"your-google-key\")\n    response = await google_provider.acomplete(\n        model=\"gemini-2.5-flash\",\n        messages=[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}],\n        response_schema=response_schema\n    )\n    result = json.loads(response.main_response.content)\n    print(f\"Gemini: {result}\")\n\nasyncio.run(main())\n```\n\n**Supported Providers**: OpenAI, Google Gemini, OpenRouter. Claude does not support structured outputs.\n\n### OpenAI Responses API with Prompt Caching\n\nThe OpenAI Responses API provides stateless conversation state management using `previous_response_id` and prompt caching with `prompt_cache_key`. This enables efficient multi-turn conversations without maintaining conversation history on the client side.\n\n```python\nimport asyncio\nimport uuid\nfrom llm_async.models import Tool\nfrom llm_async.models.message import Message\nfrom llm_async.providers import OpenAIResponsesProvider\n\n# Define a calculator tool\ncalculator_tool = Tool(\n    name=\"calculator\",\n    description=\"Perform basic arithmetic operations\",\n    parameters={\n        \"type\": \"object\",\n        \"properties\": {\n            \"operation\": {\"type\": \"string\", \"enum\": [\"add\", \"subtract\", \"multiply\", \"divide\"]},\n            \"a\": {\"type\": \"number\"},\n            \"b\": {\"type\": \"number\"}\n        },\n        \"required\": [\"operation\", \"a\", \"b\"]\n    }\n)\n\ndef calculator(operation: str, a: float, b: float) -> float:\n    if operation == \"add\":\n        return a + b\n    elif operation == \"subtract\":\n        return a - b\n    elif operation == \"multiply\":\n        return a * b\n    elif operation == \"divide\":\n        return a / b\n    return 0\n\nasync def main():\n    provider = OpenAIResponsesProvider(api_key=\"your-openai-api-key\")\n    \n    # Generate a session ID for prompt caching\n    session_id = uuid.uuid4().hex\n    \n    # First turn: Ask the model to use a tool\n    response = await provider.acomplete(\n        model=\"gpt-4.1\",\n        messages=[Message(\"user\", \"What is 15 + 27? Use the calculator tool.\")],\n        tools=[calculator_tool],\n        tool_choice=\"required\",\n        prompt_cache_key=session_id,  # Enable prompt caching for this session\n    )\n    \n    # Execute the tool locally\n    tool_call = response.main_response.tool_calls[0]\n    tool_result = await provider.execute_tool(tool_call, {\"calculator\": calculator})\n    \n    # Second turn: Continue conversation using previous_response_id\n    # No need to send the entire conversation history - just the response ID and new tool output\n    final_response = await provider.acomplete(\n        model=\"gpt-4.1\",\n        messages=[tool_result],  # Send tool result as message\n        tools=[calculator_tool],\n        previous_response_id=response.original[\"id\"],  # Reference the previous response\n        prompt_cache_key=session_id,  # Reuse the cached prompt\n    )\n    \n    print(final_response.main_response.content)  # Output: The final answer with calculation result\n\nasyncio.run(main())\n```\n\n**Key Benefits**:\n- **No history overhead**: Use `previous_response_id` to continue conversations without resending message history\n- **Prompt caching**: Pass `prompt_cache_key` to reuse cached prompts across requests in the same session\n- **Reduced costs**: Cached prefixes consume 90% fewer tokens\n- **Lower latency**: Cached prefixes are processed faster\n- **Session management**: Clients control session IDs (e.g., `uuid.uuid4().hex`) for cache routing\n\n**How it works**:\n1. First request establishes a response context and caches the prompt prefix (for prompts \u22651024 tokens)\n2. Subsequent requests reference the first response via `previous_response_id` \n3. Using the same `prompt_cache_key` routes requests to the same machine for consistent cache hits\n4. Only send new content (tool outputs, user messages) instead of full conversation history\n5. Cached prefixes remain active for 5-10 minutes of inactivity (up to 1 hour off-peak)\n\n**See also**: `examples/openai_responses_tool_call_with_previous_id.py` for a complete working example.\n\n## Why llm_async?\n- Async-first performance (aiosonic-based) vs. sync or heavier HTTP stacks.\n- Unified provider interface: same message/tool/streaming patterns across OpenAI, Claude, Gemini, OpenRouter.\n- Structured outputs (OpenAI, Google, OpenRouter) with JSON schema validation.\n- Tool-call round-trip helpers for consistent multi-turn execution.\n- Minimal surface area: easy to extend with new providers via BaseProvider.\n\n## API Reference\n\n### OpenAIProvider\n\n- `__init__(api_key: str, base_url: str = \"https://api.openai.com/v1\")`\n- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`\n\n  Performs a chat completion. When `stream=True` the method returns an async iterator that yields StreamChunk objects as they arrive from the provider.\n\n### OpenRouterProvider\n\n- `__init__(api_key: str, base_url: str = \"https://openrouter.ai/api/v1\")`\n- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`\n\n  Performs a chat completion using OpenRouter's unified API. Supports the same OpenAI-compatible interface with additional optional headers:\n  - `http_referer`: Your application's URL (recommended)\n  - `x_title`: Your application's name (recommended)\n\n  OpenRouter provides access to hundreds of AI models from various providers through a single API.\n\n### GoogleProvider\n\n- `__init__(api_key: str, base_url: str = \"https://generativelanguage.googleapis.com/v1beta/models/\")`\n- `acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]`\n\n  Performs a chat completion using Google's Gemini API. Supports structured outputs and uses camelCase for API keys (e.g., `generationConfig`).\n\n**Streaming**\n- **Usage**: `async for chunk in await provider.acomplete(..., stream=True):` print or process `chunk` in real time.\n\n**Example output**\n\n```\n--- OpenAI streaming response ---\n1. Peel and slice potatoes.\n2. Par-cook potatoes briefly.\n3. Whisk eggs with salt and pepper.\n4. Saut\u00e9 onions until translucent (optional).\n5. Combine potatoes and eggs in a pan and cook until set.\n6. Fold and serve.\n--- Claude streaming response ---\n1. Prepare potatoes by peeling and slicing.\n2. Fry or boil until tender.\n3. Beat eggs and season.\n4. Mix potatoes with eggs and cook gently.\n5. Serve warm.\n```\n\n## Development\n\n### Setup\n\n```bash\ngit clone https://github.com/sonic182/llm-async.git\ncd llm_async\npoetry install\n```\n\n### Running Tests\n\n```bash\npoetry run pytest\n```\n\n### Building\n\n```bash\npoetry build\n```\n\n## Roadmap\n\n- Support for additional providers (e.g., Grok, Anthropic direct API)\n- More advanced tool features\n- Response caching and retry mechanisms\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request on [GitHub](https://github.com/sonic182/llm-async).\n\n## License\n\nMIT License - see the [LICENSE](LICENSE) file for details.\n\n## Authors\n\n- sonic182\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Multi-LLM Provider Library",
    "version": "0.3.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/sonic182/llm-async/issues",
        "Changelog": "https://github.com/sonic182/llm-async/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/sonic182/llm-async#readme",
        "Homepage": "https://github.com/sonic182/llm-async",
        "Repository": "https://github.com/sonic182/llm-async",
        "Source": "https://github.com/sonic182/llm-async"
    },
    "split_keywords": [
        "llm",
        " asyncio",
        " ai",
        " openai",
        " anthropic",
        " claude",
        " gemini",
        " google",
        " openrouter",
        " streaming",
        " tool calling",
        " json schema",
        " structured outputs",
        " python",
        " aiosonic"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "26099aa2f68fd2b4f4cc78e733592975cac19fc315c2c9c9dc1670bc7faaa015",
                "md5": "24b546b2cde6afd52f4761d55e5c9083",
                "sha256": "24733343cb82754e93d1a10c5ddaa11be26d0f4e226f2d5de4fe64b27cf48be9"
            },
            "downloads": -1,
            "filename": "llm_async-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "24b546b2cde6afd52f4761d55e5c9083",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 27520,
            "upload_time": "2025-11-08T15:13:44",
            "upload_time_iso_8601": "2025-11-08T15:13:44.452764Z",
            "url": "https://files.pythonhosted.org/packages/26/09/9aa2f68fd2b4f4cc78e733592975cac19fc315c2c9c9dc1670bc7faaa015/llm_async-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "459f1adf324757bd7bf8acf40d441a5c2982bba776e1f7a605c6d88bd2616bdc",
                "md5": "975897c277dadda6953e84872bc6bded",
                "sha256": "94827844c870f24ceed48075fcacfddaf1d4a7b6c7b2415642b7e40f2c49e47d"
            },
            "downloads": -1,
            "filename": "llm_async-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "975897c277dadda6953e84872bc6bded",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 24379,
            "upload_time": "2025-11-08T15:13:45",
            "upload_time_iso_8601": "2025-11-08T15:13:45.781833Z",
            "url": "https://files.pythonhosted.org/packages/45/9f/1adf324757bd7bf8acf40d441a5c2982bba776e1f7a605c6d88bd2616bdc/llm_async-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-08 15:13:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sonic182",
    "github_project": "llm-async",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llm-async"
}

Johanderson Mogollon