llmring

Name	llmring JSON
Version	1.2.0 JSON
	download
home_page	None
Summary	Unified Python interface for OpenAI, Anthropic, Google, and Ollama LLMs
upload_time	2025-10-26 16:31:32
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	ai anthropic api claude gemini gpt llm ollama openai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # LLMRing

A Python library for LLM integration with unified interface and MCP support. Supports OpenAI, Anthropic, Google Gemini, and Ollama with consistent APIs.

## Features

- Unified Interface: Single API for all major LLM providers
- Streaming Support: Streaming for all providers
- Native Tool Calling: Provider-native function calling with consistent interface
- Unified Structured Output: JSON schema works across all providers with automatic adaptation
- Conversational Configuration: MCP chat interface for natural language lockfile setup
- Aliases: Semantic aliases (`deep`, `fast`, `balanced`) with registry-based recommendations
- Cost Tracking: Cost calculation with on-demand receipt generation
- Registry Integration: Centralized model capabilities and pricing
- Fallback Models: Automatic failover to alternative models
- Type Safety: Typed exceptions and error handling
- MCP Integration: Model Context Protocol support for tool ecosystems
- MCP Chat Client: Chat interface with persistent history for any MCP server

## Quick Start

### Installation

```bash
# With uv (recommended)
uv add llmring

# With pip
pip install llmring
```

**Including Lockfiles in Your Package:**

To ship your `llmring.lock` with your package (like llmring does), add to your `pyproject.toml`:

```toml
[tool.hatch.build]
include = [
    "src/yourpackage/**/*.py",
    "src/yourpackage/**/*.lock",  # Include lockfiles
]
```

### Basic Usage

```python
from llmring.service import LLMRing
from llmring.schemas import LLMRequest, Message

# Initialize service with context manager (auto-closes resources)
async with LLMRing() as service:
    # Simple chat
    request = LLMRequest(
        model="fast",
        messages=[
            Message(role="system", content="You are a helpful assistant."),
            Message(role="user", content="Hello!")
        ]
    )

    response = await service.chat(request)
    print(response.content)
```

### Streaming

```python
async with LLMRing() as service:
    # Streaming for all providers
    request = LLMRequest(
        model="balanced",
        messages=[Message(role="user", content="Count to 10")]
    )

    accumulated_usage = None
    async for chunk in service.chat_stream(request):
        print(chunk.content, end="", flush=True)
        # Capture final usage stats
        if chunk.usage:
            accumulated_usage = chunk.usage

    print()  # Newline after streaming
    if accumulated_usage:
        print(f"Tokens used: {accumulated_usage.get('total_tokens', 0)}")
```

### Tool Calling

```python
async with LLMRing() as service:
    tools = [{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }]

    request = LLMRequest(
        model="balanced",
        messages=[Message(role="user", content="What's the weather in NYC?")],
        tools=tools
    )

    response = await service.chat(request)
    if response.tool_calls:
        print("Function called:", response.tool_calls[0]["function"]["name"])
```

## Resource Management

### Context Manager (Recommended)

```python
from llmring import LLMRing, LLMRequest, Message

# Automatic resource cleanup with context manager
async with LLMRing() as service:
    request = LLMRequest(
        model="fast",
        messages=[Message(role="user", content="Hello!")]
    )
    response = await service.chat(request)
    # Resources are automatically cleaned up when exiting the context
```

### Manual Cleanup

```python
# Manual resource management
service = LLMRing()
try:
    response = await service.chat(request)
finally:
    await service.close()  # Ensure resources are cleaned up
```

## Advanced Features

### Unified Structured Output

```python
# JSON schema API works across all providers
request = LLMRequest(
    model="balanced",  # Works with any provider
    messages=[Message(role="user", content="Generate a person")],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "email": {"type": "string"}
                },
                "required": ["name", "age"]
            }
        },
        "strict": True  # Validates across all providers
    }
)

response = await service.chat(request)
print("JSON:", response.content)   # Valid JSON string
print("Data:", response.parsed)    # Python dict ready to use
```

### Provider-Specific Parameters

```python

# Anthropic: Prompt caching for 90% cost savings
request = LLMRequest(
    model="balanced",
    messages=[
        Message(
            role="system",
            content="Very long system prompt...",  # 1024+ tokens
            metadata={"cache_control": {"type": "ephemeral"}}
        ),
        Message(role="user", content="Hello")
    ]
)

# Extra parameters for provider-specific features
request = LLMRequest(
    model="fast",
    messages=[Message(role="user", content="Hello")],
    extra_params={
        "logprobs": True,
        "top_logprobs": 5,
        "presence_penalty": 0.1,
        "seed": 12345
    }
)
```

### Model Aliases and Lockfiles

LLMRing uses lockfiles to map semantic aliases to models, with support for fallback pools and environment-specific profiles:

```bash
# Initialize lockfile (explicit creation at current directory)
llmring lock init

# Conversational configuration with AI advisor (recommended)
llmring lock chat  # Natural language interface for lockfile management

# View current aliases
llmring aliases
```

**Lockfile Resolution Order:**
1. Explicit path via `lockfile_path` parameter (file must exist)
2. `LLMRING_LOCKFILE_PATH` environment variable (file must exist)
3. `./llmring.lock` in current directory (if exists)
4. Bundled lockfile at `src/llmring/llmring.lock` (minimal fallback with advisor alias)

**Packaging Your Own Lockfile:**
Libraries using LLMRing can ship with their own lockfiles. See [Lockfile Documentation](docs/lockfile.md) for details on:
- Including lockfiles in your package distribution
- Lockfile resolution order and precedence
- Creating lockfiles with fallback models
- Environment-specific profiles and configuration

**Conversational Configuration** via `llmring lock chat`:
- Describe your requirements in natural language
- Get AI-powered recommendations based on registry analysis
- Configure aliases with multiple fallback models
- Understand cost implications and tradeoffs
- Set up environment-specific profiles

```python
# Use semantic aliases (always current, with fallbacks)
request = LLMRequest(
    model="deep",      # → most capable reasoning model
    messages=[Message(role="user", content="Hello")]
)
# Or use other aliases:
# model="fast"      → cost-effective quick responses
# model="balanced"  → optimal all-around model
# model="advisor"   → Claude Opus 4.1 - powers conversational config
```

Key features:
- Registry-based recommendations
- Fallback models provide automatic failover
- Cost analysis and recommendations
- Environment-specific configurations for dev/staging/prod

### Profiles: Environment-Specific Configurations

LLMRing supports **profiles** to manage different model configurations for different environments (dev, staging, prod, etc.):

```python
# Use different models based on environment
# Development: Use cheaper/faster models
# Production: Use higher-quality models

# Set profile via environment variable
export LLMRING_PROFILE=dev  # or prod, staging, etc.

# Or specify profile in code
async with LLMRing() as service:
    # Uses 'dev' profile bindings
    response = await service.chat(request, profile="dev")
```

**Profile Configuration in Lockfiles:**

```toml
# llmring.lock (truncated for brevity)
version = "1.0"
default_profile = "default"

[profiles.default]
name = "default"
[[profiles.default.bindings]]
alias = "assistant"
models = ["anthropic:claude-3-5-sonnet"]  # Production quality

[profiles.dev]
name = "dev"
[[profiles.dev.bindings]]
alias = "assistant"
models = ["openai:gpt-4o-mini"]  # Cheaper for development

[profiles.test]
name = "test"
[[profiles.test.bindings]]
alias = "assistant"
models = ["ollama:llama3"]  # Local model for testing
```

**Using Profiles with CLI:**

```bash
# Bind aliases to specific profiles
llmring bind assistant "openai:gpt-4o-mini" --profile dev
llmring bind assistant "anthropic:claude-3-5-sonnet" --profile prod

# List aliases in a profile
llmring aliases --profile dev

# Use profile for chat
llmring chat "Hello" --profile dev

# Set default profile via environment
export LLMRING_PROFILE=dev
llmring chat "Hello"  # Now uses dev profile
```

**Profile Selection Priority:**
1. Explicit parameter: `profile="dev"` or `--profile dev` (highest priority)
2. Environment variable: `LLMRING_PROFILE=dev`
3. Default: `default` profile (if not specified)

**Common Use Cases:**
- **Development**: Use cheaper models to reduce costs during development
- **Testing**: Use local models (Ollama) or mock responses
- **Staging**: Use production models but with different rate limits
- **Production**: Use highest quality models for best user experience
- **A/B Testing**: Test different models for the same alias

### Fallback Models

Aliases can specify multiple models for automatic failover:

```toml
# In llmring.lock
[profiles.default]
name = "default"
[[profiles.default.bindings]]
alias = "assistant"
models = [
    "anthropic:claude-3-5-sonnet",  # Primary
    "openai:gpt-4o",                # First fallback
    "google:gemini-1.5-pro"         # Second fallback
]
```

If the primary model fails (rate limit, availability, etc.), LLMRing automatically tries the fallbacks.

### Advanced: Direct Model References

While aliases are recommended, you can still use direct `provider:model` references when needed:

```python
# Direct model reference (escape hatch)
request = LLMRequest(
    model="anthropic:claude-3-5-sonnet",  # Direct provider:model reference
    messages=[Message(role="user", content="Hello")]
)

# Or specify exact model versions
request = LLMRequest(
    model="openai:gpt-4o",  # Specific model version when needed
    messages=[Message(role="user", content="Hello")]
)
```

**Terminology:**
- **Alias**: Semantic name like `fast`, `balanced`, `deep` (recommended)
- **Model Reference**: Full `provider:model` format like `openai:gpt-4o` (escape hatch)
- **Raw SDK Access**: Bypassing LLMRing entirely using provider clients directly (see [Provider Guide](docs/providers.md))

Recommendation: Use aliases for maintainability and cost optimization. Use direct model references only when you need a specific model version or provider-specific features.

### Raw SDK Access

When you need direct access to the underlying SDKs:

```python
# Access provider SDK clients directly
openai_client = service.get_provider("openai").client      # openai.AsyncOpenAI
anthropic_client = service.get_provider("anthropic").client # anthropic.AsyncAnthropic
google_client = service.get_provider("google").client       # google.genai.Client
ollama_client = service.get_provider("ollama").client       # ollama.AsyncClient

# Use SDK features not exposed by LLMRing
response = await openai_client.chat.completions.create(
    model="fast",  # Use alias or provider:model format when needed
    messages=[{"role": "user", "content": "Hello"}],
    logprobs=True,
    top_logprobs=10,
    parallel_tool_calls=False,
    # Any OpenAI parameter
)

# Anthropic with all SDK features
response = await anthropic_client.messages.create(
    model="balanced",  # Use alias or provider:model format when needed
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100,
    top_p=0.9,
    top_k=40,
    system=[{
        "type": "text",
        "text": "You are helpful",
        "cache_control": {"type": "ephemeral"}
    }]
)

# Google with native SDK features
response = google_client.models.generate_content(
    model="balanced",  # Use alias or provider:model format when needed
    contents="Hello",
    generation_config={
        "temperature": 0.7,
        "top_p": 0.8,
        "top_k": 40,
        "candidate_count": 3
    },
    safety_settings=[{
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }]
)
```

When to use raw clients:
- SDK features not exposed by LLMRing
- Provider-specific optimizations
- Complex configurations
- Performance-critical applications

## Provider Support

| Provider | Models | Streaming | Tools | Special Features |
|----------|--------|-----------|-------|------------------|
| **OpenAI** | GPT-4o, GPT-4o-mini, o1 | Yes | Native | JSON schema, PDF processing |
| **Anthropic** | Claude 3.5 Sonnet/Haiku | Yes | Native | Prompt caching, large context |
| **Google** | Gemini 1.5/2.0 Pro/Flash | Yes | Native | Multimodal, 2M+ context |
| **Ollama** | Llama, Mistral, etc. | Yes | Prompt-based | Local models, custom options |

## Setup

### Environment Variables

```bash
# Add to your .env file
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GEMINI_API_KEY=AIza...

# Optional
OLLAMA_BASE_URL=http://localhost:11434  # Default
```

### Conversational Setup

```bash
# Create optimized configuration with AI advisor
llmring lock chat

# This opens an interactive chat where you can describe your needs
# and get personalized recommendations based on the registry
```

### Dependencies

```python
# Required for specific providers
pip install openai>=1.0     # OpenAI
pip install anthropic>=0.67  # Anthropic
pip install google-genai    # Google Gemini
pip install ollama>=0.4     # Ollama
```

## MCP Integration

```python
from llmring.mcp.client import create_enhanced_llm

# Create MCP-enabled LLM with tools
llm = await create_enhanced_llm(
    model="fast",
    mcp_server_path="path/to/mcp/server"
)

# Now has access to MCP tools
response = await llm.chat([
    Message(role="user", content="Use available tools to help me")
])
```

## Documentation

- **[Lockfile Documentation](docs/lockfile.md)** - Complete guide to lockfiles, aliases, and profiles
- **[Conversational Lockfile](docs/conversational-lockfile.md)** - Natural language lockfile management
- **[MCP Integration](docs/mcp.md)** - Model Context Protocol and chat client
- **[API Reference](docs/api-reference.md)** - Core API documentation
- **[Provider Guide](docs/providers.md)** - Provider-specific features
- **[Structured Output](docs/structured-output.md)** - Unified JSON schema support
- **[File Utilities](docs/file-utilities.md)** - Vision and multimodal file handling
- **[CLI Reference](docs/cli-reference.md)** - Command-line interface guide
- **[Receipts & Cost Tracking](docs/receipts.md)** - On-demand receipt generation and cost tracking
- **[Migration to On-Demand Receipts](docs/migration-to-on-demand-receipts.md)** - Upgrade guide from automatic to on-demand receipts
- **[Examples](examples/)** - Working code examples:
  - [Quick Start](examples/quick_start.py) - Basic usage patterns
  - [MCP Chat](examples/mcp_chat_example.py) - MCP integration
  - [Streaming](examples/mcp_streaming_example.py) - Streaming with tools

## Development

```bash
# Install for development
uv sync --group dev

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/
uv run ruff format src/
```

## Error Handling

LLMRing uses typed exceptions for better error handling:

```python
from llmring.exceptions import (
    ProviderAuthenticationError,
    ModelNotFoundError,
    ProviderRateLimitError,
    ProviderTimeoutError
)

try:
    response = await service.chat(request)
except ProviderAuthenticationError:
    print("Invalid API key")
except ModelNotFoundError:
    print("Model not supported")
except ProviderRateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after}s")
```

## Key Features Summary

- Unified Interface: Switch providers without code changes
- Performance: Streaming, prompt caching, optimized requests
- Reliability: Circuit breakers, retries, typed error handling
- Observability: Cost tracking, on-demand receipt generation, batch certification
- Flexibility: Provider-specific features and raw SDK access
- Standards: Type-safe, well-tested

## License

MIT License - see LICENSE file for details.

## Contributing

1. Fork the repository
2. Create a feature branch
3. Add tests for your changes
4. Ensure all tests pass: `uv run pytest`
5. Submit a pull request

## Examples

See the `examples/` directory for complete working examples:
- Basic chat and streaming
- Tool calling and function execution
- Provider-specific features
- MCP integration
- On-demand receipt generation and cost tracking

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llmring",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Juan Reyero <juan@juanreyero.com>",
    "keywords": "ai, anthropic, api, claude, gemini, gpt, llm, ollama, openai",
    "author": null,
    "author_email": "Juan Reyero <juan@juanreyero.com>",
    "download_url": "https://files.pythonhosted.org/packages/a9/bf/d9be61f9cdb2a91043ae73a66b2359fddeb965a53d027eac6aed024a99b5/llmring-1.2.0.tar.gz",
    "platform": null,
    "description": "# LLMRing\n\nA Python library for LLM integration with unified interface and MCP support. Supports OpenAI, Anthropic, Google Gemini, and Ollama with consistent APIs.\n\n## Features\n\n- Unified Interface: Single API for all major LLM providers\n- Streaming Support: Streaming for all providers\n- Native Tool Calling: Provider-native function calling with consistent interface\n- Unified Structured Output: JSON schema works across all providers with automatic adaptation\n- Conversational Configuration: MCP chat interface for natural language lockfile setup\n- Aliases: Semantic aliases (`deep`, `fast`, `balanced`) with registry-based recommendations\n- Cost Tracking: Cost calculation with on-demand receipt generation\n- Registry Integration: Centralized model capabilities and pricing\n- Fallback Models: Automatic failover to alternative models\n- Type Safety: Typed exceptions and error handling\n- MCP Integration: Model Context Protocol support for tool ecosystems\n- MCP Chat Client: Chat interface with persistent history for any MCP server\n\n## Quick Start\n\n### Installation\n\n```bash\n# With uv (recommended)\nuv add llmring\n\n# With pip\npip install llmring\n```\n\n**Including Lockfiles in Your Package:**\n\nTo ship your `llmring.lock` with your package (like llmring does), add to your `pyproject.toml`:\n\n```toml\n[tool.hatch.build]\ninclude = [\n    \"src/yourpackage/**/*.py\",\n    \"src/yourpackage/**/*.lock\",  # Include lockfiles\n]\n```\n\n### Basic Usage\n\n```python\nfrom llmring.service import LLMRing\nfrom llmring.schemas import LLMRequest, Message\n\n# Initialize service with context manager (auto-closes resources)\nasync with LLMRing() as service:\n    # Simple chat\n    request = LLMRequest(\n        model=\"fast\",\n        messages=[\n            Message(role=\"system\", content=\"You are a helpful assistant.\"),\n            Message(role=\"user\", content=\"Hello!\")\n        ]\n    )\n\n    response = await service.chat(request)\n    print(response.content)\n```\n\n### Streaming\n\n```python\nasync with LLMRing() as service:\n    # Streaming for all providers\n    request = LLMRequest(\n        model=\"balanced\",\n        messages=[Message(role=\"user\", content=\"Count to 10\")]\n    )\n\n    accumulated_usage = None\n    async for chunk in service.chat_stream(request):\n        print(chunk.content, end=\"\", flush=True)\n        # Capture final usage stats\n        if chunk.usage:\n            accumulated_usage = chunk.usage\n\n    print()  # Newline after streaming\n    if accumulated_usage:\n        print(f\"Tokens used: {accumulated_usage.get('total_tokens', 0)}\")\n```\n\n### Tool Calling\n\n```python\nasync with LLMRing() as service:\n    tools = [{\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"get_weather\",\n            \"description\": \"Get weather for a location\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"location\": {\"type\": \"string\"}\n                },\n                \"required\": [\"location\"]\n            }\n        }\n    }]\n\n    request = LLMRequest(\n        model=\"balanced\",\n        messages=[Message(role=\"user\", content=\"What's the weather in NYC?\")],\n        tools=tools\n    )\n\n    response = await service.chat(request)\n    if response.tool_calls:\n        print(\"Function called:\", response.tool_calls[0][\"function\"][\"name\"])\n```\n\n## Resource Management\n\n### Context Manager (Recommended)\n\n```python\nfrom llmring import LLMRing, LLMRequest, Message\n\n# Automatic resource cleanup with context manager\nasync with LLMRing() as service:\n    request = LLMRequest(\n        model=\"fast\",\n        messages=[Message(role=\"user\", content=\"Hello!\")]\n    )\n    response = await service.chat(request)\n    # Resources are automatically cleaned up when exiting the context\n```\n\n### Manual Cleanup\n\n```python\n# Manual resource management\nservice = LLMRing()\ntry:\n    response = await service.chat(request)\nfinally:\n    await service.close()  # Ensure resources are cleaned up\n```\n\n## Advanced Features\n\n### Unified Structured Output\n\n```python\n# JSON schema API works across all providers\nrequest = LLMRequest(\n    model=\"balanced\",  # Works with any provider\n    messages=[Message(role=\"user\", content=\"Generate a person\")],\n    response_format={\n        \"type\": \"json_schema\",\n        \"json_schema\": {\n            \"name\": \"person\",\n            \"schema\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"name\": {\"type\": \"string\"},\n                    \"age\": {\"type\": \"integer\"},\n                    \"email\": {\"type\": \"string\"}\n                },\n                \"required\": [\"name\", \"age\"]\n            }\n        },\n        \"strict\": True  # Validates across all providers\n    }\n)\n\nresponse = await service.chat(request)\nprint(\"JSON:\", response.content)   # Valid JSON string\nprint(\"Data:\", response.parsed)    # Python dict ready to use\n```\n\n### Provider-Specific Parameters\n\n```python\n\n# Anthropic: Prompt caching for 90% cost savings\nrequest = LLMRequest(\n    model=\"balanced\",\n    messages=[\n        Message(\n            role=\"system\",\n            content=\"Very long system prompt...\",  # 1024+ tokens\n            metadata={\"cache_control\": {\"type\": \"ephemeral\"}}\n        ),\n        Message(role=\"user\", content=\"Hello\")\n    ]\n)\n\n# Extra parameters for provider-specific features\nrequest = LLMRequest(\n    model=\"fast\",\n    messages=[Message(role=\"user\", content=\"Hello\")],\n    extra_params={\n        \"logprobs\": True,\n        \"top_logprobs\": 5,\n        \"presence_penalty\": 0.1,\n        \"seed\": 12345\n    }\n)\n```\n\n### Model Aliases and Lockfiles\n\nLLMRing uses lockfiles to map semantic aliases to models, with support for fallback pools and environment-specific profiles:\n\n```bash\n# Initialize lockfile (explicit creation at current directory)\nllmring lock init\n\n# Conversational configuration with AI advisor (recommended)\nllmring lock chat  # Natural language interface for lockfile management\n\n# View current aliases\nllmring aliases\n```\n\n**Lockfile Resolution Order:**\n1. Explicit path via `lockfile_path` parameter (file must exist)\n2. `LLMRING_LOCKFILE_PATH` environment variable (file must exist)\n3. `./llmring.lock` in current directory (if exists)\n4. Bundled lockfile at `src/llmring/llmring.lock` (minimal fallback with advisor alias)\n\n**Packaging Your Own Lockfile:**\nLibraries using LLMRing can ship with their own lockfiles. See [Lockfile Documentation](docs/lockfile.md) for details on:\n- Including lockfiles in your package distribution\n- Lockfile resolution order and precedence\n- Creating lockfiles with fallback models\n- Environment-specific profiles and configuration\n\n**Conversational Configuration** via `llmring lock chat`:\n- Describe your requirements in natural language\n- Get AI-powered recommendations based on registry analysis\n- Configure aliases with multiple fallback models\n- Understand cost implications and tradeoffs\n- Set up environment-specific profiles\n\n```python\n# Use semantic aliases (always current, with fallbacks)\nrequest = LLMRequest(\n    model=\"deep\",      # \u2192 most capable reasoning model\n    messages=[Message(role=\"user\", content=\"Hello\")]\n)\n# Or use other aliases:\n# model=\"fast\"      \u2192 cost-effective quick responses\n# model=\"balanced\"  \u2192 optimal all-around model\n# model=\"advisor\"   \u2192 Claude Opus 4.1 - powers conversational config\n```\n\nKey features:\n- Registry-based recommendations\n- Fallback models provide automatic failover\n- Cost analysis and recommendations\n- Environment-specific configurations for dev/staging/prod\n\n### Profiles: Environment-Specific Configurations\n\nLLMRing supports **profiles** to manage different model configurations for different environments (dev, staging, prod, etc.):\n\n```python\n# Use different models based on environment\n# Development: Use cheaper/faster models\n# Production: Use higher-quality models\n\n# Set profile via environment variable\nexport LLMRING_PROFILE=dev  # or prod, staging, etc.\n\n# Or specify profile in code\nasync with LLMRing() as service:\n    # Uses 'dev' profile bindings\n    response = await service.chat(request, profile=\"dev\")\n```\n\n**Profile Configuration in Lockfiles:**\n\n```toml\n# llmring.lock (truncated for brevity)\nversion = \"1.0\"\ndefault_profile = \"default\"\n\n[profiles.default]\nname = \"default\"\n[[profiles.default.bindings]]\nalias = \"assistant\"\nmodels = [\"anthropic:claude-3-5-sonnet\"]  # Production quality\n\n[profiles.dev]\nname = \"dev\"\n[[profiles.dev.bindings]]\nalias = \"assistant\"\nmodels = [\"openai:gpt-4o-mini\"]  # Cheaper for development\n\n[profiles.test]\nname = \"test\"\n[[profiles.test.bindings]]\nalias = \"assistant\"\nmodels = [\"ollama:llama3\"]  # Local model for testing\n```\n\n**Using Profiles with CLI:**\n\n```bash\n# Bind aliases to specific profiles\nllmring bind assistant \"openai:gpt-4o-mini\" --profile dev\nllmring bind assistant \"anthropic:claude-3-5-sonnet\" --profile prod\n\n# List aliases in a profile\nllmring aliases --profile dev\n\n# Use profile for chat\nllmring chat \"Hello\" --profile dev\n\n# Set default profile via environment\nexport LLMRING_PROFILE=dev\nllmring chat \"Hello\"  # Now uses dev profile\n```\n\n**Profile Selection Priority:**\n1. Explicit parameter: `profile=\"dev\"` or `--profile dev` (highest priority)\n2. Environment variable: `LLMRING_PROFILE=dev`\n3. Default: `default` profile (if not specified)\n\n**Common Use Cases:**\n- **Development**: Use cheaper models to reduce costs during development\n- **Testing**: Use local models (Ollama) or mock responses\n- **Staging**: Use production models but with different rate limits\n- **Production**: Use highest quality models for best user experience\n- **A/B Testing**: Test different models for the same alias\n\n### Fallback Models\n\nAliases can specify multiple models for automatic failover:\n\n```toml\n# In llmring.lock\n[profiles.default]\nname = \"default\"\n[[profiles.default.bindings]]\nalias = \"assistant\"\nmodels = [\n    \"anthropic:claude-3-5-sonnet\",  # Primary\n    \"openai:gpt-4o\",                # First fallback\n    \"google:gemini-1.5-pro\"         # Second fallback\n]\n```\n\nIf the primary model fails (rate limit, availability, etc.), LLMRing automatically tries the fallbacks.\n\n### Advanced: Direct Model References\n\nWhile aliases are recommended, you can still use direct `provider:model` references when needed:\n\n```python\n# Direct model reference (escape hatch)\nrequest = LLMRequest(\n    model=\"anthropic:claude-3-5-sonnet\",  # Direct provider:model reference\n    messages=[Message(role=\"user\", content=\"Hello\")]\n)\n\n# Or specify exact model versions\nrequest = LLMRequest(\n    model=\"openai:gpt-4o\",  # Specific model version when needed\n    messages=[Message(role=\"user\", content=\"Hello\")]\n)\n```\n\n**Terminology:**\n- **Alias**: Semantic name like `fast`, `balanced`, `deep` (recommended)\n- **Model Reference**: Full `provider:model` format like `openai:gpt-4o` (escape hatch)\n- **Raw SDK Access**: Bypassing LLMRing entirely using provider clients directly (see [Provider Guide](docs/providers.md))\n\nRecommendation: Use aliases for maintainability and cost optimization. Use direct model references only when you need a specific model version or provider-specific features.\n\n### Raw SDK Access\n\nWhen you need direct access to the underlying SDKs:\n\n```python\n# Access provider SDK clients directly\nopenai_client = service.get_provider(\"openai\").client      # openai.AsyncOpenAI\nanthropic_client = service.get_provider(\"anthropic\").client # anthropic.AsyncAnthropic\ngoogle_client = service.get_provider(\"google\").client       # google.genai.Client\nollama_client = service.get_provider(\"ollama\").client       # ollama.AsyncClient\n\n# Use SDK features not exposed by LLMRing\nresponse = await openai_client.chat.completions.create(\n    model=\"fast\",  # Use alias or provider:model format when needed\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n    logprobs=True,\n    top_logprobs=10,\n    parallel_tool_calls=False,\n    # Any OpenAI parameter\n)\n\n# Anthropic with all SDK features\nresponse = await anthropic_client.messages.create(\n    model=\"balanced\",  # Use alias or provider:model format when needed\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n    max_tokens=100,\n    top_p=0.9,\n    top_k=40,\n    system=[{\n        \"type\": \"text\",\n        \"text\": \"You are helpful\",\n        \"cache_control\": {\"type\": \"ephemeral\"}\n    }]\n)\n\n# Google with native SDK features\nresponse = google_client.models.generate_content(\n    model=\"balanced\",  # Use alias or provider:model format when needed\n    contents=\"Hello\",\n    generation_config={\n        \"temperature\": 0.7,\n        \"top_p\": 0.8,\n        \"top_k\": 40,\n        \"candidate_count\": 3\n    },\n    safety_settings=[{\n        \"category\": \"HARM_CATEGORY_HARASSMENT\",\n        \"threshold\": \"BLOCK_MEDIUM_AND_ABOVE\"\n    }]\n)\n```\n\nWhen to use raw clients:\n- SDK features not exposed by LLMRing\n- Provider-specific optimizations\n- Complex configurations\n- Performance-critical applications\n\n## Provider Support\n\n| Provider | Models | Streaming | Tools | Special Features |\n|----------|--------|-----------|-------|------------------|\n| **OpenAI** | GPT-4o, GPT-4o-mini, o1 | Yes | Native | JSON schema, PDF processing |\n| **Anthropic** | Claude 3.5 Sonnet/Haiku | Yes | Native | Prompt caching, large context |\n| **Google** | Gemini 1.5/2.0 Pro/Flash | Yes | Native | Multimodal, 2M+ context |\n| **Ollama** | Llama, Mistral, etc. | Yes | Prompt-based | Local models, custom options |\n\n## Setup\n\n### Environment Variables\n\n```bash\n# Add to your .env file\nOPENAI_API_KEY=sk-...\nANTHROPIC_API_KEY=sk-ant-...\nGOOGLE_GEMINI_API_KEY=AIza...\n\n# Optional\nOLLAMA_BASE_URL=http://localhost:11434  # Default\n```\n\n### Conversational Setup\n\n```bash\n# Create optimized configuration with AI advisor\nllmring lock chat\n\n# This opens an interactive chat where you can describe your needs\n# and get personalized recommendations based on the registry\n```\n\n### Dependencies\n\n```python\n# Required for specific providers\npip install openai>=1.0     # OpenAI\npip install anthropic>=0.67  # Anthropic\npip install google-genai    # Google Gemini\npip install ollama>=0.4     # Ollama\n```\n\n## MCP Integration\n\n```python\nfrom llmring.mcp.client import create_enhanced_llm\n\n# Create MCP-enabled LLM with tools\nllm = await create_enhanced_llm(\n    model=\"fast\",\n    mcp_server_path=\"path/to/mcp/server\"\n)\n\n# Now has access to MCP tools\nresponse = await llm.chat([\n    Message(role=\"user\", content=\"Use available tools to help me\")\n])\n```\n\n## Documentation\n\n- **[Lockfile Documentation](docs/lockfile.md)** - Complete guide to lockfiles, aliases, and profiles\n- **[Conversational Lockfile](docs/conversational-lockfile.md)** - Natural language lockfile management\n- **[MCP Integration](docs/mcp.md)** - Model Context Protocol and chat client\n- **[API Reference](docs/api-reference.md)** - Core API documentation\n- **[Provider Guide](docs/providers.md)** - Provider-specific features\n- **[Structured Output](docs/structured-output.md)** - Unified JSON schema support\n- **[File Utilities](docs/file-utilities.md)** - Vision and multimodal file handling\n- **[CLI Reference](docs/cli-reference.md)** - Command-line interface guide\n- **[Receipts & Cost Tracking](docs/receipts.md)** - On-demand receipt generation and cost tracking\n- **[Migration to On-Demand Receipts](docs/migration-to-on-demand-receipts.md)** - Upgrade guide from automatic to on-demand receipts\n- **[Examples](examples/)** - Working code examples:\n  - [Quick Start](examples/quick_start.py) - Basic usage patterns\n  - [MCP Chat](examples/mcp_chat_example.py) - MCP integration\n  - [Streaming](examples/mcp_streaming_example.py) - Streaming with tools\n\n## Development\n\n```bash\n# Install for development\nuv sync --group dev\n\n# Run tests\nuv run pytest\n\n# Lint and format\nuv run ruff check src/\nuv run ruff format src/\n```\n\n## Error Handling\n\nLLMRing uses typed exceptions for better error handling:\n\n```python\nfrom llmring.exceptions import (\n    ProviderAuthenticationError,\n    ModelNotFoundError,\n    ProviderRateLimitError,\n    ProviderTimeoutError\n)\n\ntry:\n    response = await service.chat(request)\nexcept ProviderAuthenticationError:\n    print(\"Invalid API key\")\nexcept ModelNotFoundError:\n    print(\"Model not supported\")\nexcept ProviderRateLimitError as e:\n    print(f\"Rate limited, retry after {e.retry_after}s\")\n```\n\n## Key Features Summary\n\n- Unified Interface: Switch providers without code changes\n- Performance: Streaming, prompt caching, optimized requests\n- Reliability: Circuit breakers, retries, typed error handling\n- Observability: Cost tracking, on-demand receipt generation, batch certification\n- Flexibility: Provider-specific features and raw SDK access\n- Standards: Type-safe, well-tested\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for your changes\n4. Ensure all tests pass: `uv run pytest`\n5. Submit a pull request\n\n## Examples\n\nSee the `examples/` directory for complete working examples:\n- Basic chat and streaming\n- Tool calling and function execution\n- Provider-specific features\n- MCP integration\n- On-demand receipt generation and cost tracking\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Unified Python interface for OpenAI, Anthropic, Google, and Ollama LLMs",
    "version": "1.2.0",
    "project_urls": {
        "Documentation": "https://github.com/juanre/llmring",
        "Homepage": "https://llmring.ai",
        "Issues": "https://github.com/juanre/llmring/issues",
        "Repository": "https://github.com/juanre/llmring"
    },
    "split_keywords": [
        "ai",
        " anthropic",
        " api",
        " claude",
        " gemini",
        " gpt",
        " llm",
        " ollama",
        " openai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f89a76f38b622b5036ee4d6e6ee2fdc5c992b9ad8c6e5f887f5becf64295444b",
                "md5": "6b65346062239b3eac9ed753223d0d58",
                "sha256": "f6cb2d431078bd3089e38c1a52c4553e9616f4c28f059922793e3434f90f6b87"
            },
            "downloads": -1,
            "filename": "llmring-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6b65346062239b3eac9ed753223d0d58",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 268762,
            "upload_time": "2025-10-26T16:31:30",
            "upload_time_iso_8601": "2025-10-26T16:31:30.962701Z",
            "url": "https://files.pythonhosted.org/packages/f8/9a/76f38b622b5036ee4d6e6ee2fdc5c992b9ad8c6e5f887f5becf64295444b/llmring-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a9bfd9be61f9cdb2a91043ae73a66b2359fddeb965a53d027eac6aed024a99b5",
                "md5": "fd27a879b8594e6cd5d18e88fb8684d7",
                "sha256": "4075dff7544f9a5be88c63fce4723dce6eb64765e0183c69583b86ed2a319218"
            },
            "downloads": -1,
            "filename": "llmring-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "fd27a879b8594e6cd5d18e88fb8684d7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 214492,
            "upload_time": "2025-10-26T16:31:32",
            "upload_time_iso_8601": "2025-10-26T16:31:32.683071Z",
            "url": "https://files.pythonhosted.org/packages/a9/bf/d9be61f9cdb2a91043ae73a66b2359fddeb965a53d027eac6aed024a99b5/llmring-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-26 16:31:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "juanre",
    "github_project": "llmring",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llmring"
}

None