onellm

Name	onellm JSON
Version	0.1.1 JSON
	download
home_page	https://github.com/muxi-ai/onellm
Summary	A unified interface for interacting with large language models from various providers - a complete drop-in replacement for OpenAI's client
upload_time	2025-07-14 16:29:56
maintainer	None
docs_url	None
author	Ran Aroussi
requires_python	>=3.10
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            # OneLLM

[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.python.org/pypi/onellm)
[![Version](https://img.shields.io/pypi/v/onellm.svg?maxAge=60)](https://pypi.python.org/pypi/onellm)
[![Status](https://img.shields.io/pypi/status/onellm.svg?maxAge=60)](https://pypi.python.org/pypi/onellm)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Test Coverage](https://img.shields.io/badge/coverage-96%25-brightgreen)](https://github.com/muxi-ai/onellm)
&nbsp;
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/muxi-ai/onellm)

### A "drop-in" replacement for OpenAI's client that offers a unified interface for interacting with large language models from various providers,  with support for hundreds of models, built-in fallback mechanisms, and enhanced reliability features.

---

> [!IMPORTANT]
> #### Support this project by starring this repo on GitHub!
>
> More stars → more visibility → more contributors → better features → more robust tool for everyone 🎉
>
> [🌟 Star this repo on GitHub →](https://github.com/muxi-ai/onellm)
>
> Thank you for your support! 🙏

---

## 📚 Table of Contents

- [Overview](https://github.com/muxi-ai/onellm/blob/main/README.md#-overview)
- [Getting Started](https://github.com/muxi-ai/onellm/blob/main/README.md#-getting-started)
- [Key Features](https://github.com/muxi-ai/onellm/blob/main/README.md#-key-features)
- [Supported Providers](https://github.com/muxi-ai/onellm/blob/main/README.md#-supported-providers)
- [Architecture](https://github.com/muxi-ai/onellm/blob/main/README.md#-architecture)
- [API Design](https://github.com/muxi-ai/onellm/blob/main/README.md#-api-design)
- [Advanced Features](https://github.com/muxi-ai/onellm/blob/main/README.md#-advanced-features)
- [Migration from OpenAI](https://github.com/muxi-ai/onellm/blob/main/README.md#-migration-from-openai)
- [Model Naming Convention](https://github.com/muxi-ai/onellm/blob/main/README.md#-model-naming-convention)
- [Configuration](https://github.com/muxi-ai/onellm/blob/main/README.md#-configuration)
- [Test Coverage](https://github.com/muxi-ai/onellm/blob/main/README.md#-test-coverage)
- [Documentation](https://github.com/muxi-ai/onellm/blob/main/README.md#-documentation)
- [Call for Contributions](https://github.com/muxi-ai/onellm/blob/main/README.md#-call-for-contributions)
- [License](https://github.com/muxi-ai/onellm/blob/main/README.md#-license)
- [Acknowledgements](https://github.com/muxi-ai/onellm/blob/main/README.md#-acknowledgements)


## 👉 Overview

**OneLLM** is a lightweight, provider-agnostic Python library that offers a unified interface for interacting with large language models (LLMs) from various providers. It simplifies the integration of LLMs into applications by providing a consistent API while abstracting away provider-specific implementation details.

The library follows the OpenAI client API design pattern, making it familiar to developers already using OpenAI and enabling easy migration for existing applications. **Simply change your import statements and instantly gain access to hundreds of models** across dozens of providers while maintaining your existing code structure.

With support for 19 implemented providers (and more planned), OneLLM gives you access to approximately 300+ unique language models through a single, consistent interface - from the latest proprietary models to open-source alternatives, all accessible through familiar OpenAI-compatible patterns.

> [!NOTE]
> **Ready for Use**: OneLLM now supports 19 providers with 300+ models! From cloud APIs to local models, you can access them all through a single, unified interface. [Contributions are welcome](./CONTRIBUTING.md) to help add even more providers!

---

## 🚀 Getting Started

### Installation

```bash
# Basic installation (includes OpenAI compatibility and download utility)
pip install OneLLM

# For all providers (includes dependencies for future provider support)
pip install "OneLLM[all]"
```

### Download Models for Local Use

OneLLM includes a built-in utility for downloading GGUF models:

```bash
# Download a model from HuggingFace (saves to ~/llama_models by default)
onellm download --repo-id "TheBloke/Llama-2-7B-GGUF" --filename "llama-2-7b.Q4_K_M.gguf"

# Download to a custom directory
onellm download -r "microsoft/Phi-3-mini-4k-instruct-gguf" -f "Phi-3-mini-4k-instruct-q4.gguf" -o /path/to/models
```

### Quick Win: Your First LLM Call

```python
# Basic usage with OpenAI-compatible syntax
from onellm import ChatCompletion

response = ChatCompletion.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message["content"])
# Output: I'm doing well, thank you for asking! I'm here and ready to help you...
```

For more detailed examples, check out the [examples directory](./examples).

---

## ✨ Key Features


| Feature | Description |
|---------|-------------|
| **📦 Drop-in replacement** | Use your existing OpenAI code with minimal changes |
| **🔄 Provider-agnostic** | Support for 300+ models across 19 implemented providers |
| **🔁 Automatic fallback** | Seamlessly switch to alternative models when needed |
| **🔄 Auto-retry mechanism** | Retry the same model multiple times before failing |
| **🧩 OpenAI-compatible** | Familiar interface for developers used to OpenAI |
| **📺 Streaming support** | Real-time streaming responses from supported providers |
| **🖼️ Multi-modal capabilities** | Support for text, images, audio across compatible models |
| **🏠 Local LLM support** | Run models locally via Ollama and llama.cpp |
| **⬇️ Model downloads** | Built-in CLI to download GGUF models from HuggingFace |
| **🧹 Unicode artifact cleaning** | Automatic removal of invisible characters to prevent AI detection |
| **🏷️ Consistent naming** | Clear `provider/model-name` format for attribution |
| **🧪 Comprehensive tests** | 96% test coverage ensuring reliability |
| **📄 Apache-2.0 license** | Open-source license that protects contributions |

---

## 🌐 Supported Providers

OneLLM currently supports **19 providers** with more on the way:

### Cloud API Providers (17)
- **Anthropic** - Claude family of models
- **Anyscale** - Configurable AI platform
- **AWS Bedrock** - Access to multiple model families
- **Azure OpenAI** - Microsoft-hosted OpenAI models
- **Cohere** - Command models with RAG
- **DeepSeek** - Chinese LLM provider
- **Fireworks** - Fast inference platform
- **Moonshot** - Kimi models with long-context capabilities
- **Google AI Studio** - Gemini models via API key
- **Groq** - Ultra-fast inference for Llama, Mixtral
- **Mistral** - Mistral Large, Medium, Small
- **OpenAI** - GPT-4o, 3o-mini, DALL-E, Whisper, etc.
- **OpenRouter** - Gateway to 100+ models
- **Perplexity** - Search-augmented models
- **Together AI** - Open-source model hosting
- **Vertex AI** - Google Cloud's enterprise Gemini
- **X.AI** - Grok models

### Local Providers (2)
- **Ollama** - Run models locally with easy management
- **llama.cpp** - Direct GGUF model execution

### Notable Models Available

Through these providers, you gain access to hundreds of models, including:

<div align="center">

<!-- Model categories -->
<table>
  <tr>
    <th>Model Family</th>
    <th>Notable Models</th>
  </tr>
  <tr>
    <td><strong>OpenAI Family</strong></td>
    <td>GPT-4o, GPT-4 Turbo, o3</td>
  </tr>
  <tr>
    <td><strong>Claude Family</strong></td>
    <td>Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku</td>
  </tr>
  <tr>
    <td><strong>Llama Family</strong></td>
    <td>Llama 3 70B, Llama 3 8B, Code Llama</td>
  </tr>
  <tr>
    <td><strong>Mistral Family</strong></td>
    <td>Mistral Large, Mistral 7B, Mixtral</td>
  </tr>
  <tr>
    <td><strong>Gemini Family</strong></td>
    <td>Gemini Pro, Gemini Ultra, Gemini Flash</td>
  </tr>
  <tr>
    <td><strong>Embeddings</strong></td>
    <td>Ada-002, text-embedding-3-small/large, Cohere embeddings</td>
  </tr>
  <tr>
    <td><strong>Multimodal</strong></td>
    <td>GPT-4 Vision, Claude 3 Vision, Gemini Pro Vision</td>
  </tr>
</table>

</div>

---

## 🏗️ Architecture

OneLLM follows a modular architecture with clear separation of concerns:

```mermaid
---
config:
  look: handDrawn
  theme: mc
  themeVariables:
    background: 'transparent'
    primaryColor: '#fff0'
    secondaryColor: 'transparent'
    tertiaryColor: 'transparent'
    mainBkg: 'transparent'

  flowchart:
    layout: fixed
---
flowchart TD
    %% User API Layer
    User(User Application) --> ChatCompletion
    User --> Completion
    User --> Embedding
    User --> OpenAIClient["OpenAI Client Interface"]

    subgraph API["Public API Layer"]
        ChatCompletion["ChatCompletion\n.create() / .acreate()"]
        Completion["Completion\n.create() / .acreate()"]
        Embedding["Embedding\n.create() / .acreate()"]
        OpenAIClient
    end

    %% Core logic
    subgraph Core["Core Logic"]
        Router["Provider Router"]
        Config["Configuration\nEnvironment Variables\nAPI Keys"]
        FallbackManager["Fallback Manager"]
        RetryManager["Retry Manager"]
    end

    %% Provider Layer
    BaseProvider["Provider Interface<br>(Base Class)"]

    subgraph Implementations["Provider Implementations"]
        OpenAI["OpenAI"]
        Anthropic["Anthropic"]
        GoogleProvider["Google"]
        Groq["Groq"]
        Ollama["Local LLMs"]
        OtherProviders["20+ Others"]
    end

    %% Utilities
    subgraph Utilities["Utilities"]
        Streaming["Streaming<br>Handlers"]
        TokenCounting["Token<br>Counter"]
        ErrorHandling["Error<br>Handling"]
        Types["Type<br>Definitions"]
        Models["Response<br>Models"]
    end

    %% External services
    OpenAIAPI["OpenAI API"]
    AnthropicAPI["Anthropic API"]
    GoogleAPI["Google API"]
    GroqAPI["Groq API"]
    LocalModels["Ollama/llama.cpp"]
    OtherAPIs["..."]

    %% Connections
    ChatCompletion --> Router
    Completion --> Router
    Embedding --> Router
    OpenAIClient --> Router

    Router --> Config
    Router --> FallbackManager
    FallbackManager --> RetryManager

    RetryManager --> BaseProvider

    BaseProvider --> OpenAI
    BaseProvider --> Anthropic
    BaseProvider --> GoogleProvider
    BaseProvider --> Groq
    BaseProvider --> Ollama
    BaseProvider --> OtherProviders

    BaseProvider --> Streaming
    BaseProvider --> TokenCounting
    BaseProvider --> ErrorHandling

    OpenAI --> OpenAIAPI
    Anthropic --> AnthropicAPI
    GoogleProvider --> GoogleAPI
    Groq --> GroqAPI
    Ollama --> LocalModels
    OtherProviders --> OtherAPIs
```

### Core Components

1. **Public Interface Classes**
   - `ChatCompletion` - For chat-based interactions
   - `Completion` - For text completion
   - `Embedding` - For generating embeddings

2. **Provider Interface**
   - Abstract base class defining required methods
   - Provider-specific implementations

3. **Configuration System**
   - Environment variable support
   - Runtime configuration options

4. **Error Handling**
   - Standardized error types
   - Provider-specific error mapping

5. **Fallback System**
   - Automatic retries with alternative models
   - Configurable fallback chains
   - Graceful degradation options

---

## 🔌 API Design

OneLLM mirrors the OpenAI Python client library API for familiarity:

### Chat Completions

```python
from onellm import ChatCompletion

# Basic usage (identical to OpenAI's client)
response = ChatCompletion.create(
    model="openai/gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

# Streaming
for chunk in ChatCompletion.create(
    model="anthropic/claude-3-opus",
    messages=[{"role": "user", "content": "Write a poem about AI"}],
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)

# With fallback options
response = ChatCompletion.create(
    model=["openai/gpt-4", "anthropic/claude-3-opus", "mistral/mistral-large"],
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# With retries and fallbacks for enhanced reliability
response = ChatCompletion.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    retries=3,  # Try gpt-4 up to 3 additional times before failing or using fallbacks
    fallback_models=["anthropic/claude-3-opus", "mistral/mistral-large"]
)

# JSON mode for structured outputs
response = ChatCompletion.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "user", "content": "List the top 3 planets by size"}
    ],
    response_format={"type": "json_object"}  # Request structured JSON output
)
print(response.choices[0].message["content"])  # Outputs valid, parseable JSON

# Multi-modal with images
response = ChatCompletion.create(
    model="openai/gpt-4-vision",
    messages=[
        {"role": "user", "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]}
    ]
)

# Multi-modal with audio
response = ChatCompletion.create(
    model="anthropic/claude-3-sonnet",
    messages=[
        {"role": "user", "content": [
            {"type": "text", "text": "Transcribe and analyze this audio clip"},
            {"type": "audio_url", "audio_url": {"url": "https://example.com/audio.mp3"}}
        ]}
    ]
)
```

### Completions

```python
from onellm import Completion

response = Completion.create(
    model="groq/llama3-70b",
    prompt="Once upon a time",
    max_tokens=100
)
```

### Embeddings

```python
from onellm import Embedding

response = Embedding.create(
    model="openai/text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)
```

---

## 🛠️ Advanced Features

### Fallback Chains for Enhanced Reliability

OneLLM includes built-in fallback support to handle API errors gracefully:

```python
response = ChatCompletion.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    fallback_models=[
        "anthropic/claude-3-haiku",  # Try Claude if GPT-4 fails
        "openai/gpt-3.5-turbo"       # Try GPT-3.5 if Claude fails
    ]
)
```

If the primary model fails due to service unavailability, rate limiting, or other retriable errors, OneLLM automatically tries the fallback models in sequence.

### Automatic Retries

For transient errors, you can configure OneLLM to retry the same model multiple times before falling back to alternatives:

```python
response = ChatCompletion.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    retries=3,  # Will try the same model up to 3 additional times if it fails
    fallback_models=["anthropic/claude-3-haiku", "openai/gpt-3.5-turbo"]
)
```

This is implemented using the fallback mechanism under the hood, making it both powerful and efficient.

### Fallback Chains + Automatic Retries Architecture

```mermaid
---
config:
  look: handDrawn
  theme: mc
  themeVariables:
    background: 'transparent'
    primaryColor: '#fff0'
    secondaryColor: 'transparent'
    tertiaryColor: 'transparent'
    mainBkg: 'transparent'

  flowchart:
    layout: fixed
---
flowchart TD
    START(["Client Request"]) --> REQUEST["Chat/Completion Request"]
    REQUEST --> PRIMARY["Primary Model<br>e.g., openai/gpt-4"]

    PRIMARY --> API_CHECK{"API<br>Available?"}
    API_CHECK -->|Yes| MODEL_CHECK{"Model<br>Available?"}
    MODEL_CHECK -->|Yes| QUOTA_CHECK{"Quota/Rate<br>Limits OK?"}
    QUOTA_CHECK -->|Yes| SUCCESS["Successful Response"]
    SUCCESS --> RESPONSE(["Return to Client"])

    API_CHECK -->|No| RETRY_DECISION{"Retry<br>Count < Max?"}
    MODEL_CHECK -->|No| RETRY_DECISION
    QUOTA_CHECK -->|No| RETRY_DECISION

    RETRY_DECISION -->|Yes| RETRY["Retry with Delay<br>(Same Model)"]
    RETRY --> PRIMARY

    RETRY_DECISION -->|No| FALLBACK_CHECK{"Fallbacks<br>Available?"}

    FALLBACK_CHECK -->|Yes| FALLBACK_MODEL["Next Fallback Model<br>e.g., anthropic/claude-3-haiku"]
    FALLBACK_MODEL --> FALLBACK_TRY["Try Fallback"]
    FALLBACK_TRY --> FALLBACK_API_CHECK{"API<br>Available?"}

    FALLBACK_API_CHECK -->|Yes| FALLBACK_SUCCESS["Successful Response"]
    FALLBACK_SUCCESS --> RESPONSE

    FALLBACK_API_CHECK -->|No| NEXT_FALLBACK{"More<br>Fallbacks?"}
    NEXT_FALLBACK -->|Yes| FALLBACK_MODEL
    NEXT_FALLBACK -->|No| ERROR["Error Response"]

    FALLBACK_CHECK -->|No| ERROR
    ERROR --> RESPONSE
```



### JSON Mode for Structured Outputs

For applications that require structured data, OneLLM supports JSON mode with compatible providers:

```python
response = ChatCompletion.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "user", "content": "List the top 3 programming languages with their key features"}
    ],
    response_format={"type": "json_object"}  # Request JSON output
)

# The response contains valid, parseable JSON
json_response = response.choices[0].message["content"]
print(json_response)  # Structured JSON data

# Parse it with standard libraries
import json
structured_data = json.loads(json_response)
```

When using providers that don't natively support JSON mode, OneLLM automatically adds system instructions requesting JSON-formatted responses.

### Asynchronous Support

Both synchronous and asynchronous APIs are available:

```python
# Synchronous
response = ChatCompletion.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# Asynchronous with fallbacks and retries
response = await ChatCompletion.acreate(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    retries=2,
    fallback_models=["anthropic/claude-3-opus"]
)
```

---

## 🔄 Migration from OpenAI

OneLLM provides multiple ways to migrate from the OpenAI client, including a fully compatible client interface:

### Option 1: Complete Drop-in Replacement (Identical Interface)

OneLLM is a complete drop-in replacement for the OpenAI client with the OpenAI library included as a default dependency:

```python
# Before
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello world"}]
)

# After - 100% identical client interface
from onellm import OpenAI  # or Client
client = OpenAI()            # completely compatible with OpenAI's client
response = client.chat.completions.create(
    model="gpt-4",  # automatically adds "openai/" prefix when needed
    messages=[{"role": "user", "content": "Hello world"}]
)
```

### Option 2: Streamlined Direct API (Fewer Lines)

```python
# Before
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello world"}]
)

# After - more concise
from onellm import ChatCompletion
response = ChatCompletion.create(
    model="openai/gpt-4",  # explicitly using provider prefix
    messages=[{"role": "user", "content": "Hello world"}]
)
```

### Option 3: Model Fallback (Enhanced Reliability)

```python
# Adding fallback options with ChatCompletion
from onellm import ChatCompletion
response = ChatCompletion.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Hello world"}],
    fallback_models=[
        "anthropic/claude-3-haiku",
        "openai/gpt-3.5-turbo"
    ],
    # optional config
    fallback_config={
        "log_fallbacks": True
    }
)

# Using retries with fallbacks for enhanced reliability
from onellm import ChatCompletion
response = ChatCompletion.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Hello world"}],
    retries=3,  # Will retry gpt-4 up to 3 additional times before using fallbacks
    fallback_models=[
        "anthropic/claude-3-haiku",
        "openai/gpt-3.5-turbo"
    ]
)

# Using fallback with the client interface
from onellm import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Hello world"}],
    fallback_models=[
        "anthropic/claude-3-opus",
        "groq/llama3-70b"
    ]
)
```

---

## 🏷️ Model Naming Convention

Models are specified using a provider prefix to clearly identify the source:

<!-- Model naming examples -->
<table>
  <tr>
    <th>Provider</th>
    <th>Format</th>
    <th>Example</th>
  </tr>
  <tr>
    <td>OpenAI</td>
    <td><code>openai/{model}</code></td>
    <td><code>openai/gpt-4</code></td>
  </tr>
  <tr>
    <td>Google</td>
    <td><code>google/{model}</code></td>
    <td><code>google/gemini-pro</code></td>
  </tr>
  <tr>
    <td>Anthropic</td>
    <td><code>anthropic/{model}</code></td>
    <td><code>anthropic/claude-3-opus</code></td>
  </tr>
  <tr>
    <td>Groq</td>
    <td><code>groq/{model}</code></td>
    <td><code>groq/llama3-70b</code></td>
  </tr>
  <tr>
    <td>Mistral</td>
    <td><code>mistral/{model}</code></td>
    <td><code>mistral/mistral-large</code></td>
  </tr>
  <tr>
    <td>Ollama</td>
    <td><code>ollama/{model}@host:port</code></td>
    <td><code>ollama/llama3:8b@localhost:11434</code></td>
  </tr>
  <tr>
    <td>llama.cpp</td>
    <td><code>llama_cpp/{model.gguf}</code></td>
    <td><code>llama_cpp/llama-3-8b-q4_K_M.gguf</code></td>
  </tr>
  <tr>
    <td>XAI (Grok)</td>
    <td><code>xai/{model}</code></td>
    <td><code>xai/grok-beta</code></td>
  </tr>
  <tr>
    <td>Cohere</td>
    <td><code>cohere/{model}</code></td>
    <td><code>cohere/command-r-plus</code></td>
  </tr>
  <tr>
    <td>AWS Bedrock</td>
    <td><code>bedrock/{model}</code></td>
    <td><code>bedrock/claude-3-5-sonnet</code></td>
  </tr>
</table>

---

## ⚙️ Configuration

OneLLM can be configured through environment variables or at runtime:

```python
# Environment variables
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-...

# Runtime configuration
import onellm

onellm.openai_api_key = "sk-..."  # OpenAI API key
onellm.anthropic_api_key = "sk-..."  # Anthropic API key

# Configure fallback behavior
onellm.config.fallback = {
    "enabled": True,
    "default_chains": {
        "chat": ["openai/gpt-4", "anthropic/claude-3-opus", "groq/llama3-70b"],
        "embedding": ["openai/text-embedding-3-small", "cohere/embed-english"]
    },
    "retry_delay": 1.0,
    "max_retries": 3
}
```

---

## 🧪 Test Coverage

OneLLM maintains comprehensive test coverage to ensure reliability and compatibility:

<!-- Test coverage visualization -->
<table>
  <tr>
    <th>Coverage Metric</th>
    <th>Value</th>
  </tr>
  <tr>
    <td>Overall Package Coverage</td>
    <td>96% (357 passing tests)</td>
  </tr>
  <tr>
    <td>Modules with 100% Coverage</td>
    <td>16 modules</td>
  </tr>
  <tr>
    <td>Minimum Module Coverage</td>
    <td>90%</td>
  </tr>
  <tr>
    <td>Provider Implementations</td>
    <td>93-95% coverage</td>
  </tr>
</table>


This extensive test coverage ensures reliable operation across all supported providers and models.

---

## 📖 Documentation

OneLLM uses a code-first documentation approach:

1. **Examples Directory**: The `examples/` directory contains well-documented example scripts that demonstrate all key features of the library:
   - Each example includes detailed frontmatter explaining its purpose and relationship to the codebase
   - Examples range from basic usage to advanced features like fallbacks, retries, and multi-modal capabilities
   - Running the examples is the fastest way to understand the library's capabilities

2. **Code Docstrings**: All public APIs, classes, and methods have comprehensive docstrings:
   - Detailed parameter descriptions
   - Return value documentation
   - Exception information
   - Usage examples

3. **Type Annotations**: The codebase uses Python type annotations throughout:
   - Provides IDE autocompletion support
   - Makes argument requirements clear
   - Enables static type checking

This approach keeps documentation tightly coupled with code, ensuring it stays up-to-date as the library evolves. To get started, we recommend examining the examples that match your use case.

---

## 🤝 Call for Contributions

We're building something amazing with OneLLM, and we'd love your help to make it even better! There are many ways to contribute:

- **Code contributions**: Add new providers, enhance existing ones, or improve core functionality
- **Bug reports**: Help us identify and fix issues
- **Documentation**: Improve examples, clarify API usage, or fix typos
- **Feature requests**: Share your ideas for making OneLLM more powerful
- **Testing**: Help ensure reliability across different environments and use cases

**Getting started is easy:**

1. Check out our [open issues](https://github.com/muxi-ai/onellm/issues) for good first contributions
2. Fork the repository and create a feature branch
3. Make your improvements and run tests
4. Submit a pull request with a clear description of your changes

For complete details on contribution guidelines, code style, provider development, testing standards, and our contributor license agreement, please read our [CONTRIBUTING.md](./CONTRIBUTING.md).

---

## 📄 License

OneLLM is licensed under the [Apache-2.0 license](./LICENSE).

### Why Apache 2.0?

I chose the Apache 2.0 license to make OneLLM easy to adopt, integrate, and build on. This license:

- Allows you to freely use, modify, and distribute the library in both open-source and proprietary software
- Encourages wide adoption by individuals, startups, and enterprises alike
- Includes a clear patent grant for legal peace of mind
- Enables flexible usage without the complexity of copyleft restrictions

Whether you’re building internal tools or commercial applications, Apache 2.0 gives you the freedom to use OneLLM however you need – no strings attached.

---

## 🌟 Acknowledgements

This project stands on the shoulders of many excellent open-source projects and would not be possible without the collaborative spirit of the developer community.

Special thanks to all the LLM providers whose APIs this library integrates with, and to the early adopters who tested the library and provided crucial feedback.

## 🙏 Thank You

Thank you for trying out OneLLM! Your interest and support mean a lot to this project. Whether you're using it in your applications, experimenting with different LLM providers, or just exploring the capabilities, your participation helps drive this project forward.

If you find OneLLM useful in your work:

- Consider starring the repository on GitHub
- Share your experiences or use cases with the community
- Let us know how we can make it better for your needs

Thank you for your support!

~ **Ran Aroussi**<br>
𝕏 / [@aroussi](https://x.com/aroussi)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/muxi-ai/onellm",
    "name": "onellm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ran Aroussi",
    "author_email": "Ran Aroussi <ran@aroussi.com>",
    "download_url": "https://files.pythonhosted.org/packages/37/2d/07b8a2012f695fb381cabbf6ee176dbc7ed858752bf9e36a41fdac28c70c/onellm-0.1.1.tar.gz",
    "platform": null,
    "description": "# OneLLM\n\n[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.python.org/pypi/onellm)\n[![Version](https://img.shields.io/pypi/v/onellm.svg?maxAge=60)](https://pypi.python.org/pypi/onellm)\n[![Status](https://img.shields.io/pypi/status/onellm.svg?maxAge=60)](https://pypi.python.org/pypi/onellm)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Test Coverage](https://img.shields.io/badge/coverage-96%25-brightgreen)](https://github.com/muxi-ai/onellm)\n&nbsp;\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/muxi-ai/onellm)\n\n### A \"drop-in\" replacement for OpenAI's client that offers a unified interface for interacting with large language models from various providers,  with support for hundreds of models, built-in fallback mechanisms, and enhanced reliability features.\n\n---\n\n> [!IMPORTANT]\n> #### Support this project by starring this repo on GitHub!\n>\n> More stars \u2192 more visibility \u2192 more contributors \u2192 better features \u2192 more robust tool for everyone \ud83c\udf89\n>\n> [\ud83c\udf1f Star this repo on GitHub \u2192](https://github.com/muxi-ai/onellm)\n>\n> Thank you for your support! \ud83d\ude4f\n\n---\n\n## \ud83d\udcda Table of Contents\n\n- [Overview](https://github.com/muxi-ai/onellm/blob/main/README.md#-overview)\n- [Getting Started](https://github.com/muxi-ai/onellm/blob/main/README.md#-getting-started)\n- [Key Features](https://github.com/muxi-ai/onellm/blob/main/README.md#-key-features)\n- [Supported Providers](https://github.com/muxi-ai/onellm/blob/main/README.md#-supported-providers)\n- [Architecture](https://github.com/muxi-ai/onellm/blob/main/README.md#-architecture)\n- [API Design](https://github.com/muxi-ai/onellm/blob/main/README.md#-api-design)\n- [Advanced Features](https://github.com/muxi-ai/onellm/blob/main/README.md#-advanced-features)\n- [Migration from OpenAI](https://github.com/muxi-ai/onellm/blob/main/README.md#-migration-from-openai)\n- [Model Naming Convention](https://github.com/muxi-ai/onellm/blob/main/README.md#-model-naming-convention)\n- [Configuration](https://github.com/muxi-ai/onellm/blob/main/README.md#-configuration)\n- [Test Coverage](https://github.com/muxi-ai/onellm/blob/main/README.md#-test-coverage)\n- [Documentation](https://github.com/muxi-ai/onellm/blob/main/README.md#-documentation)\n- [Call for Contributions](https://github.com/muxi-ai/onellm/blob/main/README.md#-call-for-contributions)\n- [License](https://github.com/muxi-ai/onellm/blob/main/README.md#-license)\n- [Acknowledgements](https://github.com/muxi-ai/onellm/blob/main/README.md#-acknowledgements)\n\n\n## \ud83d\udc49 Overview\n\n**OneLLM** is a lightweight, provider-agnostic Python library that offers a unified interface for interacting with large language models (LLMs) from various providers. It simplifies the integration of LLMs into applications by providing a consistent API while abstracting away provider-specific implementation details.\n\nThe library follows the OpenAI client API design pattern, making it familiar to developers already using OpenAI and enabling easy migration for existing applications. **Simply change your import statements and instantly gain access to hundreds of models** across dozens of providers while maintaining your existing code structure.\n\nWith support for 19 implemented providers (and more planned), OneLLM gives you access to approximately 300+ unique language models through a single, consistent interface - from the latest proprietary models to open-source alternatives, all accessible through familiar OpenAI-compatible patterns.\n\n> [!NOTE]\n> **Ready for Use**: OneLLM now supports 19 providers with 300+ models! From cloud APIs to local models, you can access them all through a single, unified interface. [Contributions are welcome](./CONTRIBUTING.md) to help add even more providers!\n\n---\n\n## \ud83d\ude80 Getting Started\n\n### Installation\n\n```bash\n# Basic installation (includes OpenAI compatibility and download utility)\npip install OneLLM\n\n# For all providers (includes dependencies for future provider support)\npip install \"OneLLM[all]\"\n```\n\n### Download Models for Local Use\n\nOneLLM includes a built-in utility for downloading GGUF models:\n\n```bash\n# Download a model from HuggingFace (saves to ~/llama_models by default)\nonellm download --repo-id \"TheBloke/Llama-2-7B-GGUF\" --filename \"llama-2-7b.Q4_K_M.gguf\"\n\n# Download to a custom directory\nonellm download -r \"microsoft/Phi-3-mini-4k-instruct-gguf\" -f \"Phi-3-mini-4k-instruct-q4.gguf\" -o /path/to/models\n```\n\n### Quick Win: Your First LLM Call\n\n```python\n# Basic usage with OpenAI-compatible syntax\nfrom onellm import ChatCompletion\n\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4o-mini\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n    ]\n)\n\nprint(response.choices[0].message[\"content\"])\n# Output: I'm doing well, thank you for asking! I'm here and ready to help you...\n```\n\nFor more detailed examples, check out the [examples directory](./examples).\n\n---\n\n## \u2728 Key Features\n\n\n| Feature | Description |\n|---------|-------------|\n| **\ud83d\udce6 Drop-in replacement** | Use your existing OpenAI code with minimal changes |\n| **\ud83d\udd04 Provider-agnostic** | Support for 300+ models across 19 implemented providers |\n| **\ud83d\udd01 Automatic fallback** | Seamlessly switch to alternative models when needed |\n| **\ud83d\udd04 Auto-retry mechanism** | Retry the same model multiple times before failing |\n| **\ud83e\udde9 OpenAI-compatible** | Familiar interface for developers used to OpenAI |\n| **\ud83d\udcfa Streaming support** | Real-time streaming responses from supported providers |\n| **\ud83d\uddbc\ufe0f Multi-modal capabilities** | Support for text, images, audio across compatible models |\n| **\ud83c\udfe0 Local LLM support** | Run models locally via Ollama and llama.cpp |\n| **\u2b07\ufe0f Model downloads** | Built-in CLI to download GGUF models from HuggingFace |\n| **\ud83e\uddf9 Unicode artifact cleaning** | Automatic removal of invisible characters to prevent AI detection |\n| **\ud83c\udff7\ufe0f Consistent naming** | Clear `provider/model-name` format for attribution |\n| **\ud83e\uddea Comprehensive tests** | 96% test coverage ensuring reliability |\n| **\ud83d\udcc4 Apache-2.0 license** | Open-source license that protects contributions |\n\n---\n\n## \ud83c\udf10 Supported Providers\n\nOneLLM currently supports **19 providers** with more on the way:\n\n### Cloud API Providers (17)\n- **Anthropic** - Claude family of models\n- **Anyscale** - Configurable AI platform\n- **AWS Bedrock** - Access to multiple model families\n- **Azure OpenAI** - Microsoft-hosted OpenAI models\n- **Cohere** - Command models with RAG\n- **DeepSeek** - Chinese LLM provider\n- **Fireworks** - Fast inference platform\n- **Moonshot** - Kimi models with long-context capabilities\n- **Google AI Studio** - Gemini models via API key\n- **Groq** - Ultra-fast inference for Llama, Mixtral\n- **Mistral** - Mistral Large, Medium, Small\n- **OpenAI** - GPT-4o, 3o-mini, DALL-E, Whisper, etc.\n- **OpenRouter** - Gateway to 100+ models\n- **Perplexity** - Search-augmented models\n- **Together AI** - Open-source model hosting\n- **Vertex AI** - Google Cloud's enterprise Gemini\n- **X.AI** - Grok models\n\n### Local Providers (2)\n- **Ollama** - Run models locally with easy management\n- **llama.cpp** - Direct GGUF model execution\n\n### Notable Models Available\n\nThrough these providers, you gain access to hundreds of models, including:\n\n<div align=\"center\">\n\n<!-- Model categories -->\n<table>\n  <tr>\n    <th>Model Family</th>\n    <th>Notable Models</th>\n  </tr>\n  <tr>\n    <td><strong>OpenAI Family</strong></td>\n    <td>GPT-4o, GPT-4 Turbo, o3</td>\n  </tr>\n  <tr>\n    <td><strong>Claude Family</strong></td>\n    <td>Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku</td>\n  </tr>\n  <tr>\n    <td><strong>Llama Family</strong></td>\n    <td>Llama 3 70B, Llama 3 8B, Code Llama</td>\n  </tr>\n  <tr>\n    <td><strong>Mistral Family</strong></td>\n    <td>Mistral Large, Mistral 7B, Mixtral</td>\n  </tr>\n  <tr>\n    <td><strong>Gemini Family</strong></td>\n    <td>Gemini Pro, Gemini Ultra, Gemini Flash</td>\n  </tr>\n  <tr>\n    <td><strong>Embeddings</strong></td>\n    <td>Ada-002, text-embedding-3-small/large, Cohere embeddings</td>\n  </tr>\n  <tr>\n    <td><strong>Multimodal</strong></td>\n    <td>GPT-4 Vision, Claude 3 Vision, Gemini Pro Vision</td>\n  </tr>\n</table>\n\n</div>\n\n---\n\n## \ud83c\udfd7\ufe0f Architecture\n\nOneLLM follows a modular architecture with clear separation of concerns:\n\n```mermaid\n---\nconfig:\n  look: handDrawn\n  theme: mc\n  themeVariables:\n    background: 'transparent'\n    primaryColor: '#fff0'\n    secondaryColor: 'transparent'\n    tertiaryColor: 'transparent'\n    mainBkg: 'transparent'\n\n  flowchart:\n    layout: fixed\n---\nflowchart TD\n    %% User API Layer\n    User(User Application) --> ChatCompletion\n    User --> Completion\n    User --> Embedding\n    User --> OpenAIClient[\"OpenAI Client Interface\"]\n\n    subgraph API[\"Public API Layer\"]\n        ChatCompletion[\"ChatCompletion\\n.create() / .acreate()\"]\n        Completion[\"Completion\\n.create() / .acreate()\"]\n        Embedding[\"Embedding\\n.create() / .acreate()\"]\n        OpenAIClient\n    end\n\n    %% Core logic\n    subgraph Core[\"Core Logic\"]\n        Router[\"Provider Router\"]\n        Config[\"Configuration\\nEnvironment Variables\\nAPI Keys\"]\n        FallbackManager[\"Fallback Manager\"]\n        RetryManager[\"Retry Manager\"]\n    end\n\n    %% Provider Layer\n    BaseProvider[\"Provider Interface<br>(Base Class)\"]\n\n    subgraph Implementations[\"Provider Implementations\"]\n        OpenAI[\"OpenAI\"]\n        Anthropic[\"Anthropic\"]\n        GoogleProvider[\"Google\"]\n        Groq[\"Groq\"]\n        Ollama[\"Local LLMs\"]\n        OtherProviders[\"20+ Others\"]\n    end\n\n    %% Utilities\n    subgraph Utilities[\"Utilities\"]\n        Streaming[\"Streaming<br>Handlers\"]\n        TokenCounting[\"Token<br>Counter\"]\n        ErrorHandling[\"Error<br>Handling\"]\n        Types[\"Type<br>Definitions\"]\n        Models[\"Response<br>Models\"]\n    end\n\n    %% External services\n    OpenAIAPI[\"OpenAI API\"]\n    AnthropicAPI[\"Anthropic API\"]\n    GoogleAPI[\"Google API\"]\n    GroqAPI[\"Groq API\"]\n    LocalModels[\"Ollama/llama.cpp\"]\n    OtherAPIs[\"...\"]\n\n    %% Connections\n    ChatCompletion --> Router\n    Completion --> Router\n    Embedding --> Router\n    OpenAIClient --> Router\n\n    Router --> Config\n    Router --> FallbackManager\n    FallbackManager --> RetryManager\n\n    RetryManager --> BaseProvider\n\n    BaseProvider --> OpenAI\n    BaseProvider --> Anthropic\n    BaseProvider --> GoogleProvider\n    BaseProvider --> Groq\n    BaseProvider --> Ollama\n    BaseProvider --> OtherProviders\n\n    BaseProvider --> Streaming\n    BaseProvider --> TokenCounting\n    BaseProvider --> ErrorHandling\n\n    OpenAI --> OpenAIAPI\n    Anthropic --> AnthropicAPI\n    GoogleProvider --> GoogleAPI\n    Groq --> GroqAPI\n    Ollama --> LocalModels\n    OtherProviders --> OtherAPIs\n```\n\n### Core Components\n\n1. **Public Interface Classes**\n   - `ChatCompletion` - For chat-based interactions\n   - `Completion` - For text completion\n   - `Embedding` - For generating embeddings\n\n2. **Provider Interface**\n   - Abstract base class defining required methods\n   - Provider-specific implementations\n\n3. **Configuration System**\n   - Environment variable support\n   - Runtime configuration options\n\n4. **Error Handling**\n   - Standardized error types\n   - Provider-specific error mapping\n\n5. **Fallback System**\n   - Automatic retries with alternative models\n   - Configurable fallback chains\n   - Graceful degradation options\n\n---\n\n## \ud83d\udd0c API Design\n\nOneLLM mirrors the OpenAI Python client library API for familiarity:\n\n### Chat Completions\n\n```python\nfrom onellm import ChatCompletion\n\n# Basic usage (identical to OpenAI's client)\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n    ]\n)\n\n# Streaming\nfor chunk in ChatCompletion.create(\n    model=\"anthropic/claude-3-opus\",\n    messages=[{\"role\": \"user\", \"content\": \"Write a poem about AI\"}],\n    stream=True\n):\n    print(chunk.choices[0].delta.content, end=\"\", flush=True)\n\n# With fallback options\nresponse = ChatCompletion.create(\n    model=[\"openai/gpt-4\", \"anthropic/claude-3-opus\", \"mistral/mistral-large\"],\n    messages=[{\"role\": \"user\", \"content\": \"Explain quantum computing\"}]\n)\n\n# With retries and fallbacks for enhanced reliability\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Explain quantum computing\"}],\n    retries=3,  # Try gpt-4 up to 3 additional times before failing or using fallbacks\n    fallback_models=[\"anthropic/claude-3-opus\", \"mistral/mistral-large\"]\n)\n\n# JSON mode for structured outputs\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4o\",\n    messages=[\n        {\"role\": \"user\", \"content\": \"List the top 3 planets by size\"}\n    ],\n    response_format={\"type\": \"json_object\"}  # Request structured JSON output\n)\nprint(response.choices[0].message[\"content\"])  # Outputs valid, parseable JSON\n\n# Multi-modal with images\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4-vision\",\n    messages=[\n        {\"role\": \"user\", \"content\": [\n            {\"type\": \"text\", \"text\": \"What's in this image?\"},\n            {\"type\": \"image_url\", \"image_url\": {\"url\": \"https://example.com/image.jpg\"}}\n        ]}\n    ]\n)\n\n# Multi-modal with audio\nresponse = ChatCompletion.create(\n    model=\"anthropic/claude-3-sonnet\",\n    messages=[\n        {\"role\": \"user\", \"content\": [\n            {\"type\": \"text\", \"text\": \"Transcribe and analyze this audio clip\"},\n            {\"type\": \"audio_url\", \"audio_url\": {\"url\": \"https://example.com/audio.mp3\"}}\n        ]}\n    ]\n)\n```\n\n### Completions\n\n```python\nfrom onellm import Completion\n\nresponse = Completion.create(\n    model=\"groq/llama3-70b\",\n    prompt=\"Once upon a time\",\n    max_tokens=100\n)\n```\n\n### Embeddings\n\n```python\nfrom onellm import Embedding\n\nresponse = Embedding.create(\n    model=\"openai/text-embedding-3-small\",\n    input=\"The quick brown fox jumps over the lazy dog\"\n)\n```\n\n---\n\n## \ud83d\udee0\ufe0f Advanced Features\n\n### Fallback Chains for Enhanced Reliability\n\nOneLLM includes built-in fallback support to handle API errors gracefully:\n\n```python\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello, how are you?\"}],\n    fallback_models=[\n        \"anthropic/claude-3-haiku\",  # Try Claude if GPT-4 fails\n        \"openai/gpt-3.5-turbo\"       # Try GPT-3.5 if Claude fails\n    ]\n)\n```\n\nIf the primary model fails due to service unavailability, rate limiting, or other retriable errors, OneLLM automatically tries the fallback models in sequence.\n\n### Automatic Retries\n\nFor transient errors, you can configure OneLLM to retry the same model multiple times before falling back to alternatives:\n\n```python\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello, how are you?\"}],\n    retries=3,  # Will try the same model up to 3 additional times if it fails\n    fallback_models=[\"anthropic/claude-3-haiku\", \"openai/gpt-3.5-turbo\"]\n)\n```\n\nThis is implemented using the fallback mechanism under the hood, making it both powerful and efficient.\n\n### Fallback Chains + Automatic Retries Architecture\n\n```mermaid\n---\nconfig:\n  look: handDrawn\n  theme: mc\n  themeVariables:\n    background: 'transparent'\n    primaryColor: '#fff0'\n    secondaryColor: 'transparent'\n    tertiaryColor: 'transparent'\n    mainBkg: 'transparent'\n\n  flowchart:\n    layout: fixed\n---\nflowchart TD\n    START([\"Client Request\"]) --> REQUEST[\"Chat/Completion Request\"]\n    REQUEST --> PRIMARY[\"Primary Model<br>e.g., openai/gpt-4\"]\n\n    PRIMARY --> API_CHECK{\"API<br>Available?\"}\n    API_CHECK -->|Yes| MODEL_CHECK{\"Model<br>Available?\"}\n    MODEL_CHECK -->|Yes| QUOTA_CHECK{\"Quota/Rate<br>Limits OK?\"}\n    QUOTA_CHECK -->|Yes| SUCCESS[\"Successful Response\"]\n    SUCCESS --> RESPONSE([\"Return to Client\"])\n\n    API_CHECK -->|No| RETRY_DECISION{\"Retry<br>Count < Max?\"}\n    MODEL_CHECK -->|No| RETRY_DECISION\n    QUOTA_CHECK -->|No| RETRY_DECISION\n\n    RETRY_DECISION -->|Yes| RETRY[\"Retry with Delay<br>(Same Model)\"]\n    RETRY --> PRIMARY\n\n    RETRY_DECISION -->|No| FALLBACK_CHECK{\"Fallbacks<br>Available?\"}\n\n    FALLBACK_CHECK -->|Yes| FALLBACK_MODEL[\"Next Fallback Model<br>e.g., anthropic/claude-3-haiku\"]\n    FALLBACK_MODEL --> FALLBACK_TRY[\"Try Fallback\"]\n    FALLBACK_TRY --> FALLBACK_API_CHECK{\"API<br>Available?\"}\n\n    FALLBACK_API_CHECK -->|Yes| FALLBACK_SUCCESS[\"Successful Response\"]\n    FALLBACK_SUCCESS --> RESPONSE\n\n    FALLBACK_API_CHECK -->|No| NEXT_FALLBACK{\"More<br>Fallbacks?\"}\n    NEXT_FALLBACK -->|Yes| FALLBACK_MODEL\n    NEXT_FALLBACK -->|No| ERROR[\"Error Response\"]\n\n    FALLBACK_CHECK -->|No| ERROR\n    ERROR --> RESPONSE\n```\n\n\n\n### JSON Mode for Structured Outputs\n\nFor applications that require structured data, OneLLM supports JSON mode with compatible providers:\n\n```python\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4o\",\n    messages=[\n        {\"role\": \"user\", \"content\": \"List the top 3 programming languages with their key features\"}\n    ],\n    response_format={\"type\": \"json_object\"}  # Request JSON output\n)\n\n# The response contains valid, parseable JSON\njson_response = response.choices[0].message[\"content\"]\nprint(json_response)  # Structured JSON data\n\n# Parse it with standard libraries\nimport json\nstructured_data = json.loads(json_response)\n```\n\nWhen using providers that don't natively support JSON mode, OneLLM automatically adds system instructions requesting JSON-formatted responses.\n\n### Asynchronous Support\n\nBoth synchronous and asynchronous APIs are available:\n\n```python\n# Synchronous\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}]\n)\n\n# Asynchronous with fallbacks and retries\nresponse = await ChatCompletion.acreate(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n    retries=2,\n    fallback_models=[\"anthropic/claude-3-opus\"]\n)\n```\n\n---\n\n## \ud83d\udd04 Migration from OpenAI\n\nOneLLM provides multiple ways to migrate from the OpenAI client, including a fully compatible client interface:\n\n### Option 1: Complete Drop-in Replacement (Identical Interface)\n\nOneLLM is a complete drop-in replacement for the OpenAI client with the OpenAI library included as a default dependency:\n\n```python\n# Before\nfrom openai import OpenAI\nclient = OpenAI()\nresponse = client.chat.completions.create(\n    model=\"gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello world\"}]\n)\n\n# After - 100% identical client interface\nfrom onellm import OpenAI  # or Client\nclient = OpenAI()            # completely compatible with OpenAI's client\nresponse = client.chat.completions.create(\n    model=\"gpt-4\",  # automatically adds \"openai/\" prefix when needed\n    messages=[{\"role\": \"user\", \"content\": \"Hello world\"}]\n)\n```\n\n### Option 2: Streamlined Direct API (Fewer Lines)\n\n```python\n# Before\nfrom openai import OpenAI\nclient = OpenAI()\nresponse = client.chat.completions.create(\n    model=\"gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello world\"}]\n)\n\n# After - more concise\nfrom onellm import ChatCompletion\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",  # explicitly using provider prefix\n    messages=[{\"role\": \"user\", \"content\": \"Hello world\"}]\n)\n```\n\n### Option 3: Model Fallback (Enhanced Reliability)\n\n```python\n# Adding fallback options with ChatCompletion\nfrom onellm import ChatCompletion\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello world\"}],\n    fallback_models=[\n        \"anthropic/claude-3-haiku\",\n        \"openai/gpt-3.5-turbo\"\n    ],\n    # optional config\n    fallback_config={\n        \"log_fallbacks\": True\n    }\n)\n\n# Using retries with fallbacks for enhanced reliability\nfrom onellm import ChatCompletion\nresponse = ChatCompletion.create(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello world\"}],\n    retries=3,  # Will retry gpt-4 up to 3 additional times before using fallbacks\n    fallback_models=[\n        \"anthropic/claude-3-haiku\",\n        \"openai/gpt-3.5-turbo\"\n    ]\n)\n\n# Using fallback with the client interface\nfrom onellm import OpenAI\nclient = OpenAI()\nresponse = client.chat.completions.create(\n    model=\"openai/gpt-4\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello world\"}],\n    fallback_models=[\n        \"anthropic/claude-3-opus\",\n        \"groq/llama3-70b\"\n    ]\n)\n```\n\n---\n\n## \ud83c\udff7\ufe0f Model Naming Convention\n\nModels are specified using a provider prefix to clearly identify the source:\n\n<!-- Model naming examples -->\n<table>\n  <tr>\n    <th>Provider</th>\n    <th>Format</th>\n    <th>Example</th>\n  </tr>\n  <tr>\n    <td>OpenAI</td>\n    <td><code>openai/{model}</code></td>\n    <td><code>openai/gpt-4</code></td>\n  </tr>\n  <tr>\n    <td>Google</td>\n    <td><code>google/{model}</code></td>\n    <td><code>google/gemini-pro</code></td>\n  </tr>\n  <tr>\n    <td>Anthropic</td>\n    <td><code>anthropic/{model}</code></td>\n    <td><code>anthropic/claude-3-opus</code></td>\n  </tr>\n  <tr>\n    <td>Groq</td>\n    <td><code>groq/{model}</code></td>\n    <td><code>groq/llama3-70b</code></td>\n  </tr>\n  <tr>\n    <td>Mistral</td>\n    <td><code>mistral/{model}</code></td>\n    <td><code>mistral/mistral-large</code></td>\n  </tr>\n  <tr>\n    <td>Ollama</td>\n    <td><code>ollama/{model}@host:port</code></td>\n    <td><code>ollama/llama3:8b@localhost:11434</code></td>\n  </tr>\n  <tr>\n    <td>llama.cpp</td>\n    <td><code>llama_cpp/{model.gguf}</code></td>\n    <td><code>llama_cpp/llama-3-8b-q4_K_M.gguf</code></td>\n  </tr>\n  <tr>\n    <td>XAI (Grok)</td>\n    <td><code>xai/{model}</code></td>\n    <td><code>xai/grok-beta</code></td>\n  </tr>\n  <tr>\n    <td>Cohere</td>\n    <td><code>cohere/{model}</code></td>\n    <td><code>cohere/command-r-plus</code></td>\n  </tr>\n  <tr>\n    <td>AWS Bedrock</td>\n    <td><code>bedrock/{model}</code></td>\n    <td><code>bedrock/claude-3-5-sonnet</code></td>\n  </tr>\n</table>\n\n---\n\n## \u2699\ufe0f Configuration\n\nOneLLM can be configured through environment variables or at runtime:\n\n```python\n# Environment variables\n# OPENAI_API_KEY=sk-...\n# ANTHROPIC_API_KEY=sk-...\n\n# Runtime configuration\nimport onellm\n\nonellm.openai_api_key = \"sk-...\"  # OpenAI API key\nonellm.anthropic_api_key = \"sk-...\"  # Anthropic API key\n\n# Configure fallback behavior\nonellm.config.fallback = {\n    \"enabled\": True,\n    \"default_chains\": {\n        \"chat\": [\"openai/gpt-4\", \"anthropic/claude-3-opus\", \"groq/llama3-70b\"],\n        \"embedding\": [\"openai/text-embedding-3-small\", \"cohere/embed-english\"]\n    },\n    \"retry_delay\": 1.0,\n    \"max_retries\": 3\n}\n```\n\n---\n\n## \ud83e\uddea Test Coverage\n\nOneLLM maintains comprehensive test coverage to ensure reliability and compatibility:\n\n<!-- Test coverage visualization -->\n<table>\n  <tr>\n    <th>Coverage Metric</th>\n    <th>Value</th>\n  </tr>\n  <tr>\n    <td>Overall Package Coverage</td>\n    <td>96% (357 passing tests)</td>\n  </tr>\n  <tr>\n    <td>Modules with 100% Coverage</td>\n    <td>16 modules</td>\n  </tr>\n  <tr>\n    <td>Minimum Module Coverage</td>\n    <td>90%</td>\n  </tr>\n  <tr>\n    <td>Provider Implementations</td>\n    <td>93-95% coverage</td>\n  </tr>\n</table>\n\n\nThis extensive test coverage ensures reliable operation across all supported providers and models.\n\n---\n\n## \ud83d\udcd6 Documentation\n\nOneLLM uses a code-first documentation approach:\n\n1. **Examples Directory**: The `examples/` directory contains well-documented example scripts that demonstrate all key features of the library:\n   - Each example includes detailed frontmatter explaining its purpose and relationship to the codebase\n   - Examples range from basic usage to advanced features like fallbacks, retries, and multi-modal capabilities\n   - Running the examples is the fastest way to understand the library's capabilities\n\n2. **Code Docstrings**: All public APIs, classes, and methods have comprehensive docstrings:\n   - Detailed parameter descriptions\n   - Return value documentation\n   - Exception information\n   - Usage examples\n\n3. **Type Annotations**: The codebase uses Python type annotations throughout:\n   - Provides IDE autocompletion support\n   - Makes argument requirements clear\n   - Enables static type checking\n\nThis approach keeps documentation tightly coupled with code, ensuring it stays up-to-date as the library evolves. To get started, we recommend examining the examples that match your use case.\n\n---\n\n## \ud83e\udd1d Call for Contributions\n\nWe're building something amazing with OneLLM, and we'd love your help to make it even better! There are many ways to contribute:\n\n- **Code contributions**: Add new providers, enhance existing ones, or improve core functionality\n- **Bug reports**: Help us identify and fix issues\n- **Documentation**: Improve examples, clarify API usage, or fix typos\n- **Feature requests**: Share your ideas for making OneLLM more powerful\n- **Testing**: Help ensure reliability across different environments and use cases\n\n**Getting started is easy:**\n\n1. Check out our [open issues](https://github.com/muxi-ai/onellm/issues) for good first contributions\n2. Fork the repository and create a feature branch\n3. Make your improvements and run tests\n4. Submit a pull request with a clear description of your changes\n\nFor complete details on contribution guidelines, code style, provider development, testing standards, and our contributor license agreement, please read our [CONTRIBUTING.md](./CONTRIBUTING.md).\n\n---\n\n## \ud83d\udcc4 License\n\nOneLLM is licensed under the [Apache-2.0 license](./LICENSE).\n\n### Why Apache 2.0?\n\nI chose the Apache 2.0 license to make OneLLM easy to adopt, integrate, and build on. This license:\n\n- Allows you to freely use, modify, and distribute the library in both open-source and proprietary software\n- Encourages wide adoption by individuals, startups, and enterprises alike\n- Includes a clear patent grant for legal peace of mind\n- Enables flexible usage without the complexity of copyleft restrictions\n\nWhether you\u2019re building internal tools or commercial applications, Apache 2.0 gives you the freedom to use OneLLM however you need \u2013 no strings attached.\n\n---\n\n## \ud83c\udf1f Acknowledgements\n\nThis project stands on the shoulders of many excellent open-source projects and would not be possible without the collaborative spirit of the developer community.\n\nSpecial thanks to all the LLM providers whose APIs this library integrates with, and to the early adopters who tested the library and provided crucial feedback.\n\n## \ud83d\ude4f Thank You\n\nThank you for trying out OneLLM! Your interest and support mean a lot to this project. Whether you're using it in your applications, experimenting with different LLM providers, or just exploring the capabilities, your participation helps drive this project forward.\n\nIf you find OneLLM useful in your work:\n\n- Consider starring the repository on GitHub\n- Share your experiences or use cases with the community\n- Let us know how we can make it better for your needs\n\nThank you for your support!\n\n~ **Ran Aroussi**<br>\n\ud835\udd4f / [@aroussi](https://x.com/aroussi)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A unified interface for interacting with large language models from various providers - a complete drop-in replacement for OpenAI's client",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://docs.muxi.io",
        "Homepage": "https://github.com/muxi-ai/onellm",
        "Issues": "https://github.com/muxi-ai/onellm/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "60e20c327c90aebf4a305ca9b8b573393dcf5e7b7985db276e3f8fad692ce85f",
                "md5": "386d657a7aa8e41fc84687b6d9af8b08",
                "sha256": "57f5ea02732f3573994e07d551266e09ad05e224aaaccd1b752abd5268a8efd8"
            },
            "downloads": -1,
            "filename": "onellm-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "386d657a7aa8e41fc84687b6d9af8b08",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 160275,
            "upload_time": "2025-07-14T16:29:54",
            "upload_time_iso_8601": "2025-07-14T16:29:54.751051Z",
            "url": "https://files.pythonhosted.org/packages/60/e2/0c327c90aebf4a305ca9b8b573393dcf5e7b7985db276e3f8fad692ce85f/onellm-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "372d07b8a2012f695fb381cabbf6ee176dbc7ed858752bf9e36a41fdac28c70c",
                "md5": "7168d45a65b3d3b05efa6bfdeeb69ce4",
                "sha256": "032040f0e4b2b582e07e45c1d2f481e54b8ffa3326c2f25361fdb87e9b2ea3c6"
            },
            "downloads": -1,
            "filename": "onellm-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7168d45a65b3d3b05efa6bfdeeb69ce4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 126917,
            "upload_time": "2025-07-14T16:29:56",
            "upload_time_iso_8601": "2025-07-14T16:29:56.961763Z",
            "url": "https://files.pythonhosted.org/packages/37/2d/07b8a2012f695fb381cabbf6ee176dbc7ed858752bf9e36a41fdac28c70c/onellm-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 16:29:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "muxi-ai",
    "github_project": "onellm",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": false,
    "lcname": "onellm"
}

Ran Aroussi