# Multiplexer LLM (Python)
**Unlock the Power of Distributed AI** 🚀
A lightweight Python library that combines the quotas of multiple open source LLM providers with a single unified API. Seamlessly distribute your requests across various providers hosting open source models, ensuring maximum throughput and reliability.
## The Problem: Limited AI Resources
- ❌ **Rate Limit Errors**: "Rate limit exceeded" errors hinder your application's performance
- ❌ **Limited Throughput**: Single provider constraints limit your AI capabilities
- ❌ **Unpredictable Failures**: Rate limits can occur at critical moments
- ❌ **Manual Intervention**: Switching providers requires code changes
## The Solution: Unified Access to Multiple Providers
- ✅ **Increased Throughput**: Combine quotas from multiple open source LLM providers
- ✅ **Error Resilience**: Automatic failover when one provider hits rate limits
- ✅ **Seamless Integration**: Compatible with OpenAI SDK for easy adoption
- ✅ **Smart Load Balancing**: Weight-based distribution across providers for optimal performance
## Key Benefits
- 🚀 **Scalable AI**: Combine resources from multiple providers for enhanced capabilities
- 🛡️ **Error Prevention**: Automatic failover minimizes rate limit failures
- ⚡ **High Availability**: Seamless switching between providers ensures continuous operation
- 🔌 **OpenAI SDK Compatibility**: Works with existing OpenAI SDK code
- 📊 **Usage Analytics**: Track provider performance and rate limits
## How It Works
```
Single Model: [Model A: 10K RPM] ❌ Rate Limit Error at 10,001 requests
Multiple Providers: [Provider 1: 10K] + [Provider 2: 15K] + [Provider 3: 20K] = 45,000 RPM ✅
Multiple Models: [Model A: 10K] + [Model B: 50K] + [Model C: 15K] = 75,000 RPM ✅✅
```
## Installation
```bash
pip install multiplexer-llm
```
The package requires Python 3.8+ and automatically installs the OpenAI Python SDK as a dependency.
## Quick Start
```python
import asyncio
import os
from multiplexer_llm import Multiplexer
from openai import AsyncOpenAI
async def main():
# Create client instances for a few open source models
model1 = AsyncOpenAI(
api_key=os.getenv("MODEL1_API_KEY"),
base_url="https://api.model1.com/v1/",
)
model2 = AsyncOpenAI(
api_key=os.getenv("MODEL2_API_KEY"),
base_url="https://api.model2.org/v1",
)
# Initialize multiplexer
async with Multiplexer() as multiplexer:
# Add models with weights
multiplexer.add_model(model1, 5, "model1-large")
multiplexer.add_model(model2, 3, "model2-base")
# Use like a regular OpenAI client
completion = await multiplexer.chat.completions.create(
model="placeholder", # Will be overridden by selected model
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
)
print(completion.choices[0].message.content)
print("Model usage stats:", multiplexer.get_stats())
# Run the async function
asyncio.run(main())
```
### How Primary and Fallback Models Work
The multiplexer operates with a **two-tier system**:
#### **Primary Models** (`add_model`)
- **First choice**: Used when available
- **Weight-based selection**: Higher weights = higher probability of selection
#### **Fallback Models** (`add_fallback_model`)
- **Backup safety net**: Activated when all primary models hit rate limits
## API Examples
### Creating a Multiplexer
```python
from multiplexer_llm import Multiplexer
# Create multiplexer instance
multiplexer = Multiplexer()
# Or use as async context manager (recommended)
async with Multiplexer() as multiplexer:
# Your code here
pass
```
### Adding Models
```python
# Add a primary model
multiplexer.add_model(client: AsyncOpenAI, weight: int, model_name: str)
# Add a fallback model
multiplexer.add_fallback_model(client: AsyncOpenAI, weight: int, model_name: str)
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## About Haven Network
[Haven Network](https://github.com/haven-hvn) builds open-source tools to help online communities produce high-quality data for multi-modal AI, with a strong focus on local inference and data privacy.
Raw data
{
"_id": null,
"home_page": "https://github.com/Haven-hvn/multiplexer-llm",
"name": "multiplexer-llm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "openai, multiplexer, llm, rate-limit, api, ai, machine-learning, chatgpt, claude, gemini",
"author": "Havencto",
"author_email": "havencto <officialhavennetwork@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b9/35/d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0/multiplexer_llm-0.1.5.tar.gz",
"platform": null,
"description": "# Multiplexer LLM (Python)\n\n**Unlock the Power of Distributed AI** \ud83d\ude80\n\nA lightweight Python library that combines the quotas of multiple open source LLM providers with a single unified API. Seamlessly distribute your requests across various providers hosting open source models, ensuring maximum throughput and reliability.\n\n## The Problem: Limited AI Resources\n\n- \u274c **Rate Limit Errors**: \"Rate limit exceeded\" errors hinder your application's performance\n- \u274c **Limited Throughput**: Single provider constraints limit your AI capabilities\n- \u274c **Unpredictable Failures**: Rate limits can occur at critical moments\n- \u274c **Manual Intervention**: Switching providers requires code changes\n\n## The Solution: Unified Access to Multiple Providers\n\n- \u2705 **Increased Throughput**: Combine quotas from multiple open source LLM providers\n- \u2705 **Error Resilience**: Automatic failover when one provider hits rate limits\n- \u2705 **Seamless Integration**: Compatible with OpenAI SDK for easy adoption\n- \u2705 **Smart Load Balancing**: Weight-based distribution across providers for optimal performance\n\n## Key Benefits\n\n- \ud83d\ude80 **Scalable AI**: Combine resources from multiple providers for enhanced capabilities\n- \ud83d\udee1\ufe0f **Error Prevention**: Automatic failover minimizes rate limit failures\n- \u26a1 **High Availability**: Seamless switching between providers ensures continuous operation\n- \ud83d\udd0c **OpenAI SDK Compatibility**: Works with existing OpenAI SDK code\n- \ud83d\udcca **Usage Analytics**: Track provider performance and rate limits\n\n## How It Works\n\n```\nSingle Model: [Model A: 10K RPM] \u274c Rate Limit Error at 10,001 requests\nMultiple Providers: [Provider 1: 10K] + [Provider 2: 15K] + [Provider 3: 20K] = 45,000 RPM \u2705\nMultiple Models: [Model A: 10K] + [Model B: 50K] + [Model C: 15K] = 75,000 RPM \u2705\u2705\n```\n\n## Installation\n\n```bash\npip install multiplexer-llm\n```\n\nThe package requires Python 3.8+ and automatically installs the OpenAI Python SDK as a dependency.\n\n## Quick Start\n\n```python\nimport asyncio\nimport os\nfrom multiplexer_llm import Multiplexer\nfrom openai import AsyncOpenAI\n\nasync def main():\n # Create client instances for a few open source models\n model1 = AsyncOpenAI(\n api_key=os.getenv(\"MODEL1_API_KEY\"),\n base_url=\"https://api.model1.com/v1/\",\n )\n\n model2 = AsyncOpenAI(\n api_key=os.getenv(\"MODEL2_API_KEY\"),\n base_url=\"https://api.model2.org/v1\",\n )\n\n # Initialize multiplexer\n async with Multiplexer() as multiplexer:\n # Add models with weights\n multiplexer.add_model(model1, 5, \"model1-large\")\n multiplexer.add_model(model2, 3, \"model2-base\")\n\n # Use like a regular OpenAI client\n completion = await multiplexer.chat.completions.create(\n model=\"placeholder\", # Will be overridden by selected model\n messages=[\n {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n {\"role\": \"user\", \"content\": \"What is the capital of France?\"},\n ],\n )\n\n print(completion.choices[0].message.content)\n print(\"Model usage stats:\", multiplexer.get_stats())\n\n# Run the async function\nasyncio.run(main())\n```\n\n### How Primary and Fallback Models Work\n\nThe multiplexer operates with a **two-tier system**:\n\n#### **Primary Models** (`add_model`)\n\n- **First choice**: Used when available\n- **Weight-based selection**: Higher weights = higher probability of selection\n\n#### **Fallback Models** (`add_fallback_model`)\n\n- **Backup safety net**: Activated when all primary models hit rate limits\n\n## API Examples\n\n### Creating a Multiplexer\n\n```python\nfrom multiplexer_llm import Multiplexer\n\n# Create multiplexer instance\nmultiplexer = Multiplexer()\n\n# Or use as async context manager (recommended)\nasync with Multiplexer() as multiplexer:\n # Your code here\n pass\n```\n\n### Adding Models\n\n```python\n# Add a primary model\nmultiplexer.add_model(client: AsyncOpenAI, weight: int, model_name: str)\n\n# Add a fallback model\nmultiplexer.add_fallback_model(client: AsyncOpenAI, weight: int, model_name: str)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## About Haven Network\n\n[Haven Network](https://github.com/haven-hvn) builds open-source tools to help online communities produce high-quality data for multi-modal AI, with a strong focus on local inference and data privacy.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A multiplexer for Large Language Model APIs built on the OpenAI SDK. It combines quotas from multiple models and automatically uses fallback models when the primary models are rate limited.",
"version": "0.1.5",
"project_urls": {
"Bug Reports": "https://github.com/Haven-hvn/multiplexer-llm/issues",
"Documentation": "https://github.com/Haven-hvn/multiplexer-llm#readme",
"Homepage": "https://github.com/Haven-hvn/multiplexer-llm",
"Repository": "https://github.com/Haven-hvn/multiplexer-llm"
},
"split_keywords": [
"openai",
" multiplexer",
" llm",
" rate-limit",
" api",
" ai",
" machine-learning",
" chatgpt",
" claude",
" gemini"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "de51d3b47fed2a4655966faae0e61308d30318da353e0b9f643f9b28becab3df",
"md5": "268d847e0821058d5c15d1426424c0de",
"sha256": "d19ab29a916fad522bf11ed5008a5d234ac318444ef4e2491ee07bb0f60d5e1e"
},
"downloads": -1,
"filename": "multiplexer_llm-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "268d847e0821058d5c15d1426424c0de",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12867,
"upload_time": "2025-07-17T16:55:29",
"upload_time_iso_8601": "2025-07-17T16:55:29.778360Z",
"url": "https://files.pythonhosted.org/packages/de/51/d3b47fed2a4655966faae0e61308d30318da353e0b9f643f9b28becab3df/multiplexer_llm-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b935d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0",
"md5": "3159f6dff852a57e6af3e76353505b7b",
"sha256": "e3ca8534371657c7b6ec4ab76129bbaef649997a44dc6b646cf92c4b3a36f7e3"
},
"downloads": -1,
"filename": "multiplexer_llm-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "3159f6dff852a57e6af3e76353505b7b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18678,
"upload_time": "2025-07-17T16:55:31",
"upload_time_iso_8601": "2025-07-17T16:55:31.242020Z",
"url": "https://files.pythonhosted.org/packages/b9/35/d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0/multiplexer_llm-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 16:55:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Haven-hvn",
"github_project": "multiplexer-llm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "openai",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
">=",
"4.0.0"
]
]
}
],
"lcname": "multiplexer-llm"
}