multiplexer-llm


Namemultiplexer-llm JSON
Version 0.1.5 PyPI version JSON
download
home_pagehttps://github.com/Haven-hvn/multiplexer-llm
SummaryA multiplexer for Large Language Model APIs built on the OpenAI SDK. It combines quotas from multiple models and automatically uses fallback models when the primary models are rate limited.
upload_time2025-07-17 16:55:31
maintainerNone
docs_urlNone
authorHavencto
requires_python>=3.8
licenseMIT
keywords openai multiplexer llm rate-limit api ai machine-learning chatgpt claude gemini
VCS
bugtrack_url
requirements openai typing-extensions
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Multiplexer LLM (Python)

**Unlock the Power of Distributed AI** 🚀

A lightweight Python library that combines the quotas of multiple open source LLM providers with a single unified API. Seamlessly distribute your requests across various providers hosting open source models, ensuring maximum throughput and reliability.

## The Problem: Limited AI Resources

- ❌ **Rate Limit Errors**: "Rate limit exceeded" errors hinder your application's performance
- ❌ **Limited Throughput**: Single provider constraints limit your AI capabilities
- ❌ **Unpredictable Failures**: Rate limits can occur at critical moments
- ❌ **Manual Intervention**: Switching providers requires code changes

## The Solution: Unified Access to Multiple Providers

- ✅ **Increased Throughput**: Combine quotas from multiple open source LLM providers
- ✅ **Error Resilience**: Automatic failover when one provider hits rate limits
- ✅ **Seamless Integration**: Compatible with OpenAI SDK for easy adoption
- ✅ **Smart Load Balancing**: Weight-based distribution across providers for optimal performance

## Key Benefits

- 🚀 **Scalable AI**: Combine resources from multiple providers for enhanced capabilities
- 🛡️ **Error Prevention**: Automatic failover minimizes rate limit failures
- ⚡ **High Availability**: Seamless switching between providers ensures continuous operation
- 🔌 **OpenAI SDK Compatibility**: Works with existing OpenAI SDK code
- 📊 **Usage Analytics**: Track provider performance and rate limits

## How It Works

```
Single Model:        [Model A: 10K RPM] ❌ Rate Limit Error at 10,001 requests
Multiple Providers:  [Provider 1: 10K] + [Provider 2: 15K] + [Provider 3: 20K] = 45,000 RPM ✅
Multiple Models:     [Model A: 10K] + [Model B: 50K] + [Model C: 15K] = 75,000 RPM ✅✅
```

## Installation

```bash
pip install multiplexer-llm
```

The package requires Python 3.8+ and automatically installs the OpenAI Python SDK as a dependency.

## Quick Start

```python
import asyncio
import os
from multiplexer_llm import Multiplexer
from openai import AsyncOpenAI

async def main():
    # Create client instances for a few open source models
    model1 = AsyncOpenAI(
        api_key=os.getenv("MODEL1_API_KEY"),
        base_url="https://api.model1.com/v1/",
    )

    model2 = AsyncOpenAI(
        api_key=os.getenv("MODEL2_API_KEY"),
        base_url="https://api.model2.org/v1",
    )

    # Initialize multiplexer
    async with Multiplexer() as multiplexer:
        # Add models with weights
        multiplexer.add_model(model1, 5, "model1-large")
        multiplexer.add_model(model2, 3, "model2-base")

        # Use like a regular OpenAI client
        completion = await multiplexer.chat.completions.create(
            model="placeholder",  # Will be overridden by selected model
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "What is the capital of France?"},
            ],
        )

        print(completion.choices[0].message.content)
        print("Model usage stats:", multiplexer.get_stats())

# Run the async function
asyncio.run(main())
```

### How Primary and Fallback Models Work

The multiplexer operates with a **two-tier system**:

#### **Primary Models** (`add_model`)

- **First choice**: Used when available
- **Weight-based selection**: Higher weights = higher probability of selection

#### **Fallback Models** (`add_fallback_model`)

- **Backup safety net**: Activated when all primary models hit rate limits

## API Examples

### Creating a Multiplexer

```python
from multiplexer_llm import Multiplexer

# Create multiplexer instance
multiplexer = Multiplexer()

# Or use as async context manager (recommended)
async with Multiplexer() as multiplexer:
    # Your code here
    pass
```

### Adding Models

```python
# Add a primary model
multiplexer.add_model(client: AsyncOpenAI, weight: int, model_name: str)

# Add a fallback model
multiplexer.add_fallback_model(client: AsyncOpenAI, weight: int, model_name: str)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## About Haven Network

[Haven Network](https://github.com/haven-hvn) builds open-source tools to help online communities produce high-quality data for multi-modal AI, with a strong focus on local inference and data privacy.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Haven-hvn/multiplexer-llm",
    "name": "multiplexer-llm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "openai, multiplexer, llm, rate-limit, api, ai, machine-learning, chatgpt, claude, gemini",
    "author": "Havencto",
    "author_email": "havencto <officialhavennetwork@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b9/35/d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0/multiplexer_llm-0.1.5.tar.gz",
    "platform": null,
    "description": "# Multiplexer LLM (Python)\n\n**Unlock the Power of Distributed AI** \ud83d\ude80\n\nA lightweight Python library that combines the quotas of multiple open source LLM providers with a single unified API. Seamlessly distribute your requests across various providers hosting open source models, ensuring maximum throughput and reliability.\n\n## The Problem: Limited AI Resources\n\n- \u274c **Rate Limit Errors**: \"Rate limit exceeded\" errors hinder your application's performance\n- \u274c **Limited Throughput**: Single provider constraints limit your AI capabilities\n- \u274c **Unpredictable Failures**: Rate limits can occur at critical moments\n- \u274c **Manual Intervention**: Switching providers requires code changes\n\n## The Solution: Unified Access to Multiple Providers\n\n- \u2705 **Increased Throughput**: Combine quotas from multiple open source LLM providers\n- \u2705 **Error Resilience**: Automatic failover when one provider hits rate limits\n- \u2705 **Seamless Integration**: Compatible with OpenAI SDK for easy adoption\n- \u2705 **Smart Load Balancing**: Weight-based distribution across providers for optimal performance\n\n## Key Benefits\n\n- \ud83d\ude80 **Scalable AI**: Combine resources from multiple providers for enhanced capabilities\n- \ud83d\udee1\ufe0f **Error Prevention**: Automatic failover minimizes rate limit failures\n- \u26a1 **High Availability**: Seamless switching between providers ensures continuous operation\n- \ud83d\udd0c **OpenAI SDK Compatibility**: Works with existing OpenAI SDK code\n- \ud83d\udcca **Usage Analytics**: Track provider performance and rate limits\n\n## How It Works\n\n```\nSingle Model:        [Model A: 10K RPM] \u274c Rate Limit Error at 10,001 requests\nMultiple Providers:  [Provider 1: 10K] + [Provider 2: 15K] + [Provider 3: 20K] = 45,000 RPM \u2705\nMultiple Models:     [Model A: 10K] + [Model B: 50K] + [Model C: 15K] = 75,000 RPM \u2705\u2705\n```\n\n## Installation\n\n```bash\npip install multiplexer-llm\n```\n\nThe package requires Python 3.8+ and automatically installs the OpenAI Python SDK as a dependency.\n\n## Quick Start\n\n```python\nimport asyncio\nimport os\nfrom multiplexer_llm import Multiplexer\nfrom openai import AsyncOpenAI\n\nasync def main():\n    # Create client instances for a few open source models\n    model1 = AsyncOpenAI(\n        api_key=os.getenv(\"MODEL1_API_KEY\"),\n        base_url=\"https://api.model1.com/v1/\",\n    )\n\n    model2 = AsyncOpenAI(\n        api_key=os.getenv(\"MODEL2_API_KEY\"),\n        base_url=\"https://api.model2.org/v1\",\n    )\n\n    # Initialize multiplexer\n    async with Multiplexer() as multiplexer:\n        # Add models with weights\n        multiplexer.add_model(model1, 5, \"model1-large\")\n        multiplexer.add_model(model2, 3, \"model2-base\")\n\n        # Use like a regular OpenAI client\n        completion = await multiplexer.chat.completions.create(\n            model=\"placeholder\",  # Will be overridden by selected model\n            messages=[\n                {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n                {\"role\": \"user\", \"content\": \"What is the capital of France?\"},\n            ],\n        )\n\n        print(completion.choices[0].message.content)\n        print(\"Model usage stats:\", multiplexer.get_stats())\n\n# Run the async function\nasyncio.run(main())\n```\n\n### How Primary and Fallback Models Work\n\nThe multiplexer operates with a **two-tier system**:\n\n#### **Primary Models** (`add_model`)\n\n- **First choice**: Used when available\n- **Weight-based selection**: Higher weights = higher probability of selection\n\n#### **Fallback Models** (`add_fallback_model`)\n\n- **Backup safety net**: Activated when all primary models hit rate limits\n\n## API Examples\n\n### Creating a Multiplexer\n\n```python\nfrom multiplexer_llm import Multiplexer\n\n# Create multiplexer instance\nmultiplexer = Multiplexer()\n\n# Or use as async context manager (recommended)\nasync with Multiplexer() as multiplexer:\n    # Your code here\n    pass\n```\n\n### Adding Models\n\n```python\n# Add a primary model\nmultiplexer.add_model(client: AsyncOpenAI, weight: int, model_name: str)\n\n# Add a fallback model\nmultiplexer.add_fallback_model(client: AsyncOpenAI, weight: int, model_name: str)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## About Haven Network\n\n[Haven Network](https://github.com/haven-hvn) builds open-source tools to help online communities produce high-quality data for multi-modal AI, with a strong focus on local inference and data privacy.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A multiplexer for Large Language Model APIs built on the OpenAI SDK. It combines quotas from multiple models and automatically uses fallback models when the primary models are rate limited.",
    "version": "0.1.5",
    "project_urls": {
        "Bug Reports": "https://github.com/Haven-hvn/multiplexer-llm/issues",
        "Documentation": "https://github.com/Haven-hvn/multiplexer-llm#readme",
        "Homepage": "https://github.com/Haven-hvn/multiplexer-llm",
        "Repository": "https://github.com/Haven-hvn/multiplexer-llm"
    },
    "split_keywords": [
        "openai",
        " multiplexer",
        " llm",
        " rate-limit",
        " api",
        " ai",
        " machine-learning",
        " chatgpt",
        " claude",
        " gemini"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "de51d3b47fed2a4655966faae0e61308d30318da353e0b9f643f9b28becab3df",
                "md5": "268d847e0821058d5c15d1426424c0de",
                "sha256": "d19ab29a916fad522bf11ed5008a5d234ac318444ef4e2491ee07bb0f60d5e1e"
            },
            "downloads": -1,
            "filename": "multiplexer_llm-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "268d847e0821058d5c15d1426424c0de",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 12867,
            "upload_time": "2025-07-17T16:55:29",
            "upload_time_iso_8601": "2025-07-17T16:55:29.778360Z",
            "url": "https://files.pythonhosted.org/packages/de/51/d3b47fed2a4655966faae0e61308d30318da353e0b9f643f9b28becab3df/multiplexer_llm-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b935d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0",
                "md5": "3159f6dff852a57e6af3e76353505b7b",
                "sha256": "e3ca8534371657c7b6ec4ab76129bbaef649997a44dc6b646cf92c4b3a36f7e3"
            },
            "downloads": -1,
            "filename": "multiplexer_llm-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "3159f6dff852a57e6af3e76353505b7b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18678,
            "upload_time": "2025-07-17T16:55:31",
            "upload_time_iso_8601": "2025-07-17T16:55:31.242020Z",
            "url": "https://files.pythonhosted.org/packages/b9/35/d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0/multiplexer_llm-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 16:55:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Haven-hvn",
    "github_project": "multiplexer-llm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        }
    ],
    "lcname": "multiplexer-llm"
}
        
Elapsed time: 1.84941s