multiplexer-llm

Name	multiplexer-llm JSON
Version	0.1.5 JSON
	download
home_page	https://github.com/Haven-hvn/multiplexer-llm
Summary	A multiplexer for Large Language Model APIs built on the OpenAI SDK. It combines quotas from multiple models and automatically uses fallback models when the primary models are rate limited.
upload_time	2025-07-17 16:55:31
maintainer	None
docs_url	None
author	Havencto
requires_python	>=3.8
license	MIT
keywords	openai multiplexer llm rate-limit api ai machine-learning chatgpt claude gemini
VCS
bugtrack_url
requirements	openai typing-extensions
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Multiplexer LLM (Python)

**Unlock the Power of Distributed AI** 🚀

A lightweight Python library that combines the quotas of multiple open source LLM providers with a single unified API. Seamlessly distribute your requests across various providers hosting open source models, ensuring maximum throughput and reliability.

## The Problem: Limited AI Resources

- ❌ **Rate Limit Errors**: "Rate limit exceeded" errors hinder your application's performance
- ❌ **Limited Throughput**: Single provider constraints limit your AI capabilities
- ❌ **Unpredictable Failures**: Rate limits can occur at critical moments
- ❌ **Manual Intervention**: Switching providers requires code changes

## The Solution: Unified Access to Multiple Providers

- ✅ **Increased Throughput**: Combine quotas from multiple open source LLM providers
- ✅ **Error Resilience**: Automatic failover when one provider hits rate limits
- ✅ **Seamless Integration**: Compatible with OpenAI SDK for easy adoption
- ✅ **Smart Load Balancing**: Weight-based distribution across providers for optimal performance

## Key Benefits

- 🚀 **Scalable AI**: Combine resources from multiple providers for enhanced capabilities
- 🛡️ **Error Prevention**: Automatic failover minimizes rate limit failures
- ⚡ **High Availability**: Seamless switching between providers ensures continuous operation
- 🔌 **OpenAI SDK Compatibility**: Works with existing OpenAI SDK code
- 📊 **Usage Analytics**: Track provider performance and rate limits

## How It Works

```
Single Model:        [Model A: 10K RPM] ❌ Rate Limit Error at 10,001 requests
Multiple Providers:  [Provider 1: 10K] + [Provider 2: 15K] + [Provider 3: 20K] = 45,000 RPM ✅
Multiple Models:     [Model A: 10K] + [Model B: 50K] + [Model C: 15K] = 75,000 RPM ✅✅
```

## Installation

```bash
pip install multiplexer-llm
```

The package requires Python 3.8+ and automatically installs the OpenAI Python SDK as a dependency.

## Quick Start

```python
import asyncio
import os
from multiplexer_llm import Multiplexer
from openai import AsyncOpenAI

async def main():
    # Create client instances for a few open source models
    model1 = AsyncOpenAI(
        api_key=os.getenv("MODEL1_API_KEY"),
        base_url="https://api.model1.com/v1/",
    )

    model2 = AsyncOpenAI(
        api_key=os.getenv("MODEL2_API_KEY"),
        base_url="https://api.model2.org/v1",
    )

    # Initialize multiplexer
    async with Multiplexer() as multiplexer:
        # Add models with weights
        multiplexer.add_model(model1, 5, "model1-large")
        multiplexer.add_model(model2, 3, "model2-base")

        # Use like a regular OpenAI client
        completion = await multiplexer.chat.completions.create(
            model="placeholder",  # Will be overridden by selected model
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "What is the capital of France?"},
            ],
        )

        print(completion.choices[0].message.content)
        print("Model usage stats:", multiplexer.get_stats())

# Run the async function
asyncio.run(main())
```

### How Primary and Fallback Models Work

The multiplexer operates with a **two-tier system**:

#### **Primary Models** (`add_model`)

- **First choice**: Used when available
- **Weight-based selection**: Higher weights = higher probability of selection

#### **Fallback Models** (`add_fallback_model`)

- **Backup safety net**: Activated when all primary models hit rate limits

## API Examples

### Creating a Multiplexer

```python
from multiplexer_llm import Multiplexer

# Create multiplexer instance
multiplexer = Multiplexer()

# Or use as async context manager (recommended)
async with Multiplexer() as multiplexer:
    # Your code here
    pass
```

### Adding Models

```python
# Add a primary model
multiplexer.add_model(client: AsyncOpenAI, weight: int, model_name: str)

# Add a fallback model
multiplexer.add_fallback_model(client: AsyncOpenAI, weight: int, model_name: str)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## About Haven Network

[Haven Network](https://github.com/haven-hvn) builds open-source tools to help online communities produce high-quality data for multi-modal AI, with a strong focus on local inference and data privacy.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Haven-hvn/multiplexer-llm",
    "name": "multiplexer-llm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "openai, multiplexer, llm, rate-limit, api, ai, machine-learning, chatgpt, claude, gemini",
    "author": "Havencto",
    "author_email": "havencto <officialhavennetwork@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b9/35/d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0/multiplexer_llm-0.1.5.tar.gz",
    "platform": null,
    "description": "# Multiplexer LLM (Python)\n\n**Unlock the Power of Distributed AI** \ud83d\ude80\n\nA lightweight Python library that combines the quotas of multiple open source LLM providers with a single unified API. Seamlessly distribute your requests across various providers hosting open source models, ensuring maximum throughput and reliability.\n\n## The Problem: Limited AI Resources\n\n- \u274c **Rate Limit Errors**: \"Rate limit exceeded\" errors hinder your application's performance\n- \u274c **Limited Throughput**: Single provider constraints limit your AI capabilities\n- \u274c **Unpredictable Failures**: Rate limits can occur at critical moments\n- \u274c **Manual Intervention**: Switching providers requires code changes\n\n## The Solution: Unified Access to Multiple Providers\n\n- \u2705 **Increased Throughput**: Combine quotas from multiple open source LLM providers\n- \u2705 **Error Resilience**: Automatic failover when one provider hits rate limits\n- \u2705 **Seamless Integration**: Compatible with OpenAI SDK for easy adoption\n- \u2705 **Smart Load Balancing**: Weight-based distribution across providers for optimal performance\n\n## Key Benefits\n\n- \ud83d\ude80 **Scalable AI**: Combine resources from multiple providers for enhanced capabilities\n- \ud83d\udee1\ufe0f **Error Prevention**: Automatic failover minimizes rate limit failures\n- \u26a1 **High Availability**: Seamless switching between providers ensures continuous operation\n- \ud83d\udd0c **OpenAI SDK Compatibility**: Works with existing OpenAI SDK code\n- \ud83d\udcca **Usage Analytics**: Track provider performance and rate limits\n\n## How It Works\n\n```\nSingle Model:        [Model A: 10K RPM] \u274c Rate Limit Error at 10,001 requests\nMultiple Providers:  [Provider 1: 10K] + [Provider 2: 15K] + [Provider 3: 20K] = 45,000 RPM \u2705\nMultiple Models:     [Model A: 10K] + [Model B: 50K] + [Model C: 15K] = 75,000 RPM \u2705\u2705\n```\n\n## Installation\n\n```bash\npip install multiplexer-llm\n```\n\nThe package requires Python 3.8+ and automatically installs the OpenAI Python SDK as a dependency.\n\n## Quick Start\n\n```python\nimport asyncio\nimport os\nfrom multiplexer_llm import Multiplexer\nfrom openai import AsyncOpenAI\n\nasync def main():\n    # Create client instances for a few open source models\n    model1 = AsyncOpenAI(\n        api_key=os.getenv(\"MODEL1_API_KEY\"),\n        base_url=\"https://api.model1.com/v1/\",\n    )\n\n    model2 = AsyncOpenAI(\n        api_key=os.getenv(\"MODEL2_API_KEY\"),\n        base_url=\"https://api.model2.org/v1\",\n    )\n\n    # Initialize multiplexer\n    async with Multiplexer() as multiplexer:\n        # Add models with weights\n        multiplexer.add_model(model1, 5, \"model1-large\")\n        multiplexer.add_model(model2, 3, \"model2-base\")\n\n        # Use like a regular OpenAI client\n        completion = await multiplexer.chat.completions.create(\n            model=\"placeholder\",  # Will be overridden by selected model\n            messages=[\n                {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n                {\"role\": \"user\", \"content\": \"What is the capital of France?\"},\n            ],\n        )\n\n        print(completion.choices[0].message.content)\n        print(\"Model usage stats:\", multiplexer.get_stats())\n\n# Run the async function\nasyncio.run(main())\n```\n\n### How Primary and Fallback Models Work\n\nThe multiplexer operates with a **two-tier system**:\n\n#### **Primary Models** (`add_model`)\n\n- **First choice**: Used when available\n- **Weight-based selection**: Higher weights = higher probability of selection\n\n#### **Fallback Models** (`add_fallback_model`)\n\n- **Backup safety net**: Activated when all primary models hit rate limits\n\n## API Examples\n\n### Creating a Multiplexer\n\n```python\nfrom multiplexer_llm import Multiplexer\n\n# Create multiplexer instance\nmultiplexer = Multiplexer()\n\n# Or use as async context manager (recommended)\nasync with Multiplexer() as multiplexer:\n    # Your code here\n    pass\n```\n\n### Adding Models\n\n```python\n# Add a primary model\nmultiplexer.add_model(client: AsyncOpenAI, weight: int, model_name: str)\n\n# Add a fallback model\nmultiplexer.add_fallback_model(client: AsyncOpenAI, weight: int, model_name: str)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## About Haven Network\n\n[Haven Network](https://github.com/haven-hvn) builds open-source tools to help online communities produce high-quality data for multi-modal AI, with a strong focus on local inference and data privacy.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A multiplexer for Large Language Model APIs built on the OpenAI SDK. It combines quotas from multiple models and automatically uses fallback models when the primary models are rate limited.",
    "version": "0.1.5",
    "project_urls": {
        "Bug Reports": "https://github.com/Haven-hvn/multiplexer-llm/issues",
        "Documentation": "https://github.com/Haven-hvn/multiplexer-llm#readme",
        "Homepage": "https://github.com/Haven-hvn/multiplexer-llm",
        "Repository": "https://github.com/Haven-hvn/multiplexer-llm"
    },
    "split_keywords": [
        "openai",
        " multiplexer",
        " llm",
        " rate-limit",
        " api",
        " ai",
        " machine-learning",
        " chatgpt",
        " claude",
        " gemini"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "de51d3b47fed2a4655966faae0e61308d30318da353e0b9f643f9b28becab3df",
                "md5": "268d847e0821058d5c15d1426424c0de",
                "sha256": "d19ab29a916fad522bf11ed5008a5d234ac318444ef4e2491ee07bb0f60d5e1e"
            },
            "downloads": -1,
            "filename": "multiplexer_llm-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "268d847e0821058d5c15d1426424c0de",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 12867,
            "upload_time": "2025-07-17T16:55:29",
            "upload_time_iso_8601": "2025-07-17T16:55:29.778360Z",
            "url": "https://files.pythonhosted.org/packages/de/51/d3b47fed2a4655966faae0e61308d30318da353e0b9f643f9b28becab3df/multiplexer_llm-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b935d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0",
                "md5": "3159f6dff852a57e6af3e76353505b7b",
                "sha256": "e3ca8534371657c7b6ec4ab76129bbaef649997a44dc6b646cf92c4b3a36f7e3"
            },
            "downloads": -1,
            "filename": "multiplexer_llm-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "3159f6dff852a57e6af3e76353505b7b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18678,
            "upload_time": "2025-07-17T16:55:31",
            "upload_time_iso_8601": "2025-07-17T16:55:31.242020Z",
            "url": "https://files.pythonhosted.org/packages/b9/35/d5565a0af1a818f02d4c2d8dec3250a48327390880087fa3cd75e906a7d0/multiplexer_llm-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 16:55:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Haven-hvn",
    "github_project": "multiplexer-llm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        }
    ],
    "lcname": "multiplexer-llm"
}

Havencto