llmux-optimizer

Name	llmux-optimizer JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/mihirahuja/llmux
Summary	Automatically find cheaper LLM alternatives while maintaining performance
upload_time	2025-08-10 01:10:17
maintainer	None
docs_url	None
author	Mihir Ahuja
requires_python	>=3.8
license	MIT
keywords	llm ai cost-optimization machine-learning openai anthropic
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # LLMux-Optimizer

[![PyPI version](https://badge.fury.io/py/llmux-optimizer.svg)](https://badge.fury.io/py/llmux-optimizer)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://pepy.tech/badge/llmux-optimizer)](https://pepy.tech/project/llmux-optimizer)

Automatically find cheaper LLM alternatives while maintaining performance.

## Quick Start

```python
import llmux

# Find the cheapest model that maintains your accuracy requirements
result = llmux.optimize_cost(
    baseline="gpt-4",
    dataset="your_data.jsonl",
    min_accuracy=0.9
)

print(f"Best model: {result['model']}")
print(f"Cost savings: {result['cost_savings']:.1%}")
print(f"Accuracy: {result['accuracy']:.1%}")
```

## Installation

```bash
pip install llmux-optimizer
```

## Why LLMux?

- **One-liner optimization** - Just specify baseline and dataset
- **Real cost savings** - Average 73% reduction in LLM costs
- **Multiple providers** - Tests 18+ models across OpenAI, Anthropic, Google, Meta, Mistral, and more
- **Smart stopping** - Skips smaller models when larger ones fail (saves API calls)
- **Production ready** - Used by companies processing millions of requests

## Features

### Simple API

```python
# Basic usage
result = llmux.optimize_cost(
    baseline="gpt-4",
    dataset="data.jsonl"
)

# With custom parameters
result = llmux.optimize_cost(
    baseline="gpt-4",
    dataset="data.jsonl",
    prompt="Classify the sentiment as positive, negative, or neutral",
    task="classification",
    min_accuracy=0.85,
    sample_size=0.2  # Test on 20% of data for speed
)
```

### Supported Tasks

- **Classification** - Sentiment analysis, intent detection, categorization
- **Extraction** - Named entity recognition, information extraction
- **Generation** - Text completion, summarization, translation
- **Binary** - Yes/no, true/false decisions

### Model Universe

Tests models from a curated universe including:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3 Haiku, Sonnet)
- Google (Gemini Pro, Flash)
- Meta (Llama 3.1 8B, 70B)
- Mistral (7B, Mixtral, Large)
- And more...

## Examples

### Classification Task

```python
import llmux

# Sentiment analysis
examples = [
    {"input": "This product is amazing!", "ground_truth": "positive"},
    {"input": "Terrible service", "ground_truth": "negative"},
    {"input": "It's okay", "ground_truth": "neutral"}
]

result = llmux.optimize_cost(
    baseline="gpt-4",
    examples=examples,
    task="classification",
    options=["positive", "negative", "neutral"]
)
```

### Banking Intent Classification

```python
# Prepare dataset (one-time)
from prepare_banking77 import prepare_banking77_dataset
prepare_banking77_dataset()

# Find optimal model
result = llmux.optimize_cost(
    baseline="gpt-4",
    dataset="data/banking77_test.jsonl",
    prompt="Classify the banking customer query into one of 77 intent categories",
    task="classification",
    min_accuracy=0.8
)
```

### Cost Comparison

Typical savings on standard benchmarks:

| Dataset | Baseline | Best Alternative | Cost Savings | Accuracy |
|---------|----------|------------------|--------------|----------|
| IMDB | GPT-4 | Llama-3.1-8B | 96.3% | 95.2% |
| AG News | GPT-4 | Mistral-7B | 94.7% | 93.8% |
| Banking77 | GPT-4 | GPT-3.5-turbo | 89.2% | 91.4% |

## Advanced Usage

### Custom Evaluation

```python
from llmux import Evaluator, Provider

# Use specific provider
provider = Provider.get_provider("openrouter", model="meta-llama/llama-3.1-8b")
evaluator = Evaluator(provider)

# Run evaluation
accuracy, results = evaluator.evaluate(
    dataset="test_data.jsonl",
    system_prompt="You are a helpful assistant"
)
```

### Smart Stopping

LLMux automatically implements smart stopping - if a larger model in a family (e.g., Llama-70B) fails to meet accuracy requirements, smaller models (Llama-8B) are skipped to save API calls.

## Dataset Format

LLMux expects JSONL format with `input` and `label` fields:

```json
{"input": "Example text", "label": "category"}
{"input": "Another example", "label": "other_category"}
```

Or use the `examples` parameter directly:

```python
examples = [
    {"input": "text", "ground_truth": "label"},
    ...
]
```

## API Reference

### optimize_cost()

Main function to find the best cost-optimized model.

**Parameters:**
- `baseline` (str): Reference model to beat (e.g., "gpt-4")
- `dataset` (str): Path to JSONL dataset file
- `prompt` (str, optional): System prompt for the task
- `task` (str, optional): Task type ("classification", "extraction", "generation", "binary")
- `min_accuracy` (float): Minimum acceptable accuracy (default: 0.9)
- `sample_size` (float, optional): Percentage of dataset to use (0.0-1.0)
- `options` (list, optional): Valid output options for classification
- `examples` (list, optional): Direct examples instead of dataset file

**Returns:**
- Dictionary with:
  - `model`: Best model found
  - `accuracy`: Achieved accuracy
  - `cost_savings`: Percentage saved vs baseline
  - `cost_per_million`: Cost per million tokens

## Requirements

- Python 3.8+
- OpenRouter API key (set as `OPENROUTER_API_KEY` environment variable)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License - see LICENSE file for details.

## Citation

If you use LLMux in your research, please cite:

```bibtex
@software{llmux2024,
  title = {LLMux: Automatic LLM Cost Optimization},
  author = {Ahuja, Mihir},
  year = {2024},
  url = {https://github.com/mihirahuja/llmux}
}
```

## Support

- Issues: [GitHub Issues](https://github.com/mihirahuja/llmux/issues)
- Discussions: [GitHub Discussions](https://github.com/mihirahuja/llmux/discussions)
- Email: your@email.com

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mihirahuja/llmux",
    "name": "llmux-optimizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, ai, cost-optimization, machine-learning, openai, anthropic",
    "author": "Mihir Ahuja",
    "author_email": "Mihir Ahuja <mihirahuja09@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/1d/f9/af2940d39f723a83b71a7e920ac9d87db7eaa2d9986bf5ab927d9be6f450/llmux_optimizer-0.1.0.tar.gz",
    "platform": null,
    "description": "# LLMux-Optimizer\n\n[![PyPI version](https://badge.fury.io/py/llmux-optimizer.svg)](https://badge.fury.io/py/llmux-optimizer)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://pepy.tech/badge/llmux-optimizer)](https://pepy.tech/project/llmux-optimizer)\n\nAutomatically find cheaper LLM alternatives while maintaining performance.\n\n## Quick Start\n\n```python\nimport llmux\n\n# Find the cheapest model that maintains your accuracy requirements\nresult = llmux.optimize_cost(\n    baseline=\"gpt-4\",\n    dataset=\"your_data.jsonl\",\n    min_accuracy=0.9\n)\n\nprint(f\"Best model: {result['model']}\")\nprint(f\"Cost savings: {result['cost_savings']:.1%}\")\nprint(f\"Accuracy: {result['accuracy']:.1%}\")\n```\n\n## Installation\n\n```bash\npip install llmux-optimizer\n```\n\n## Why LLMux?\n\n- **One-liner optimization** - Just specify baseline and dataset\n- **Real cost savings** - Average 73% reduction in LLM costs\n- **Multiple providers** - Tests 18+ models across OpenAI, Anthropic, Google, Meta, Mistral, and more\n- **Smart stopping** - Skips smaller models when larger ones fail (saves API calls)\n- **Production ready** - Used by companies processing millions of requests\n\n## Features\n\n### Simple API\n\n```python\n# Basic usage\nresult = llmux.optimize_cost(\n    baseline=\"gpt-4\",\n    dataset=\"data.jsonl\"\n)\n\n# With custom parameters\nresult = llmux.optimize_cost(\n    baseline=\"gpt-4\",\n    dataset=\"data.jsonl\",\n    prompt=\"Classify the sentiment as positive, negative, or neutral\",\n    task=\"classification\",\n    min_accuracy=0.85,\n    sample_size=0.2  # Test on 20% of data for speed\n)\n```\n\n### Supported Tasks\n\n- **Classification** - Sentiment analysis, intent detection, categorization\n- **Extraction** - Named entity recognition, information extraction\n- **Generation** - Text completion, summarization, translation\n- **Binary** - Yes/no, true/false decisions\n\n### Model Universe\n\nTests models from a curated universe including:\n- OpenAI (GPT-4, GPT-3.5)\n- Anthropic (Claude 3 Haiku, Sonnet)\n- Google (Gemini Pro, Flash)\n- Meta (Llama 3.1 8B, 70B)\n- Mistral (7B, Mixtral, Large)\n- And more...\n\n## Examples\n\n### Classification Task\n\n```python\nimport llmux\n\n# Sentiment analysis\nexamples = [\n    {\"input\": \"This product is amazing!\", \"ground_truth\": \"positive\"},\n    {\"input\": \"Terrible service\", \"ground_truth\": \"negative\"},\n    {\"input\": \"It's okay\", \"ground_truth\": \"neutral\"}\n]\n\nresult = llmux.optimize_cost(\n    baseline=\"gpt-4\",\n    examples=examples,\n    task=\"classification\",\n    options=[\"positive\", \"negative\", \"neutral\"]\n)\n```\n\n### Banking Intent Classification\n\n```python\n# Prepare dataset (one-time)\nfrom prepare_banking77 import prepare_banking77_dataset\nprepare_banking77_dataset()\n\n# Find optimal model\nresult = llmux.optimize_cost(\n    baseline=\"gpt-4\",\n    dataset=\"data/banking77_test.jsonl\",\n    prompt=\"Classify the banking customer query into one of 77 intent categories\",\n    task=\"classification\",\n    min_accuracy=0.8\n)\n```\n\n### Cost Comparison\n\nTypical savings on standard benchmarks:\n\n| Dataset | Baseline | Best Alternative | Cost Savings | Accuracy |\n|---------|----------|------------------|--------------|----------|\n| IMDB | GPT-4 | Llama-3.1-8B | 96.3% | 95.2% |\n| AG News | GPT-4 | Mistral-7B | 94.7% | 93.8% |\n| Banking77 | GPT-4 | GPT-3.5-turbo | 89.2% | 91.4% |\n\n## Advanced Usage\n\n### Custom Evaluation\n\n```python\nfrom llmux import Evaluator, Provider\n\n# Use specific provider\nprovider = Provider.get_provider(\"openrouter\", model=\"meta-llama/llama-3.1-8b\")\nevaluator = Evaluator(provider)\n\n# Run evaluation\naccuracy, results = evaluator.evaluate(\n    dataset=\"test_data.jsonl\",\n    system_prompt=\"You are a helpful assistant\"\n)\n```\n\n### Smart Stopping\n\nLLMux automatically implements smart stopping - if a larger model in a family (e.g., Llama-70B) fails to meet accuracy requirements, smaller models (Llama-8B) are skipped to save API calls.\n\n## Dataset Format\n\nLLMux expects JSONL format with `input` and `label` fields:\n\n```json\n{\"input\": \"Example text\", \"label\": \"category\"}\n{\"input\": \"Another example\", \"label\": \"other_category\"}\n```\n\nOr use the `examples` parameter directly:\n\n```python\nexamples = [\n    {\"input\": \"text\", \"ground_truth\": \"label\"},\n    ...\n]\n```\n\n## API Reference\n\n### optimize_cost()\n\nMain function to find the best cost-optimized model.\n\n**Parameters:**\n- `baseline` (str): Reference model to beat (e.g., \"gpt-4\")\n- `dataset` (str): Path to JSONL dataset file\n- `prompt` (str, optional): System prompt for the task\n- `task` (str, optional): Task type (\"classification\", \"extraction\", \"generation\", \"binary\")\n- `min_accuracy` (float): Minimum acceptable accuracy (default: 0.9)\n- `sample_size` (float, optional): Percentage of dataset to use (0.0-1.0)\n- `options` (list, optional): Valid output options for classification\n- `examples` (list, optional): Direct examples instead of dataset file\n\n**Returns:**\n- Dictionary with:\n  - `model`: Best model found\n  - `accuracy`: Achieved accuracy\n  - `cost_savings`: Percentage saved vs baseline\n  - `cost_per_million`: Cost per million tokens\n\n## Requirements\n\n- Python 3.8+\n- OpenRouter API key (set as `OPENROUTER_API_KEY` environment variable)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Citation\n\nIf you use LLMux in your research, please cite:\n\n```bibtex\n@software{llmux2024,\n  title = {LLMux: Automatic LLM Cost Optimization},\n  author = {Ahuja, Mihir},\n  year = {2024},\n  url = {https://github.com/mihirahuja/llmux}\n}\n```\n\n## Support\n\n- Issues: [GitHub Issues](https://github.com/mihirahuja/llmux/issues)\n- Discussions: [GitHub Discussions](https://github.com/mihirahuja/llmux/discussions)\n- Email: your@email.com\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Automatically find cheaper LLM alternatives while maintaining performance",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/mihirahuja/llmux#readme",
        "Homepage": "https://github.com/mihirahuja/llmux",
        "Issues": "https://github.com/mihirahuja/llmux/issues",
        "Repository": "https://github.com/mihirahuja/llmux"
    },
    "split_keywords": [
        "llm",
        " ai",
        " cost-optimization",
        " machine-learning",
        " openai",
        " anthropic"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb8b0dd8512b2bede7dc1a990bc56b9009d349f6576d090b42b03ce0f1f6c072",
                "md5": "996e92c257625b61890c4e9431c3f567",
                "sha256": "7d8abfe7403ef001727a06ff49d49f4e1860f63a22fd6c32f549c952c2bfa960"
            },
            "downloads": -1,
            "filename": "llmux_optimizer-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "996e92c257625b61890c4e9431c3f567",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 19299,
            "upload_time": "2025-08-10T01:10:15",
            "upload_time_iso_8601": "2025-08-10T01:10:15.668179Z",
            "url": "https://files.pythonhosted.org/packages/bb/8b/0dd8512b2bede7dc1a990bc56b9009d349f6576d090b42b03ce0f1f6c072/llmux_optimizer-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1df9af2940d39f723a83b71a7e920ac9d87db7eaa2d9986bf5ab927d9be6f450",
                "md5": "ea3f023add21c191270ea99f55d9d95f",
                "sha256": "7154ddb0f13700b57759361da736269615c53596a54bacd19302dd6c3c3d294a"
            },
            "downloads": -1,
            "filename": "llmux_optimizer-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ea3f023add21c191270ea99f55d9d95f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 21723,
            "upload_time": "2025-08-10T01:10:17",
            "upload_time_iso_8601": "2025-08-10T01:10:17.248327Z",
            "url": "https://files.pythonhosted.org/packages/1d/f9/af2940d39f723a83b71a7e920ac9d87db7eaa2d9986bf5ab927d9be6f450/llmux_optimizer-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-10 01:10:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mihirahuja",
    "github_project": "llmux",
    "github_not_found": true,
    "lcname": "llmux-optimizer"
}

Mihir Ahuja