# LLMux-Optimizer
[](https://badge.fury.io/py/llmux-optimizer)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/project/llmux-optimizer)
Automatically find cheaper LLM alternatives while maintaining performance.
## Quick Start
```python
import llmux
# Find the cheapest model that maintains your accuracy requirements
result = llmux.optimize_cost(
baseline="gpt-4",
dataset="your_data.jsonl",
min_accuracy=0.9
)
print(f"Best model: {result['model']}")
print(f"Cost savings: {result['cost_savings']:.1%}")
print(f"Accuracy: {result['accuracy']:.1%}")
```
## Installation
```bash
pip install llmux-optimizer
```
## Why LLMux?
- **One-liner optimization** - Just specify baseline and dataset
- **Real cost savings** - Average 73% reduction in LLM costs
- **Multiple providers** - Tests 18+ models across OpenAI, Anthropic, Google, Meta, Mistral, and more
- **Smart stopping** - Skips smaller models when larger ones fail (saves API calls)
- **Production ready** - Used by companies processing millions of requests
## Features
### Simple API
```python
# Basic usage
result = llmux.optimize_cost(
baseline="gpt-4",
dataset="data.jsonl"
)
# With custom parameters
result = llmux.optimize_cost(
baseline="gpt-4",
dataset="data.jsonl",
prompt="Classify the sentiment as positive, negative, or neutral",
task="classification",
min_accuracy=0.85,
sample_size=0.2 # Test on 20% of data for speed
)
```
### Supported Tasks
- **Classification** - Sentiment analysis, intent detection, categorization
- **Extraction** - Named entity recognition, information extraction
- **Generation** - Text completion, summarization, translation
- **Binary** - Yes/no, true/false decisions
### Model Universe
Tests models from a curated universe including:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3 Haiku, Sonnet)
- Google (Gemini Pro, Flash)
- Meta (Llama 3.1 8B, 70B)
- Mistral (7B, Mixtral, Large)
- And more...
## Examples
### Classification Task
```python
import llmux
# Sentiment analysis
examples = [
{"input": "This product is amazing!", "ground_truth": "positive"},
{"input": "Terrible service", "ground_truth": "negative"},
{"input": "It's okay", "ground_truth": "neutral"}
]
result = llmux.optimize_cost(
baseline="gpt-4",
examples=examples,
task="classification",
options=["positive", "negative", "neutral"]
)
```
### Banking Intent Classification
```python
# Prepare dataset (one-time)
from prepare_banking77 import prepare_banking77_dataset
prepare_banking77_dataset()
# Find optimal model
result = llmux.optimize_cost(
baseline="gpt-4",
dataset="data/banking77_test.jsonl",
prompt="Classify the banking customer query into one of 77 intent categories",
task="classification",
min_accuracy=0.8
)
```
### Cost Comparison
Typical savings on standard benchmarks:
| Dataset | Baseline | Best Alternative | Cost Savings | Accuracy |
|---------|----------|------------------|--------------|----------|
| IMDB | GPT-4 | Llama-3.1-8B | 96.3% | 95.2% |
| AG News | GPT-4 | Mistral-7B | 94.7% | 93.8% |
| Banking77 | GPT-4 | GPT-3.5-turbo | 89.2% | 91.4% |
## Advanced Usage
### Custom Evaluation
```python
from llmux import Evaluator, Provider
# Use specific provider
provider = Provider.get_provider("openrouter", model="meta-llama/llama-3.1-8b")
evaluator = Evaluator(provider)
# Run evaluation
accuracy, results = evaluator.evaluate(
dataset="test_data.jsonl",
system_prompt="You are a helpful assistant"
)
```
### Smart Stopping
LLMux automatically implements smart stopping - if a larger model in a family (e.g., Llama-70B) fails to meet accuracy requirements, smaller models (Llama-8B) are skipped to save API calls.
## Dataset Format
LLMux expects JSONL format with `input` and `label` fields:
```json
{"input": "Example text", "label": "category"}
{"input": "Another example", "label": "other_category"}
```
Or use the `examples` parameter directly:
```python
examples = [
{"input": "text", "ground_truth": "label"},
...
]
```
## API Reference
### optimize_cost()
Main function to find the best cost-optimized model.
**Parameters:**
- `baseline` (str): Reference model to beat (e.g., "gpt-4")
- `dataset` (str): Path to JSONL dataset file
- `prompt` (str, optional): System prompt for the task
- `task` (str, optional): Task type ("classification", "extraction", "generation", "binary")
- `min_accuracy` (float): Minimum acceptable accuracy (default: 0.9)
- `sample_size` (float, optional): Percentage of dataset to use (0.0-1.0)
- `options` (list, optional): Valid output options for classification
- `examples` (list, optional): Direct examples instead of dataset file
**Returns:**
- Dictionary with:
- `model`: Best model found
- `accuracy`: Achieved accuracy
- `cost_savings`: Percentage saved vs baseline
- `cost_per_million`: Cost per million tokens
## Requirements
- Python 3.8+
- OpenRouter API key (set as `OPENROUTER_API_KEY` environment variable)
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License - see LICENSE file for details.
## Citation
If you use LLMux in your research, please cite:
```bibtex
@software{llmux2024,
title = {LLMux: Automatic LLM Cost Optimization},
author = {Ahuja, Mihir},
year = {2024},
url = {https://github.com/mihirahuja/llmux}
}
```
## Support
- Issues: [GitHub Issues](https://github.com/mihirahuja/llmux/issues)
- Discussions: [GitHub Discussions](https://github.com/mihirahuja/llmux/discussions)
- Email: your@email.com
Raw data
{
"_id": null,
"home_page": "https://github.com/mihirahuja/llmux",
"name": "llmux-optimizer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, ai, cost-optimization, machine-learning, openai, anthropic",
"author": "Mihir Ahuja",
"author_email": "Mihir Ahuja <mihirahuja09@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/1d/f9/af2940d39f723a83b71a7e920ac9d87db7eaa2d9986bf5ab927d9be6f450/llmux_optimizer-0.1.0.tar.gz",
"platform": null,
"description": "# LLMux-Optimizer\n\n[](https://badge.fury.io/py/llmux-optimizer)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://pepy.tech/project/llmux-optimizer)\n\nAutomatically find cheaper LLM alternatives while maintaining performance.\n\n## Quick Start\n\n```python\nimport llmux\n\n# Find the cheapest model that maintains your accuracy requirements\nresult = llmux.optimize_cost(\n baseline=\"gpt-4\",\n dataset=\"your_data.jsonl\",\n min_accuracy=0.9\n)\n\nprint(f\"Best model: {result['model']}\")\nprint(f\"Cost savings: {result['cost_savings']:.1%}\")\nprint(f\"Accuracy: {result['accuracy']:.1%}\")\n```\n\n## Installation\n\n```bash\npip install llmux-optimizer\n```\n\n## Why LLMux?\n\n- **One-liner optimization** - Just specify baseline and dataset\n- **Real cost savings** - Average 73% reduction in LLM costs\n- **Multiple providers** - Tests 18+ models across OpenAI, Anthropic, Google, Meta, Mistral, and more\n- **Smart stopping** - Skips smaller models when larger ones fail (saves API calls)\n- **Production ready** - Used by companies processing millions of requests\n\n## Features\n\n### Simple API\n\n```python\n# Basic usage\nresult = llmux.optimize_cost(\n baseline=\"gpt-4\",\n dataset=\"data.jsonl\"\n)\n\n# With custom parameters\nresult = llmux.optimize_cost(\n baseline=\"gpt-4\",\n dataset=\"data.jsonl\",\n prompt=\"Classify the sentiment as positive, negative, or neutral\",\n task=\"classification\",\n min_accuracy=0.85,\n sample_size=0.2 # Test on 20% of data for speed\n)\n```\n\n### Supported Tasks\n\n- **Classification** - Sentiment analysis, intent detection, categorization\n- **Extraction** - Named entity recognition, information extraction\n- **Generation** - Text completion, summarization, translation\n- **Binary** - Yes/no, true/false decisions\n\n### Model Universe\n\nTests models from a curated universe including:\n- OpenAI (GPT-4, GPT-3.5)\n- Anthropic (Claude 3 Haiku, Sonnet)\n- Google (Gemini Pro, Flash)\n- Meta (Llama 3.1 8B, 70B)\n- Mistral (7B, Mixtral, Large)\n- And more...\n\n## Examples\n\n### Classification Task\n\n```python\nimport llmux\n\n# Sentiment analysis\nexamples = [\n {\"input\": \"This product is amazing!\", \"ground_truth\": \"positive\"},\n {\"input\": \"Terrible service\", \"ground_truth\": \"negative\"},\n {\"input\": \"It's okay\", \"ground_truth\": \"neutral\"}\n]\n\nresult = llmux.optimize_cost(\n baseline=\"gpt-4\",\n examples=examples,\n task=\"classification\",\n options=[\"positive\", \"negative\", \"neutral\"]\n)\n```\n\n### Banking Intent Classification\n\n```python\n# Prepare dataset (one-time)\nfrom prepare_banking77 import prepare_banking77_dataset\nprepare_banking77_dataset()\n\n# Find optimal model\nresult = llmux.optimize_cost(\n baseline=\"gpt-4\",\n dataset=\"data/banking77_test.jsonl\",\n prompt=\"Classify the banking customer query into one of 77 intent categories\",\n task=\"classification\",\n min_accuracy=0.8\n)\n```\n\n### Cost Comparison\n\nTypical savings on standard benchmarks:\n\n| Dataset | Baseline | Best Alternative | Cost Savings | Accuracy |\n|---------|----------|------------------|--------------|----------|\n| IMDB | GPT-4 | Llama-3.1-8B | 96.3% | 95.2% |\n| AG News | GPT-4 | Mistral-7B | 94.7% | 93.8% |\n| Banking77 | GPT-4 | GPT-3.5-turbo | 89.2% | 91.4% |\n\n## Advanced Usage\n\n### Custom Evaluation\n\n```python\nfrom llmux import Evaluator, Provider\n\n# Use specific provider\nprovider = Provider.get_provider(\"openrouter\", model=\"meta-llama/llama-3.1-8b\")\nevaluator = Evaluator(provider)\n\n# Run evaluation\naccuracy, results = evaluator.evaluate(\n dataset=\"test_data.jsonl\",\n system_prompt=\"You are a helpful assistant\"\n)\n```\n\n### Smart Stopping\n\nLLMux automatically implements smart stopping - if a larger model in a family (e.g., Llama-70B) fails to meet accuracy requirements, smaller models (Llama-8B) are skipped to save API calls.\n\n## Dataset Format\n\nLLMux expects JSONL format with `input` and `label` fields:\n\n```json\n{\"input\": \"Example text\", \"label\": \"category\"}\n{\"input\": \"Another example\", \"label\": \"other_category\"}\n```\n\nOr use the `examples` parameter directly:\n\n```python\nexamples = [\n {\"input\": \"text\", \"ground_truth\": \"label\"},\n ...\n]\n```\n\n## API Reference\n\n### optimize_cost()\n\nMain function to find the best cost-optimized model.\n\n**Parameters:**\n- `baseline` (str): Reference model to beat (e.g., \"gpt-4\")\n- `dataset` (str): Path to JSONL dataset file\n- `prompt` (str, optional): System prompt for the task\n- `task` (str, optional): Task type (\"classification\", \"extraction\", \"generation\", \"binary\")\n- `min_accuracy` (float): Minimum acceptable accuracy (default: 0.9)\n- `sample_size` (float, optional): Percentage of dataset to use (0.0-1.0)\n- `options` (list, optional): Valid output options for classification\n- `examples` (list, optional): Direct examples instead of dataset file\n\n**Returns:**\n- Dictionary with:\n - `model`: Best model found\n - `accuracy`: Achieved accuracy\n - `cost_savings`: Percentage saved vs baseline\n - `cost_per_million`: Cost per million tokens\n\n## Requirements\n\n- Python 3.8+\n- OpenRouter API key (set as `OPENROUTER_API_KEY` environment variable)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Citation\n\nIf you use LLMux in your research, please cite:\n\n```bibtex\n@software{llmux2024,\n title = {LLMux: Automatic LLM Cost Optimization},\n author = {Ahuja, Mihir},\n year = {2024},\n url = {https://github.com/mihirahuja/llmux}\n}\n```\n\n## Support\n\n- Issues: [GitHub Issues](https://github.com/mihirahuja/llmux/issues)\n- Discussions: [GitHub Discussions](https://github.com/mihirahuja/llmux/discussions)\n- Email: your@email.com\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Automatically find cheaper LLM alternatives while maintaining performance",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/mihirahuja/llmux#readme",
"Homepage": "https://github.com/mihirahuja/llmux",
"Issues": "https://github.com/mihirahuja/llmux/issues",
"Repository": "https://github.com/mihirahuja/llmux"
},
"split_keywords": [
"llm",
" ai",
" cost-optimization",
" machine-learning",
" openai",
" anthropic"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bb8b0dd8512b2bede7dc1a990bc56b9009d349f6576d090b42b03ce0f1f6c072",
"md5": "996e92c257625b61890c4e9431c3f567",
"sha256": "7d8abfe7403ef001727a06ff49d49f4e1860f63a22fd6c32f549c952c2bfa960"
},
"downloads": -1,
"filename": "llmux_optimizer-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "996e92c257625b61890c4e9431c3f567",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 19299,
"upload_time": "2025-08-10T01:10:15",
"upload_time_iso_8601": "2025-08-10T01:10:15.668179Z",
"url": "https://files.pythonhosted.org/packages/bb/8b/0dd8512b2bede7dc1a990bc56b9009d349f6576d090b42b03ce0f1f6c072/llmux_optimizer-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1df9af2940d39f723a83b71a7e920ac9d87db7eaa2d9986bf5ab927d9be6f450",
"md5": "ea3f023add21c191270ea99f55d9d95f",
"sha256": "7154ddb0f13700b57759361da736269615c53596a54bacd19302dd6c3c3d294a"
},
"downloads": -1,
"filename": "llmux_optimizer-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "ea3f023add21c191270ea99f55d9d95f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 21723,
"upload_time": "2025-08-10T01:10:17",
"upload_time_iso_8601": "2025-08-10T01:10:17.248327Z",
"url": "https://files.pythonhosted.org/packages/1d/f9/af2940d39f723a83b71a7e920ac9d87db7eaa2d9986bf5ab927d9be6f450/llmux_optimizer-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-10 01:10:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mihirahuja",
"github_project": "llmux",
"github_not_found": true,
"lcname": "llmux-optimizer"
}