vllm-semantic-router-bench


Namevllm-semantic-router-bench JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/vllm-project/semantic-router
SummaryComprehensive benchmark suite for semantic router vs direct vLLM evaluation across multiple reasoning datasets
upload_time2025-09-12 21:10:15
maintainerNone
docs_urlNone
authorvLLM Semantic Router Team
requires_python>=3.8
licenseApache-2.0
keywords vllm-semantic-router benchmark vllm llm evaluation reasoning multiple-choice mmlu arc gpqa commonsense hellaswag truthfulqa
VCS
bugtrack_url
requirements torch accelerate sentence-transformers transformers datasets scikit-learn numpy pandas requests huggingface-hub psutil matplotlib seaborn openai
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vLLM Semantic Router Benchmark Suite

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

A comprehensive benchmark suite for evaluating **semantic router** performance against **direct vLLM** across multiple reasoning datasets. Perfect for researchers and developers working on LLM routing, evaluation, and performance optimization.

## 🎯 Key Features

- **6 Major Reasoning Datasets**: MMLU-Pro, ARC, GPQA, TruthfulQA, CommonsenseQA, HellaSwag
- **Router vs vLLM Comparison**: Side-by-side performance evaluation
- **Multiple Evaluation Modes**: NR (neutral), XC (explicit CoT), NR_REASONING (auto-reasoning)
- **Research-Ready Output**: CSV files and publication-quality plots
- **Dataset-Agnostic Architecture**: Easy to extend with new datasets
- **CLI Tools**: Simple command-line interface for common operations

## 🚀 Quick Start

### Installation

```bash
pip install vllm-semantic-router-bench
```

### Basic Usage

```bash
# Quick test on MMLU dataset
vllm-semantic-router-bench test --dataset mmlu --samples 5

# Full comparison between router and vLLM
vllm-semantic-router-bench compare --dataset arc --samples 10

# List available datasets
vllm-semantic-router-bench list-datasets

# Run comprehensive multi-dataset benchmark
vllm-semantic-router-bench comprehensive
```

### Python API

```python
from vllm_semantic_router_bench import DatasetFactory, list_available_datasets

# Load a dataset
factory = DatasetFactory()
dataset = factory.create_dataset("mmlu")
questions, info = dataset.load_dataset(samples_per_category=10)

print(f"Loaded {len(questions)} questions from {info.name}")
print(f"Categories: {info.categories}")
```

## 📊 Supported Datasets

| Dataset | Domain | Categories | Difficulty | CoT Support |
|---------|--------|------------|------------|-------------|
| **MMLU-Pro** | Academic Knowledge | 57 subjects | Undergraduate | ✅ |
| **ARC** | Scientific Reasoning | Science | Grade School | ❌ |
| **GPQA** | Graduate Q&A | Graduate-level | Graduate | ❌ |
| **TruthfulQA** | Truthfulness | Truthfulness | Hard | ❌ |
| **CommonsenseQA** | Common Sense | Common Sense | Hard | ❌ |
| **HellaSwag** | Commonsense NLI | ~50 activities | Moderate | ❌ |

## 🔧 Advanced Usage

### Custom Evaluation Script

```python
import subprocess
import sys

# Run detailed benchmark with custom parameters
cmd = [
    "router-bench",  # Main benchmark script
    "--dataset", "mmlu",
    "--samples-per-category", "20", 
    "--run-router", "--router-models", "auto",
    "--run-vllm", "--vllm-models", "openai/gpt-oss-20b",
    "--vllm-exec-modes", "NR", "NR_REASONING",
    "--output-dir", "results/custom_test"
]

subprocess.run(cmd)
```

### Plotting Results

```bash
# Generate plots from benchmark results
bench-plot --router-dir results/router_mmlu \
           --vllm-dir results/vllm_mmlu \
           --output-dir results/plots \
           --dataset-name "MMLU-Pro"
```

## 📈 Research Output

The benchmark generates research-ready outputs:

- **CSV Files**: Detailed per-question results and aggregated metrics
- **Master CSV**: Combined results across all test runs
- **Plots**: Accuracy and token usage comparisons
- **Summary Reports**: Markdown reports with key findings

### Example Output Structure

```
results/
├── research_results_master.csv          # Main research data
├── comparison_20250115_143022/
│   ├── router_mmlu/
│   │   └── detailed_results.csv
│   ├── vllm_mmlu/  
│   │   └── detailed_results.csv
│   ├── plots/
│   │   ├── accuracy_comparison.png
│   │   └── token_usage_comparison.png
│   └── RESEARCH_SUMMARY.md
```

## 🛠️ Development

### Local Installation

```bash
git clone https://github.com/vllm-project/semantic-router
cd semantic-router/bench
pip install -e ".[dev]"
```

### Adding New Datasets

1. Create a new dataset implementation in `dataset_implementations/`
2. Inherit from `DatasetInterface`
3. Register in `dataset_factory.py`
4. Add tests and documentation

```python
from vllm_semantic_router_bench import DatasetInterface, Question, DatasetInfo

class MyDataset(DatasetInterface):
    def load_dataset(self, **kwargs):
        # Implementation here
        pass
    
    def format_prompt(self, question, style="plain"):
        # Implementation here  
        pass
```

## 📋 Requirements

- Python 3.8+
- OpenAI API access (for model evaluation)
- Hugging Face account (for dataset access)
- 4GB+ RAM (for larger datasets)

### Dependencies

- `openai>=1.0.0` - OpenAI API client
- `datasets>=2.14.0` - Hugging Face datasets
- `pandas>=1.5.0` - Data manipulation
- `matplotlib>=3.5.0` - Plotting
- `seaborn>=0.11.0` - Advanced plotting
- `tqdm>=4.64.0` - Progress bars

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

### Common Contributions

- Adding new datasets
- Improving evaluation metrics
- Enhancing visualization
- Performance optimizations
- Documentation improvements

## 📄 License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- **Documentation**: https://vllm-semantic-router.com
- **GitHub**: https://github.com/vllm-project/semantic-router
- **Issues**: https://github.com/vllm-project/semantic-router/issues
- **PyPI**: https://pypi.org/project/vllm-semantic-router-bench/

## 📞 Support

- **GitHub Issues**: Bug reports and feature requests
- **Documentation**: Comprehensive guides and API reference
- **Community**: Join our discussions and get help from other users

---

**Made with ❤️ by the vLLM Semantic Router Team**

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vllm-project/semantic-router",
    "name": "vllm-semantic-router-bench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "vllm-semantic-router, benchmark, vllm, llm, evaluation, reasoning, multiple-choice, mmlu, arc, gpqa, commonsense, hellaswag, truthfulqa",
    "author": "vLLM Semantic Router Team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/75/30/268500ec3ea185d761f17e20504e8e2c6fcaa7d556e018a14a0c85611899/vllm_semantic_router_bench-1.0.0.tar.gz",
    "platform": null,
    "description": "# vLLM Semantic Router Benchmark Suite\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\nA comprehensive benchmark suite for evaluating **semantic router** performance against **direct vLLM** across multiple reasoning datasets. Perfect for researchers and developers working on LLM routing, evaluation, and performance optimization.\n\n## \ud83c\udfaf Key Features\n\n- **6 Major Reasoning Datasets**: MMLU-Pro, ARC, GPQA, TruthfulQA, CommonsenseQA, HellaSwag\n- **Router vs vLLM Comparison**: Side-by-side performance evaluation\n- **Multiple Evaluation Modes**: NR (neutral), XC (explicit CoT), NR_REASONING (auto-reasoning)\n- **Research-Ready Output**: CSV files and publication-quality plots\n- **Dataset-Agnostic Architecture**: Easy to extend with new datasets\n- **CLI Tools**: Simple command-line interface for common operations\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install vllm-semantic-router-bench\n```\n\n### Basic Usage\n\n```bash\n# Quick test on MMLU dataset\nvllm-semantic-router-bench test --dataset mmlu --samples 5\n\n# Full comparison between router and vLLM\nvllm-semantic-router-bench compare --dataset arc --samples 10\n\n# List available datasets\nvllm-semantic-router-bench list-datasets\n\n# Run comprehensive multi-dataset benchmark\nvllm-semantic-router-bench comprehensive\n```\n\n### Python API\n\n```python\nfrom vllm_semantic_router_bench import DatasetFactory, list_available_datasets\n\n# Load a dataset\nfactory = DatasetFactory()\ndataset = factory.create_dataset(\"mmlu\")\nquestions, info = dataset.load_dataset(samples_per_category=10)\n\nprint(f\"Loaded {len(questions)} questions from {info.name}\")\nprint(f\"Categories: {info.categories}\")\n```\n\n## \ud83d\udcca Supported Datasets\n\n| Dataset | Domain | Categories | Difficulty | CoT Support |\n|---------|--------|------------|------------|-------------|\n| **MMLU-Pro** | Academic Knowledge | 57 subjects | Undergraduate | \u2705 |\n| **ARC** | Scientific Reasoning | Science | Grade School | \u274c |\n| **GPQA** | Graduate Q&A | Graduate-level | Graduate | \u274c |\n| **TruthfulQA** | Truthfulness | Truthfulness | Hard | \u274c |\n| **CommonsenseQA** | Common Sense | Common Sense | Hard | \u274c |\n| **HellaSwag** | Commonsense NLI | ~50 activities | Moderate | \u274c |\n\n## \ud83d\udd27 Advanced Usage\n\n### Custom Evaluation Script\n\n```python\nimport subprocess\nimport sys\n\n# Run detailed benchmark with custom parameters\ncmd = [\n    \"router-bench\",  # Main benchmark script\n    \"--dataset\", \"mmlu\",\n    \"--samples-per-category\", \"20\", \n    \"--run-router\", \"--router-models\", \"auto\",\n    \"--run-vllm\", \"--vllm-models\", \"openai/gpt-oss-20b\",\n    \"--vllm-exec-modes\", \"NR\", \"NR_REASONING\",\n    \"--output-dir\", \"results/custom_test\"\n]\n\nsubprocess.run(cmd)\n```\n\n### Plotting Results\n\n```bash\n# Generate plots from benchmark results\nbench-plot --router-dir results/router_mmlu \\\n           --vllm-dir results/vllm_mmlu \\\n           --output-dir results/plots \\\n           --dataset-name \"MMLU-Pro\"\n```\n\n## \ud83d\udcc8 Research Output\n\nThe benchmark generates research-ready outputs:\n\n- **CSV Files**: Detailed per-question results and aggregated metrics\n- **Master CSV**: Combined results across all test runs\n- **Plots**: Accuracy and token usage comparisons\n- **Summary Reports**: Markdown reports with key findings\n\n### Example Output Structure\n\n```\nresults/\n\u251c\u2500\u2500 research_results_master.csv          # Main research data\n\u251c\u2500\u2500 comparison_20250115_143022/\n\u2502   \u251c\u2500\u2500 router_mmlu/\n\u2502   \u2502   \u2514\u2500\u2500 detailed_results.csv\n\u2502   \u251c\u2500\u2500 vllm_mmlu/  \n\u2502   \u2502   \u2514\u2500\u2500 detailed_results.csv\n\u2502   \u251c\u2500\u2500 plots/\n\u2502   \u2502   \u251c\u2500\u2500 accuracy_comparison.png\n\u2502   \u2502   \u2514\u2500\u2500 token_usage_comparison.png\n\u2502   \u2514\u2500\u2500 RESEARCH_SUMMARY.md\n```\n\n## \ud83d\udee0\ufe0f Development\n\n### Local Installation\n\n```bash\ngit clone https://github.com/vllm-project/semantic-router\ncd semantic-router/bench\npip install -e \".[dev]\"\n```\n\n### Adding New Datasets\n\n1. Create a new dataset implementation in `dataset_implementations/`\n2. Inherit from `DatasetInterface`\n3. Register in `dataset_factory.py`\n4. Add tests and documentation\n\n```python\nfrom vllm_semantic_router_bench import DatasetInterface, Question, DatasetInfo\n\nclass MyDataset(DatasetInterface):\n    def load_dataset(self, **kwargs):\n        # Implementation here\n        pass\n    \n    def format_prompt(self, question, style=\"plain\"):\n        # Implementation here  \n        pass\n```\n\n## \ud83d\udccb Requirements\n\n- Python 3.8+\n- OpenAI API access (for model evaluation)\n- Hugging Face account (for dataset access)\n- 4GB+ RAM (for larger datasets)\n\n### Dependencies\n\n- `openai>=1.0.0` - OpenAI API client\n- `datasets>=2.14.0` - Hugging Face datasets\n- `pandas>=1.5.0` - Data manipulation\n- `matplotlib>=3.5.0` - Plotting\n- `seaborn>=0.11.0` - Advanced plotting\n- `tqdm>=4.64.0` - Progress bars\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n### Common Contributions\n\n- Adding new datasets\n- Improving evaluation metrics\n- Enhancing visualization\n- Performance optimizations\n- Documentation improvements\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **Documentation**: https://vllm-semantic-router.com\n- **GitHub**: https://github.com/vllm-project/semantic-router\n- **Issues**: https://github.com/vllm-project/semantic-router/issues\n- **PyPI**: https://pypi.org/project/vllm-semantic-router-bench/\n\n## \ud83d\udcde Support\n\n- **GitHub Issues**: Bug reports and feature requests\n- **Documentation**: Comprehensive guides and API reference\n- **Community**: Join our discussions and get help from other users\n\n---\n\n**Made with \u2764\ufe0f by the vLLM Semantic Router Team**\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Comprehensive benchmark suite for semantic router vs direct vLLM evaluation across multiple reasoning datasets",
    "version": "1.0.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/vllm-project/semantic-router/issues",
        "Documentation": "https://vllm-semantic-router.com",
        "Homepage": "https://github.com/vllm-project/semantic-router",
        "Repository": "https://github.com/vllm-project/semantic-router"
    },
    "split_keywords": [
        "vllm-semantic-router",
        " benchmark",
        " vllm",
        " llm",
        " evaluation",
        " reasoning",
        " multiple-choice",
        " mmlu",
        " arc",
        " gpqa",
        " commonsense",
        " hellaswag",
        " truthfulqa"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "114a28f3aa49c315f39919c9d7ae4be0ea07f8f8bf72e1972c070717a67a7e4a",
                "md5": "24705d06f5ffb7aaa703188629e28902",
                "sha256": "7ab7d6ac106ab1169fa880ca86a19d9f680cdadcb546db437c1c5a33c4467b7d"
            },
            "downloads": -1,
            "filename": "vllm_semantic_router_bench-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "24705d06f5ffb7aaa703188629e28902",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 45531,
            "upload_time": "2025-09-12T21:10:14",
            "upload_time_iso_8601": "2025-09-12T21:10:14.869752Z",
            "url": "https://files.pythonhosted.org/packages/11/4a/28f3aa49c315f39919c9d7ae4be0ea07f8f8bf72e1972c070717a67a7e4a/vllm_semantic_router_bench-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7530268500ec3ea185d761f17e20504e8e2c6fcaa7d556e018a14a0c85611899",
                "md5": "04c17bc03a0a3c4cfd2dd15e730ebbc5",
                "sha256": "3e11ce635573814d1aebbc6a9f087edfdd044754dd8bc44b33ed17ad5d845f29"
            },
            "downloads": -1,
            "filename": "vllm_semantic_router_bench-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "04c17bc03a0a3c4cfd2dd15e730ebbc5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 53477,
            "upload_time": "2025-09-12T21:10:15",
            "upload_time_iso_8601": "2025-09-12T21:10:15.887868Z",
            "url": "https://files.pythonhosted.org/packages/75/30/268500ec3ea185d761f17e20504e8e2c6fcaa7d556e018a14a0c85611899/vllm_semantic_router_bench-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-12 21:10:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vllm-project",
    "github_project": "semantic-router",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.7.1"
                ]
            ]
        },
        {
            "name": "accelerate",
            "specs": [
                [
                    ">=",
                    "0.26.0"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    ">=",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.54.0"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.25.0"
                ]
            ]
        },
        {
            "name": "huggingface-hub",
            "specs": [
                [
                    ">=",
                    "0.10.0"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.10"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.13"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.100"
                ]
            ]
        }
    ],
    "lcname": "vllm-semantic-router-bench"
}
        
Elapsed time: 1.05974s