Name | robobench JSON |
Version |
0.0.2
JSON |
| download |
home_page | None |
Summary | A benchmarking tool for AI models and Hardware. |
upload_time | 2025-02-09 07:59:11 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.12 |
license | None |
keywords |
benchmark
cortex
llm
machine-learning
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# robobench
<p align="center">
<img width="1280" alt="robobench Banner" src="./images/robo-bench1.jpg">
</p>
<div align="center">
<table>
<tr>
<td align="center">
<strong>🚧 EARLY DEVELOPMENT WARNING 🚧</strong>
<br />
<br />
<span>
This tool is currently about as stable as a house of cards in a wind tunnel.
<br />
Very early alpha. Bugs aren't just expected - they've signed a lease.
<br />
<br />
<code>Status: Proceed with optimism ☕</code>
</span>
</td>
</tr>
</table>
</div>
A benchmarking tool for Local LLMs. Currently keeping an eye on [Cortex.cpp](https://github.com/janhq/cortex.cpp)
but with plans to judge other frameworks equally in the future.
## What is this?
`robobench` measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.
## Features
- Model initialization metrics
- Runtime performance
- Resource utilization
- Advanced processing scenarios
- Workload-specific benchmarks
- System integration metrics
- Stability analysis
## Installation
Using `uvx`:
```bash
uvx install robobench
```
Using pip:
```bash
pip install robobench
```
## Usage
### Basic Benchmarking
```bash
# Standard benchmark
robobench "llama3.2:3b-gguf-q2-k"
# With detailed metrics
robobench "llama3.2:3b-gguf-q2-k" --verbose
```
### Specific Benchmarks
```bash
# Initialization only
robobench "llama3.2:3b-gguf-q2-k" --type init
# Runtime metrics
robobench "llama3.2:3b-gguf-q2-k" --type runtime
# Long-running stability test
robobench "llama3.2:3b-gguf-q2-k" --type stability --stability-duration 24
```
### Advanced Usage
```bash
# Custom benchmark prompts
robobench "llama3.2:3b-gguf-q2-k" --type workload --prompts my_prompts.json
# Multi-model benchmarking
robobench "llama3.2:3b-gguf-q2-k" --type advanced \
--secondary-models "tinyllama:1b-gguf-q4" "phi2:3b-gguf-q4"
# Export results
robobench "llama3.2:3b-gguf-q2-k" --json results.json
```
## Status
Under active development. Support for additional frameworks is planned.
## Roadmap
- Framework-agnostic benchmarking
- Additional performance metrics
- Enhanced visualizations
- Extended stability testing
- local server and UI
- CI/CD management
## Development
### Setup
1. Clone the repository:
```bash
git clone https://github.com/jan.ai/robobench.git
cd robobench
```
2. Create and activate a virtual environment:
```bash
# Using uv (recommended)
uv venv .venv --python 3.12
source .venv/bin/activate
```
3. Install development dependencies:
```bash
# Install project in editable mode with test dependencies
uv pip install -e ".[test]"
# Install development tools
uv add --dev ruff pytest pytest-cov pytest-asyncio hypothesis
```
### Code Quality
#### Linting and Formatting
Run Ruff linter:
```bash
# Check code
ruff check .
# Auto-fix issues
ruff check --fix .
# Format code
ruff format .
# Check formatting without changes
ruff format --check .
```
#### Testing
Run tests:
```bash
# All tests
pytest
# With coverage
pytest --cov=robobench --cov-report=html
# Specific test file
pytest src/tests/test_utils.py
# With hypothesis verbose output
pytest -v src/tests/test_utils.py
```
### Pre-commit Checks
Before submitting a PR:
```bash
# Format code
ruff format .
# Run linter
ruff check .
# Run tests with coverage
pytest --cov=robobench --cov-report=term-missing
# Show coverage report in browser (optional)
python -m http.server -d htmlcov
```
### Code Style
The project uses:
- Type hints
- Some docstrings for public functions and classes
### Project Structure
```
src/
├── robobench/
│ ├── core/
│ │ ├── initialization.py # Model initialization metrics
│ │ ├── runtime.py # Runtime performance metrics
│ │ ├── resources.py # Resource utilization metrics
│ │ ├── integration.py # System integration metrics
│ │ ├── workloads.py # Workload-specific metrics
│ │ ├── stability.py # Stability metrics
│ │ └── utils.py # Shared utilities
│ ├── cli.py # Command-line interface
│ └── __init__.py
└── tests/
├── conftest.py # Shared test fixtures
├── test_initialization.py
├── test_runtime.py
├── test_resources.py
├── test_integration.py
└── test_utils.py
```
### Pre-commit Checks
Before submitting a PR:
1. Run all tests
2. Check test coverage
3. Verify type hints with mypy (coming soon)
4. Ensure docstrings are up to date
## Contributing
Issues and pull requests welcome. Do have a look at the existing ones first, though.
Raw data
{
"_id": null,
"home_page": null,
"name": "robobench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "benchmark, cortex, llm, machine-learning",
"author": null,
"author_email": "Ramon Perez <ramon@menlo.ai>, Minh Nguyen <minh@menlo.ai>",
"download_url": "https://files.pythonhosted.org/packages/bd/13/a12a397439222c13d1837200f14e6d1ddec2274173d006426dadbc4a66c6/robobench-0.0.2.tar.gz",
"platform": null,
"description": "# robobench\n\n<p align=\"center\">\n <img width=\"1280\" alt=\"robobench Banner\" src=\"./images/robo-bench1.jpg\">\n</p>\n\n<div align=\"center\">\n <table>\n <tr>\n <td align=\"center\">\n <strong>\ud83d\udea7 EARLY DEVELOPMENT WARNING \ud83d\udea7</strong>\n <br />\n <br />\n <span>\n This tool is currently about as stable as a house of cards in a wind tunnel.\n <br />\n Very early alpha. Bugs aren't just expected - they've signed a lease.\n <br />\n <br />\n <code>Status: Proceed with optimism \u2615</code>\n </span>\n </td>\n </tr>\n </table>\n</div>\n\nA benchmarking tool for Local LLMs. Currently keeping an eye on [Cortex.cpp](https://github.com/janhq/cortex.cpp)\nbut with plans to judge other frameworks equally in the future.\n\n## What is this?\n\n`robobench` measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.\n\n## Features\n\n- Model initialization metrics\n- Runtime performance\n- Resource utilization\n- Advanced processing scenarios\n- Workload-specific benchmarks\n- System integration metrics\n- Stability analysis\n\n## Installation\n\nUsing `uvx`:\n```bash\nuvx install robobench\n```\n\nUsing pip:\n```bash\npip install robobench\n```\n\n## Usage\n\n### Basic Benchmarking\n```bash\n# Standard benchmark\nrobobench \"llama3.2:3b-gguf-q2-k\"\n\n# With detailed metrics\nrobobench \"llama3.2:3b-gguf-q2-k\" --verbose\n```\n\n### Specific Benchmarks\n```bash\n# Initialization only\nrobobench \"llama3.2:3b-gguf-q2-k\" --type init\n\n# Runtime metrics\nrobobench \"llama3.2:3b-gguf-q2-k\" --type runtime\n\n# Long-running stability test\nrobobench \"llama3.2:3b-gguf-q2-k\" --type stability --stability-duration 24\n```\n\n### Advanced Usage\n```bash\n# Custom benchmark prompts\nrobobench \"llama3.2:3b-gguf-q2-k\" --type workload --prompts my_prompts.json\n\n# Multi-model benchmarking\nrobobench \"llama3.2:3b-gguf-q2-k\" --type advanced \\\n --secondary-models \"tinyllama:1b-gguf-q4\" \"phi2:3b-gguf-q4\"\n\n# Export results\nrobobench \"llama3.2:3b-gguf-q2-k\" --json results.json\n```\n\n## Status\n\nUnder active development. Support for additional frameworks is planned.\n\n## Roadmap\n\n- Framework-agnostic benchmarking\n- Additional performance metrics\n- Enhanced visualizations\n- Extended stability testing\n- local server and UI\n- CI/CD management\n\n## Development\n\n### Setup\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/jan.ai/robobench.git\ncd robobench\n```\n\n2. Create and activate a virtual environment:\n```bash\n# Using uv (recommended)\nuv venv .venv --python 3.12\nsource .venv/bin/activate\n```\n\n3. Install development dependencies:\n```bash\n# Install project in editable mode with test dependencies\nuv pip install -e \".[test]\"\n\n# Install development tools\nuv add --dev ruff pytest pytest-cov pytest-asyncio hypothesis\n```\n\n### Code Quality\n\n#### Linting and Formatting\n\nRun Ruff linter:\n```bash\n# Check code\nruff check .\n\n# Auto-fix issues\nruff check --fix .\n\n# Format code\nruff format .\n\n# Check formatting without changes\nruff format --check .\n```\n\n#### Testing\n\nRun tests:\n```bash\n# All tests\npytest\n\n# With coverage\npytest --cov=robobench --cov-report=html\n\n# Specific test file\npytest src/tests/test_utils.py\n\n# With hypothesis verbose output\npytest -v src/tests/test_utils.py\n```\n\n### Pre-commit Checks\n\nBefore submitting a PR:\n```bash\n# Format code\nruff format .\n\n# Run linter\nruff check .\n\n# Run tests with coverage\npytest --cov=robobench --cov-report=term-missing\n\n# Show coverage report in browser (optional)\npython -m http.server -d htmlcov\n```\n\n### Code Style\n\nThe project uses:\n- Type hints\n- Some docstrings for public functions and classes\n\n### Project Structure\n```\nsrc/\n\u251c\u2500\u2500 robobench/\n\u2502 \u251c\u2500\u2500 core/\n\u2502 \u2502 \u251c\u2500\u2500 initialization.py # Model initialization metrics\n\u2502 \u2502 \u251c\u2500\u2500 runtime.py # Runtime performance metrics\n\u2502 \u2502 \u251c\u2500\u2500 resources.py # Resource utilization metrics\n\u2502 \u2502 \u251c\u2500\u2500 integration.py # System integration metrics\n\u2502 \u2502 \u251c\u2500\u2500 workloads.py # Workload-specific metrics\n\u2502 \u2502 \u251c\u2500\u2500 stability.py # Stability metrics\n\u2502 \u2502 \u2514\u2500\u2500 utils.py # Shared utilities\n\u2502 \u251c\u2500\u2500 cli.py # Command-line interface\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2514\u2500\u2500 tests/\n \u251c\u2500\u2500 conftest.py # Shared test fixtures\n \u251c\u2500\u2500 test_initialization.py\n \u251c\u2500\u2500 test_runtime.py\n \u251c\u2500\u2500 test_resources.py\n \u251c\u2500\u2500 test_integration.py\n \u2514\u2500\u2500 test_utils.py\n```\n\n### Pre-commit Checks\n\nBefore submitting a PR:\n1. Run all tests\n2. Check test coverage\n3. Verify type hints with mypy (coming soon)\n4. Ensure docstrings are up to date\n\n\n## Contributing\n\nIssues and pull requests welcome. Do have a look at the existing ones first, though.\n",
"bugtrack_url": null,
"license": null,
"summary": "A benchmarking tool for AI models and Hardware.",
"version": "0.0.2",
"project_urls": null,
"split_keywords": [
"benchmark",
" cortex",
" llm",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8666dbe4d68577bea0d66cf19707872787007f19564c1d19a7d22bbb05ef182b",
"md5": "a5902080f21189587d52ccb74f3350ee",
"sha256": "5adb4cd5ecde38ad4c42fb5b5c1de2cef00384dc750148122e83a44db9931c1a"
},
"downloads": -1,
"filename": "robobench-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a5902080f21189587d52ccb74f3350ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 30602,
"upload_time": "2025-02-09T07:59:08",
"upload_time_iso_8601": "2025-02-09T07:59:08.756545Z",
"url": "https://files.pythonhosted.org/packages/86/66/dbe4d68577bea0d66cf19707872787007f19564c1d19a7d22bbb05ef182b/robobench-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bd13a12a397439222c13d1837200f14e6d1ddec2274173d006426dadbc4a66c6",
"md5": "dfba4875deeffc50ebf7c51db075cf55",
"sha256": "fc2f472dddf2f7ccfa75cce64fd12d7576d192ced9fde9b5f0aafa297695a80e"
},
"downloads": -1,
"filename": "robobench-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "dfba4875deeffc50ebf7c51db075cf55",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 223558,
"upload_time": "2025-02-09T07:59:11",
"upload_time_iso_8601": "2025-02-09T07:59:11.605155Z",
"url": "https://files.pythonhosted.org/packages/bd/13/a12a397439222c13d1837200f14e6d1ddec2274173d006426dadbc4a66c6/robobench-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-09 07:59:11",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "robobench"
}