robobench


Namerobobench JSON
Version 0.0.2 PyPI version JSON
download
home_pageNone
SummaryA benchmarking tool for AI models and Hardware.
upload_time2025-02-09 07:59:11
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseNone
keywords benchmark cortex llm machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # robobench

<p align="center">
  <img width="1280" alt="robobench Banner" src="./images/robo-bench1.jpg">
</p>

<div align="center">
  <table>
    <tr>
      <td align="center">
        <strong>🚧 EARLY DEVELOPMENT WARNING 🚧</strong>
        <br />
        <br />
        <span>
          This tool is currently about as stable as a house of cards in a wind tunnel.
          <br />
          Very early alpha. Bugs aren't just expected - they've signed a lease.
          <br />
          <br />
          <code>Status: Proceed with optimism ☕</code>
        </span>
      </td>
    </tr>
  </table>
</div>

A benchmarking tool for Local LLMs. Currently keeping an eye on [Cortex.cpp](https://github.com/janhq/cortex.cpp)
but with plans to judge other frameworks equally in the future.

## What is this?

`robobench` measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.

## Features

- Model initialization metrics
- Runtime performance
- Resource utilization
- Advanced processing scenarios
- Workload-specific benchmarks
- System integration metrics
- Stability analysis

## Installation

Using `uvx`:
```bash
uvx install robobench
```

Using pip:
```bash
pip install robobench
```

## Usage

### Basic Benchmarking
```bash
# Standard benchmark
robobench "llama3.2:3b-gguf-q2-k"

# With detailed metrics
robobench "llama3.2:3b-gguf-q2-k" --verbose
```

### Specific Benchmarks
```bash
# Initialization only
robobench "llama3.2:3b-gguf-q2-k" --type init

# Runtime metrics
robobench "llama3.2:3b-gguf-q2-k" --type runtime

# Long-running stability test
robobench "llama3.2:3b-gguf-q2-k" --type stability --stability-duration 24
```

### Advanced Usage
```bash
# Custom benchmark prompts
robobench "llama3.2:3b-gguf-q2-k" --type workload --prompts my_prompts.json

# Multi-model benchmarking
robobench "llama3.2:3b-gguf-q2-k" --type advanced \
    --secondary-models "tinyllama:1b-gguf-q4" "phi2:3b-gguf-q4"

# Export results
robobench "llama3.2:3b-gguf-q2-k" --json results.json
```

## Status

Under active development. Support for additional frameworks is planned.

## Roadmap

- Framework-agnostic benchmarking
- Additional performance metrics
- Enhanced visualizations
- Extended stability testing
- local server and UI
- CI/CD management

## Development

### Setup

1. Clone the repository:
```bash
git clone https://github.com/jan.ai/robobench.git
cd robobench
```

2. Create and activate a virtual environment:
```bash
# Using uv (recommended)
uv venv .venv --python 3.12
source .venv/bin/activate
```

3. Install development dependencies:
```bash
# Install project in editable mode with test dependencies
uv pip install -e ".[test]"

# Install development tools
uv add --dev ruff pytest pytest-cov pytest-asyncio hypothesis
```

### Code Quality

#### Linting and Formatting

Run Ruff linter:
```bash
# Check code
ruff check .

# Auto-fix issues
ruff check --fix .

# Format code
ruff format .

# Check formatting without changes
ruff format --check .
```

#### Testing

Run tests:
```bash
# All tests
pytest

# With coverage
pytest --cov=robobench --cov-report=html

# Specific test file
pytest src/tests/test_utils.py

# With hypothesis verbose output
pytest -v src/tests/test_utils.py
```

### Pre-commit Checks

Before submitting a PR:
```bash
# Format code
ruff format .

# Run linter
ruff check .

# Run tests with coverage
pytest --cov=robobench --cov-report=term-missing

# Show coverage report in browser (optional)
python -m http.server -d htmlcov
```

### Code Style

The project uses:
- Type hints
- Some docstrings for public functions and classes

### Project Structure
```
src/
├── robobench/
│   ├── core/
│   │   ├── initialization.py   # Model initialization metrics
│   │   ├── runtime.py         # Runtime performance metrics
│   │   ├── resources.py       # Resource utilization metrics
│   │   ├── integration.py     # System integration metrics
│   │   ├── workloads.py      # Workload-specific metrics
│   │   ├── stability.py       # Stability metrics
│   │   └── utils.py          # Shared utilities
│   ├── cli.py                # Command-line interface
│   └── __init__.py
└── tests/
    ├── conftest.py           # Shared test fixtures
    ├── test_initialization.py
    ├── test_runtime.py
    ├── test_resources.py
    ├── test_integration.py
    └── test_utils.py
```

### Pre-commit Checks

Before submitting a PR:
1. Run all tests
2. Check test coverage
3. Verify type hints with mypy (coming soon)
4. Ensure docstrings are up to date


## Contributing

Issues and pull requests welcome. Do have a look at the existing ones first, though.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "robobench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "benchmark, cortex, llm, machine-learning",
    "author": null,
    "author_email": "Ramon Perez <ramon@menlo.ai>, Minh Nguyen <minh@menlo.ai>",
    "download_url": "https://files.pythonhosted.org/packages/bd/13/a12a397439222c13d1837200f14e6d1ddec2274173d006426dadbc4a66c6/robobench-0.0.2.tar.gz",
    "platform": null,
    "description": "# robobench\n\n<p align=\"center\">\n  <img width=\"1280\" alt=\"robobench Banner\" src=\"./images/robo-bench1.jpg\">\n</p>\n\n<div align=\"center\">\n  <table>\n    <tr>\n      <td align=\"center\">\n        <strong>\ud83d\udea7 EARLY DEVELOPMENT WARNING \ud83d\udea7</strong>\n        <br />\n        <br />\n        <span>\n          This tool is currently about as stable as a house of cards in a wind tunnel.\n          <br />\n          Very early alpha. Bugs aren't just expected - they've signed a lease.\n          <br />\n          <br />\n          <code>Status: Proceed with optimism \u2615</code>\n        </span>\n      </td>\n    </tr>\n  </table>\n</div>\n\nA benchmarking tool for Local LLMs. Currently keeping an eye on [Cortex.cpp](https://github.com/janhq/cortex.cpp)\nbut with plans to judge other frameworks equally in the future.\n\n## What is this?\n\n`robobench` measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.\n\n## Features\n\n- Model initialization metrics\n- Runtime performance\n- Resource utilization\n- Advanced processing scenarios\n- Workload-specific benchmarks\n- System integration metrics\n- Stability analysis\n\n## Installation\n\nUsing `uvx`:\n```bash\nuvx install robobench\n```\n\nUsing pip:\n```bash\npip install robobench\n```\n\n## Usage\n\n### Basic Benchmarking\n```bash\n# Standard benchmark\nrobobench \"llama3.2:3b-gguf-q2-k\"\n\n# With detailed metrics\nrobobench \"llama3.2:3b-gguf-q2-k\" --verbose\n```\n\n### Specific Benchmarks\n```bash\n# Initialization only\nrobobench \"llama3.2:3b-gguf-q2-k\" --type init\n\n# Runtime metrics\nrobobench \"llama3.2:3b-gguf-q2-k\" --type runtime\n\n# Long-running stability test\nrobobench \"llama3.2:3b-gguf-q2-k\" --type stability --stability-duration 24\n```\n\n### Advanced Usage\n```bash\n# Custom benchmark prompts\nrobobench \"llama3.2:3b-gguf-q2-k\" --type workload --prompts my_prompts.json\n\n# Multi-model benchmarking\nrobobench \"llama3.2:3b-gguf-q2-k\" --type advanced \\\n    --secondary-models \"tinyllama:1b-gguf-q4\" \"phi2:3b-gguf-q4\"\n\n# Export results\nrobobench \"llama3.2:3b-gguf-q2-k\" --json results.json\n```\n\n## Status\n\nUnder active development. Support for additional frameworks is planned.\n\n## Roadmap\n\n- Framework-agnostic benchmarking\n- Additional performance metrics\n- Enhanced visualizations\n- Extended stability testing\n- local server and UI\n- CI/CD management\n\n## Development\n\n### Setup\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/jan.ai/robobench.git\ncd robobench\n```\n\n2. Create and activate a virtual environment:\n```bash\n# Using uv (recommended)\nuv venv .venv --python 3.12\nsource .venv/bin/activate\n```\n\n3. Install development dependencies:\n```bash\n# Install project in editable mode with test dependencies\nuv pip install -e \".[test]\"\n\n# Install development tools\nuv add --dev ruff pytest pytest-cov pytest-asyncio hypothesis\n```\n\n### Code Quality\n\n#### Linting and Formatting\n\nRun Ruff linter:\n```bash\n# Check code\nruff check .\n\n# Auto-fix issues\nruff check --fix .\n\n# Format code\nruff format .\n\n# Check formatting without changes\nruff format --check .\n```\n\n#### Testing\n\nRun tests:\n```bash\n# All tests\npytest\n\n# With coverage\npytest --cov=robobench --cov-report=html\n\n# Specific test file\npytest src/tests/test_utils.py\n\n# With hypothesis verbose output\npytest -v src/tests/test_utils.py\n```\n\n### Pre-commit Checks\n\nBefore submitting a PR:\n```bash\n# Format code\nruff format .\n\n# Run linter\nruff check .\n\n# Run tests with coverage\npytest --cov=robobench --cov-report=term-missing\n\n# Show coverage report in browser (optional)\npython -m http.server -d htmlcov\n```\n\n### Code Style\n\nThe project uses:\n- Type hints\n- Some docstrings for public functions and classes\n\n### Project Structure\n```\nsrc/\n\u251c\u2500\u2500 robobench/\n\u2502   \u251c\u2500\u2500 core/\n\u2502   \u2502   \u251c\u2500\u2500 initialization.py   # Model initialization metrics\n\u2502   \u2502   \u251c\u2500\u2500 runtime.py         # Runtime performance metrics\n\u2502   \u2502   \u251c\u2500\u2500 resources.py       # Resource utilization metrics\n\u2502   \u2502   \u251c\u2500\u2500 integration.py     # System integration metrics\n\u2502   \u2502   \u251c\u2500\u2500 workloads.py      # Workload-specific metrics\n\u2502   \u2502   \u251c\u2500\u2500 stability.py       # Stability metrics\n\u2502   \u2502   \u2514\u2500\u2500 utils.py          # Shared utilities\n\u2502   \u251c\u2500\u2500 cli.py                # Command-line interface\n\u2502   \u2514\u2500\u2500 __init__.py\n\u2514\u2500\u2500 tests/\n    \u251c\u2500\u2500 conftest.py           # Shared test fixtures\n    \u251c\u2500\u2500 test_initialization.py\n    \u251c\u2500\u2500 test_runtime.py\n    \u251c\u2500\u2500 test_resources.py\n    \u251c\u2500\u2500 test_integration.py\n    \u2514\u2500\u2500 test_utils.py\n```\n\n### Pre-commit Checks\n\nBefore submitting a PR:\n1. Run all tests\n2. Check test coverage\n3. Verify type hints with mypy (coming soon)\n4. Ensure docstrings are up to date\n\n\n## Contributing\n\nIssues and pull requests welcome. Do have a look at the existing ones first, though.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A benchmarking tool for AI models and Hardware.",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "benchmark",
        " cortex",
        " llm",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8666dbe4d68577bea0d66cf19707872787007f19564c1d19a7d22bbb05ef182b",
                "md5": "a5902080f21189587d52ccb74f3350ee",
                "sha256": "5adb4cd5ecde38ad4c42fb5b5c1de2cef00384dc750148122e83a44db9931c1a"
            },
            "downloads": -1,
            "filename": "robobench-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a5902080f21189587d52ccb74f3350ee",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 30602,
            "upload_time": "2025-02-09T07:59:08",
            "upload_time_iso_8601": "2025-02-09T07:59:08.756545Z",
            "url": "https://files.pythonhosted.org/packages/86/66/dbe4d68577bea0d66cf19707872787007f19564c1d19a7d22bbb05ef182b/robobench-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bd13a12a397439222c13d1837200f14e6d1ddec2274173d006426dadbc4a66c6",
                "md5": "dfba4875deeffc50ebf7c51db075cf55",
                "sha256": "fc2f472dddf2f7ccfa75cce64fd12d7576d192ced9fde9b5f0aafa297695a80e"
            },
            "downloads": -1,
            "filename": "robobench-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "dfba4875deeffc50ebf7c51db075cf55",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 223558,
            "upload_time": "2025-02-09T07:59:11",
            "upload_time_iso_8601": "2025-02-09T07:59:11.605155Z",
            "url": "https://files.pythonhosted.org/packages/bd/13/a12a397439222c13d1837200f14e6d1ddec2274173d006426dadbc4a66c6/robobench-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-09 07:59:11",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "robobench"
}
        
Elapsed time: 0.58242s