tacho


Nametacho JSON
Version 0.8.6 PyPI version JSON
download
home_pageNone
SummaryCLI tool for measuring and comparing LLM inference speeds
upload_time2025-10-07 08:28:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords benchmark cli inference llm performance
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ⚡ tacho - LLM Speed Test

A fast CLI tool for benchmarking LLM inference speed across multiple models and providers. Get tokens/second metrics to compare model performance.


## Quick Start

Set up your API keys:

```bash
export OPENAI_API_KEY=<your-key-here>
export GEMINI_API_KEY=<your-key-here>
```

Run a benchmark (requires `uv`):

```bash
uvx tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514
✓ gpt-4.1
✓ vertex_ai/claude-sonnet-4@20250514
✓ gemini/gemini-2.5-pro
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ Model                              ┃ Avg t/s ┃ Min t/s ┃ Max t/s ┃  Time ┃ Tokens ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ gemini/gemini-2.5-pro              │    80.0 │    56.7 │   128.4 │ 13.5s │    998 │
│ vertex_ai/claude-sonnet-4@20250514 │    48.9 │    44.9 │    51.6 │ 10.2s │    500 │
│ gpt-4.1                            │    41.5 │    35.1 │    49.9 │ 12.3s │    500 │
└────────────────────────────────────┴─────────┴─────────┴─────────┴───────┴────────┘
```

> With its default settings, tacho generates 5 runs of 500 tokens each per model producing some inference costs.


## Features

- **Parallel benchmarking** - All models and runs execute concurrently for faster results
- **Token-based metrics** - Measures actual tokens/second, not just response time
- **Multi-provider support** - Works with any provider supported by LiteLLM (OpenAI, Anthropic, Google, Cohere, etc.)
- **Configurable token limits** - Control response length for consistent comparisons
- **Pre-flight validation** - Checks model availability and authentication before benchmarking
- **Graceful error handling** - Clear error messages for authentication, rate limits, and connection issues


## Installation

For regular use, install with `uv`:

```bash
uv tool install tacho
```

Or with pip:

```bash
pip install tacho
```

## Usage

### Basic benchmark

```bash
# Compare models with default settings (5 runs, 500 token limit)
tacho gpt-4.1-nano gemini/gemini-2.0-flash

# Custom settings (options must come before model names)
tacho --runs 3 --tokens 1000 gpt-4.1-nano gemini/gemini-2.0-flash
tacho -r 3 -t 1000 gpt-4.1-nano gemini/gemini-2.0-flash
```

### Command options

- `--runs, -r`: Number of inference runs per model (default: 5)
- `--tokens, -t`: Maximum tokens to generate per response (default: 500)
- `--prompt, -p`: Custom prompt for benchmarking

**Note:** When using the shorthand syntax (without the `bench` subcommand), options must be placed before model names. For example:
- ✅ `tacho -t 2000 gpt-4.1-mini`
- ❌ `tacho gpt-4.1-mini -t 2000`

## Output

Tacho displays a clean comparison table showing:
- **Avg/Min/Max tokens per second** - Primary performance metrics
- **Average time** - Average time per inference run

Models are sorted by performance (highest tokens/second first).

## Supported Providers

Tacho works with any provider supported by LiteLLM.

## Development

To contribute to Tacho, clone the repository and install development dependencies:

```bash
git clone https://github.com/pietz/tacho.git
cd tacho
uv sync
```

### Running Tests

Tacho includes a comprehensive test suite with full mocking of external API calls:

```bash
# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_cli.py

# Run with coverage
uv run pytest --cov=tacho
```

The test suite includes:
- Unit tests for all core modules
- Mocked LiteLLM API calls (no API keys required)
- CLI command testing
- Async function testing
- Edge case coverage

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tacho",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "benchmark, cli, inference, llm, performance",
    "author": null,
    "author_email": "Paul-Louis Pr\u00f6ve <mail@plpp.de>",
    "download_url": "https://files.pythonhosted.org/packages/29/3e/9b331edf20e309a72ac39559ec966d2114727d660e5506c1367e692d1cca/tacho-0.8.6.tar.gz",
    "platform": null,
    "description": "# \u26a1 tacho - LLM Speed Test\n\nA fast CLI tool for benchmarking LLM inference speed across multiple models and providers. Get tokens/second metrics to compare model performance.\n\n\n## Quick Start\n\nSet up your API keys:\n\n```bash\nexport OPENAI_API_KEY=<your-key-here>\nexport GEMINI_API_KEY=<your-key-here>\n```\n\nRun a benchmark (requires `uv`):\n\n```bash\nuvx tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514\n\u2713 gpt-4.1\n\u2713 vertex_ai/claude-sonnet-4@20250514\n\u2713 gemini/gemini-2.5-pro\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 Model                              \u2503 Avg t/s \u2503 Min t/s \u2503 Max t/s \u2503  Time \u2503 Tokens \u2503\n\u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 gemini/gemini-2.5-pro              \u2502    80.0 \u2502    56.7 \u2502   128.4 \u2502 13.5s \u2502    998 \u2502\n\u2502 vertex_ai/claude-sonnet-4@20250514 \u2502    48.9 \u2502    44.9 \u2502    51.6 \u2502 10.2s \u2502    500 \u2502\n\u2502 gpt-4.1                            \u2502    41.5 \u2502    35.1 \u2502    49.9 \u2502 12.3s \u2502    500 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n> With its default settings, tacho generates 5 runs of 500 tokens each per model producing some inference costs.\n\n\n## Features\n\n- **Parallel benchmarking** - All models and runs execute concurrently for faster results\n- **Token-based metrics** - Measures actual tokens/second, not just response time\n- **Multi-provider support** - Works with any provider supported by LiteLLM (OpenAI, Anthropic, Google, Cohere, etc.)\n- **Configurable token limits** - Control response length for consistent comparisons\n- **Pre-flight validation** - Checks model availability and authentication before benchmarking\n- **Graceful error handling** - Clear error messages for authentication, rate limits, and connection issues\n\n\n## Installation\n\nFor regular use, install with `uv`:\n\n```bash\nuv tool install tacho\n```\n\nOr with pip:\n\n```bash\npip install tacho\n```\n\n## Usage\n\n### Basic benchmark\n\n```bash\n# Compare models with default settings (5 runs, 500 token limit)\ntacho gpt-4.1-nano gemini/gemini-2.0-flash\n\n# Custom settings (options must come before model names)\ntacho --runs 3 --tokens 1000 gpt-4.1-nano gemini/gemini-2.0-flash\ntacho -r 3 -t 1000 gpt-4.1-nano gemini/gemini-2.0-flash\n```\n\n### Command options\n\n- `--runs, -r`: Number of inference runs per model (default: 5)\n- `--tokens, -t`: Maximum tokens to generate per response (default: 500)\n- `--prompt, -p`: Custom prompt for benchmarking\n\n**Note:** When using the shorthand syntax (without the `bench` subcommand), options must be placed before model names. For example:\n- \u2705 `tacho -t 2000 gpt-4.1-mini`\n- \u274c `tacho gpt-4.1-mini -t 2000`\n\n## Output\n\nTacho displays a clean comparison table showing:\n- **Avg/Min/Max tokens per second** - Primary performance metrics\n- **Average time** - Average time per inference run\n\nModels are sorted by performance (highest tokens/second first).\n\n## Supported Providers\n\nTacho works with any provider supported by LiteLLM.\n\n## Development\n\nTo contribute to Tacho, clone the repository and install development dependencies:\n\n```bash\ngit clone https://github.com/pietz/tacho.git\ncd tacho\nuv sync\n```\n\n### Running Tests\n\nTacho includes a comprehensive test suite with full mocking of external API calls:\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with verbose output\nuv run pytest -v\n\n# Run specific test file\nuv run pytest tests/test_cli.py\n\n# Run with coverage\nuv run pytest --cov=tacho\n```\n\nThe test suite includes:\n- Unit tests for all core modules\n- Mocked LiteLLM API calls (no API keys required)\n- CLI command testing\n- Async function testing\n- Edge case coverage\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "CLI tool for measuring and comparing LLM inference speeds",
    "version": "0.8.6",
    "project_urls": {
        "Homepage": "https://github.com/pietz/tacho",
        "Issues": "https://github.com/pietz/tacho/issues",
        "Repository": "https://github.com/pietz/tacho"
    },
    "split_keywords": [
        "benchmark",
        " cli",
        " inference",
        " llm",
        " performance"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7016d775990f26d1bd7c1fd612fec46d7319efb33e1a637b70d467893d04855d",
                "md5": "891465aa56baa5c31b6820c156fdefdc",
                "sha256": "ccf9fb955b6dc8b832b0b7e562b8ad30a7c7bc4ab6540c78ccaab154323814fc"
            },
            "downloads": -1,
            "filename": "tacho-0.8.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "891465aa56baa5c31b6820c156fdefdc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 7501,
            "upload_time": "2025-10-07T08:28:56",
            "upload_time_iso_8601": "2025-10-07T08:28:56.879779Z",
            "url": "https://files.pythonhosted.org/packages/70/16/d775990f26d1bd7c1fd612fec46d7319efb33e1a637b70d467893d04855d/tacho-0.8.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "293e9b331edf20e309a72ac39559ec966d2114727d660e5506c1367e692d1cca",
                "md5": "267d7c17b5728ef0a05a8c8211830005",
                "sha256": "b2a6d1396096dfc5249f787108cd5c77ad6daeafdfedd300d3f3260bd5ac3cf9"
            },
            "downloads": -1,
            "filename": "tacho-0.8.6.tar.gz",
            "has_sig": false,
            "md5_digest": "267d7c17b5728ef0a05a8c8211830005",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 161029,
            "upload_time": "2025-10-07T08:28:58",
            "upload_time_iso_8601": "2025-10-07T08:28:58.020350Z",
            "url": "https://files.pythonhosted.org/packages/29/3e/9b331edf20e309a72ac39559ec966d2114727d660e5506c1367e692d1cca/tacho-0.8.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-07 08:28:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pietz",
    "github_project": "tacho",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tacho"
}
        
Elapsed time: 1.65597s