| Name | tacho JSON |
| Version |
0.8.6
JSON |
| download |
| home_page | None |
| Summary | CLI tool for measuring and comparing LLM inference speeds |
| upload_time | 2025-10-07 08:28:58 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.10 |
| license | None |
| keywords |
benchmark
cli
inference
llm
performance
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# ⚡ tacho - LLM Speed Test
A fast CLI tool for benchmarking LLM inference speed across multiple models and providers. Get tokens/second metrics to compare model performance.
## Quick Start
Set up your API keys:
```bash
export OPENAI_API_KEY=<your-key-here>
export GEMINI_API_KEY=<your-key-here>
```
Run a benchmark (requires `uv`):
```bash
uvx tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514
✓ gpt-4.1
✓ vertex_ai/claude-sonnet-4@20250514
✓ gemini/gemini-2.5-pro
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ Model ┃ Avg t/s ┃ Min t/s ┃ Max t/s ┃ Time ┃ Tokens ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ gemini/gemini-2.5-pro │ 80.0 │ 56.7 │ 128.4 │ 13.5s │ 998 │
│ vertex_ai/claude-sonnet-4@20250514 │ 48.9 │ 44.9 │ 51.6 │ 10.2s │ 500 │
│ gpt-4.1 │ 41.5 │ 35.1 │ 49.9 │ 12.3s │ 500 │
└────────────────────────────────────┴─────────┴─────────┴─────────┴───────┴────────┘
```
> With its default settings, tacho generates 5 runs of 500 tokens each per model producing some inference costs.
## Features
- **Parallel benchmarking** - All models and runs execute concurrently for faster results
- **Token-based metrics** - Measures actual tokens/second, not just response time
- **Multi-provider support** - Works with any provider supported by LiteLLM (OpenAI, Anthropic, Google, Cohere, etc.)
- **Configurable token limits** - Control response length for consistent comparisons
- **Pre-flight validation** - Checks model availability and authentication before benchmarking
- **Graceful error handling** - Clear error messages for authentication, rate limits, and connection issues
## Installation
For regular use, install with `uv`:
```bash
uv tool install tacho
```
Or with pip:
```bash
pip install tacho
```
## Usage
### Basic benchmark
```bash
# Compare models with default settings (5 runs, 500 token limit)
tacho gpt-4.1-nano gemini/gemini-2.0-flash
# Custom settings (options must come before model names)
tacho --runs 3 --tokens 1000 gpt-4.1-nano gemini/gemini-2.0-flash
tacho -r 3 -t 1000 gpt-4.1-nano gemini/gemini-2.0-flash
```
### Command options
- `--runs, -r`: Number of inference runs per model (default: 5)
- `--tokens, -t`: Maximum tokens to generate per response (default: 500)
- `--prompt, -p`: Custom prompt for benchmarking
**Note:** When using the shorthand syntax (without the `bench` subcommand), options must be placed before model names. For example:
- ✅ `tacho -t 2000 gpt-4.1-mini`
- ❌ `tacho gpt-4.1-mini -t 2000`
## Output
Tacho displays a clean comparison table showing:
- **Avg/Min/Max tokens per second** - Primary performance metrics
- **Average time** - Average time per inference run
Models are sorted by performance (highest tokens/second first).
## Supported Providers
Tacho works with any provider supported by LiteLLM.
## Development
To contribute to Tacho, clone the repository and install development dependencies:
```bash
git clone https://github.com/pietz/tacho.git
cd tacho
uv sync
```
### Running Tests
Tacho includes a comprehensive test suite with full mocking of external API calls:
```bash
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run specific test file
uv run pytest tests/test_cli.py
# Run with coverage
uv run pytest --cov=tacho
```
The test suite includes:
- Unit tests for all core modules
- Mocked LiteLLM API calls (no API keys required)
- CLI command testing
- Async function testing
- Edge case coverage
Raw data
{
"_id": null,
"home_page": null,
"name": "tacho",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "benchmark, cli, inference, llm, performance",
"author": null,
"author_email": "Paul-Louis Pr\u00f6ve <mail@plpp.de>",
"download_url": "https://files.pythonhosted.org/packages/29/3e/9b331edf20e309a72ac39559ec966d2114727d660e5506c1367e692d1cca/tacho-0.8.6.tar.gz",
"platform": null,
"description": "# \u26a1 tacho - LLM Speed Test\n\nA fast CLI tool for benchmarking LLM inference speed across multiple models and providers. Get tokens/second metrics to compare model performance.\n\n\n## Quick Start\n\nSet up your API keys:\n\n```bash\nexport OPENAI_API_KEY=<your-key-here>\nexport GEMINI_API_KEY=<your-key-here>\n```\n\nRun a benchmark (requires `uv`):\n\n```bash\nuvx tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514\n\u2713 gpt-4.1\n\u2713 vertex_ai/claude-sonnet-4@20250514\n\u2713 gemini/gemini-2.5-pro\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 Model \u2503 Avg t/s \u2503 Min t/s \u2503 Max t/s \u2503 Time \u2503 Tokens \u2503\n\u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 gemini/gemini-2.5-pro \u2502 80.0 \u2502 56.7 \u2502 128.4 \u2502 13.5s \u2502 998 \u2502\n\u2502 vertex_ai/claude-sonnet-4@20250514 \u2502 48.9 \u2502 44.9 \u2502 51.6 \u2502 10.2s \u2502 500 \u2502\n\u2502 gpt-4.1 \u2502 41.5 \u2502 35.1 \u2502 49.9 \u2502 12.3s \u2502 500 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n> With its default settings, tacho generates 5 runs of 500 tokens each per model producing some inference costs.\n\n\n## Features\n\n- **Parallel benchmarking** - All models and runs execute concurrently for faster results\n- **Token-based metrics** - Measures actual tokens/second, not just response time\n- **Multi-provider support** - Works with any provider supported by LiteLLM (OpenAI, Anthropic, Google, Cohere, etc.)\n- **Configurable token limits** - Control response length for consistent comparisons\n- **Pre-flight validation** - Checks model availability and authentication before benchmarking\n- **Graceful error handling** - Clear error messages for authentication, rate limits, and connection issues\n\n\n## Installation\n\nFor regular use, install with `uv`:\n\n```bash\nuv tool install tacho\n```\n\nOr with pip:\n\n```bash\npip install tacho\n```\n\n## Usage\n\n### Basic benchmark\n\n```bash\n# Compare models with default settings (5 runs, 500 token limit)\ntacho gpt-4.1-nano gemini/gemini-2.0-flash\n\n# Custom settings (options must come before model names)\ntacho --runs 3 --tokens 1000 gpt-4.1-nano gemini/gemini-2.0-flash\ntacho -r 3 -t 1000 gpt-4.1-nano gemini/gemini-2.0-flash\n```\n\n### Command options\n\n- `--runs, -r`: Number of inference runs per model (default: 5)\n- `--tokens, -t`: Maximum tokens to generate per response (default: 500)\n- `--prompt, -p`: Custom prompt for benchmarking\n\n**Note:** When using the shorthand syntax (without the `bench` subcommand), options must be placed before model names. For example:\n- \u2705 `tacho -t 2000 gpt-4.1-mini`\n- \u274c `tacho gpt-4.1-mini -t 2000`\n\n## Output\n\nTacho displays a clean comparison table showing:\n- **Avg/Min/Max tokens per second** - Primary performance metrics\n- **Average time** - Average time per inference run\n\nModels are sorted by performance (highest tokens/second first).\n\n## Supported Providers\n\nTacho works with any provider supported by LiteLLM.\n\n## Development\n\nTo contribute to Tacho, clone the repository and install development dependencies:\n\n```bash\ngit clone https://github.com/pietz/tacho.git\ncd tacho\nuv sync\n```\n\n### Running Tests\n\nTacho includes a comprehensive test suite with full mocking of external API calls:\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with verbose output\nuv run pytest -v\n\n# Run specific test file\nuv run pytest tests/test_cli.py\n\n# Run with coverage\nuv run pytest --cov=tacho\n```\n\nThe test suite includes:\n- Unit tests for all core modules\n- Mocked LiteLLM API calls (no API keys required)\n- CLI command testing\n- Async function testing\n- Edge case coverage\n",
"bugtrack_url": null,
"license": null,
"summary": "CLI tool for measuring and comparing LLM inference speeds",
"version": "0.8.6",
"project_urls": {
"Homepage": "https://github.com/pietz/tacho",
"Issues": "https://github.com/pietz/tacho/issues",
"Repository": "https://github.com/pietz/tacho"
},
"split_keywords": [
"benchmark",
" cli",
" inference",
" llm",
" performance"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7016d775990f26d1bd7c1fd612fec46d7319efb33e1a637b70d467893d04855d",
"md5": "891465aa56baa5c31b6820c156fdefdc",
"sha256": "ccf9fb955b6dc8b832b0b7e562b8ad30a7c7bc4ab6540c78ccaab154323814fc"
},
"downloads": -1,
"filename": "tacho-0.8.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "891465aa56baa5c31b6820c156fdefdc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 7501,
"upload_time": "2025-10-07T08:28:56",
"upload_time_iso_8601": "2025-10-07T08:28:56.879779Z",
"url": "https://files.pythonhosted.org/packages/70/16/d775990f26d1bd7c1fd612fec46d7319efb33e1a637b70d467893d04855d/tacho-0.8.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "293e9b331edf20e309a72ac39559ec966d2114727d660e5506c1367e692d1cca",
"md5": "267d7c17b5728ef0a05a8c8211830005",
"sha256": "b2a6d1396096dfc5249f787108cd5c77ad6daeafdfedd300d3f3260bd5ac3cf9"
},
"downloads": -1,
"filename": "tacho-0.8.6.tar.gz",
"has_sig": false,
"md5_digest": "267d7c17b5728ef0a05a8c8211830005",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 161029,
"upload_time": "2025-10-07T08:28:58",
"upload_time_iso_8601": "2025-10-07T08:28:58.020350Z",
"url": "https://files.pythonhosted.org/packages/29/3e/9b331edf20e309a72ac39559ec966d2114727d660e5506c1367e692d1cca/tacho-0.8.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-07 08:28:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pietz",
"github_project": "tacho",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "tacho"
}