# Tokker CLI
A fast, simple CLI tool for tokenizing text using OpenAI's `tiktoken` library. Get accurate token counts for GPT models with a single command.
---
## Features
- **Simple Usage**: Just `tok "your text"` - that's it!
- **Multiple Tokenizers**: Support for `cl100k_base` (GPT-4) and `o200k_base` (GPT-4o) tokenizers
- **Flexible Output**: JSON, plain text, and summary output formats
- **Configuration**: Persistent configuration for default tokenizer settings
- **Text Analysis**: Token count, word count, character count, and token frequency analysis
- **Cross-platform**: Works on Windows, macOS, and Linux
---
## Installation
Install from PyPI with pip:
```bash
pip install tokker
```
That's it! The `tok` command is now available in your terminal.
---
## Quick Start
```bash
# Basic usage
tok "Hello world"
# Get plain token output
tok "Hello world" --format plain
# Get summary stats
tok "Hello world" --format summary
# Use a different tokenizer
tok "Hello world" --tokenizer o200k_base
```
---
## Usage Examples
### Basic Tokenization
```bash
$ tok 'Hello world'
{
"converted": "Hello⎮ world",
"token_strings": ["Hello", " world"],
"token_ids": [15339, 1917],
"token_count": 2,
"word_count": 2,
"char_count": 11,
"pivot": {
"Hello": 1,
" world": 1
},
"tokenizer": "cl100k_base"
}
```
### Plain Text Output
```bash
$ tok 'Hello world' --format plain
Hello⎮ world
```
### Summary Statistics
```bash
$ tok 'Hello world' --format summary
{
"token_count": 2,
"word_count": 2,
"char_count": 11,
"tokenizer": "cl100k_base"
}
```
### Using Different Tokenizers
```bash
$ tok 'Hello world' --tokenizer o200k_base
```
### Configuration
Set your default tokenizer to avoid specifying it every time:
```bash
$ tok --set-default-tokenizer o200k_base
✓ Default tokenizer set to: o200k_base
Configuration saved to: ~/.config/tokker/tokenizer_config.json
```
---
## Command Options
```
usage: tok [-h] [--tokenizer {cl100k_base,o200k_base}]
[--format {json,plain,summary}]
[--set-default-tokenizer {cl100k_base,o200k_base}]
[text]
positional arguments:
text Text to tokenize
options:
--tokenizer Tokenizer to use (cl100k_base, o200k_base)
--format Output format (json, plain, summary)
--set-default-tokenizer Set default tokenizer
-h, --help Show help message
```
---
## Tokenizers
### cl100k_base (Default)
- **Used by**: GPT-4, GPT-3.5-turbo
- **Description**: OpenAI's standard tokenizer for GPT-4 models
- **Vocabulary size**: ~100,000 tokens
### o200k_base
- **Used by**: GPT-4o, GPT-4o-mini
- **Description**: Newer tokenizer with improved efficiency
- **Vocabulary size**: ~200,000 tokens
---
## Configuration
Tokker stores your preferences in `~/.config/tokker/tokenizer_config.json`:
```json
{
"default_tokenizer": "cl100k_base",
"delimiter": "⎮"
}
```
- `default_tokenizer`: Default tokenizer to use
- `delimiter`: Character used to separate tokens in plain text output
---
## Programmatic Usage
You can also use tokker in your Python code:
```python
import tokker
# Count tokens
count = tokker.count_tokens("Hello world")
print(f"Token count: {count}")
# Full tokenization
result = tokker.tokenize_text("Hello world", "cl100k_base")
print(result["token_count"])
```
---
## Tips
- Use single quotes to avoid shell interpretation: `tok 'Hello world!'`
- Pipe text from other commands: `echo "Hello world" | xargs tok`
- Set your preferred tokenizer once: `tok --set-default-tokenizer o200k_base`
---
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## Contributing
Issues and pull requests are welcome! Visit the [GitHub repository](https://github.com/igoakulov/tokker).
---
## Acknowledgments
- OpenAI for the tiktoken library
Raw data
{
"_id": null,
"home_page": null,
"name": "tokker",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "tokenization, tokens, tiktoken, openai, cli, text-analysis",
"author": null,
"author_email": "igoakulov <igoruphere@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/20/cd/c720461e41ffe73c96732d8a7053af912ad7635ee13868037c2387c24f8a/tokker-0.1.1.tar.gz",
"platform": null,
"description": "# Tokker CLI\n\nA fast, simple CLI tool for tokenizing text using OpenAI's `tiktoken` library. Get accurate token counts for GPT models with a single command.\n\n---\n\n## Features\n\n- **Simple Usage**: Just `tok \"your text\"` - that's it!\n- **Multiple Tokenizers**: Support for `cl100k_base` (GPT-4) and `o200k_base` (GPT-4o) tokenizers\n- **Flexible Output**: JSON, plain text, and summary output formats\n- **Configuration**: Persistent configuration for default tokenizer settings\n- **Text Analysis**: Token count, word count, character count, and token frequency analysis\n- **Cross-platform**: Works on Windows, macOS, and Linux\n\n---\n\n## Installation\n\nInstall from PyPI with pip:\n\n```bash\npip install tokker\n```\n\nThat's it! The `tok` command is now available in your terminal.\n\n---\n\n## Quick Start\n\n```bash\n# Basic usage\ntok \"Hello world\"\n\n# Get plain token output\ntok \"Hello world\" --format plain\n\n# Get summary stats\ntok \"Hello world\" --format summary\n\n# Use a different tokenizer\ntok \"Hello world\" --tokenizer o200k_base\n```\n\n---\n\n## Usage Examples\n\n### Basic Tokenization\n\n```bash\n$ tok 'Hello world'\n{\n \"converted\": \"Hello\u23ae world\",\n \"token_strings\": [\"Hello\", \" world\"],\n \"token_ids\": [15339, 1917],\n \"token_count\": 2,\n \"word_count\": 2,\n \"char_count\": 11,\n \"pivot\": {\n \"Hello\": 1,\n \" world\": 1\n },\n \"tokenizer\": \"cl100k_base\"\n}\n```\n\n### Plain Text Output\n\n```bash\n$ tok 'Hello world' --format plain\nHello\u23ae world\n```\n\n### Summary Statistics\n\n```bash\n$ tok 'Hello world' --format summary\n{\n \"token_count\": 2,\n \"word_count\": 2,\n \"char_count\": 11,\n \"tokenizer\": \"cl100k_base\"\n}\n```\n\n### Using Different Tokenizers\n\n```bash\n$ tok 'Hello world' --tokenizer o200k_base\n```\n\n### Configuration\n\nSet your default tokenizer to avoid specifying it every time:\n\n```bash\n$ tok --set-default-tokenizer o200k_base\n\u2713 Default tokenizer set to: o200k_base\nConfiguration saved to: ~/.config/tokker/tokenizer_config.json\n```\n\n---\n\n## Command Options\n\n```\nusage: tok [-h] [--tokenizer {cl100k_base,o200k_base}]\n [--format {json,plain,summary}]\n [--set-default-tokenizer {cl100k_base,o200k_base}]\n [text]\n\npositional arguments:\n text Text to tokenize\n\noptions:\n --tokenizer Tokenizer to use (cl100k_base, o200k_base)\n --format Output format (json, plain, summary)\n --set-default-tokenizer Set default tokenizer\n -h, --help Show help message\n```\n\n---\n\n## Tokenizers\n\n### cl100k_base (Default)\n- **Used by**: GPT-4, GPT-3.5-turbo\n- **Description**: OpenAI's standard tokenizer for GPT-4 models\n- **Vocabulary size**: ~100,000 tokens\n\n### o200k_base\n- **Used by**: GPT-4o, GPT-4o-mini\n- **Description**: Newer tokenizer with improved efficiency\n- **Vocabulary size**: ~200,000 tokens\n\n---\n\n## Configuration\n\nTokker stores your preferences in `~/.config/tokker/tokenizer_config.json`:\n\n```json\n{\n \"default_tokenizer\": \"cl100k_base\",\n \"delimiter\": \"\u23ae\"\n}\n```\n\n- `default_tokenizer`: Default tokenizer to use\n- `delimiter`: Character used to separate tokens in plain text output\n\n---\n\n## Programmatic Usage\n\nYou can also use tokker in your Python code:\n\n```python\nimport tokker\n\n# Count tokens\ncount = tokker.count_tokens(\"Hello world\")\nprint(f\"Token count: {count}\")\n\n# Full tokenization\nresult = tokker.tokenize_text(\"Hello world\", \"cl100k_base\")\nprint(result[\"token_count\"])\n```\n\n---\n\n## Tips\n\n- Use single quotes to avoid shell interpretation: `tok 'Hello world!'`\n- Pipe text from other commands: `echo \"Hello world\" | xargs tok`\n- Set your preferred tokenizer once: `tok --set-default-tokenizer o200k_base`\n\n---\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## Contributing\n\nIssues and pull requests are welcome! Visit the [GitHub repository](https://github.com/igoakulov/tokker).\n\n---\n\n## Acknowledgments\n\n- OpenAI for the tiktoken library\n",
"bugtrack_url": null,
"license": null,
"summary": "A fast, simple CLI tool for tokenizing text using OpenAI's tiktoken library",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://github.com/igoakulov/tokker#readme",
"Homepage": "https://github.com/igoakulov/tokker",
"Issues": "https://github.com/igoakulov/tokker/issues",
"Repository": "https://github.com/igoakulov/tokker"
},
"split_keywords": [
"tokenization",
" tokens",
" tiktoken",
" openai",
" cli",
" text-analysis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bb0870a61275032b083852ad9cd80d3a1da2a99bcf67ac72d3a5d2bb85288401",
"md5": "19a359e1d6553e32710dd1f43d26333f",
"sha256": "dd0c36b2a2f80ecd4689897fda637502e95cec98b4c392c07d8409ab76771e22"
},
"downloads": -1,
"filename": "tokker-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "19a359e1d6553e32710dd1f43d26333f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 9728,
"upload_time": "2025-07-28T20:19:09",
"upload_time_iso_8601": "2025-07-28T20:19:09.755905Z",
"url": "https://files.pythonhosted.org/packages/bb/08/70a61275032b083852ad9cd80d3a1da2a99bcf67ac72d3a5d2bb85288401/tokker-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "20cdc720461e41ffe73c96732d8a7053af912ad7635ee13868037c2387c24f8a",
"md5": "77e4256bec6e1c12b5c88409b5d5a6e1",
"sha256": "91a222bc11a0a070af6df705c62015559a83d6281367cc0abfaf45c219d7fd71"
},
"downloads": -1,
"filename": "tokker-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "77e4256bec6e1c12b5c88409b5d5a6e1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 9689,
"upload_time": "2025-07-28T20:19:10",
"upload_time_iso_8601": "2025-07-28T20:19:10.715685Z",
"url": "https://files.pythonhosted.org/packages/20/cd/c720461e41ffe73c96732d8a7053af912ad7635ee13868037c2387c24f8a/tokker-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-28 20:19:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "igoakulov",
"github_project": "tokker#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "tiktoken",
"specs": [
[
">=",
"0.5.0"
]
]
}
],
"lcname": "tokker"
}