tokker


Nametokker JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA fast, simple CLI tool for tokenizing text using OpenAI's tiktoken library
upload_time2025-07-28 20:19:10
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords tokenization tokens tiktoken openai cli text-analysis
VCS
bugtrack_url
requirements tiktoken
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tokker CLI

A fast, simple CLI tool for tokenizing text using OpenAI's `tiktoken` library. Get accurate token counts for GPT models with a single command.

---

## Features

- **Simple Usage**: Just `tok "your text"` - that's it!
- **Multiple Tokenizers**: Support for `cl100k_base` (GPT-4) and `o200k_base` (GPT-4o) tokenizers
- **Flexible Output**: JSON, plain text, and summary output formats
- **Configuration**: Persistent configuration for default tokenizer settings
- **Text Analysis**: Token count, word count, character count, and token frequency analysis
- **Cross-platform**: Works on Windows, macOS, and Linux

---

## Installation

Install from PyPI with pip:

```bash
pip install tokker
```

That's it! The `tok` command is now available in your terminal.

---

## Quick Start

```bash
# Basic usage
tok "Hello world"

# Get plain token output
tok "Hello world" --format plain

# Get summary stats
tok "Hello world" --format summary

# Use a different tokenizer
tok "Hello world" --tokenizer o200k_base
```

---

## Usage Examples

### Basic Tokenization

```bash
$ tok 'Hello world'
{
  "converted": "Hello⎮ world",
  "token_strings": ["Hello", " world"],
  "token_ids": [15339, 1917],
  "token_count": 2,
  "word_count": 2,
  "char_count": 11,
  "pivot": {
    "Hello": 1,
    " world": 1
  },
  "tokenizer": "cl100k_base"
}
```

### Plain Text Output

```bash
$ tok 'Hello world' --format plain
Hello⎮ world
```

### Summary Statistics

```bash
$ tok 'Hello world' --format summary
{
  "token_count": 2,
  "word_count": 2,
  "char_count": 11,
  "tokenizer": "cl100k_base"
}
```

### Using Different Tokenizers

```bash
$ tok 'Hello world' --tokenizer o200k_base
```

### Configuration

Set your default tokenizer to avoid specifying it every time:

```bash
$ tok --set-default-tokenizer o200k_base
✓ Default tokenizer set to: o200k_base
Configuration saved to: ~/.config/tokker/tokenizer_config.json
```

---

## Command Options

```
usage: tok [-h] [--tokenizer {cl100k_base,o200k_base}]
           [--format {json,plain,summary}]
           [--set-default-tokenizer {cl100k_base,o200k_base}]
           [text]

positional arguments:
  text                  Text to tokenize

options:
  --tokenizer           Tokenizer to use (cl100k_base, o200k_base)
  --format              Output format (json, plain, summary)
  --set-default-tokenizer  Set default tokenizer
  -h, --help           Show help message
```

---

## Tokenizers

### cl100k_base (Default)
- **Used by**: GPT-4, GPT-3.5-turbo
- **Description**: OpenAI's standard tokenizer for GPT-4 models
- **Vocabulary size**: ~100,000 tokens

### o200k_base
- **Used by**: GPT-4o, GPT-4o-mini
- **Description**: Newer tokenizer with improved efficiency
- **Vocabulary size**: ~200,000 tokens

---

## Configuration

Tokker stores your preferences in `~/.config/tokker/tokenizer_config.json`:

```json
{
  "default_tokenizer": "cl100k_base",
  "delimiter": "⎮"
}
```

- `default_tokenizer`: Default tokenizer to use
- `delimiter`: Character used to separate tokens in plain text output

---

## Programmatic Usage

You can also use tokker in your Python code:

```python
import tokker

# Count tokens
count = tokker.count_tokens("Hello world")
print(f"Token count: {count}")

# Full tokenization
result = tokker.tokenize_text("Hello world", "cl100k_base")
print(result["token_count"])
```

---

## Tips

- Use single quotes to avoid shell interpretation: `tok 'Hello world!'`
- Pipe text from other commands: `echo "Hello world" | xargs tok`
- Set your preferred tokenizer once: `tok --set-default-tokenizer o200k_base`

---

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## Contributing

Issues and pull requests are welcome! Visit the [GitHub repository](https://github.com/igoakulov/tokker).

---

## Acknowledgments

- OpenAI for the tiktoken library

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tokker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "tokenization, tokens, tiktoken, openai, cli, text-analysis",
    "author": null,
    "author_email": "igoakulov <igoruphere@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/20/cd/c720461e41ffe73c96732d8a7053af912ad7635ee13868037c2387c24f8a/tokker-0.1.1.tar.gz",
    "platform": null,
    "description": "# Tokker CLI\n\nA fast, simple CLI tool for tokenizing text using OpenAI's `tiktoken` library. Get accurate token counts for GPT models with a single command.\n\n---\n\n## Features\n\n- **Simple Usage**: Just `tok \"your text\"` - that's it!\n- **Multiple Tokenizers**: Support for `cl100k_base` (GPT-4) and `o200k_base` (GPT-4o) tokenizers\n- **Flexible Output**: JSON, plain text, and summary output formats\n- **Configuration**: Persistent configuration for default tokenizer settings\n- **Text Analysis**: Token count, word count, character count, and token frequency analysis\n- **Cross-platform**: Works on Windows, macOS, and Linux\n\n---\n\n## Installation\n\nInstall from PyPI with pip:\n\n```bash\npip install tokker\n```\n\nThat's it! The `tok` command is now available in your terminal.\n\n---\n\n## Quick Start\n\n```bash\n# Basic usage\ntok \"Hello world\"\n\n# Get plain token output\ntok \"Hello world\" --format plain\n\n# Get summary stats\ntok \"Hello world\" --format summary\n\n# Use a different tokenizer\ntok \"Hello world\" --tokenizer o200k_base\n```\n\n---\n\n## Usage Examples\n\n### Basic Tokenization\n\n```bash\n$ tok 'Hello world'\n{\n  \"converted\": \"Hello\u23ae world\",\n  \"token_strings\": [\"Hello\", \" world\"],\n  \"token_ids\": [15339, 1917],\n  \"token_count\": 2,\n  \"word_count\": 2,\n  \"char_count\": 11,\n  \"pivot\": {\n    \"Hello\": 1,\n    \" world\": 1\n  },\n  \"tokenizer\": \"cl100k_base\"\n}\n```\n\n### Plain Text Output\n\n```bash\n$ tok 'Hello world' --format plain\nHello\u23ae world\n```\n\n### Summary Statistics\n\n```bash\n$ tok 'Hello world' --format summary\n{\n  \"token_count\": 2,\n  \"word_count\": 2,\n  \"char_count\": 11,\n  \"tokenizer\": \"cl100k_base\"\n}\n```\n\n### Using Different Tokenizers\n\n```bash\n$ tok 'Hello world' --tokenizer o200k_base\n```\n\n### Configuration\n\nSet your default tokenizer to avoid specifying it every time:\n\n```bash\n$ tok --set-default-tokenizer o200k_base\n\u2713 Default tokenizer set to: o200k_base\nConfiguration saved to: ~/.config/tokker/tokenizer_config.json\n```\n\n---\n\n## Command Options\n\n```\nusage: tok [-h] [--tokenizer {cl100k_base,o200k_base}]\n           [--format {json,plain,summary}]\n           [--set-default-tokenizer {cl100k_base,o200k_base}]\n           [text]\n\npositional arguments:\n  text                  Text to tokenize\n\noptions:\n  --tokenizer           Tokenizer to use (cl100k_base, o200k_base)\n  --format              Output format (json, plain, summary)\n  --set-default-tokenizer  Set default tokenizer\n  -h, --help           Show help message\n```\n\n---\n\n## Tokenizers\n\n### cl100k_base (Default)\n- **Used by**: GPT-4, GPT-3.5-turbo\n- **Description**: OpenAI's standard tokenizer for GPT-4 models\n- **Vocabulary size**: ~100,000 tokens\n\n### o200k_base\n- **Used by**: GPT-4o, GPT-4o-mini\n- **Description**: Newer tokenizer with improved efficiency\n- **Vocabulary size**: ~200,000 tokens\n\n---\n\n## Configuration\n\nTokker stores your preferences in `~/.config/tokker/tokenizer_config.json`:\n\n```json\n{\n  \"default_tokenizer\": \"cl100k_base\",\n  \"delimiter\": \"\u23ae\"\n}\n```\n\n- `default_tokenizer`: Default tokenizer to use\n- `delimiter`: Character used to separate tokens in plain text output\n\n---\n\n## Programmatic Usage\n\nYou can also use tokker in your Python code:\n\n```python\nimport tokker\n\n# Count tokens\ncount = tokker.count_tokens(\"Hello world\")\nprint(f\"Token count: {count}\")\n\n# Full tokenization\nresult = tokker.tokenize_text(\"Hello world\", \"cl100k_base\")\nprint(result[\"token_count\"])\n```\n\n---\n\n## Tips\n\n- Use single quotes to avoid shell interpretation: `tok 'Hello world!'`\n- Pipe text from other commands: `echo \"Hello world\" | xargs tok`\n- Set your preferred tokenizer once: `tok --set-default-tokenizer o200k_base`\n\n---\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## Contributing\n\nIssues and pull requests are welcome! Visit the [GitHub repository](https://github.com/igoakulov/tokker).\n\n---\n\n## Acknowledgments\n\n- OpenAI for the tiktoken library\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A fast, simple CLI tool for tokenizing text using OpenAI's tiktoken library",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/igoakulov/tokker#readme",
        "Homepage": "https://github.com/igoakulov/tokker",
        "Issues": "https://github.com/igoakulov/tokker/issues",
        "Repository": "https://github.com/igoakulov/tokker"
    },
    "split_keywords": [
        "tokenization",
        " tokens",
        " tiktoken",
        " openai",
        " cli",
        " text-analysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb0870a61275032b083852ad9cd80d3a1da2a99bcf67ac72d3a5d2bb85288401",
                "md5": "19a359e1d6553e32710dd1f43d26333f",
                "sha256": "dd0c36b2a2f80ecd4689897fda637502e95cec98b4c392c07d8409ab76771e22"
            },
            "downloads": -1,
            "filename": "tokker-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "19a359e1d6553e32710dd1f43d26333f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9728,
            "upload_time": "2025-07-28T20:19:09",
            "upload_time_iso_8601": "2025-07-28T20:19:09.755905Z",
            "url": "https://files.pythonhosted.org/packages/bb/08/70a61275032b083852ad9cd80d3a1da2a99bcf67ac72d3a5d2bb85288401/tokker-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "20cdc720461e41ffe73c96732d8a7053af912ad7635ee13868037c2387c24f8a",
                "md5": "77e4256bec6e1c12b5c88409b5d5a6e1",
                "sha256": "91a222bc11a0a070af6df705c62015559a83d6281367cc0abfaf45c219d7fd71"
            },
            "downloads": -1,
            "filename": "tokker-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "77e4256bec6e1c12b5c88409b5d5a6e1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 9689,
            "upload_time": "2025-07-28T20:19:10",
            "upload_time_iso_8601": "2025-07-28T20:19:10.715685Z",
            "url": "https://files.pythonhosted.org/packages/20/cd/c720461e41ffe73c96732d8a7053af912ad7635ee13868037c2387c24f8a/tokker-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-28 20:19:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "igoakulov",
    "github_project": "tokker#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "tiktoken",
            "specs": [
                [
                    ">=",
                    "0.5.0"
                ]
            ]
        }
    ],
    "lcname": "tokker"
}
        
Elapsed time: 0.49398s