llmvision


Namellmvision JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryVisualize how LLMs tokenize text
upload_time2025-07-26 07:34:54
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords gpt llm nlp tokenization tokenizer visualization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LLMVision

Visualize how LLMs tokenize text.

```python
from llmvision import tokenize_and_visualize, GPT4Tokenizer

text = "Hello world! πŸ‘‹πŸŒ"
print(tokenize_and_visualize(text, GPT4Tokenizer()))
# Output: Helloβ”‚ worldβ”‚!β”‚<bytes:20f09f>β”‚<bytes:91>β”‚<bytes:8b>β”‚<bytes:f09f>β”‚<bytes:8c>β”‚<bytes:8d>
```

## Features

- Multiple tokenizers: GPT-2, GPT-4, byte-level, character-level
- Visual token boundaries  
- Unicode/emoji handling
- Actual tokenization used by OpenAI models

## Installation

```bash
pip install llmvision
```

## Usage

```bash
llmvision "Hello world!"
llmvision "Hello world!" --tokenizer gpt4
llmvision "Hello world!" --indices
```

```python
from llmvision import tokenize_and_visualize, GPT4Tokenizer

# Default tokenizer
print(tokenize_and_visualize("Hello world!"))

# Specific tokenizer
print(tokenize_and_visualize("Hello world!", GPT4Tokenizer()))
```

## Examples

```python
from llmvision import GPT4Tokenizer

tokenizer = GPT4Tokenizer()
tokens = tokenizer.tokenize("Hello world!")
print(tokens)  # ['Hello', ' world', '!']
```

## Tokenizers

- `SimpleTokenizer` - word/punctuation/space
- `WordTokenizer` - whitespace-based
- `CharTokenizer` - character-level
- `GraphemeTokenizer` - Unicode grapheme clusters
- `ByteLevelTokenizer` - UTF-8 bytes
- `GPT2Tokenizer` - GPT-2 (via tiktoken)
- `GPT4Tokenizer` - GPT-4 (via tiktoken)
- `SubwordTokenizer` - basic subword splitting

## Token Costs

```python
tokenizer = GPT4Tokenizer()
examples = [
    "Hello world!",    # 3 tokens
    "Hello δΈ–η•Œ!",     # 5 tokens  
    "Hello πŸ‘‹πŸŒ!",     # 8 tokens
    "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦",            # 18 tokens
]
for text in examples:
    print(f"{text:15} β†’ {len(tokenizer.tokenize(text))} tokens")
```

## License

MIT
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llmvision",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "gpt, llm, nlp, tokenization, tokenizer, visualization",
    "author": null,
    "author_email": "Alok Singh <alokbeniwal@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/0b/4b/f829e91e1b8e0a5fd4fd416b69df6b0dbe710da62e964f87838621eee731/llmvision-0.1.1.tar.gz",
    "platform": null,
    "description": "# LLMVision\n\nVisualize how LLMs tokenize text.\n\n```python\nfrom llmvision import tokenize_and_visualize, GPT4Tokenizer\n\ntext = \"Hello world! \ud83d\udc4b\ud83c\udf0d\"\nprint(tokenize_and_visualize(text, GPT4Tokenizer()))\n# Output: Hello\u2502 world\u2502!\u2502<bytes:20f09f>\u2502<bytes:91>\u2502<bytes:8b>\u2502<bytes:f09f>\u2502<bytes:8c>\u2502<bytes:8d>\n```\n\n## Features\n\n- Multiple tokenizers: GPT-2, GPT-4, byte-level, character-level\n- Visual token boundaries  \n- Unicode/emoji handling\n- Actual tokenization used by OpenAI models\n\n## Installation\n\n```bash\npip install llmvision\n```\n\n## Usage\n\n```bash\nllmvision \"Hello world!\"\nllmvision \"Hello world!\" --tokenizer gpt4\nllmvision \"Hello world!\" --indices\n```\n\n```python\nfrom llmvision import tokenize_and_visualize, GPT4Tokenizer\n\n# Default tokenizer\nprint(tokenize_and_visualize(\"Hello world!\"))\n\n# Specific tokenizer\nprint(tokenize_and_visualize(\"Hello world!\", GPT4Tokenizer()))\n```\n\n## Examples\n\n```python\nfrom llmvision import GPT4Tokenizer\n\ntokenizer = GPT4Tokenizer()\ntokens = tokenizer.tokenize(\"Hello world!\")\nprint(tokens)  # ['Hello', ' world', '!']\n```\n\n## Tokenizers\n\n- `SimpleTokenizer` - word/punctuation/space\n- `WordTokenizer` - whitespace-based\n- `CharTokenizer` - character-level\n- `GraphemeTokenizer` - Unicode grapheme clusters\n- `ByteLevelTokenizer` - UTF-8 bytes\n- `GPT2Tokenizer` - GPT-2 (via tiktoken)\n- `GPT4Tokenizer` - GPT-4 (via tiktoken)\n- `SubwordTokenizer` - basic subword splitting\n\n## Token Costs\n\n```python\ntokenizer = GPT4Tokenizer()\nexamples = [\n    \"Hello world!\",    # 3 tokens\n    \"Hello \u4e16\u754c!\",     # 5 tokens  \n    \"Hello \ud83d\udc4b\ud83c\udf0d!\",     # 8 tokens\n    \"\ud83d\udc68\u200d\ud83d\udc69\u200d\ud83d\udc67\u200d\ud83d\udc66\",            # 18 tokens\n]\nfor text in examples:\n    print(f\"{text:15} \u2192 {len(tokenizer.tokenize(text))} tokens\")\n```\n\n## License\n\nMIT",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Visualize how LLMs tokenize text",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/alokbeniwal/llmvision#readme",
        "Homepage": "https://github.com/alokbeniwal/llmvision",
        "Issues": "https://github.com/alokbeniwal/llmvision/issues",
        "Repository": "https://github.com/alokbeniwal/llmvision"
    },
    "split_keywords": [
        "gpt",
        " llm",
        " nlp",
        " tokenization",
        " tokenizer",
        " visualization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "987565cc6dbb92346b6c400d19d0873253e0fd3b0e80d32dd658443f5045285c",
                "md5": "d6f3b553362e0e5be91db143a779bf8c",
                "sha256": "230d261fa1402e9c63be1cd01d16377c26c59fa1e32366ed1f95a6b921317a2a"
            },
            "downloads": -1,
            "filename": "llmvision-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d6f3b553362e0e5be91db143a779bf8c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 10432,
            "upload_time": "2025-07-26T07:34:53",
            "upload_time_iso_8601": "2025-07-26T07:34:53.526659Z",
            "url": "https://files.pythonhosted.org/packages/98/75/65cc6dbb92346b6c400d19d0873253e0fd3b0e80d32dd658443f5045285c/llmvision-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b4bf829e91e1b8e0a5fd4fd416b69df6b0dbe710da62e964f87838621eee731",
                "md5": "aee95464fc7c0d360b885b857336f462",
                "sha256": "8eede57c89d8960e2a16cfb65d65a0841a697076f64ad2bbc6288f0a762a990b"
            },
            "downloads": -1,
            "filename": "llmvision-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "aee95464fc7c0d360b885b857336f462",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 3597810,
            "upload_time": "2025-07-26T07:34:54",
            "upload_time_iso_8601": "2025-07-26T07:34:54.863189Z",
            "url": "https://files.pythonhosted.org/packages/0b/4b/f829e91e1b8e0a5fd4fd416b69df6b0dbe710da62e964f87838621eee731/llmvision-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-26 07:34:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "alokbeniwal",
    "github_project": "llmvision#readme",
    "github_not_found": true,
    "lcname": "llmvision"
}
        
Elapsed time: 1.75178s