tokenlens

Name	tokenlens JSON
Version	0.1.6 JSON
	download
home_page	https://github.com/tokenlens/tokenlens
Summary	A library for accurate token counting and limit validation across various LLM providers
upload_time	2025-01-05 03:40:24
maintainer	None
docs_url	None
author	TokenLens Team
requires_python	>=3.8
license	MIT
keywords	llm tokens openai mistral anthropic tokenizer
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # TokenLens

A lightweight Python library for accurate token counting and limit validation across various LLM providers. TokenLens helps developers avoid costly API failures by providing accurate token counting before making API calls.

## Features

- **Modular Design**: Provider dependencies are only imported when needed
- **Accurate Token Counting**: Provider-specific tokenizer implementations
- **Proactive Limit Checking**: Validate content length before API calls
- **Smart Token Management**: Encoding and decoding support
- **Provider Independence**: Each provider can be installed separately

## Installation

Install the base package:
```bash
pip install tokenlens
```

Install with specific provider dependencies:
```bash
# For OpenAI support
pip install tokenlens[openai]

# For Anthropic support
pip install tokenlens[anthropic]

# For all providers
pip install tokenlens[all]
```

## Quick Start

### Basic Token Counting with OpenAI

```python
from tokenlens import OpenAITokenizer
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()

# Initialize tokenizer
tokenizer = OpenAITokenizer(model_name="gpt-4")

# Count tokens
text = "Hello, world!"
token_count = tokenizer.count_tokens(text)
print(f"Token count: {token_count}")

# Encode text to tokens
tokens = tokenizer.encode(text)
print(f"Tokens: {tokens}")

# Decode tokens back to text
decoded_text = tokenizer.decode(tokens)
print(f"Decoded text: {decoded_text}")
```

### Using Other Providers

TokenLens uses a modular design where provider dependencies are only imported when needed:

```python
# For Anthropic
from tokenlens.tokenizers import AnthropicTokenizer
tokenizer = AnthropicTokenizer()

# For Mistral
from tokenlens.tokenizers import MistralTokenizer
tokenizer = MistralTokenizer()
```

## Provider-Specific Dependencies

TokenLens uses a modular dependency system. Each provider's dependencies are optional and must be installed separately:

```bash
# For Mistral support
pip install tokenlens[mistral]

# For OpenAI support
pip install tokenlens[openai]

# For Anthropic support
pip install tokenlens[anthropic]

# For all providers
pip install tokenlens[all]
```

If you try to use a provider without installing its dependencies, TokenLens will:
1. Attempt to use a fallback implementation if available
2. Raise an informative error message if the provider's functionality is required

Example with Mistral:
```python
from tokenlens import MistralTokenizer

# This requires the mistralai package
tokenizer = MistralTokenizer(api_key="your-api-key")

# Count tokens (will use fallback if mistralai not installed)
text = "Hello, world!"
token_count = tokenizer.count_tokens(text)

# Get model limits
model_limits = tokenizer.get_model_limits("mistral-medium")
print(f"Model limits: {model_limits}")
```

## The Problem TokenLens Solves

Traditional AI application development often faces these challenges:

1. **Reactive Error Handling**: Without TokenLens
   ```python
   try:
       response = openai.ChatCompletion.create(
           model="gpt-4",
           messages=[{"role": "user", "content": very_long_text}]
       )
   except openai.error.InvalidRequestError as e:
       # Now you have to handle the API failure after it occurs
       pass
   ```

2. **Manual Content Splitting**: Without proper token counting
   ```python
   # Naive character-based splitting that might break context
   chunk_size = 1000
   chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
   ```

## The TokenLens Solution

TokenLens provides accurate token counting and limit validation:

```python
from tokenlens import OpenAITokenizer

tokenizer = OpenAITokenizer(model_name="gpt-4")

# Check token count before making API calls
token_count = tokenizer.count_tokens(very_long_text)

if token_count <= tokenizer.model_max_tokens:
    # Safe to make API call
    response = openai.ChatCompletion.create(...)
else:
    # Handle content that exceeds token limit
    print(f"Content too long: {token_count} tokens")
```

## Token Limit Validation

TokenLens makes it easy to check token counts against model limits:

```python
from tokenlens import OpenAITokenizer

# Initialize tokenizer
tokenizer = OpenAITokenizer(model_name="gpt-4")

# Get model limits
model_limits = tokenizer.get_model_limits()
print(f"Model limits: {model_limits}")
# Output: {'token_limit': 8192, 'max_response_tokens': 4096}

# Check if text is within limits
text = "Your long text here..."
token_count = tokenizer.count_tokens(text)

if token_count <= model_limits['token_limit']:
    print(f"Text is within limits ({token_count} tokens)")
else:
    print(f"Text exceeds model limit: {token_count} > {model_limits['token_limit']}")

# For chat completions, account for response tokens
available_tokens = model_limits['token_limit'] - model_limits['max_response_tokens']
if token_count <= available_tokens:
    print("Safe to send to chat completion API")
else:
    print("Need to reduce input tokens to leave room for response")
```

This proactive validation helps you:
- Avoid costly API failures due to token limits
- Ensure enough tokens are reserved for model responses
- Make informed decisions about content splitting

## Why Use TokenLens?

1. **Accurate Token Counting**: Get exact token counts before making API calls
2. **Provider-Specific Tokenization**: Each provider uses its own tokenization rules
3. **Modular Dependencies**: Only install the provider packages you need
4. **Simple Interface**: Consistent API across all providers
5. **Proactive Validation**: Check limits before making expensive API calls

## Development

1. Clone the repository:
```bash
git clone https://github.com/yourusername/tokenlens.git
cd tokenlens
```

2. Install development dependencies:
```bash
pip install -e ".[dev]"
```

3. Run tests:
```bash
# Run all tests
pytest

# Run specific provider tests
python tests/test_openai_limits.py
```

## Contributing

Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tokenlens/tokenlens",
    "name": "tokenlens",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, tokens, openai, mistral, anthropic, tokenizer",
    "author": "TokenLens Team",
    "author_email": "Giri Ramanathan <vgiri2015@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/8a/d0/790ac813e4bf9297852f5db4be7f3b03ac3fac802f9ea2c9b951beee854a/tokenlens-0.1.6.tar.gz",
    "platform": null,
    "description": "# TokenLens\n\nA lightweight Python library for accurate token counting and limit validation across various LLM providers. TokenLens helps developers avoid costly API failures by providing accurate token counting before making API calls.\n\n## Features\n\n- **Modular Design**: Provider dependencies are only imported when needed\n- **Accurate Token Counting**: Provider-specific tokenizer implementations\n- **Proactive Limit Checking**: Validate content length before API calls\n- **Smart Token Management**: Encoding and decoding support\n- **Provider Independence**: Each provider can be installed separately\n\n## Installation\n\nInstall the base package:\n```bash\npip install tokenlens\n```\n\nInstall with specific provider dependencies:\n```bash\n# For OpenAI support\npip install tokenlens[openai]\n\n# For Anthropic support\npip install tokenlens[anthropic]\n\n# For all providers\npip install tokenlens[all]\n```\n\n## Quick Start\n\n### Basic Token Counting with OpenAI\n\n```python\nfrom tokenlens import OpenAITokenizer\nfrom dotenv import load_dotenv\n\n# Load API key from .env file\nload_dotenv()\n\n# Initialize tokenizer\ntokenizer = OpenAITokenizer(model_name=\"gpt-4\")\n\n# Count tokens\ntext = \"Hello, world!\"\ntoken_count = tokenizer.count_tokens(text)\nprint(f\"Token count: {token_count}\")\n\n# Encode text to tokens\ntokens = tokenizer.encode(text)\nprint(f\"Tokens: {tokens}\")\n\n# Decode tokens back to text\ndecoded_text = tokenizer.decode(tokens)\nprint(f\"Decoded text: {decoded_text}\")\n```\n\n### Using Other Providers\n\nTokenLens uses a modular design where provider dependencies are only imported when needed:\n\n```python\n# For Anthropic\nfrom tokenlens.tokenizers import AnthropicTokenizer\ntokenizer = AnthropicTokenizer()\n\n# For Mistral\nfrom tokenlens.tokenizers import MistralTokenizer\ntokenizer = MistralTokenizer()\n```\n\n## Provider-Specific Dependencies\n\nTokenLens uses a modular dependency system. Each provider's dependencies are optional and must be installed separately:\n\n```bash\n# For Mistral support\npip install tokenlens[mistral]\n\n# For OpenAI support\npip install tokenlens[openai]\n\n# For Anthropic support\npip install tokenlens[anthropic]\n\n# For all providers\npip install tokenlens[all]\n```\n\nIf you try to use a provider without installing its dependencies, TokenLens will:\n1. Attempt to use a fallback implementation if available\n2. Raise an informative error message if the provider's functionality is required\n\nExample with Mistral:\n```python\nfrom tokenlens import MistralTokenizer\n\n# This requires the mistralai package\ntokenizer = MistralTokenizer(api_key=\"your-api-key\")\n\n# Count tokens (will use fallback if mistralai not installed)\ntext = \"Hello, world!\"\ntoken_count = tokenizer.count_tokens(text)\n\n# Get model limits\nmodel_limits = tokenizer.get_model_limits(\"mistral-medium\")\nprint(f\"Model limits: {model_limits}\")\n```\n\n## The Problem TokenLens Solves\n\nTraditional AI application development often faces these challenges:\n\n1. **Reactive Error Handling**: Without TokenLens\n   ```python\n   try:\n       response = openai.ChatCompletion.create(\n           model=\"gpt-4\",\n           messages=[{\"role\": \"user\", \"content\": very_long_text}]\n       )\n   except openai.error.InvalidRequestError as e:\n       # Now you have to handle the API failure after it occurs\n       pass\n   ```\n\n2. **Manual Content Splitting**: Without proper token counting\n   ```python\n   # Naive character-based splitting that might break context\n   chunk_size = 1000\n   chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]\n   ```\n\n## The TokenLens Solution\n\nTokenLens provides accurate token counting and limit validation:\n\n```python\nfrom tokenlens import OpenAITokenizer\n\ntokenizer = OpenAITokenizer(model_name=\"gpt-4\")\n\n# Check token count before making API calls\ntoken_count = tokenizer.count_tokens(very_long_text)\n\nif token_count <= tokenizer.model_max_tokens:\n    # Safe to make API call\n    response = openai.ChatCompletion.create(...)\nelse:\n    # Handle content that exceeds token limit\n    print(f\"Content too long: {token_count} tokens\")\n```\n\n## Token Limit Validation\n\nTokenLens makes it easy to check token counts against model limits:\n\n```python\nfrom tokenlens import OpenAITokenizer\n\n# Initialize tokenizer\ntokenizer = OpenAITokenizer(model_name=\"gpt-4\")\n\n# Get model limits\nmodel_limits = tokenizer.get_model_limits()\nprint(f\"Model limits: {model_limits}\")\n# Output: {'token_limit': 8192, 'max_response_tokens': 4096}\n\n# Check if text is within limits\ntext = \"Your long text here...\"\ntoken_count = tokenizer.count_tokens(text)\n\nif token_count <= model_limits['token_limit']:\n    print(f\"Text is within limits ({token_count} tokens)\")\nelse:\n    print(f\"Text exceeds model limit: {token_count} > {model_limits['token_limit']}\")\n\n# For chat completions, account for response tokens\navailable_tokens = model_limits['token_limit'] - model_limits['max_response_tokens']\nif token_count <= available_tokens:\n    print(\"Safe to send to chat completion API\")\nelse:\n    print(\"Need to reduce input tokens to leave room for response\")\n```\n\nThis proactive validation helps you:\n- Avoid costly API failures due to token limits\n- Ensure enough tokens are reserved for model responses\n- Make informed decisions about content splitting\n\n## Why Use TokenLens?\n\n1. **Accurate Token Counting**: Get exact token counts before making API calls\n2. **Provider-Specific Tokenization**: Each provider uses its own tokenization rules\n3. **Modular Dependencies**: Only install the provider packages you need\n4. **Simple Interface**: Consistent API across all providers\n5. **Proactive Validation**: Check limits before making expensive API calls\n\n## Development\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/yourusername/tokenlens.git\ncd tokenlens\n```\n\n2. Install development dependencies:\n```bash\npip install -e \".[dev]\"\n```\n\n3. Run tests:\n```bash\n# Run all tests\npytest\n\n# Run specific provider tests\npython tests/test_openai_limits.py\n```\n\n## Contributing\n\nContributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for accurate token counting and limit validation across various LLM providers",
    "version": "0.1.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/vgiri2015/tokenlens/issues",
        "Documentation": "https://github.com/vgiri2015/tokenlens/tree/main/docs",
        "Homepage": "https://github.com/vgiri2015/tokenlens",
        "Repository": "https://github.com/vgiri2015/tokenlens.git"
    },
    "split_keywords": [
        "llm",
        " tokens",
        " openai",
        " mistral",
        " anthropic",
        " tokenizer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f043eddf799dcc855741679e46a846eabefb42bb69120af81e697819786bf5a7",
                "md5": "b82b037ef2bd90dff8c3f7192862be49",
                "sha256": "c24475d39884b67846d67ad1336eaaad9929181041fc2a59ff15dca76ee35f4c"
            },
            "downloads": -1,
            "filename": "tokenlens-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b82b037ef2bd90dff8c3f7192862be49",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 40074,
            "upload_time": "2025-01-05T03:40:18",
            "upload_time_iso_8601": "2025-01-05T03:40:18.000460Z",
            "url": "https://files.pythonhosted.org/packages/f0/43/eddf799dcc855741679e46a846eabefb42bb69120af81e697819786bf5a7/tokenlens-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8ad0790ac813e4bf9297852f5db4be7f3b03ac3fac802f9ea2c9b951beee854a",
                "md5": "b6992bc63d66a74696bef7db269a996f",
                "sha256": "fc07364e973fb8a68a4c44215ef068ef2910a5056d707a99b716bfd4f3a1ee00"
            },
            "downloads": -1,
            "filename": "tokenlens-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "b6992bc63d66a74696bef7db269a996f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 37025,
            "upload_time": "2025-01-05T03:40:24",
            "upload_time_iso_8601": "2025-01-05T03:40:24.328810Z",
            "url": "https://files.pythonhosted.org/packages/8a/d0/790ac813e4bf9297852f5db4be7f3b03ac3fac802f9ea2c9b951beee854a/tokenlens-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-05 03:40:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tokenlens",
    "github_project": "tokenlens",
    "github_not_found": true,
    "lcname": "tokenlens"
}

TokenLens Team