# TokenLens
A lightweight Python library for accurate token counting and limit validation across various LLM providers. TokenLens helps developers avoid costly API failures by providing accurate token counting before making API calls.
## Features
- **Modular Design**: Provider dependencies are only imported when needed
- **Accurate Token Counting**: Provider-specific tokenizer implementations
- **Proactive Limit Checking**: Validate content length before API calls
- **Smart Token Management**: Encoding and decoding support
- **Provider Independence**: Each provider can be installed separately
## Installation
Install the base package:
```bash
pip install tokenlens
```
Install with specific provider dependencies:
```bash
# For OpenAI support
pip install tokenlens[openai]
# For Anthropic support
pip install tokenlens[anthropic]
# For all providers
pip install tokenlens[all]
```
## Quick Start
### Basic Token Counting with OpenAI
```python
from tokenlens import OpenAITokenizer
from dotenv import load_dotenv
# Load API key from .env file
load_dotenv()
# Initialize tokenizer
tokenizer = OpenAITokenizer(model_name="gpt-4")
# Count tokens
text = "Hello, world!"
token_count = tokenizer.count_tokens(text)
print(f"Token count: {token_count}")
# Encode text to tokens
tokens = tokenizer.encode(text)
print(f"Tokens: {tokens}")
# Decode tokens back to text
decoded_text = tokenizer.decode(tokens)
print(f"Decoded text: {decoded_text}")
```
### Using Other Providers
TokenLens uses a modular design where provider dependencies are only imported when needed:
```python
# For Anthropic
from tokenlens.tokenizers import AnthropicTokenizer
tokenizer = AnthropicTokenizer()
# For Mistral
from tokenlens.tokenizers import MistralTokenizer
tokenizer = MistralTokenizer()
```
## Provider-Specific Dependencies
TokenLens uses a modular dependency system. Each provider's dependencies are optional and must be installed separately:
```bash
# For Mistral support
pip install tokenlens[mistral]
# For OpenAI support
pip install tokenlens[openai]
# For Anthropic support
pip install tokenlens[anthropic]
# For all providers
pip install tokenlens[all]
```
If you try to use a provider without installing its dependencies, TokenLens will:
1. Attempt to use a fallback implementation if available
2. Raise an informative error message if the provider's functionality is required
Example with Mistral:
```python
from tokenlens import MistralTokenizer
# This requires the mistralai package
tokenizer = MistralTokenizer(api_key="your-api-key")
# Count tokens (will use fallback if mistralai not installed)
text = "Hello, world!"
token_count = tokenizer.count_tokens(text)
# Get model limits
model_limits = tokenizer.get_model_limits("mistral-medium")
print(f"Model limits: {model_limits}")
```
## The Problem TokenLens Solves
Traditional AI application development often faces these challenges:
1. **Reactive Error Handling**: Without TokenLens
```python
try:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": very_long_text}]
)
except openai.error.InvalidRequestError as e:
# Now you have to handle the API failure after it occurs
pass
```
2. **Manual Content Splitting**: Without proper token counting
```python
# Naive character-based splitting that might break context
chunk_size = 1000
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
```
## The TokenLens Solution
TokenLens provides accurate token counting and limit validation:
```python
from tokenlens import OpenAITokenizer
tokenizer = OpenAITokenizer(model_name="gpt-4")
# Check token count before making API calls
token_count = tokenizer.count_tokens(very_long_text)
if token_count <= tokenizer.model_max_tokens:
# Safe to make API call
response = openai.ChatCompletion.create(...)
else:
# Handle content that exceeds token limit
print(f"Content too long: {token_count} tokens")
```
## Token Limit Validation
TokenLens makes it easy to check token counts against model limits:
```python
from tokenlens import OpenAITokenizer
# Initialize tokenizer
tokenizer = OpenAITokenizer(model_name="gpt-4")
# Get model limits
model_limits = tokenizer.get_model_limits()
print(f"Model limits: {model_limits}")
# Output: {'token_limit': 8192, 'max_response_tokens': 4096}
# Check if text is within limits
text = "Your long text here..."
token_count = tokenizer.count_tokens(text)
if token_count <= model_limits['token_limit']:
print(f"Text is within limits ({token_count} tokens)")
else:
print(f"Text exceeds model limit: {token_count} > {model_limits['token_limit']}")
# For chat completions, account for response tokens
available_tokens = model_limits['token_limit'] - model_limits['max_response_tokens']
if token_count <= available_tokens:
print("Safe to send to chat completion API")
else:
print("Need to reduce input tokens to leave room for response")
```
This proactive validation helps you:
- Avoid costly API failures due to token limits
- Ensure enough tokens are reserved for model responses
- Make informed decisions about content splitting
## Why Use TokenLens?
1. **Accurate Token Counting**: Get exact token counts before making API calls
2. **Provider-Specific Tokenization**: Each provider uses its own tokenization rules
3. **Modular Dependencies**: Only install the provider packages you need
4. **Simple Interface**: Consistent API across all providers
5. **Proactive Validation**: Check limits before making expensive API calls
## Development
1. Clone the repository:
```bash
git clone https://github.com/yourusername/tokenlens.git
cd tokenlens
```
2. Install development dependencies:
```bash
pip install -e ".[dev]"
```
3. Run tests:
```bash
# Run all tests
pytest
# Run specific provider tests
python tests/test_openai_limits.py
```
## Contributing
Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/tokenlens/tokenlens",
"name": "tokenlens",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, tokens, openai, mistral, anthropic, tokenizer",
"author": "TokenLens Team",
"author_email": "Giri Ramanathan <vgiri2015@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8a/d0/790ac813e4bf9297852f5db4be7f3b03ac3fac802f9ea2c9b951beee854a/tokenlens-0.1.6.tar.gz",
"platform": null,
"description": "# TokenLens\n\nA lightweight Python library for accurate token counting and limit validation across various LLM providers. TokenLens helps developers avoid costly API failures by providing accurate token counting before making API calls.\n\n## Features\n\n- **Modular Design**: Provider dependencies are only imported when needed\n- **Accurate Token Counting**: Provider-specific tokenizer implementations\n- **Proactive Limit Checking**: Validate content length before API calls\n- **Smart Token Management**: Encoding and decoding support\n- **Provider Independence**: Each provider can be installed separately\n\n## Installation\n\nInstall the base package:\n```bash\npip install tokenlens\n```\n\nInstall with specific provider dependencies:\n```bash\n# For OpenAI support\npip install tokenlens[openai]\n\n# For Anthropic support\npip install tokenlens[anthropic]\n\n# For all providers\npip install tokenlens[all]\n```\n\n## Quick Start\n\n### Basic Token Counting with OpenAI\n\n```python\nfrom tokenlens import OpenAITokenizer\nfrom dotenv import load_dotenv\n\n# Load API key from .env file\nload_dotenv()\n\n# Initialize tokenizer\ntokenizer = OpenAITokenizer(model_name=\"gpt-4\")\n\n# Count tokens\ntext = \"Hello, world!\"\ntoken_count = tokenizer.count_tokens(text)\nprint(f\"Token count: {token_count}\")\n\n# Encode text to tokens\ntokens = tokenizer.encode(text)\nprint(f\"Tokens: {tokens}\")\n\n# Decode tokens back to text\ndecoded_text = tokenizer.decode(tokens)\nprint(f\"Decoded text: {decoded_text}\")\n```\n\n### Using Other Providers\n\nTokenLens uses a modular design where provider dependencies are only imported when needed:\n\n```python\n# For Anthropic\nfrom tokenlens.tokenizers import AnthropicTokenizer\ntokenizer = AnthropicTokenizer()\n\n# For Mistral\nfrom tokenlens.tokenizers import MistralTokenizer\ntokenizer = MistralTokenizer()\n```\n\n## Provider-Specific Dependencies\n\nTokenLens uses a modular dependency system. Each provider's dependencies are optional and must be installed separately:\n\n```bash\n# For Mistral support\npip install tokenlens[mistral]\n\n# For OpenAI support\npip install tokenlens[openai]\n\n# For Anthropic support\npip install tokenlens[anthropic]\n\n# For all providers\npip install tokenlens[all]\n```\n\nIf you try to use a provider without installing its dependencies, TokenLens will:\n1. Attempt to use a fallback implementation if available\n2. Raise an informative error message if the provider's functionality is required\n\nExample with Mistral:\n```python\nfrom tokenlens import MistralTokenizer\n\n# This requires the mistralai package\ntokenizer = MistralTokenizer(api_key=\"your-api-key\")\n\n# Count tokens (will use fallback if mistralai not installed)\ntext = \"Hello, world!\"\ntoken_count = tokenizer.count_tokens(text)\n\n# Get model limits\nmodel_limits = tokenizer.get_model_limits(\"mistral-medium\")\nprint(f\"Model limits: {model_limits}\")\n```\n\n## The Problem TokenLens Solves\n\nTraditional AI application development often faces these challenges:\n\n1. **Reactive Error Handling**: Without TokenLens\n ```python\n try:\n response = openai.ChatCompletion.create(\n model=\"gpt-4\",\n messages=[{\"role\": \"user\", \"content\": very_long_text}]\n )\n except openai.error.InvalidRequestError as e:\n # Now you have to handle the API failure after it occurs\n pass\n ```\n\n2. **Manual Content Splitting**: Without proper token counting\n ```python\n # Naive character-based splitting that might break context\n chunk_size = 1000\n chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]\n ```\n\n## The TokenLens Solution\n\nTokenLens provides accurate token counting and limit validation:\n\n```python\nfrom tokenlens import OpenAITokenizer\n\ntokenizer = OpenAITokenizer(model_name=\"gpt-4\")\n\n# Check token count before making API calls\ntoken_count = tokenizer.count_tokens(very_long_text)\n\nif token_count <= tokenizer.model_max_tokens:\n # Safe to make API call\n response = openai.ChatCompletion.create(...)\nelse:\n # Handle content that exceeds token limit\n print(f\"Content too long: {token_count} tokens\")\n```\n\n## Token Limit Validation\n\nTokenLens makes it easy to check token counts against model limits:\n\n```python\nfrom tokenlens import OpenAITokenizer\n\n# Initialize tokenizer\ntokenizer = OpenAITokenizer(model_name=\"gpt-4\")\n\n# Get model limits\nmodel_limits = tokenizer.get_model_limits()\nprint(f\"Model limits: {model_limits}\")\n# Output: {'token_limit': 8192, 'max_response_tokens': 4096}\n\n# Check if text is within limits\ntext = \"Your long text here...\"\ntoken_count = tokenizer.count_tokens(text)\n\nif token_count <= model_limits['token_limit']:\n print(f\"Text is within limits ({token_count} tokens)\")\nelse:\n print(f\"Text exceeds model limit: {token_count} > {model_limits['token_limit']}\")\n\n# For chat completions, account for response tokens\navailable_tokens = model_limits['token_limit'] - model_limits['max_response_tokens']\nif token_count <= available_tokens:\n print(\"Safe to send to chat completion API\")\nelse:\n print(\"Need to reduce input tokens to leave room for response\")\n```\n\nThis proactive validation helps you:\n- Avoid costly API failures due to token limits\n- Ensure enough tokens are reserved for model responses\n- Make informed decisions about content splitting\n\n## Why Use TokenLens?\n\n1. **Accurate Token Counting**: Get exact token counts before making API calls\n2. **Provider-Specific Tokenization**: Each provider uses its own tokenization rules\n3. **Modular Dependencies**: Only install the provider packages you need\n4. **Simple Interface**: Consistent API across all providers\n5. **Proactive Validation**: Check limits before making expensive API calls\n\n## Development\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/yourusername/tokenlens.git\ncd tokenlens\n```\n\n2. Install development dependencies:\n```bash\npip install -e \".[dev]\"\n```\n\n3. Run tests:\n```bash\n# Run all tests\npytest\n\n# Run specific provider tests\npython tests/test_openai_limits.py\n```\n\n## Contributing\n\nContributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A library for accurate token counting and limit validation across various LLM providers",
"version": "0.1.6",
"project_urls": {
"Bug Tracker": "https://github.com/vgiri2015/tokenlens/issues",
"Documentation": "https://github.com/vgiri2015/tokenlens/tree/main/docs",
"Homepage": "https://github.com/vgiri2015/tokenlens",
"Repository": "https://github.com/vgiri2015/tokenlens.git"
},
"split_keywords": [
"llm",
" tokens",
" openai",
" mistral",
" anthropic",
" tokenizer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f043eddf799dcc855741679e46a846eabefb42bb69120af81e697819786bf5a7",
"md5": "b82b037ef2bd90dff8c3f7192862be49",
"sha256": "c24475d39884b67846d67ad1336eaaad9929181041fc2a59ff15dca76ee35f4c"
},
"downloads": -1,
"filename": "tokenlens-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b82b037ef2bd90dff8c3f7192862be49",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 40074,
"upload_time": "2025-01-05T03:40:18",
"upload_time_iso_8601": "2025-01-05T03:40:18.000460Z",
"url": "https://files.pythonhosted.org/packages/f0/43/eddf799dcc855741679e46a846eabefb42bb69120af81e697819786bf5a7/tokenlens-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8ad0790ac813e4bf9297852f5db4be7f3b03ac3fac802f9ea2c9b951beee854a",
"md5": "b6992bc63d66a74696bef7db269a996f",
"sha256": "fc07364e973fb8a68a4c44215ef068ef2910a5056d707a99b716bfd4f3a1ee00"
},
"downloads": -1,
"filename": "tokenlens-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "b6992bc63d66a74696bef7db269a996f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 37025,
"upload_time": "2025-01-05T03:40:24",
"upload_time_iso_8601": "2025-01-05T03:40:24.328810Z",
"url": "https://files.pythonhosted.org/packages/8a/d0/790ac813e4bf9297852f5db4be7f3b03ac3fac802f9ea2c9b951beee854a/tokenlens-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-05 03:40:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tokenlens",
"github_project": "tokenlens",
"github_not_found": true,
"lcname": "tokenlens"
}