Name | tokenx-core JSON |
Version |
0.2.7
JSON |
| download |
home_page | None |
Summary | Cross-provider LLM token tracking and cost calculation |
upload_time | 2025-08-02 09:09:56 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | MIT License
Copyright (c) 2025 YOUR_NAME
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the “Software”), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
|
keywords |
openai
pricing
observability
llm
anthropic
gemini
|
VCS |
 |
bugtrack_url |
|
requirements |
pyyaml
tiktoken
pytest
pytest-cov
pytest-mock
pytest-asyncio
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<p align="center">
<img src="https://raw.githubusercontent.com/dvlshah/tokenx/main/docs/assets/logo.png"
alt="tokenX logo" width="280">
</p>
<p align="center"><em>Track cost and latency of your LLM calls </em></p>
<p align="center">
<a href="https://pypi.org/project/tokenx-core/"><img src="https://img.shields.io/pypi/v/tokenx-core?logo=pypi&label=pypi" alt="PyPI version"></a>
<a href="https://github.com/dvlshah/tokenx/actions/workflows/test.yml"><img src="https://github.com/dvlshah/tokenx/actions/workflows/test.yml/badge.svg?branch=main" alt="CI status"></a>
<!-- <a href="https://codecov.io/gh/dvlshah/tokenx"><img src="https://img.shields.io/codecov/c/github/dvlshah/tokenx?logo=codecov" alt="Coverage"></a> -->
<a href="https://pypi.org/project/tokenx-core/"><img src="https://img.shields.io/pypi/pyversions/tokenx-core?logo=python&label=python" alt="Python versions"></a>
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT license"></a>
</p>
<p align="center"><strong>👉 Like what you see? <a href="https://github.com/dvlshah/tokenx/stargazers">Star the repo</a> and <a href="https://github.com/dvlshah">follow @dvlshah</a> for updates!</strong></p>
> **Decorator in → Metrics out.**
> Monitor cost & latency of any LLM function without touching its body.
```bash
pip install tokenx-core # 1️⃣ install
```
```python
from tokenx.metrics import measure_cost, measure_latency # 2️⃣ decorate
from openai import OpenAI
@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini")
def ask(prompt: str):
return OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
resp, m = ask("Hello!") # 3️⃣ run
print(m["cost_usd"], "USD |", m["latency_ms"], "ms")
```
---
## 🤔 Why tokenx?
Integrating with LLM APIs often involves hidden costs and variable performance. Manually tracking token usage and calculating costs across different models and providers is tedious and error-prone. `tokenx` simplifies this by:
* **Decorator-based monitoring:** Add cost and latency tracking with simple decorators.
* **Accurate cost calculation:** Uses configurable pricing with caching discounts.
* **Latency measurement:** Measures API call response times.
* **Multi-provider support:** Supports OpenAI and Anthropic APIs.
---
## 🏗️ Architecture (1‑min overview)
```mermaid
flowchart LR
%% --- subgraph: user's code ---
subgraph User_Code
A["API call"]
end
%% --- main pipeline ---
A -- decorators --> B["tokenx wrapper"]
B -- cost --> C["Cost Calculator"]
B -- latency --> D["Latency Timer"]
C -- lookup --> E["model_prices.yaml"]
B -- metrics --> F["Structured JSON (stdout / exporter)"]
```
*No vendor lock‑in:* pure‑Python wrapper emits plain dicts—pipe them to Prometheus, Datadog, or stdout.
---
## 💡 Features at a glance
* **Cost tracking** – USD costing with cached‑token discounts (if applicable)
* **No estimates policy** – only real usage data, numbers will match what you see in the portal
* **Latency measurement** – measures API response times
* **Decorator interface** – works with sync and async functions
* **Provider support** – OpenAI, Anthropic, adding more providers soon!
* **Minimal deps** – tiktoken and pyyaml only
---
## 📦 Installation
```bash
pip install tokenx-core # stable
pip install tokenx-core[openai] # with OpenAI provider extras
pip install tokenx-core[anthropic] # with Anthropic provider extras
pip install tokenx-core[all] # with all provider extras
```
---
## 🚀 Quick Start
Here's how to monitor your OpenAI API calls with just two lines of code:
```python
from tokenx.metrics import measure_cost, measure_latency
from openai import OpenAI
@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini") # Always specify provider and model
def call_openai():
client = OpenAI()
return client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello, world!"}]
)
response, metrics = call_openai()
# Access your metrics
print(f"Cost: ${metrics['cost_usd']:.6f}")
print(f"Latency: {metrics['latency_ms']:.2f}ms")
print(f"Tokens: {metrics['input_tokens']} in, {metrics['output_tokens']} out")
print(f"Cached tokens: {metrics['cached_tokens']}") # New in v0.2.0
```
Here's how to monitor your Anthropic API calls with just two lines of code:
```python
from tokenx.metrics import measure_cost, measure_latency
from anthropic import Anthropic # Or AsyncAnthropic
@measure_latency
@measure_cost(provider="anthropic", model="claude-3-haiku-20240307")
def call_anthropic(prompt: str):
# For Anthropic prompt caching metrics, initialize client with .beta
# and use 'cache_control' in your messages.
# Example: client = Anthropic(api_key="YOUR_KEY").beta
client = Anthropic() # Standard client
return client.messages.create(
model="claude-3-haiku-20240307", # Ensure model matches decorator
max_tokens=150,
messages=[{"role": "user", "content": prompt}]
)
response_claude, metrics_claude = call_anthropic("Why is the sky blue?")
print(f"Cost: ${metrics_claude['cost_usd']:.6f} USD")
print(f"Latency: {metrics_claude['latency_ms']:.2f} ms")
print(f"Input Tokens: {metrics_claude['input_tokens']}")
print(f"Output Tokens: {metrics_claude['output_tokens']}")
# For Anthropic, 'cached_tokens' reflects 'cache_read_input_tokens'.
# 'cache_creation_input_tokens' will also be in metrics if caching beta is active.
print(f"Cached (read) tokens: {metrics_claude.get('cached_tokens', 0)}")
print(f"Cache creation tokens: {metrics_claude.get('cache_creation_input_tokens', 'N/A (requires caching beta)')}")
```
Here's how to monitor OpenAI's new audio models with dual token pricing:
```python
from tokenx.metrics import measure_cost, measure_latency
from openai import OpenAI
@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini-transcribe")
def transcribe_audio(audio_file_path: str):
client = OpenAI()
with open(audio_file_path, "rb") as audio_file:
return client.audio.transcriptions.create(
model="gpt-4o-mini-transcribe",
file=audio_file,
response_format="verbose_json"
)
@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini-tts")
def generate_speech(text: str):
client = OpenAI()
return client.audio.speech.create(
model="gpt-4o-mini-tts",
voice="alloy",
input=text
)
# Real usage data with separate audio/text token costs
response, metrics = transcribe_audio("path/to/audio.mp3") # Replace with actual audio file path
print(f"Transcription cost: ${metrics['cost_usd']:.6f}") # Combines audio + text tokens
print(f"Audio tokens: {metrics.get('audio_input_tokens', 0)}")
print(f"Text tokens: {metrics['input_tokens']}")
```
## 🔍 Detailed Usage
### Cost Tracking
The `measure_cost` decorator requires explicit provider and model specification:
```python
@measure_cost(provider="openai", model="gpt-4o") # Explicit specification required
def my_function(): ...
@measure_cost(provider="openai", model="gpt-4o", tier="flex") # Optional tier
def my_function(): ...
```
### Latency Measurement
The `measure_latency` decorator works with both sync and async functions:
```python
@measure_latency
def sync_function(): ...
@measure_latency
async def async_function(): ...
```
### Combining Decorators
Decorators can be combined in any order:
```python
@measure_latency
@measure_cost(provider="openai", model="gpt-4o")
def my_function(): ...
# Equivalent to:
@measure_cost(provider="openai", model="gpt-4o")
@measure_latency
def my_function(): ...
```
### Async Usage
Both decorators work seamlessly with `async` functions:
```python
import asyncio
from tokenx.metrics import measure_cost, measure_latency
from openai import AsyncOpenAI # Use Async client
@measure_latency
@measure_cost(provider="openai", model="gpt-4o-mini")
async def call_openai_async():
client = AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me an async joke!"}]
)
return response
async def main():
response, metrics = await call_openai_async()
print(metrics)
# asyncio.run(main()) # Example of how to run it
```
---
## 🔄 Provider Compatibility
tokenx is designed to work with multiple LLM providers. Here's the current compatibility overview:
| Provider | Status | SDK Version | Supported APIs | Cache Support |
|----------|--------|-------------|----------------|---------------|
| OpenAI | ✅ | >= 1.0.0 | Chat, Responses, Embeddings, Audio (Transcription/TTS), Moderation, Images | ✅ |
| Anthropic | ✅ | >= 0.20.0 | Messages | ✅ |
📊 **[View Complete Coverage Matrix](docs/COVERAGE_MATRIX.md)** - Detailed breakdown of all supported APIs, models, and features
### OpenAI Support Details
- **SDK Versions**: Compatible with OpenAI Python SDK v1.0.0 and newer.
- **Response Formats**: Supports dictionary responses from older SDK versions and Pydantic model responses from newer SDK versions, with cached token extraction from `prompt_tokens_details.cached_tokens`.
- **API Types**:
- ✅ Chat Completions API
- ✅ Responses API (advanced interface)
- ✅ Embeddings API
- ✅ Audio Transcription API (token-based models)
- ✅ Text-to-Speech API (token-based models)
- ✅ Audio Preview API
- ✅ Realtime Audio API
- ✅ Moderation API
- ✅ Image Generation API (gpt-image-1 with hybrid pricing)
- **Pricing Features**:
- ✅ Dual token pricing (separate audio/text tokens)
- ✅ Hybrid pricing (tokens + per-image costs)
- ✅ No estimates policy (accuracy guaranteed)
- **Models**: 47+ models including o3, o4-mini, gpt-4o-transcribe, gpt-4o-mini-tts, gpt-image-1
### Anthropic Support Details
- **SDK Versions**: Compatible with Anthropic Python SDK (e.g., v0.20.0+ for full prompt caching beta fields).
- **Response Formats**: Supports Pydantic-like response objects.
- **Token Extraction**: Extracts input_tokens and output_tokens from the usage object.
- **Caching (Prompt Caching Beta)**: To utilize Anthropic's prompt caching and see related metrics, enable the beta feature in your client (`client = Anthropic().beta`) and define `cache_control` checkpoints in messages. tokenx maps `cache_read_input_tokens` to `cached_tokens` (for cost discounts if `cached_in` is defined) and includes `cache_creation_input_tokens` directly in metrics.
- **API Types**: Supports Messages API.
## 🛠️ Advanced Configuration
### Custom Pricing
Prices are loaded from the `model_prices.yaml` file. You can update this file when new models are released or prices change:
```yaml
openai:
gpt-4o:
sync:
in: 2.50 # USD per million input tokens
cached_in: 1.25 # USD per million cached tokens (OpenAI specific)
out: 10.00 # USD per million output tokens
# Audio models with dual token pricing
gpt-4o-mini-transcribe:
sync:
in: 1.25 # USD per million input tokens (text tokens)
out: 5.00 # USD per million output tokens (text tokens)
audio_in: 3.00 # USD per million input tokens (audio tokens)
gpt-4o-mini-tts:
sync:
in: 0.60 # USD per million input tokens (text tokens)
audio_out: 12.00 # USD per million output tokens (audio tokens)
anthropic:
claude-3-haiku-20240307:
sync:
in: 0.25 # USD per million input tokens
# cached_in: 0.10 # Example: if Anthropic offered a specific rate for cache_read_input_tokens
out: 1.25 # USD per million output tokens
# If 'cached_in' is not specified for an Anthropic model,
# all input_tokens (including cache_read_input_tokens) are billed at the 'in' rate.
```
### How Pricing Configuration is loaded?
By default, tokenx will:
- Check for updated pricing daily (24-hour cache)
- Automatically fetch the latest prices from the official repository
- Use stale cache if network is unavailable
- Show clear error if no pricing data available
The price yaml will be automatically uploaded in the remote as the pricing information is updated by supported providers.
```python
# First run - fetches latest prices
from tokenx import CostCalculator
calc = CostCalculator.for_provider("openai", "gpt-4o")
# Output: "tokenx: Fetching latest model prices..."
# Output: "tokenx: Cached latest prices locally."
# Subsequent runs - uses cache
calc2 = CostCalculator.for_provider("openai", "gpt-4o-mini")
# Output: "tokenx: Using cached model prices."
```
#### Custom Pricing Overrides
For enterprise users with custom pricing contracts:
```bash
# Set environment variable to your custom pricing file
export TOKENX_PRICES_PATH="/path/to/my-custom-pricing.yaml"
```
Create a YAML file with only the prices you want to override:
```yaml
# my-custom-pricing.yaml
openai:
gpt-4o:
sync:
in: 1.50 # Custom rate: $1.50 per million tokens
out: 8.00 # Custom rate: $8.00 per million tokens
# You only need to specify what you're overriding
# All other models use the standard pricing
```
## 📊 Example Metrics Output
When you use the decorators, you'll get a structured metrics dictionary:
```python
{
"provider": "openai",
"model": "gpt-4o-mini",
"tier": "sync",
"input_tokens": 12,
"output_tokens": 48,
"cached_tokens": 20, # New in v0.2.0
"cost_usd": 0.000348, # $0.000348 USD
"latency_ms": 543.21 # 543.21 milliseconds
}
```
For Anthropic (when prompt caching beta is active and cache is utilized):
```python
{
"provider": "anthropic",
"model": "claude-3-haiku-20240307",
"tier": "sync",
"input_tokens": 25, # Total input tokens for the request
"output_tokens": 60,
"cached_tokens": 10, # Populated from Anthropic's 'cache_read_input_tokens'
"cache_creation_input_tokens": 15, # Anthropic's 'cache_creation_input_tokens'
"cost_usd": 0.0000XX,
"usd": 0.0000XX,
"latency_ms": 750.21
}
```
If Anthropic's prompt caching beta is not used or no cache interaction occurs, cached_tokens will be 0 and cache_creation_input_tokens might be 0 or absent.
## 🔌 Adding a Custom Provider
TokenX makes it easy to add support for new LLM providers using the `@register_provider` decorator:
```python
from tokenx.providers import register_provider
from tokenx.providers.base import ProviderAdapter, Usage
from typing import Any, Optional
@register_provider("custom")
class CustomAdapter(ProviderAdapter):
@property
def provider_name(self) -> str:
return "custom"
def usage_from_response(self, response: Any) -> Usage:
# Extract standardized usage information from provider response
return Usage(
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
cached_tokens=getattr(response.usage, 'cached_tokens', 0),
extra_fields={'provider': 'custom'}
)
def matches_function(self, func: Any, args: tuple, kwargs: dict) -> bool:
# Detect if this function uses your provider
return "custom" in kwargs.get("model", "").lower()
def detect_model(self, func: Any, args: tuple, kwargs: dict) -> Optional[str]:
return kwargs.get("model")
def calculate_cost(self, model: str, input_tokens: int,
output_tokens: int, cached_tokens: int = 0,
tier: str = "sync") -> float:
# Implement your pricing logic
return input_tokens * 0.001 + output_tokens * 0.002
# Now you can use it immediately
from tokenx.metrics import measure_cost
@measure_cost(provider="custom", model="custom-model")
def call_custom_llm():
# Your API call here
pass
```
## 🤝 Contributing
```bash
git clone https://github.com/dvlshah/tokenx.git
pre-commit install
pip install -e .[dev] # or `poetry install`
pytest -q && mypy src/
```
See [CONTRIBUTING.md](https://github.com/dvlshah/tokenx/blob/main/docs/CONTRIBUTING.md) for details.
---
## 📝 Changelog
See [CHANGELOG.md](https://github.com/dvlshah/tokenx/blob/main/docs/CHANGELOG.md) for full history.
---
## 📜 License
MIT © 2025 Deval Shah
<p align="center"><em>If tokenX saves you time or money, please consider <a href="https://github.com/sponsors/dvlshah">sponsoring</a> or giving a ⭐ – it really helps!</em></p>
Raw data
{
"_id": null,
"home_page": null,
"name": "tokenx-core",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "openai, pricing, observability, llm, anthropic, gemini",
"author": null,
"author_email": "Deval Shah <deval@neurink.ai>",
"download_url": "https://files.pythonhosted.org/packages/45/ea/fb11ca4fed5470c7768b0e6de5ff928371ebcb5139633a78aa611eda1179/tokenx_core-0.2.7.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://raw.githubusercontent.com/dvlshah/tokenx/main/docs/assets/logo.png\"\n alt=\"tokenX logo\" width=\"280\">\n</p>\n\n<p align=\"center\"><em>Track cost and latency of your LLM calls </em></p>\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/tokenx-core/\"><img src=\"https://img.shields.io/pypi/v/tokenx-core?logo=pypi&label=pypi\" alt=\"PyPI version\"></a>\n <a href=\"https://github.com/dvlshah/tokenx/actions/workflows/test.yml\"><img src=\"https://github.com/dvlshah/tokenx/actions/workflows/test.yml/badge.svg?branch=main\" alt=\"CI status\"></a>\n <!-- <a href=\"https://codecov.io/gh/dvlshah/tokenx\"><img src=\"https://img.shields.io/codecov/c/github/dvlshah/tokenx?logo=codecov\" alt=\"Coverage\"></a> -->\n <a href=\"https://pypi.org/project/tokenx-core/\"><img src=\"https://img.shields.io/pypi/pyversions/tokenx-core?logo=python&label=python\" alt=\"Python versions\"></a>\n <a href=\"https://opensource.org/licenses/MIT\"><img src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"MIT license\"></a>\n</p>\n\n<p align=\"center\"><strong>\ud83d\udc49 Like what you see? <a href=\"https://github.com/dvlshah/tokenx/stargazers\">Star the repo</a> and <a href=\"https://github.com/dvlshah\">follow @dvlshah</a> for updates!</strong></p>\n\n> **Decorator in \u2192 Metrics out.**\n> Monitor cost & latency of any LLM function without touching its body.\n\n```bash\npip install tokenx-core # 1\ufe0f\u20e3 install\n```\n\n```python\nfrom tokenx.metrics import measure_cost, measure_latency # 2\ufe0f\u20e3 decorate\nfrom openai import OpenAI\n\n@measure_latency\n@measure_cost(provider=\"openai\", model=\"gpt-4o-mini\")\ndef ask(prompt: str):\n return OpenAI().chat.completions.create(\n model=\"gpt-4o-mini\",\n messages=[{\"role\": \"user\", \"content\": prompt}]\n )\n\nresp, m = ask(\"Hello!\") # 3\ufe0f\u20e3 run\nprint(m[\"cost_usd\"], \"USD |\", m[\"latency_ms\"], \"ms\")\n```\n\n---\n\n## \ud83e\udd14 Why tokenx?\n\nIntegrating with LLM APIs often involves hidden costs and variable performance. Manually tracking token usage and calculating costs across different models and providers is tedious and error-prone. `tokenx` simplifies this by:\n\n* **Decorator-based monitoring:** Add cost and latency tracking with simple decorators.\n* **Accurate cost calculation:** Uses configurable pricing with caching discounts.\n* **Latency measurement:** Measures API call response times.\n* **Multi-provider support:** Supports OpenAI and Anthropic APIs.\n\n---\n\n## \ud83c\udfd7\ufe0f Architecture (1\u2011min overview)\n\n```mermaid\nflowchart LR\n %% --- subgraph: user's code ---\n subgraph User_Code\n A[\"API call\"]\n end\n\n %% --- main pipeline ---\n A -- decorators --> B[\"tokenx wrapper\"]\n B -- cost --> C[\"Cost Calculator\"]\n B -- latency --> D[\"Latency Timer\"]\n C -- lookup --> E[\"model_prices.yaml\"]\n B -- metrics --> F[\"Structured JSON (stdout / exporter)\"]\n```\n\n*No vendor lock\u2011in:* pure\u2011Python wrapper emits plain dicts\u2014pipe them to Prometheus, Datadog, or stdout.\n\n---\n\n## \ud83d\udca1 Features at a glance\n\n* **Cost tracking** \u2013 USD costing with cached\u2011token discounts (if applicable)\n* **No estimates policy** \u2013 only real usage data, numbers will match what you see in the portal\n* **Latency measurement** \u2013 measures API response times\n* **Decorator interface** \u2013 works with sync and async functions\n* **Provider support** \u2013 OpenAI, Anthropic, adding more providers soon!\n* **Minimal deps** \u2013 tiktoken and pyyaml only\n\n---\n\n## \ud83d\udce6 Installation\n\n```bash\npip install tokenx-core # stable\npip install tokenx-core[openai] # with OpenAI provider extras\npip install tokenx-core[anthropic] # with Anthropic provider extras\npip install tokenx-core[all] # with all provider extras\n```\n\n---\n\n## \ud83d\ude80 Quick Start\n\nHere's how to monitor your OpenAI API calls with just two lines of code:\n\n```python\nfrom tokenx.metrics import measure_cost, measure_latency\nfrom openai import OpenAI\n\n@measure_latency\n@measure_cost(provider=\"openai\", model=\"gpt-4o-mini\") # Always specify provider and model\ndef call_openai():\n client = OpenAI()\n return client.chat.completions.create(\n model=\"gpt-4o-mini\",\n messages=[{\"role\": \"user\", \"content\": \"Hello, world!\"}]\n )\n\nresponse, metrics = call_openai()\n\n# Access your metrics\nprint(f\"Cost: ${metrics['cost_usd']:.6f}\")\nprint(f\"Latency: {metrics['latency_ms']:.2f}ms\")\nprint(f\"Tokens: {metrics['input_tokens']} in, {metrics['output_tokens']} out\")\nprint(f\"Cached tokens: {metrics['cached_tokens']}\") # New in v0.2.0\n```\n\nHere's how to monitor your Anthropic API calls with just two lines of code:\n\n```python\nfrom tokenx.metrics import measure_cost, measure_latency\nfrom anthropic import Anthropic # Or AsyncAnthropic\n\n@measure_latency\n@measure_cost(provider=\"anthropic\", model=\"claude-3-haiku-20240307\")\ndef call_anthropic(prompt: str):\n # For Anthropic prompt caching metrics, initialize client with .beta\n # and use 'cache_control' in your messages.\n # Example: client = Anthropic(api_key=\"YOUR_KEY\").beta\n client = Anthropic() # Standard client\n return client.messages.create(\n model=\"claude-3-haiku-20240307\", # Ensure model matches decorator\n max_tokens=150,\n messages=[{\"role\": \"user\", \"content\": prompt}]\n )\n\nresponse_claude, metrics_claude = call_anthropic(\"Why is the sky blue?\")\nprint(f\"Cost: ${metrics_claude['cost_usd']:.6f} USD\")\nprint(f\"Latency: {metrics_claude['latency_ms']:.2f} ms\")\nprint(f\"Input Tokens: {metrics_claude['input_tokens']}\")\nprint(f\"Output Tokens: {metrics_claude['output_tokens']}\")\n# For Anthropic, 'cached_tokens' reflects 'cache_read_input_tokens'.\n# 'cache_creation_input_tokens' will also be in metrics if caching beta is active.\nprint(f\"Cached (read) tokens: {metrics_claude.get('cached_tokens', 0)}\")\nprint(f\"Cache creation tokens: {metrics_claude.get('cache_creation_input_tokens', 'N/A (requires caching beta)')}\")\n```\n\nHere's how to monitor OpenAI's new audio models with dual token pricing:\n\n```python\nfrom tokenx.metrics import measure_cost, measure_latency\nfrom openai import OpenAI\n\n@measure_latency\n@measure_cost(provider=\"openai\", model=\"gpt-4o-mini-transcribe\")\ndef transcribe_audio(audio_file_path: str):\n client = OpenAI()\n with open(audio_file_path, \"rb\") as audio_file:\n return client.audio.transcriptions.create(\n model=\"gpt-4o-mini-transcribe\",\n file=audio_file,\n response_format=\"verbose_json\"\n )\n\n@measure_latency\n@measure_cost(provider=\"openai\", model=\"gpt-4o-mini-tts\")\ndef generate_speech(text: str):\n client = OpenAI()\n return client.audio.speech.create(\n model=\"gpt-4o-mini-tts\",\n voice=\"alloy\",\n input=text\n )\n\n# Real usage data with separate audio/text token costs\nresponse, metrics = transcribe_audio(\"path/to/audio.mp3\") # Replace with actual audio file path\nprint(f\"Transcription cost: ${metrics['cost_usd']:.6f}\") # Combines audio + text tokens\nprint(f\"Audio tokens: {metrics.get('audio_input_tokens', 0)}\")\nprint(f\"Text tokens: {metrics['input_tokens']}\")\n```\n\n## \ud83d\udd0d Detailed Usage\n\n### Cost Tracking\n\nThe `measure_cost` decorator requires explicit provider and model specification:\n\n```python\n@measure_cost(provider=\"openai\", model=\"gpt-4o\") # Explicit specification required\ndef my_function(): ...\n\n@measure_cost(provider=\"openai\", model=\"gpt-4o\", tier=\"flex\") # Optional tier\ndef my_function(): ...\n```\n\n### Latency Measurement\n\nThe `measure_latency` decorator works with both sync and async functions:\n\n```python\n@measure_latency\ndef sync_function(): ...\n\n@measure_latency\nasync def async_function(): ...\n```\n\n### Combining Decorators\n\nDecorators can be combined in any order:\n\n```python\n@measure_latency\n@measure_cost(provider=\"openai\", model=\"gpt-4o\")\ndef my_function(): ...\n\n# Equivalent to:\n@measure_cost(provider=\"openai\", model=\"gpt-4o\")\n@measure_latency\ndef my_function(): ...\n```\n\n### Async Usage\n\nBoth decorators work seamlessly with `async` functions:\n\n```python\nimport asyncio\nfrom tokenx.metrics import measure_cost, measure_latency\nfrom openai import AsyncOpenAI # Use Async client\n\n@measure_latency\n@measure_cost(provider=\"openai\", model=\"gpt-4o-mini\")\nasync def call_openai_async():\n client = AsyncOpenAI()\n response = await client.chat.completions.create(\n model=\"gpt-4o-mini\",\n messages=[{\"role\": \"user\", \"content\": \"Tell me an async joke!\"}]\n )\n return response\n\nasync def main():\n response, metrics = await call_openai_async()\n print(metrics)\n\n# asyncio.run(main()) # Example of how to run it\n```\n\n---\n\n## \ud83d\udd04 Provider Compatibility\n\ntokenx is designed to work with multiple LLM providers. Here's the current compatibility overview:\n\n| Provider | Status | SDK Version | Supported APIs | Cache Support |\n|----------|--------|-------------|----------------|---------------|\n| OpenAI | \u2705 | >= 1.0.0 | Chat, Responses, Embeddings, Audio (Transcription/TTS), Moderation, Images | \u2705 |\n| Anthropic | \u2705 | >= 0.20.0 | Messages | \u2705 |\n\n\ud83d\udcca **[View Complete Coverage Matrix](docs/COVERAGE_MATRIX.md)** - Detailed breakdown of all supported APIs, models, and features\n\n### OpenAI Support Details\n\n- **SDK Versions**: Compatible with OpenAI Python SDK v1.0.0 and newer.\n- **Response Formats**: Supports dictionary responses from older SDK versions and Pydantic model responses from newer SDK versions, with cached token extraction from `prompt_tokens_details.cached_tokens`.\n- **API Types**:\n - \u2705 Chat Completions API\n - \u2705 Responses API (advanced interface)\n - \u2705 Embeddings API\n - \u2705 Audio Transcription API (token-based models)\n - \u2705 Text-to-Speech API (token-based models)\n - \u2705 Audio Preview API\n - \u2705 Realtime Audio API\n - \u2705 Moderation API\n - \u2705 Image Generation API (gpt-image-1 with hybrid pricing)\n- **Pricing Features**:\n - \u2705 Dual token pricing (separate audio/text tokens)\n - \u2705 Hybrid pricing (tokens + per-image costs)\n - \u2705 No estimates policy (accuracy guaranteed)\n- **Models**: 47+ models including o3, o4-mini, gpt-4o-transcribe, gpt-4o-mini-tts, gpt-image-1\n\n### Anthropic Support Details\n\n- **SDK Versions**: Compatible with Anthropic Python SDK (e.g., v0.20.0+ for full prompt caching beta fields).\n- **Response Formats**: Supports Pydantic-like response objects.\n- **Token Extraction**: Extracts input_tokens and output_tokens from the usage object.\n- **Caching (Prompt Caching Beta)**: To utilize Anthropic's prompt caching and see related metrics, enable the beta feature in your client (`client = Anthropic().beta`) and define `cache_control` checkpoints in messages. tokenx maps `cache_read_input_tokens` to `cached_tokens` (for cost discounts if `cached_in` is defined) and includes `cache_creation_input_tokens` directly in metrics.\n- **API Types**: Supports Messages API.\n\n## \ud83d\udee0\ufe0f Advanced Configuration\n\n### Custom Pricing\n\nPrices are loaded from the `model_prices.yaml` file. You can update this file when new models are released or prices change:\n\n```yaml\nopenai:\n gpt-4o:\n sync:\n in: 2.50 # USD per million input tokens\n cached_in: 1.25 # USD per million cached tokens (OpenAI specific)\n out: 10.00 # USD per million output tokens\n\n # Audio models with dual token pricing\n gpt-4o-mini-transcribe:\n sync:\n in: 1.25 # USD per million input tokens (text tokens)\n out: 5.00 # USD per million output tokens (text tokens)\n audio_in: 3.00 # USD per million input tokens (audio tokens)\n\n gpt-4o-mini-tts:\n sync:\n in: 0.60 # USD per million input tokens (text tokens)\n audio_out: 12.00 # USD per million output tokens (audio tokens)\n\nanthropic:\n claude-3-haiku-20240307:\n sync:\n in: 0.25 # USD per million input tokens\n # cached_in: 0.10 # Example: if Anthropic offered a specific rate for cache_read_input_tokens\n out: 1.25 # USD per million output tokens\n # If 'cached_in' is not specified for an Anthropic model,\n # all input_tokens (including cache_read_input_tokens) are billed at the 'in' rate.\n```\n\n### How Pricing Configuration is loaded?\n\nBy default, tokenx will:\n- Check for updated pricing daily (24-hour cache)\n- Automatically fetch the latest prices from the official repository\n- Use stale cache if network is unavailable\n- Show clear error if no pricing data available\n\nThe price yaml will be automatically uploaded in the remote as the pricing information is updated by supported providers.\n\n```python\n# First run - fetches latest prices\nfrom tokenx import CostCalculator\ncalc = CostCalculator.for_provider(\"openai\", \"gpt-4o\")\n# Output: \"tokenx: Fetching latest model prices...\"\n# Output: \"tokenx: Cached latest prices locally.\"\n\n# Subsequent runs - uses cache\ncalc2 = CostCalculator.for_provider(\"openai\", \"gpt-4o-mini\")\n# Output: \"tokenx: Using cached model prices.\"\n```\n\n#### Custom Pricing Overrides\n\nFor enterprise users with custom pricing contracts:\n\n```bash\n# Set environment variable to your custom pricing file\nexport TOKENX_PRICES_PATH=\"/path/to/my-custom-pricing.yaml\"\n```\n\nCreate a YAML file with only the prices you want to override:\n\n```yaml\n# my-custom-pricing.yaml\nopenai:\n gpt-4o:\n sync:\n in: 1.50 # Custom rate: $1.50 per million tokens\n out: 8.00 # Custom rate: $8.00 per million tokens\n\n # You only need to specify what you're overriding\n # All other models use the standard pricing\n```\n\n## \ud83d\udcca Example Metrics Output\n\nWhen you use the decorators, you'll get a structured metrics dictionary:\n\n```python\n{\n \"provider\": \"openai\",\n \"model\": \"gpt-4o-mini\",\n \"tier\": \"sync\",\n \"input_tokens\": 12,\n \"output_tokens\": 48,\n \"cached_tokens\": 20, # New in v0.2.0\n \"cost_usd\": 0.000348, # $0.000348 USD\n \"latency_ms\": 543.21 # 543.21 milliseconds\n}\n```\n\nFor Anthropic (when prompt caching beta is active and cache is utilized):\n\n```python\n{\n \"provider\": \"anthropic\",\n \"model\": \"claude-3-haiku-20240307\",\n \"tier\": \"sync\",\n \"input_tokens\": 25, # Total input tokens for the request\n \"output_tokens\": 60,\n \"cached_tokens\": 10, # Populated from Anthropic's 'cache_read_input_tokens'\n \"cache_creation_input_tokens\": 15, # Anthropic's 'cache_creation_input_tokens'\n \"cost_usd\": 0.0000XX,\n \"usd\": 0.0000XX,\n \"latency_ms\": 750.21\n}\n```\n\nIf Anthropic's prompt caching beta is not used or no cache interaction occurs, cached_tokens will be 0 and cache_creation_input_tokens might be 0 or absent.\n\n## \ud83d\udd0c Adding a Custom Provider\n\nTokenX makes it easy to add support for new LLM providers using the `@register_provider` decorator:\n\n```python\nfrom tokenx.providers import register_provider\nfrom tokenx.providers.base import ProviderAdapter, Usage\nfrom typing import Any, Optional\n\n@register_provider(\"custom\")\nclass CustomAdapter(ProviderAdapter):\n @property\n def provider_name(self) -> str:\n return \"custom\"\n\n def usage_from_response(self, response: Any) -> Usage:\n # Extract standardized usage information from provider response\n return Usage(\n input_tokens=response.usage.input_tokens,\n output_tokens=response.usage.output_tokens,\n cached_tokens=getattr(response.usage, 'cached_tokens', 0),\n extra_fields={'provider': 'custom'}\n )\n\n def matches_function(self, func: Any, args: tuple, kwargs: dict) -> bool:\n # Detect if this function uses your provider\n return \"custom\" in kwargs.get(\"model\", \"\").lower()\n\n def detect_model(self, func: Any, args: tuple, kwargs: dict) -> Optional[str]:\n return kwargs.get(\"model\")\n\n def calculate_cost(self, model: str, input_tokens: int,\n output_tokens: int, cached_tokens: int = 0,\n tier: str = \"sync\") -> float:\n # Implement your pricing logic\n return input_tokens * 0.001 + output_tokens * 0.002\n\n# Now you can use it immediately\nfrom tokenx.metrics import measure_cost\n\n@measure_cost(provider=\"custom\", model=\"custom-model\")\ndef call_custom_llm():\n # Your API call here\n pass\n```\n\n## \ud83e\udd1d Contributing\n\n```bash\ngit clone https://github.com/dvlshah/tokenx.git\npre-commit install\npip install -e .[dev] # or `poetry install`\npytest -q && mypy src/\n```\n\nSee [CONTRIBUTING.md](https://github.com/dvlshah/tokenx/blob/main/docs/CONTRIBUTING.md) for details.\n\n---\n\n## \ud83d\udcdd Changelog\n\nSee [CHANGELOG.md](https://github.com/dvlshah/tokenx/blob/main/docs/CHANGELOG.md) for full history.\n\n---\n\n## \ud83d\udcdc License\n\nMIT \u00a9 2025 Deval Shah\n\n<p align=\"center\"><em>If tokenX saves you time or money, please consider <a href=\"https://github.com/sponsors/dvlshah\">sponsoring</a> or giving a \u2b50 \u2013 it really helps!</em></p>\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2025 YOUR_NAME\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \u201cSoftware\u201d), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.\n ",
"summary": "Cross-provider LLM token tracking and cost calculation",
"version": "0.2.7",
"project_urls": {
"Bug Tracker": "https://github.com/dvlshah/tokenx/issues",
"Changelog": "https://github.com/dvlshah/tokenx/blob/main/docs/CHANGELOG.md",
"Documentation": "https://github.com/dvlshah/tokenx#readme",
"Homepage": "https://github.com/dvlshah/tokenx",
"Repository": "https://github.com/dvlshah/tokenx"
},
"split_keywords": [
"openai",
" pricing",
" observability",
" llm",
" anthropic",
" gemini"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c6679f5abbb75d8c98bf014af3cfd6c1086d49f27f6e3436c524490a2d1a2426",
"md5": "194d2e152143a606a0551d9daa9bd0c6",
"sha256": "6c32c98663bce13bd9b8dd2aff96b91da1d296cf6e3e002e0e8aa837673b06e4"
},
"downloads": -1,
"filename": "tokenx_core-0.2.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "194d2e152143a606a0551d9daa9bd0c6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 35564,
"upload_time": "2025-08-02T09:09:55",
"upload_time_iso_8601": "2025-08-02T09:09:55.752364Z",
"url": "https://files.pythonhosted.org/packages/c6/67/9f5abbb75d8c98bf014af3cfd6c1086d49f27f6e3436c524490a2d1a2426/tokenx_core-0.2.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "45eafb11ca4fed5470c7768b0e6de5ff928371ebcb5139633a78aa611eda1179",
"md5": "cd0f3c0467d0cf5f70946e6a02825593",
"sha256": "bc77e36903362bbf647a134762cdd04789a0d6369960bb43920bf23e9da195ec"
},
"downloads": -1,
"filename": "tokenx_core-0.2.7.tar.gz",
"has_sig": false,
"md5_digest": "cd0f3c0467d0cf5f70946e6a02825593",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 60559,
"upload_time": "2025-08-02T09:09:56",
"upload_time_iso_8601": "2025-08-02T09:09:56.992041Z",
"url": "https://files.pythonhosted.org/packages/45/ea/fb11ca4fed5470c7768b0e6de5ff928371ebcb5139633a78aa611eda1179/tokenx_core-0.2.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-02 09:09:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dvlshah",
"github_project": "tokenx",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pyyaml",
"specs": [
[
">=",
"6.0"
]
]
},
{
"name": "tiktoken",
"specs": [
[
">=",
"0.4.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "pytest-mock",
"specs": [
[
">=",
"3.10.0"
]
]
},
{
"name": "pytest-asyncio",
"specs": [
[
">=",
"0.21.0"
]
]
}
],
"lcname": "tokenx-core"
}