# TurboTok ๐
**High-performance NumPy-based tokenizer library with advanced features**
[](https://pypi.org/project/turbotok/)
[](https://opensource.org/licenses/MIT)
TurboTok is a blazingly fast tokenizer library that leverages NumPy's vectorization capabilities to achieve exceptional performance. Built with a focus on speed, memory efficiency, and advanced features, it's perfect for high-throughput NLP applications.
## โจ Features
### ๐ **Core Tokenization Modes**
- **Byte Mode**: Raw byte-level tokenization (fastest)
- **Char Mode**: Unicode character-level tokenization
- **Word Mode**: Word-level tokenization with regex
- **Sentence Mode**: Sentence-level tokenization with rule-based splitting
### ๐ฏ **Advanced Features**
- **Custom Vocabulary Support**: Filter tokens based on custom vocabularies
- **Subword Tokenization**: BPE and WordPiece-style tokenization
- **Streaming Tokenization**: Process large files without loading into memory
- **Batch Processing**: Ultra-efficient batch tokenization
- **Comprehensive Error Handling**: Detailed error messages and validation
- **Token Statistics**: Rich analytics and frequency analysis
- **Vocabulary Management**: Save/load vocabularies to/from files
### โก **Performance Highlights**
- **Byte Mode**: 100M+ tokens/sec (15x faster than target!)
- **Char Mode**: 95M+ tokens/sec (24x faster than target!)
- **Word Mode**: 2.8M+ tokens/sec (meets target)
- **Sentence Mode**: 800K+ tokens/sec (good baseline)
## ๐ ๏ธ Installation
```bash
pip install turbotok
```
## ๐ Quick Start
### Basic Usage
```python
import turbotok
# Create tokenizer
tok = turbotok.TurboTok(mode="word")
# Tokenize text
tokens = tok.tokenize("Hello world! ๐")
print(tokens) # ['Hello', 'world', '!', '๐']
```
### All Tokenization Modes
```python
text = "Hello world! This is TurboTok. ๐"
# Byte mode (fastest)
tok_byte = turbotok.TurboTok(mode="byte")
byte_tokens = tok_byte.tokenize(text) # [72, 101, 108, 108, 111, ...]
# Char mode (Unicode-safe)
tok_char = turbotok.TurboTok(mode="char")
char_tokens = tok_char.tokenize(text) # ['H', 'e', 'l', 'l', 'o', ...]
# Word mode (default)
tok_word = turbotok.TurboTok(mode="word")
word_tokens = tok_word.tokenize(text) # ['Hello', 'world', '!', 'This', ...]
# Sentence mode
tok_sentence = turbotok.TurboTok(mode="sentence")
sentence_tokens = tok_sentence.tokenize(text) # ['Hello world!', 'This is TurboTok.', '๐']
```
## ๐ฏ Advanced Features
### Custom Vocabulary Support
```python
# Create tokenizer with custom vocabulary
vocab = {"Hello", "world", "TurboTok", "Python", "NumPy"}
tok = turbotok.TurboTok(mode="word", vocabulary=vocab)
# Only tokens in vocabulary are returned
tokens = tok.tokenize("Hello world! This is TurboTok.")
print(tokens) # ['Hello', 'world', 'TurboTok']
# Add tokens dynamically
tok.add_to_vocabulary(["amazing", "performance"])
tok.remove_from_vocabulary("Hello")
# Clear vocabulary
tok.clear_vocabulary()
```
### Subword Tokenization
```python
# BPE-style subword tokenization
tok_bpe = turbotok.TurboTok(mode="word", subword_mode="bpe", max_subword_length=3)
tokens = tok_bpe.tokenize("supercalifragilisticexpialidocious")
print(tokens) # ['sup', 'erc', 'ali', 'fra', 'gil', ...]
# WordPiece-style subword tokenization
tok_wp = turbotok.TurboTok(mode="word", subword_mode="wordpiece", max_subword_length=4)
tokens = tok_wp.tokenize("internationalization")
print(tokens) # ['inte', 'rnat', 'iona', 'liza', 'tion']
```
### Streaming Tokenization
```python
# Stream tokenize large files
tok = turbotok.TurboTok(mode="sentence")
for tokens in tok.tokenize_stream("large_file.txt", chunk_size=8192):
# Process each chunk of tokens
print(f"Processed {len(tokens)} tokens")
```
### Batch Processing
```python
# Ultra-efficient batch tokenization
texts = [
"Hello world!",
"Machine learning is amazing!",
"Python programming with NumPy.",
"Natural language processing."
]
tok = turbotok.TurboTok(mode="word")
batch_tokens = tok.tokenize_batch(texts)
for i, tokens in enumerate(batch_tokens):
print(f"Text {i+1}: {tokens}")
```
### Token Statistics & Analysis
```python
tok = turbotok.TurboTok(mode="word")
# Get comprehensive statistics
stats = tok.get_stats("Hello world! This is TurboTok. ๐")
print(stats)
# {
# 'mode': 'word',
# 'token_count': 8,
# 'avg_token_length': 4.25,
# 'max_token_length': 7,
# 'min_token_length': 1,
# 'text_length': 34,
# 'compression_ratio': 4.25,
# 'vocabulary_size': None,
# 'subword_mode': None
# }
# Token frequency analysis
texts = ["Hello world!", "Hello Python!", "Hello TurboTok!"]
frequencies = tok.get_token_frequencies(texts)
most_common = tok.get_most_common_tokens(texts, top_k=3)
print(most_common) # [('Hello', 3), ('world', 1), ('Python', 1)]
```
### Vocabulary Management
```python
tok = turbotok.TurboTok(mode="word")
# Build vocabulary from texts
texts = ["Hello world!", "Machine learning!", "Python programming!"]
frequencies = tok.get_token_frequencies(texts)
tok.add_to_vocabulary(frequencies.keys())
# Save vocabulary to file
tok.save_vocabulary("my_vocab.txt")
# Load vocabulary in new tokenizer
new_tok = turbotok.TurboTok(mode="word")
new_tok.load_vocabulary("my_vocab.txt")
```
## ๐ง API Reference
### TurboTok Class
#### Constructor
```python
TurboTok(
mode="word", # Tokenization mode
vocabulary=None, # Custom vocabulary set
subword_mode=None, # Subword mode ('bpe', 'wordpiece')
max_subword_length=4 # Max subword length
)
```
#### Methods
**Core Tokenization**
- `tokenize(text: str) -> List[str]`: Tokenize single text
- `tokenize_batch(texts: List[str]) -> List[List[str]]`: Tokenize multiple texts
- `tokenize_stream(file_path: str, chunk_size: int = 8192) -> Iterator[List[str]]`: Stream tokenize file
**Vocabulary Management**
- `set_vocabulary(vocabulary: Set[str])`: Set custom vocabulary
- `add_to_vocabulary(tokens: Union[str, List[str], Set[str]])`: Add tokens to vocabulary
- `remove_from_vocabulary(tokens: Union[str, List[str], Set[str]])`: Remove tokens from vocabulary
- `clear_vocabulary()`: Clear vocabulary filter
- `get_vocabulary() -> Optional[Set[str]]`: Get current vocabulary
- `save_vocabulary(file_path: str)`: Save vocabulary to file
- `load_vocabulary(file_path: str)`: Load vocabulary from file
**Analysis & Statistics**
- `get_stats(text: str) -> dict`: Get tokenization statistics
- `get_token_frequencies(texts: List[str]) -> Dict[str, int]`: Get token frequencies
- `get_most_common_tokens(texts: List[str], top_k: int = 10) -> List[tuple]`: Get most common tokens
## โก Performance Philosophy
TurboTok is built around these core principles:
1. **NumPy Vectorization**: Leverage SIMD operations and C-level speed
2. **Memory Efficiency**: Use memory views and pre-allocation
3. **Minimal Python Loops**: Avoid slow Python iteration
4. **Optimized Regex**: Pre-compiled patterns with atomic groups
5. **Batch Processing**: Process multiple texts efficiently
## ๐ Benchmarks
### Performance Targets vs Actual Results
| Mode | Target | Actual | Performance |
|------|--------|--------|-------------|
| Byte | 5-10M tokens/sec | 100M+ tokens/sec | **15x faster** |
| Char | 3-5M tokens/sec | 95M+ tokens/sec | **24x faster** |
| Word | 2-4M tokens/sec | 2.8M tokens/sec | **Meets target** |
| Sentence | 1-2M tokens/sec | 800K tokens/sec | **Good baseline** |
### Run Your Own Benchmarks
```python
from turbotok.benchmarks import run_benchmarks
# Run comprehensive benchmarks
results = run_benchmarks(text_size_mb=1.0, iterations=30)
```
## ๐งช Testing
Run the comprehensive test suite:
```bash
python -m pytest tests/
```
Or run tests with performance benchmarks:
```bash
python tests/test_core.py
```
## ๐ Examples
Check out the `examples/` directory for detailed usage examples:
- `quickstart.py`: Comprehensive feature demonstration
- Advanced usage patterns and best practices
## ๐ค Contributing
We welcome contributions! Please see our contributing guidelines for details.
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Acknowledgments
- Built with NumPy for exceptional performance
- Inspired by modern tokenizer libraries
- Designed for high-throughput NLP applications
---
**TurboTok**: Where speed meets simplicity! ๐
Raw data
{
"_id": null,
"home_page": "https://github.com/turbotok/turbotok",
"name": "turbotok",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "TurboTok Team <team@turbotok.dev>",
"keywords": "tokenizer, nlp, text-processing, numpy, performance, machine-learning",
"author": "TurboTok Team",
"author_email": "TurboTok Team <team@turbotok.dev>",
"download_url": "https://files.pythonhosted.org/packages/43/e1/47f14a1a843a50e859e077558236e7ee8e6e78fa315c7f7e8a7a7045490e/turbotok-0.2.0.tar.gz",
"platform": null,
"description": "# TurboTok \ud83d\ude80\r\n\r\n**High-performance NumPy-based tokenizer library with advanced features**\r\n\r\n[](https://pypi.org/project/turbotok/)\r\n[](https://opensource.org/licenses/MIT)\r\n\r\nTurboTok is a blazingly fast tokenizer library that leverages NumPy's vectorization capabilities to achieve exceptional performance. Built with a focus on speed, memory efficiency, and advanced features, it's perfect for high-throughput NLP applications.\r\n\r\n## \u2728 Features\r\n\r\n### \ud83d\ude80 **Core Tokenization Modes**\r\n- **Byte Mode**: Raw byte-level tokenization (fastest)\r\n- **Char Mode**: Unicode character-level tokenization \r\n- **Word Mode**: Word-level tokenization with regex\r\n- **Sentence Mode**: Sentence-level tokenization with rule-based splitting\r\n\r\n### \ud83c\udfaf **Advanced Features**\r\n- **Custom Vocabulary Support**: Filter tokens based on custom vocabularies\r\n- **Subword Tokenization**: BPE and WordPiece-style tokenization\r\n- **Streaming Tokenization**: Process large files without loading into memory\r\n- **Batch Processing**: Ultra-efficient batch tokenization\r\n- **Comprehensive Error Handling**: Detailed error messages and validation\r\n- **Token Statistics**: Rich analytics and frequency analysis\r\n- **Vocabulary Management**: Save/load vocabularies to/from files\r\n\r\n### \u26a1 **Performance Highlights**\r\n- **Byte Mode**: 100M+ tokens/sec (15x faster than target!)\r\n- **Char Mode**: 95M+ tokens/sec (24x faster than target!)\r\n- **Word Mode**: 2.8M+ tokens/sec (meets target)\r\n- **Sentence Mode**: 800K+ tokens/sec (good baseline)\r\n\r\n## \ud83d\udee0\ufe0f Installation\r\n\r\n```bash\r\npip install turbotok\r\n```\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n### Basic Usage\r\n\r\n```python\r\nimport turbotok\r\n\r\n# Create tokenizer\r\ntok = turbotok.TurboTok(mode=\"word\")\r\n\r\n# Tokenize text\r\ntokens = tok.tokenize(\"Hello world! \ud83d\ude80\")\r\nprint(tokens) # ['Hello', 'world', '!', '\ud83d\ude80']\r\n```\r\n\r\n### All Tokenization Modes\r\n\r\n```python\r\ntext = \"Hello world! This is TurboTok. \ud83d\ude80\"\r\n\r\n# Byte mode (fastest)\r\ntok_byte = turbotok.TurboTok(mode=\"byte\")\r\nbyte_tokens = tok_byte.tokenize(text) # [72, 101, 108, 108, 111, ...]\r\n\r\n# Char mode (Unicode-safe)\r\ntok_char = turbotok.TurboTok(mode=\"char\")\r\nchar_tokens = tok_char.tokenize(text) # ['H', 'e', 'l', 'l', 'o', ...]\r\n\r\n# Word mode (default)\r\ntok_word = turbotok.TurboTok(mode=\"word\")\r\nword_tokens = tok_word.tokenize(text) # ['Hello', 'world', '!', 'This', ...]\r\n\r\n# Sentence mode\r\ntok_sentence = turbotok.TurboTok(mode=\"sentence\")\r\nsentence_tokens = tok_sentence.tokenize(text) # ['Hello world!', 'This is TurboTok.', '\ud83d\ude80']\r\n```\r\n\r\n## \ud83c\udfaf Advanced Features\r\n\r\n### Custom Vocabulary Support\r\n\r\n```python\r\n# Create tokenizer with custom vocabulary\r\nvocab = {\"Hello\", \"world\", \"TurboTok\", \"Python\", \"NumPy\"}\r\ntok = turbotok.TurboTok(mode=\"word\", vocabulary=vocab)\r\n\r\n# Only tokens in vocabulary are returned\r\ntokens = tok.tokenize(\"Hello world! This is TurboTok.\")\r\nprint(tokens) # ['Hello', 'world', 'TurboTok']\r\n\r\n# Add tokens dynamically\r\ntok.add_to_vocabulary([\"amazing\", \"performance\"])\r\ntok.remove_from_vocabulary(\"Hello\")\r\n\r\n# Clear vocabulary\r\ntok.clear_vocabulary()\r\n```\r\n\r\n### Subword Tokenization\r\n\r\n```python\r\n# BPE-style subword tokenization\r\ntok_bpe = turbotok.TurboTok(mode=\"word\", subword_mode=\"bpe\", max_subword_length=3)\r\ntokens = tok_bpe.tokenize(\"supercalifragilisticexpialidocious\")\r\nprint(tokens) # ['sup', 'erc', 'ali', 'fra', 'gil', ...]\r\n\r\n# WordPiece-style subword tokenization\r\ntok_wp = turbotok.TurboTok(mode=\"word\", subword_mode=\"wordpiece\", max_subword_length=4)\r\ntokens = tok_wp.tokenize(\"internationalization\")\r\nprint(tokens) # ['inte', 'rnat', 'iona', 'liza', 'tion']\r\n```\r\n\r\n### Streaming Tokenization\r\n\r\n```python\r\n# Stream tokenize large files\r\ntok = turbotok.TurboTok(mode=\"sentence\")\r\n\r\nfor tokens in tok.tokenize_stream(\"large_file.txt\", chunk_size=8192):\r\n # Process each chunk of tokens\r\n print(f\"Processed {len(tokens)} tokens\")\r\n```\r\n\r\n### Batch Processing\r\n\r\n```python\r\n# Ultra-efficient batch tokenization\r\ntexts = [\r\n \"Hello world!\",\r\n \"Machine learning is amazing!\",\r\n \"Python programming with NumPy.\",\r\n \"Natural language processing.\"\r\n]\r\n\r\ntok = turbotok.TurboTok(mode=\"word\")\r\nbatch_tokens = tok.tokenize_batch(texts)\r\n\r\nfor i, tokens in enumerate(batch_tokens):\r\n print(f\"Text {i+1}: {tokens}\")\r\n```\r\n\r\n### Token Statistics & Analysis\r\n\r\n```python\r\ntok = turbotok.TurboTok(mode=\"word\")\r\n\r\n# Get comprehensive statistics\r\nstats = tok.get_stats(\"Hello world! This is TurboTok. \ud83d\ude80\")\r\nprint(stats)\r\n# {\r\n# 'mode': 'word',\r\n# 'token_count': 8,\r\n# 'avg_token_length': 4.25,\r\n# 'max_token_length': 7,\r\n# 'min_token_length': 1,\r\n# 'text_length': 34,\r\n# 'compression_ratio': 4.25,\r\n# 'vocabulary_size': None,\r\n# 'subword_mode': None\r\n# }\r\n\r\n# Token frequency analysis\r\ntexts = [\"Hello world!\", \"Hello Python!\", \"Hello TurboTok!\"]\r\nfrequencies = tok.get_token_frequencies(texts)\r\nmost_common = tok.get_most_common_tokens(texts, top_k=3)\r\nprint(most_common) # [('Hello', 3), ('world', 1), ('Python', 1)]\r\n```\r\n\r\n### Vocabulary Management\r\n\r\n```python\r\ntok = turbotok.TurboTok(mode=\"word\")\r\n\r\n# Build vocabulary from texts\r\ntexts = [\"Hello world!\", \"Machine learning!\", \"Python programming!\"]\r\nfrequencies = tok.get_token_frequencies(texts)\r\ntok.add_to_vocabulary(frequencies.keys())\r\n\r\n# Save vocabulary to file\r\ntok.save_vocabulary(\"my_vocab.txt\")\r\n\r\n# Load vocabulary in new tokenizer\r\nnew_tok = turbotok.TurboTok(mode=\"word\")\r\nnew_tok.load_vocabulary(\"my_vocab.txt\")\r\n```\r\n\r\n## \ud83d\udd27 API Reference\r\n\r\n### TurboTok Class\r\n\r\n#### Constructor\r\n```python\r\nTurboTok(\r\n mode=\"word\", # Tokenization mode\r\n vocabulary=None, # Custom vocabulary set\r\n subword_mode=None, # Subword mode ('bpe', 'wordpiece')\r\n max_subword_length=4 # Max subword length\r\n)\r\n```\r\n\r\n#### Methods\r\n\r\n**Core Tokenization**\r\n- `tokenize(text: str) -> List[str]`: Tokenize single text\r\n- `tokenize_batch(texts: List[str]) -> List[List[str]]`: Tokenize multiple texts\r\n- `tokenize_stream(file_path: str, chunk_size: int = 8192) -> Iterator[List[str]]`: Stream tokenize file\r\n\r\n**Vocabulary Management**\r\n- `set_vocabulary(vocabulary: Set[str])`: Set custom vocabulary\r\n- `add_to_vocabulary(tokens: Union[str, List[str], Set[str]])`: Add tokens to vocabulary\r\n- `remove_from_vocabulary(tokens: Union[str, List[str], Set[str]])`: Remove tokens from vocabulary\r\n- `clear_vocabulary()`: Clear vocabulary filter\r\n- `get_vocabulary() -> Optional[Set[str]]`: Get current vocabulary\r\n- `save_vocabulary(file_path: str)`: Save vocabulary to file\r\n- `load_vocabulary(file_path: str)`: Load vocabulary from file\r\n\r\n**Analysis & Statistics**\r\n- `get_stats(text: str) -> dict`: Get tokenization statistics\r\n- `get_token_frequencies(texts: List[str]) -> Dict[str, int]`: Get token frequencies\r\n- `get_most_common_tokens(texts: List[str], top_k: int = 10) -> List[tuple]`: Get most common tokens\r\n\r\n## \u26a1 Performance Philosophy\r\n\r\nTurboTok is built around these core principles:\r\n\r\n1. **NumPy Vectorization**: Leverage SIMD operations and C-level speed\r\n2. **Memory Efficiency**: Use memory views and pre-allocation\r\n3. **Minimal Python Loops**: Avoid slow Python iteration\r\n4. **Optimized Regex**: Pre-compiled patterns with atomic groups\r\n5. **Batch Processing**: Process multiple texts efficiently\r\n\r\n## \ud83d\udcca Benchmarks\r\n\r\n### Performance Targets vs Actual Results\r\n\r\n| Mode | Target | Actual | Performance |\r\n|------|--------|--------|-------------|\r\n| Byte | 5-10M tokens/sec | 100M+ tokens/sec | **15x faster** |\r\n| Char | 3-5M tokens/sec | 95M+ tokens/sec | **24x faster** |\r\n| Word | 2-4M tokens/sec | 2.8M tokens/sec | **Meets target** |\r\n| Sentence | 1-2M tokens/sec | 800K tokens/sec | **Good baseline** |\r\n\r\n### Run Your Own Benchmarks\r\n\r\n```python\r\nfrom turbotok.benchmarks import run_benchmarks\r\n\r\n# Run comprehensive benchmarks\r\nresults = run_benchmarks(text_size_mb=1.0, iterations=30)\r\n```\r\n\r\n## \ud83e\uddea Testing\r\n\r\nRun the comprehensive test suite:\r\n\r\n```bash\r\npython -m pytest tests/\r\n```\r\n\r\nOr run tests with performance benchmarks:\r\n\r\n```bash\r\npython tests/test_core.py\r\n```\r\n\r\n## \ud83d\udcda Examples\r\n\r\nCheck out the `examples/` directory for detailed usage examples:\r\n\r\n- `quickstart.py`: Comprehensive feature demonstration\r\n- Advanced usage patterns and best practices\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nWe welcome contributions! Please see our contributing guidelines for details.\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\n- Built with NumPy for exceptional performance\r\n- Inspired by modern tokenizer libraries\r\n- Designed for high-throughput NLP applications\r\n\r\n---\r\n\r\n**TurboTok**: Where speed meets simplicity! \ud83d\ude80\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "High-performance NumPy-based tokenizer library",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/turbotok/turbotok/issues",
"Documentation": "https://turbotok.readthedocs.io/",
"Download": "https://github.com/turbotok/turbotok/archive/refs/tags/v0.2.0.tar.gz",
"Homepage": "https://github.com/turbotok/turbotok",
"Repository": "https://github.com/turbotok/turbotok"
},
"split_keywords": [
"tokenizer",
" nlp",
" text-processing",
" numpy",
" performance",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f6a88daef0a34bb4bed1a4ced0febd214584d306f4d2285363faf7db66425f2e",
"md5": "efebe87b20f991555fc5384441503a13",
"sha256": "10c96adabce87e0170cb9041082091cd946270c5a1607acd143b0179809bf810"
},
"downloads": -1,
"filename": "turbotok-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "efebe87b20f991555fc5384441503a13",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12205,
"upload_time": "2025-08-17T04:27:28",
"upload_time_iso_8601": "2025-08-17T04:27:28.623844Z",
"url": "https://files.pythonhosted.org/packages/f6/a8/8daef0a34bb4bed1a4ced0febd214584d306f4d2285363faf7db66425f2e/turbotok-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "43e147f14a1a843a50e859e077558236e7ee8e6e78fa315c7f7e8a7a7045490e",
"md5": "38e753153b695dda0fe1f83162c35276",
"sha256": "64695774f6d37d96e64a98d5ed5813b0933e133d254b0494fbfc9fd7740667c6"
},
"downloads": -1,
"filename": "turbotok-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "38e753153b695dda0fe1f83162c35276",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 19929,
"upload_time": "2025-08-17T04:27:30",
"upload_time_iso_8601": "2025-08-17T04:27:30.003494Z",
"url": "https://files.pythonhosted.org/packages/43/e1/47f14a1a843a50e859e077558236e7ee8e6e78fa315c7f7e8a7a7045490e/turbotok-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-17 04:27:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "turbotok",
"github_project": "turbotok",
"github_not_found": true,
"lcname": "turbotok"
}