Name | Version | Summary | date |
nupunkt-rs |
0.1.1 |
High-performance Rust implementation of nupunkt sentence/paragraph tokenization |
2025-08-16 02:05:56 |
llmbuilder |
0.4.6 |
A comprehensive toolkit for building, training, and deploying language models |
2025-08-14 20:16:12 |
tokker |
0.3.9 |
Tokker: a fast local-first CLI tokenizer with all the best models in one place |
2025-08-09 17:59:33 |
chunkipy |
1.0.0.post1 |
Chunkipy is an easy-to-use library for chunking text based on the size estimator function you provide. |
2025-08-08 12:37:03 |
maze-dataset |
1.4.0 |
generating and working with datasets of mazes |
2025-08-06 23:08:57 |
ultranlp |
1.0.6 |
Ultra-fast, comprehensive NLP preprocessing library with advanced tokenization |
2025-08-02 10:21:43 |
sakurs |
0.1.1 |
Fast, parallel sentence boundary detection using Delta-Stack Monoid algorithm |
2025-07-27 15:33:39 |
llmvision |
0.1.1 |
Visualize how LLMs tokenize text |
2025-07-26 07:34:54 |
miditok |
3.0.6.post1 |
MIDI / symbolic music tokenizers for Deep Learning models. |
2025-07-22 12:41:12 |
rs-bpe |
0.1.0 |
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust |
2025-03-19 05:58:24 |
nlpashto |
0.0.25 |
Pashto Natural Language Processing Toolkit |
2025-02-01 11:17:44 |
code-tokenize |
0.2.1 |
Fast program tokenization and structural analysis in Python |
2025-01-14 09:17:25 |
rftokenizer |
2.3.0 |
A character-wise tokenizer for morphologically rich languages |
2024-12-17 19:05:30 |
alphacodings |
0.2.0 |
base26 ([A-Z]) and base52 ([A-Za-z]) encodings |
2024-12-09 03:04:43 |
QuickBPE |
2.1 |
A fast BPE implementation in C |
2024-12-05 11:37:29 |
zhon |
2.1.1 |
Zhon provides constants used in Chinese text processing. |
2024-11-20 00:29:10 |
llama-tokens |
0.0.3 |
A Quick Library with Llama 3.1/3.2 Tokenization - source https://github.com/jeffxtang/llama-tokens |
2024-11-10 17:03:39 |
eKoNLPy |
2.0.6 |
A Korean natural language processing toolkit for economic analysis |
2024-11-02 00:02:54 |
huspacy |
0.12.0 |
HuSpaCy: industrial strength Hungarian natural language processing |
2024-10-28 10:30:55 |
taibun |
1.1.7 |
Taiwanese Hokkien Transliterator and Tokeniser |
2024-08-31 20:25:01 |