PyDigger - unearthing stuff about Python


NameVersionSummarydate
nupunkt-rs 0.1.1 High-performance Rust implementation of nupunkt sentence/paragraph tokenization 2025-08-16 02:05:56
llmbuilder 0.4.6 A comprehensive toolkit for building, training, and deploying language models 2025-08-14 20:16:12
tokker 0.3.9 Tokker: a fast local-first CLI tokenizer with all the best models in one place 2025-08-09 17:59:33
chunkipy 1.0.0.post1 Chunkipy is an easy-to-use library for chunking text based on the size estimator function you provide. 2025-08-08 12:37:03
maze-dataset 1.4.0 generating and working with datasets of mazes 2025-08-06 23:08:57
ultranlp 1.0.6 Ultra-fast, comprehensive NLP preprocessing library with advanced tokenization 2025-08-02 10:21:43
sakurs 0.1.1 Fast, parallel sentence boundary detection using Delta-Stack Monoid algorithm 2025-07-27 15:33:39
llmvision 0.1.1 Visualize how LLMs tokenize text 2025-07-26 07:34:54
miditok 3.0.6.post1 MIDI / symbolic music tokenizers for Deep Learning models. 2025-07-22 12:41:12
rs-bpe 0.1.0 A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust 2025-03-19 05:58:24
nlpashto 0.0.25 Pashto Natural Language Processing Toolkit 2025-02-01 11:17:44
code-tokenize 0.2.1 Fast program tokenization and structural analysis in Python 2025-01-14 09:17:25
rftokenizer 2.3.0 A character-wise tokenizer for morphologically rich languages 2024-12-17 19:05:30
alphacodings 0.2.0 base26 ([A-Z]) and base52 ([A-Za-z]) encodings 2024-12-09 03:04:43
QuickBPE 2.1 A fast BPE implementation in C 2024-12-05 11:37:29
zhon 2.1.1 Zhon provides constants used in Chinese text processing. 2024-11-20 00:29:10
llama-tokens 0.0.3 A Quick Library with Llama 3.1/3.2 Tokenization - source https://github.com/jeffxtang/llama-tokens 2024-11-10 17:03:39
eKoNLPy 2.0.6 A Korean natural language processing toolkit for economic analysis 2024-11-02 00:02:54
huspacy 0.12.0 HuSpaCy: industrial strength Hungarian natural language processing 2024-10-28 10:30:55
taibun 1.1.7 Taiwanese Hokkien Transliterator and Tokeniser 2024-08-31 20:25:01
hourdayweektotal
120186710227312775
Elapsed time: 3.31006s