PyDigger - unearthing stuff about Python


NameVersionSummarydate
llmvision 0.1.1 Visualize how LLMs tokenize text 2025-07-26 07:34:54
miditok 3.0.6.post1 MIDI / symbolic music tokenizers for Deep Learning models. 2025-07-22 12:41:12
rs-bpe 0.1.0 A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust 2025-03-19 05:58:24
nlpashto 0.0.25 Pashto Natural Language Processing Toolkit 2025-02-01 11:17:44
code-tokenize 0.2.1 Fast program tokenization and structural analysis in Python 2025-01-14 09:17:25
rftokenizer 2.3.0 A character-wise tokenizer for morphologically rich languages 2024-12-17 19:05:30
alphacodings 0.2.0 base26 ([A-Z]) and base52 ([A-Za-z]) encodings 2024-12-09 03:04:43
QuickBPE 2.1 A fast BPE implementation in C 2024-12-05 11:37:29
zhon 2.1.1 Zhon provides constants used in Chinese text processing. 2024-11-20 00:29:10
llama-tokens 0.0.3 A Quick Library with Llama 3.1/3.2 Tokenization - source https://github.com/jeffxtang/llama-tokens 2024-11-10 17:03:39
eKoNLPy 2.0.6 A Korean natural language processing toolkit for economic analysis 2024-11-02 00:02:54
huspacy 0.12.0 HuSpaCy: industrial strength Hungarian natural language processing 2024-10-28 10:30:55
maze-dataset 1.1.0 generating and working with datasets of mazes 2024-09-10 19:33:49
taibun 1.1.7 Taiwanese Hokkien Transliterator and Tokeniser 2024-08-31 20:25:01
bpeasy 0.1.3 Fast bare-bones BPE for modern tokenizer training 2024-08-23 10:47:52
simplemma 1.1.1 A lightweight toolkit for multilingual lemmatization and language detection. 2024-08-08 12:20:45
textmate-grammar-python 0.6.1 A lexer and tokenizer for grammar files as defined by TextMate and used in VSCode, implemented in Python. 2024-07-31 18:43:24
process-twarc 0.20.2 Tools for transforming raw data from Twarc2 to structured data for Masked Language Modeling. 2024-06-12 11:40:55
example990420 1.1.1 Taiwanese Hokkien Transliterator and Tokeniser 2024-05-01 20:28:38
hourdayweektotal
49128510307302937
Elapsed time: 2.39658s