PyDigger - unearthing stuff about Python

Found 45 out of 333,603. Showing 20 on page 1. Total pages: 3.

Name	Version	Summary	date
whitespacetokenizer	1.0.5	Fast python whitespace tokenizer wtitten in cython.	2025-10-29 20:45:56
turkish-tokenizer	0.2.26	Turkish tokenizer for Turkish language processing	2025-09-03 10:32:03
tocount	0.1	ToCount: Lightweight Token Estimator	2025-08-30 16:21:30
turbotok	0.2.0	High-performance NumPy-based tokenizer library	2025-08-17 04:27:30
ultra-tokenizer	0.1.3	Advanced tokenizer with support for BPE, WordPiece, and Unigram algorithms	2025-08-16 09:58:51
smoltoken	0.1.3	A light-weight & fast library for Byte Pair Encoding (BPE) tokenization.	2025-02-06 04:43:52
alta-tokenizer	1.1	ALTA tokenizer for encoding and decoding Kinyarwanda language text	2025-01-26 13:59:56
text2text	1.9.4	Text2Text Language Modeling Toolkit	2025-01-12 22:34:27
tokenizerchanger	1.0.1	Library for manipulating the existing tokenizer.	2024-12-28 00:02:35
kitoken	0.10.1	Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization	2024-12-20 03:07:14
python-ucto	0.6.9	This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).	2024-12-17 11:56:39
autotiktokenizer	0.2.1	🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨	2024-11-11 21:16:02
StrTokenizer	1.1.0	A Python equivalent of Java's StringTokenizer with some added functionality	2024-10-14 18:37:19
tiniestsegmenter	0.3.0	Compact Japanese segmenter	2024-09-24 13:39:24
TokenizerChanger	0.3.4	Library for manipulating the existing tokenizer.	2024-08-27 23:47:57
bpeasy	0.1.3	Fast bare-bones BPE for modern tokenizer training	2024-08-23 10:47:52
kin-tokenizer	3.3.1	Kinyarwanda tokenizer for encoding and decoding Kinyarwanda language text	2024-08-16 10:14:03
dtokenizer	0.0.6	None	2024-08-13 10:26:00
SoMaJo	2.4.3	A tokenizer and sentence splitter for German and English web and social media texts.	2024-08-05 06:41:55
cpp-protein-encoders	0.0.1	Fast Python-wrapped C++ basic encoders for protein sequences	2024-07-22 04:15:17

Found 45 out of 333,603. Showing 20 on page 1. Total pages: 3.

first prev next last