PyDigger - unearthing stuff about Python


NameVersionSummarydate
tokengeex 1.1.0 TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster. 2024-06-03 03:21:47
kotokenizer 0.1.1 Korean tokenizer, sentence classification, and spacing model. 2024-03-16 07:07:35
tokenizers-gt 0.15.2.post0 None 2024-02-19 01:41:23
sumire 1.0.2 Scikit-learn compatible Japanese text vectorizer for CPU-based Japanese natural language processing. 2024-01-31 14:38:04
mwtokenizer 0.2.0 Wikipedia Tokenizer Utility 2023-12-22 16:24:45
rs-bytepiece 0.2.2 bytepiece-rs Python binding 2023-11-12 08:52:37
UnicodeTokenizer 0.2.1 UnicodeTokenizer: tokenize all Unicode text 2023-09-20 21:46:37
semiformal 0.7.0 Tokenizer for semiformal unicode text using TR-29 segmentation 2023-08-20 08:00:32
tokenstream 1.6.0 A versatile token stream for handwritten parsers 2023-08-02 18:52:57
Texo 0.0.4 Sentiment Analysis Multiple language and for all products 2023-07-09 15:34:19
nepalitokenizers 0.0.1 Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers library for customizability. 2023-06-23 22:51:43
Texo-v1 0.0.2 Sentiment Analysis Multiple language and for all products 2023-06-18 16:12:15
wyzard 1.0 Run various transformers models from one packages. 2023-05-18 06:10:17
botok 0.8.12 Tibetan Word Tokenizer 2023-05-17 11:36:37
gpt3-tokenizer 0.1.4 Encoder/Decoder and tokens counter for GPT3 2023-05-16 00:50:21
ZiTokenizer 0.0.8 ZiTokenizer: tokenize world text as Zi 2023-04-20 17:50:08
zltk 0.0.0 A collection of commonly used functions. 2023-04-06 14:53:47
basictokenizer 0.0.4 A basic and useful tokenizer. 2023-02-06 23:07:12
code-tokenizers 0.0.5 Aligning BPE and AST 2023-02-06 03:46:53
space-wrap 0.0.3 Automated Spacy wrapper to turn plain text into Spacy doc objects 2023-01-27 21:46:55
hourdayweektotal
5115139201332476
Elapsed time: 4.99937s