Name | Version | Summary | date |
rs-bytepiece |
0.2.2 |
bytepiece-rs Python binding |
2023-11-12 08:52:37 |
count-tokens |
0.7.0 |
Count number of tokens in the text file using toktoken tokenizer from OpenAI. |
2023-09-26 11:16:08 |
UnicodeTokenizer |
0.2.1 |
UnicodeTokenizer: tokenize all Unicode text |
2023-09-20 21:46:37 |
semiformal |
0.7.0 |
Tokenizer for semiformal unicode text using TR-29 segmentation |
2023-08-20 08:00:32 |
tokenizer |
3.4.3 |
A tokenizer for Icelandic text |
2023-08-11 15:09:13 |
tokenstream |
1.6.0 |
A versatile token stream for handwritten parsers |
2023-08-02 18:52:57 |
Texo |
0.0.4 |
Sentiment Analysis Multiple language and for all products |
2023-07-09 15:34:19 |
nepalitokenizers |
0.0.1 |
Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers library for customizability. |
2023-06-23 22:51:43 |
Texo-v1 |
0.0.2 |
Sentiment Analysis Multiple language and for all products |
2023-06-18 16:12:15 |
pinyintokenizer |
0.0.2 |
Pinyin Tokenizer, chinese pinyin tokenizer |
2023-06-08 08:04:40 |
wyzard |
1.0 |
Run various transformers models from one packages. |
2023-05-18 06:10:17 |
botok |
0.8.12 |
Tibetan Word Tokenizer |
2023-05-17 11:36:37 |
gpt3-tokenizer |
0.1.4 |
Encoder/Decoder and tokens counter for GPT3 |
2023-05-16 00:50:21 |
ZiTokenizer |
0.0.8 |
ZiTokenizer: tokenize world text as Zi |
2023-04-20 17:50:08 |
zltk |
0.0.0 |
A collection of commonly used functions. |
2023-04-06 14:53:47 |
basictokenizer |
0.0.4 |
A basic and useful tokenizer. |
2023-02-06 23:07:12 |
code-tokenizers |
0.0.5 |
Aligning BPE and AST |
2023-02-06 03:46:53 |
space-wrap |
0.0.3 |
Automated Spacy wrapper to turn plain text into Spacy doc objects |
2023-01-27 21:46:55 |
ZiCutter |
0.0.8 |
ZiCutter: cut character smaller |
2023-01-10 18:10:33 |
segments |
2.2.1 |
|
2022-07-08 13:42:58 |