| Name | Version | Summary | date |
| tokengeex |
1.1.0 |
TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster. |
2024-06-03 03:21:47 |
| kotokenizer |
0.1.1 |
Korean tokenizer, sentence classification, and spacing model. |
2024-03-16 07:07:35 |
| tokenizers-gt |
0.15.2.post0 |
None |
2024-02-19 01:41:23 |
| sumire |
1.0.2 |
Scikit-learn compatible Japanese text vectorizer for CPU-based Japanese natural language processing. |
2024-01-31 14:38:04 |
| mwtokenizer |
0.2.0 |
Wikipedia Tokenizer Utility |
2023-12-22 16:24:45 |
| rs-bytepiece |
0.2.2 |
bytepiece-rs Python binding |
2023-11-12 08:52:37 |
| UnicodeTokenizer |
0.2.1 |
UnicodeTokenizer: tokenize all Unicode text |
2023-09-20 21:46:37 |
| semiformal |
0.7.0 |
Tokenizer for semiformal unicode text using TR-29 segmentation |
2023-08-20 08:00:32 |
| tokenstream |
1.6.0 |
A versatile token stream for handwritten parsers |
2023-08-02 18:52:57 |
| Texo |
0.0.4 |
Sentiment Analysis Multiple language and for all products |
2023-07-09 15:34:19 |
| nepalitokenizers |
0.0.1 |
Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers library for customizability. |
2023-06-23 22:51:43 |
| Texo-v1 |
0.0.2 |
Sentiment Analysis Multiple language and for all products |
2023-06-18 16:12:15 |
| wyzard |
1.0 |
Run various transformers models from one packages. |
2023-05-18 06:10:17 |
| botok |
0.8.12 |
Tibetan Word Tokenizer |
2023-05-17 11:36:37 |
| gpt3-tokenizer |
0.1.4 |
Encoder/Decoder and tokens counter for GPT3 |
2023-05-16 00:50:21 |
| ZiTokenizer |
0.0.8 |
ZiTokenizer: tokenize world text as Zi |
2023-04-20 17:50:08 |
| zltk |
0.0.0 |
A collection of commonly used functions. |
2023-04-06 14:53:47 |
| basictokenizer |
0.0.4 |
A basic and useful tokenizer. |
2023-02-06 23:07:12 |
| code-tokenizers |
0.0.5 |
Aligning BPE and AST |
2023-02-06 03:46:53 |
| space-wrap |
0.0.3 |
Automated Spacy wrapper to turn plain text into Spacy doc objects |
2023-01-27 21:46:55 |