Name | Version | Summary | date |
pinyintokenizer |
0.0.2 |
Pinyin Tokenizer, chinese pinyin tokenizer |
2023-06-08 08:04:40 |
wyzard |
1.0 |
Run various transformers models from one packages. |
2023-05-18 06:10:17 |
botok |
0.8.12 |
Tibetan Word Tokenizer |
2023-05-17 11:36:37 |
gpt3-tokenizer |
0.1.4 |
Encoder/Decoder and tokens counter for GPT3 |
2023-05-16 00:50:21 |
ZiTokenizer |
0.0.8 |
ZiTokenizer: tokenize world text as Zi |
2023-04-20 17:50:08 |
nlpo3 |
1.3.0 |
Python binding for nlpO3 Thai language processing library in Rust |
2023-04-14 12:01:17 |
zltk |
0.0.0 |
A collection of commonly used functions. |
2023-04-06 14:53:47 |
basictokenizer |
0.0.4 |
A basic and useful tokenizer. |
2023-02-06 23:07:12 |
code-tokenizers |
0.0.5 |
Aligning BPE and AST |
2023-02-06 03:46:53 |
space-wrap |
0.0.3 |
Automated Spacy wrapper to turn plain text into Spacy doc objects |
2023-01-27 21:46:55 |
simplemma |
0.9.1 |
A simple multilingual lemmatizer for Python. |
2023-01-20 17:07:40 |
ZiCutter |
0.0.8 |
ZiCutter: cut character smaller |
2023-01-10 18:10:33 |
segments |
2.2.1 |
|
2022-07-08 13:42:58 |
segtok |
1.5.11 |
sentence segmentation and word tokenization tools |
2021-12-15 21:56:14 |
hangul-utils |
0.4.5 |
An integrated library for Korean preprocessing. |
2020-06-19 10:00:08 |
sentence-splitter |
1.4 |
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder |
2019-01-14 17:11:25 |