| Name | Version | Summary | date |
| plankode |
0.0.0a0 |
Unified Token Rendering Library for DeepSeek Models. |
2025-10-07 10:53:05 |
| rwa-sdk-qdf |
0.1.0 |
QuantDeFi.ai RWA SDK - Professional toolkit for tokenized asset data analysis | $297B+ tracked assets |
2025-09-16 03:38:58 |
| precious-nlp |
0.1.2 |
A tokenizer-free NLP library with T-FREE, CANINE, and byte-level approaches |
2025-09-02 11:28:22 |
| PyTokenCounter |
1.8.2 |
A Python library for tokenizing text and counting tokens using various encoding schemes. |
2025-08-22 03:20:04 |
| romanRekhta |
0.1.0 |
An NLP library for Roman Urdu text preprocessing, tokenization, and stopword handling. |
2025-07-25 11:42:17 |
| tokenization-scorer |
1.1.8 |
Package for evaluating text tokenizations. |
2025-01-13 10:36:40 |
| text2text |
1.9.4 |
Text2Text Language Modeling Toolkit |
2025-01-12 22:34:27 |
| LughaatNLP |
1.3.1 |
A Python package for natural language processing tasks for the Urdu language, including normalization, part-of-speech (POS) tagging, named entity recognition (NER), stemming, lemmatization, tokenization, and stopword removal , text-to-speech , speech-to-text, summarization. |
2024-12-30 22:07:32 |
| python-ucto |
0.6.9 |
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). |
2024-12-17 11:56:39 |
| tokenize-text |
0.2.32 |
Tokenizing and processing text inputs with transformer models |
2024-08-11 21:53:24 |
| tokenize-transformer |
0.2.14 |
Tokenizing and processing text inputs with transformer models |
2024-08-01 18:39:29 |
| BasicTextMetrics |
0.2.1 |
Analyze textual data and extract useful metrics such as word count, character count, average word length, and most common words |
2024-02-28 13:32:17 |
| huspacy-nightly |
0.11.0.dev261 |
HuSpaCy: industrial strength Hungarian natural language processing |
2024-01-03 19:48:33 |
| miditok-for-musiclang |
0.0.1 |
A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies |
2023-11-23 17:31:40 |
| hebpipe |
3.0.0.6 |
A pipeline for Hebrew NLP |
2023-08-22 00:50:27 |
| ipa-core |
0.1.3 |
NLP Preprocessing Pipeline Wrappers |
2023-05-12 15:14:56 |
| pyonmttok |
1.36.0 |
Fast and customizable text tokenization library with BPE and SentencePiece support |
2023-01-11 13:46:07 |
| mosestokenizer |
1.2.1 |
Wrappers for several pre-processing scripts from the Moses toolkit. |
2021-10-22 14:15:07 |
| sentence-splitter |
1.4 |
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder |
2019-01-14 17:11:25 |