Name | Version | Summary | date |
LughaatNLP |
1.0.5 |
A Python package for natural language processing tasks for the Urdu language, including normalization, part-of-speech (POS) tagging, named entity recognition (NER), stemming, lemmatization, tokenization, and stopword removal , text-to-speech , speech-to-text, summarization. |
2024-04-15 16:31:28 |
BasicTextMetrics |
0.2.1 |
Analyze textual data and extract useful metrics such as word count, character count, average word length, and most common words |
2024-02-28 13:32:17 |
text2text |
1.4.4 |
Text2Text: Crosslingual NLP/G toolkit |
2024-02-19 19:21:15 |
rftokenizer |
2.2.0 |
A character-wise tokenizer for morphologically rich languages |
2024-02-01 21:30:54 |
huspacy-nightly |
0.11.0.dev261 |
HuSpaCy: industrial strength Hungarian natural language processing |
2024-01-03 19:48:33 |
bpeasy |
0.1.2 |
Fast bare-bones BPE for modern tokenizer training |
2023-12-19 10:55:51 |
miditok-for-musiclang |
0.0.1 |
A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies |
2023-11-23 17:31:40 |
huspacy |
0.11.0 |
HuSpaCy: industrial strength Hungarian natural language processing |
2023-10-27 08:36:29 |
nlpashto |
0.0.23 |
Pashto Natural Language Processing Toolkit |
2023-10-09 14:40:40 |
python-ucto |
0.6.6 |
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). |
2023-09-13 09:57:41 |
taibun |
1.0.0 |
Taiwanese Hokkien Transliterator and Tokeniser |
2023-08-31 11:29:11 |
hebpipe |
3.0.0.6 |
A pipeline for Hebrew NLP |
2023-08-22 00:50:27 |
zhon |
2.0.2 |
Zhon provides constants used in Chinese text processing. |
2023-06-27 10:45:04 |
ipa-core |
0.1.3 |
NLP Preprocessing Pipeline Wrappers |
2023-05-12 15:14:56 |
simplemma |
0.9.1 |
A simple multilingual lemmatizer for Python. |
2023-01-20 17:07:40 |
pyonmttok |
1.36.0 |
Fast and customizable text tokenization library with BPE and SentencePiece support |
2023-01-11 13:46:07 |
mosestokenizer |
1.2.1 |
Wrappers for several pre-processing scripts from the Moses toolkit. |
2021-10-22 14:15:07 |
sentence-splitter |
1.4 |
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder |
2019-01-14 17:11:25 |