PyDigger - unearthing stuff about Python

Found 18 out of 201,227. Showing 18 on page 1. Total pages: 1.

Name	Version	Summary	date
LughaatNLP	1.0.5	A Python package for natural language processing tasks for the Urdu language, including normalization, part-of-speech (POS) tagging, named entity recognition (NER), stemming, lemmatization, tokenization, and stopword removal , text-to-speech , speech-to-text, summarization.	2024-04-15 16:31:28
BasicTextMetrics	0.2.1	Analyze textual data and extract useful metrics such as word count, character count, average word length, and most common words	2024-02-28 13:32:17
text2text	1.4.4	Text2Text: Crosslingual NLP/G toolkit	2024-02-19 19:21:15
rftokenizer	2.2.0	A character-wise tokenizer for morphologically rich languages	2024-02-01 21:30:54
huspacy-nightly	0.11.0.dev261	HuSpaCy: industrial strength Hungarian natural language processing	2024-01-03 19:48:33
bpeasy	0.1.2	Fast bare-bones BPE for modern tokenizer training	2023-12-19 10:55:51
miditok-for-musiclang	0.0.1	A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies	2023-11-23 17:31:40
huspacy	0.11.0	HuSpaCy: industrial strength Hungarian natural language processing	2023-10-27 08:36:29
nlpashto	0.0.23	Pashto Natural Language Processing Toolkit	2023-10-09 14:40:40
python-ucto	0.6.6	This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).	2023-09-13 09:57:41
taibun	1.0.0	Taiwanese Hokkien Transliterator and Tokeniser	2023-08-31 11:29:11
hebpipe	3.0.0.6	A pipeline for Hebrew NLP	2023-08-22 00:50:27
zhon	2.0.2	Zhon provides constants used in Chinese text processing.	2023-06-27 10:45:04
ipa-core	0.1.3	NLP Preprocessing Pipeline Wrappers	2023-05-12 15:14:56
simplemma	0.9.1	A simple multilingual lemmatizer for Python.	2023-01-20 17:07:40
pyonmttok	1.36.0	Fast and customizable text tokenization library with BPE and SentencePiece support	2023-01-11 13:46:07
mosestokenizer	1.2.1	Wrappers for several pre-processing scripts from the Moses toolkit.	2021-10-22 14:15:07
sentence-splitter	1.4	Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder	2019-01-14 17:11:25

Found 18 out of 201,227. Showing 18 on page 1. Total pages: 1.

first prev next last