PyDigger - unearthing stuff about Python


NameVersionSummarydate
python-ucto 0.6.9 This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). 2024-12-17 11:56:39
text2text 1.8.5 Text2Text Language Modeling Toolkit 2024-12-08 21:33:57
LughaatNLP 1.3.0 A Python package for natural language processing tasks for the Urdu language, including normalization, part-of-speech (POS) tagging, named entity recognition (NER), stemming, lemmatization, tokenization, and stopword removal , text-to-speech , speech-to-text, summarization. 2024-12-08 08:48:34
tokenize-text 0.2.32 Tokenizing and processing text inputs with transformer models 2024-08-11 21:53:24
tokenize-transformer 0.2.14 Tokenizing and processing text inputs with transformer models 2024-08-01 18:39:29
BasicTextMetrics 0.2.1 Analyze textual data and extract useful metrics such as word count, character count, average word length, and most common words 2024-02-28 13:32:17
huspacy-nightly 0.11.0.dev261 HuSpaCy: industrial strength Hungarian natural language processing 2024-01-03 19:48:33
miditok-for-musiclang 0.0.1 A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies 2023-11-23 17:31:40
nlpashto 0.0.23 Pashto Natural Language Processing Toolkit 2023-10-09 14:40:40
hebpipe 3.0.0.6 A pipeline for Hebrew NLP 2023-08-22 00:50:27
ipa-core 0.1.3 NLP Preprocessing Pipeline Wrappers 2023-05-12 15:14:56
pyonmttok 1.36.0 Fast and customizable text tokenization library with BPE and SentencePiece support 2023-01-11 13:46:07
mosestokenizer 1.2.1 Wrappers for several pre-processing scripts from the Moses toolkit. 2021-10-22 14:15:07
sentence-splitter 1.4 Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder 2019-01-14 17:11:25
hourdayweektotal
2711759546274406
Elapsed time: 2.13408s