PyDigger - unearthing stuff about Python


NameVersionSummarydate
process-twarc 0.19.6 Tools for transforming raw data from Twarc2 to structured data for Masked Language Modeling. 2024-03-14 16:12:40
BasicTextMetrics 0.2.1 Analyze textual data and extract useful metrics such as word count, character count, average word length, and most common words 2024-02-28 13:32:17
text2text 1.4.4 Text2Text: Crosslingual NLP/G toolkit 2024-02-19 19:21:15
rftokenizer 2.2.0 A character-wise tokenizer for morphologically rich languages 2024-02-01 21:30:54
huspacy-nightly 0.11.0.dev261 HuSpaCy: industrial strength Hungarian natural language processing 2024-01-03 19:48:33
bpeasy 0.1.2 Fast bare-bones BPE for modern tokenizer training 2023-12-19 10:55:51
miditok-for-musiclang 0.0.1 A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies 2023-11-23 17:31:40
huspacy 0.11.0 HuSpaCy: industrial strength Hungarian natural language processing 2023-10-27 08:36:29
nlpashto 0.0.23 Pashto Natural Language Processing Toolkit 2023-10-09 14:40:40
python-ucto 0.6.6 This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). 2023-09-13 09:57:41
taibun 1.0.0 Taiwanese Hokkien Transliterator and Tokeniser 2023-08-31 11:29:11
hebpipe 3.0.0.6 A pipeline for Hebrew NLP 2023-08-22 00:50:27
zhon 2.0.2 Zhon provides constants used in Chinese text processing. 2023-06-27 10:45:04
ipa-core 0.1.3 NLP Preprocessing Pipeline Wrappers 2023-05-12 15:14:56
simplemma 0.9.1 A simple multilingual lemmatizer for Python. 2023-01-20 17:07:40
pyonmttok 1.36.0 Fast and customizable text tokenization library with BPE and SentencePiece support 2023-01-11 13:46:07
mosestokenizer 1.2.1 Wrappers for several pre-processing scripts from the Moses toolkit. 2021-10-22 14:15:07
sentence-splitter 1.4 Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder 2019-01-14 17:11:25
hourdayweektotal
9920959560192640
Elapsed time: 0.78227s