Name | Version | Summary | date |
kallia |
0.1.2 |
Semantic Document Processing Library |
2025-07-13 08:55:52 |
splurge-tools |
0.2.6 |
Python tools for data type handling and validation |
2025-07-12 20:27:52 |
profanex |
0.0.2 |
None |
2025-07-12 19:54:01 |
html-to-markdown |
1.8.0 |
A modern, type-safe Python library for converting HTML to Markdown with comprehensive tag support and customizable options |
2025-07-12 10:22:42 |
contextgem |
0.11.1 |
Effortless LLM extraction from documents |
2025-07-11 16:24:06 |
rs-bpe |
0.1.0 |
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust |
2025-03-19 05:58:24 |
reliq |
0.0.33 |
Python ctypes bindings for reliq |
2025-02-26 19:51:44 |
chonkie |
0.5.0 |
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library |
2025-02-17 20:39:12 |
smoothtext |
0.3.0 |
A Python library for text readability analysis, supporting multiple languages. |
2025-02-16 21:50:05 |
dom-tree-sitter-language-pack |
0.4.0 |
Extensive Language Pack for Tree-Sitter |
2025-02-15 08:15:55 |
PyTokenCounter |
1.6.4 |
A Python library for tokenizing text and counting tokens using various encoding schemes. |
2025-02-03 23:06:07 |
ts-tokenizer |
0.1.19 |
TS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed specifically for tokenizing Turkish texts. |
2025-01-30 19:59:44 |
tikara |
0.1.5 |
The metadata and text content extractor for almost every file type. |
2025-01-26 23:33:40 |
safwaText |
0.1.0 |
A Python package for Arabic text preprocessing, including cleaning, normalization, stemming, and stopword removal. |
2025-01-24 16:07:06 |
long2short |
0.1.3 |
A flexible text summarization library to summarize long documents supporting multiple LLM providers |
2025-01-23 11:13:46 |
indoxMiner |
0.1.4 |
Indox Data Extraction |
2024-12-29 09:52:42 |
yurenizer |
0.2.2 |
A library for standardizing terms with spelling variations using a synonym dictionary. |
2024-12-08 08:03:52 |
huggingface-text-data-analyzer |
1.1.0 |
A comprehensive tool for analyzing text datasets from HuggingFace's datasets library |
2024-12-06 03:06:41 |
pawpaw |
1.0.0rc8 |
High Performance Text Processing & Segmentation Framework |
2024-11-18 04:28:31 |
analiticcl |
0.4.8 |
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation |
2024-10-17 19:47:08 |