PyDigger - unearthing stuff about Python

Found 30 out of 312,444. Showing 20 on page 1. Total pages: 2.

Name	Version	Summary	date
pdf2markdown	0.2.0	Python library and CLI tool that leverages LLMs to convert technical PDF documents to well-structured Markdown	2025-08-17 20:03:08
kreuzberg	3.11.2	Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats	2025-08-15 13:51:46
contextgem	0.15.0	Effortless LLM extraction from documents	2025-08-13 22:25:52
inkognito	0.1.0	Privacy-first document processing FastMCP server with PII anonymization	2025-08-13 17:45:52
ocr-detection	0.1.2	A Python library to detect whether PDF pages contain extractable text or are scanned images requiring OCR	2025-08-13 04:29:13
qdrant-loader	0.6.0	A tool for collecting and vectorizing technical content from multiple sources and storing it in a QDrant vector database.	2025-08-12 09:20:21
xml-analysis-framework	1.4.4	XML document analysis and preprocessing framework designed for AI/ML data pipelines	2025-08-12 04:21:41
raggy	0.3.5	scraping stuff	2025-08-11 14:49:05
docstrange	1.1.3	Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR.	2025-08-11 07:10:23
docling-analysis-framework	1.1.0	AI-ready analysis framework for PDF and Office documents using Docling for content extraction	2025-07-29 14:34:10
document-data-extractor	1.0.4	Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract	2025-07-29 08:25:56
aikitx	1.0.0	A comprehensive GUI toolkit for Large Language Models (LLMs) with GGUF support, document processing, email automation, and multi-backend inference	2025-07-25 19:44:31
llm-data-converter	2.2.0	Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract	2025-07-25 13:32:07
llm-text-splitter	0.2.0	A lightweight, rule-based text splitter for LLM context window management, handles multiple file formats and enriches chunks with metadata.	2025-07-24 12:21:01
mseep-kreuzberg	3.8.2	Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats	2025-07-17 03:32:28
pdf-splitter-cli	0.1.1	A modern command-line tool to split PDF files into smaller chunks with progress bars and automatic filename generation	2025-07-17 01:37:12
pdf-ocr-processor	2.0.3	Advanced PDF OCR processing with AI-powered text extraction and selectable text overlays	2025-07-11 21:11:24
ai-chunking	0.1.4	A powerful Python library for semantic document chunking and enrichment using AI	2025-03-16 20:44:19
atai-pdf-tool	0.1.0	A tool for parsing and extracting text from PDF files with OCR capabilities	2025-02-27 11:15:46
smart-llm-loader	0.1.0	A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document chunking and RAG applications. Features smart context-aware segmentation, multi-LLM support, and optimized content extraction for enhanced RAG performance.	2025-02-14 12:42:55

Found 30 out of 312,444. Showing 20 on page 1. Total pages: 2.

first prev next last