PyDigger - unearthing stuff about Python

Found 21 out of 313,489. Showing 20 on page 1. Total pages: 2.

Name	Version	Summary	date
upspawn-ocr-cli	0.1.0b3	Modern, polished CLI to extract text from PDFs using the Mistral OCR API.	2025-08-15 23:24:29
kreuzberg	3.11.2	Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats	2025-08-15 13:51:46
hashub-docapp	1.0.0	Professional Python SDK for the HashubDocApp API - Advanced OCR, document conversion, and text extraction service	2025-08-15 12:09:58
extract-hwp	0.1.0	Python library for extracting text from Korean HWP files (HWP 5.0 and HWPX formats)	2025-08-13 10:13:03
ocr-detection	0.1.2	A Python library to detect whether PDF pages contain extractable text or are scanned images requiring OCR	2025-08-13 04:29:13
docstrange	1.1.3	Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR.	2025-08-11 07:10:23
pdf-tools-mcp	0.1.4	A FastMCP-based PDF reading and manipulation tool server	2025-08-10 17:21:33
html-to-markdown	1.9.0	A modern, type-safe Python library for converting HTML to Markdown with comprehensive tag support and customizable options	2025-07-29 15:40:00
document-data-extractor	1.0.4	Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract	2025-07-29 08:25:56
llm-data-converter	2.2.0	Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract	2025-07-25 13:32:07
pyrtex	0.1.6	A Python library for batch text extraction and processing using Google Cloud Vertex AI	2025-07-20 15:59:06
mseep-kreuzberg	3.8.2	Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats	2025-07-17 03:32:28
pdfhandleretc	0.1.1	Lightweight command-line and Python API toolkit for PDF text extraction, encryption, permissions, and more.	2025-07-16 04:04:16
pdf-ocr-processor	2.0.3	Advanced PDF OCR processing with AI-powered text extraction and selectable text overlays	2025-07-11 21:11:24
atai-pdf-tool	0.1.0	A tool for parsing and extracting text from PDF files with OCR capabilities	2025-02-27 11:15:46
fileseek	0.1.3	FileSeek – AI-Powered Local Document Archive&Search	2025-02-08 07:13:54
tikara	0.1.5	The metadata and text content extractor for almost every file type.	2025-01-26 23:33:40
pdf-parser-header-footer	0.1.0	A Python package for processing PDFs with header and footer detection	2025-01-14 16:10:34
spanish-pdf-parser	0.1.0	A Python package for processing PDFs with header and footer detection	2025-01-13 14:56:27
vlense	0.1.4	A Python package to extract text from images and PDFs using Vision Language Model (VLM).	2024-11-06 10:51:15

Found 21 out of 313,489. Showing 20 on page 1. Total pages: 2.

first prev next last