Name | Version | Summary | date |
llm-markdownify |
0.3.0 |
Convert PDFs, images to high-quality Markdown using Vision LLMs. |
2025-08-13 22:07:30 |
bot-vision-suite |
1.0.4 |
Advanced GUI automation with OCR and image recognition |
2025-08-13 20:23:59 |
lizeur |
0.1.3 |
Lizeur is a MCP server to be able to get content from PDFs. |
2025-08-13 17:21:00 |
kreuzberg |
3.11.1 |
Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats |
2025-08-13 17:03:32 |
binimage |
1.5.0 |
Extract binary from images (OCR or grid) and decode to text, saving to results.txt |
2025-08-13 04:43:29 |
ocr-detection |
0.1.2 |
A Python library to detect whether PDF pages contain extractable text or are scanned images requiring OCR |
2025-08-13 04:29:13 |
simple-botmaker |
0.0.2 |
A package that simplifies the creation of bots that react to the screen in real time |
2025-08-12 22:07:12 |
docstrange |
1.1.3 |
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR. |
2025-08-11 07:10:23 |
vietcombank-captcha |
0.1.0 |
Lightweight CAPTCHA predictor for Vietcombank using ONNX |
2025-08-06 04:43:20 |
aspose-total-net |
25.7.0 |
Aspose.Total for Python via .NET is a Document Processing python class library that allows developers to work with Microsoft Word®, Microsoft PowerPoint®, Microsoft Outlook®, OpenOffice®, & 3D file formats without needing Office Automation. |
2025-08-05 23:32:27 |
marker-pdf |
1.8.3 |
Convert documents to markdown with high speed and accuracy. |
2025-08-04 18:18:40 |
huaweicloudsdkocr |
3.1.160 |
OCR |
2025-07-31 09:51:16 |
ai-resume-parser |
1.0.6 |
AI-powered resume parser with parallel processing for multiple file formats (PDF, DOCX, images, etc.) |
2025-07-29 23:13:04 |
document-data-extractor |
1.0.4 |
Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-29 08:25:56 |
dedoc |
2.4 |
Extract content and logical tree structure from textual documents |
2025-07-28 09:47:38 |
cloudflare-peek |
0.1.0 |
A Python utility for scraping Cloudflare-protected websites using screenshot + OCR fallback |
2025-07-27 16:41:12 |
cleanit |
0.4.9 |
Subtitles extremely clean |
2025-07-26 19:02:05 |
llm-data-converter |
2.2.0 |
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-25 13:32:07 |
nanonets-extractor |
0.1.4 |
A unified document extraction library supporting local CPU, GPU, and cloud processing |
2025-07-23 11:17:54 |
invoice-ocr-mcp |
1.0.4 |
企业发票OCR识别MCP服务器 - 基于ModelScope的专业发票识别解决方案 |
2025-07-17 07:23:13 |