Name | Version | Summary | date |
pdf2markdown |
0.2.0 |
Python library and CLI tool that leverages LLMs to convert technical PDF documents to well-structured Markdown |
2025-08-17 20:03:08 |
docstrange |
1.1.3 |
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR. |
2025-08-11 07:10:23 |
md-server |
0.1.2 |
HTTP API server for converting documents, web pages, and media to markdown |
2025-08-10 17:41:22 |
document-data-extractor |
1.0.4 |
Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-29 08:25:56 |
markitdown-pdf-separators |
0.4.3 |
MarkItDown with PDF page separators - convert PDFs to Markdown with page boundary markers |
2025-07-28 08:46:51 |
llm-data-converter |
2.2.0 |
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-25 13:32:07 |