PyDigger - unearthing stuff about Python


NameVersionSummarydate
pdf2markdown 0.2.0 Python library and CLI tool that leverages LLMs to convert technical PDF documents to well-structured Markdown 2025-08-17 20:03:08
kreuzberg 3.11.2 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-08-15 13:51:46
contextgem 0.15.0 Effortless LLM extraction from documents 2025-08-13 22:25:52
inkognito 0.1.0 Privacy-first document processing FastMCP server with PII anonymization 2025-08-13 17:45:52
ocr-detection 0.1.2 A Python library to detect whether PDF pages contain extractable text or are scanned images requiring OCR 2025-08-13 04:29:13
qdrant-loader 0.6.0 A tool for collecting and vectorizing technical content from multiple sources and storing it in a QDrant vector database. 2025-08-12 09:20:21
xml-analysis-framework 1.4.4 XML document analysis and preprocessing framework designed for AI/ML data pipelines 2025-08-12 04:21:41
raggy 0.3.5 scraping stuff 2025-08-11 14:49:05
docstrange 1.1.3 Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR. 2025-08-11 07:10:23
docling-analysis-framework 1.1.0 AI-ready analysis framework for PDF and Office documents using Docling for content extraction 2025-07-29 14:34:10
document-data-extractor 1.0.4 Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract 2025-07-29 08:25:56
aikitx 1.0.0 A comprehensive GUI toolkit for Large Language Models (LLMs) with GGUF support, document processing, email automation, and multi-backend inference 2025-07-25 19:44:31
llm-data-converter 2.2.0 Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract 2025-07-25 13:32:07
llm-text-splitter 0.2.0 A lightweight, rule-based text splitter for LLM context window management, handles multiple file formats and enriches chunks with metadata. 2025-07-24 12:21:01
mseep-kreuzberg 3.8.2 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-07-17 03:32:28
pdf-splitter-cli 0.1.1 A modern command-line tool to split PDF files into smaller chunks with progress bars and automatic filename generation 2025-07-17 01:37:12
pdf-ocr-processor 2.0.3 Advanced PDF OCR processing with AI-powered text extraction and selectable text overlays 2025-07-11 21:11:24
ai-chunking 0.1.4 A powerful Python library for semantic document chunking and enrichment using AI 2025-03-16 20:44:19
atai-pdf-tool 0.1.0 A tool for parsing and extracting text from PDF files with OCR capabilities 2025-02-27 11:15:46
smart-llm-loader 0.1.0 A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document chunking and RAG applications. Features smart context-aware segmentation, multi-LLM support, and optimized content extraction for enhanced RAG performance. 2025-02-14 12:42:55
hourdayweektotal
60146810304312444
Elapsed time: 2.02330s