docling-analysis-framework


Namedocling-analysis-framework JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/rdwj/docling-analysis-framework
SummaryAI-ready analysis framework for PDF and Office documents using Docling for content extraction
upload_time2025-07-29 14:34:10
maintainerNone
docs_urlNone
authorWes Jackson
requires_python>=3.8
licenseMIT
keywords docling pdf office ai ml document-processing semantic-search chunking
VCS
bugtrack_url
requirements docling
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Docling Analysis Framework

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Docling Powered](https://img.shields.io/badge/Docling-Powered-blue.svg)](https://github.com/DS4SD/docling)
[![AI Ready](https://img.shields.io/badge/AI%20Ready-โœ“-green.svg)](https://github.com/rdwj/docling-analysis-framework)

AI-ready analysis framework for **PDF and Office documents** using Docling for content extraction. Transforms Docling's output into optimized chunks and structured analysis for AI/ML pipelines.

> **๐Ÿ“ For text, code, and configuration files**, use our companion [document-analysis-framework](https://github.com/redhat-ai-americas/document-analysis-framework) which uses only Python standard library.

## ๐Ÿš€ Quick Start

### Simple API - Get Started in Seconds

```python
import docling_analysis_framework as daf

# ๐ŸŽฏ One-line analysis with Docling extraction
result = daf.analyze("document.pdf")
print(f"Document type: {result['document_type'].type_name}")
print(f"Pages: {result['document_type'].pages}")
print(f"Handler used: {result['handler_used']}")

# โœ‚๏ธ Smart chunking for AI/ML
chunks = daf.chunk("document.pdf", strategy="auto")
print(f"Created {len(chunks)} optimized chunks")

# ๐Ÿš€ Enhanced analysis with both analysis and chunking
enhanced = daf.analyze_enhanced("document.pdf", chunking_strategy="structural")
print(f"Document: {enhanced['analysis']['document_type'].type_name}")
print(f"Chunks: {len(enhanced['chunks'])}")

# ๐Ÿ’พ Save chunks to JSON
daf.save_chunks_to_json(chunks, "chunks_output.json")

# ๐Ÿ’พ Save analysis to JSON  
daf.save_analysis_to_json(result, "analysis_output.json")
```

### Advanced Usage

```python
import docling_analysis_framework as daf

# Enhanced analysis with full results
enhanced = daf.analyze_enhanced("research_paper.pdf")

print(f"Type: {enhanced['analysis']['document_type'].type_name}")
print(f"Confidence: {enhanced['analysis']['confidence']:.2f}")
print(f"AI use cases: {len(enhanced['analysis']['analysis'].ai_use_cases)}")
if enhanced['analysis']['analysis'].quality_metrics:
    for metric, score in enhanced['analysis']['analysis'].quality_metrics.items():
        print(f"{metric}: {score:.2f}")

# Different chunking strategies
hierarchical_chunks = daf.chunk("document.pdf", strategy="structural")
table_aware_chunks = daf.chunk("document.pdf", strategy="table_aware") 
page_aware_chunks = daf.chunk("document.pdf", strategy="page_aware")

# Process chunks
for chunk in hierarchical_chunks:
    print(f"Chunk {chunk.chunk_id}: {chunk.token_count} tokens")
    print(f"Type: {chunk.chunk_type}")
    # Send to your AI model...

# ๐Ÿ’พ Save different strategies to separate files
strategies = {
    "structural": hierarchical_chunks,
    "table_aware": table_aware_chunks,
    "page_aware": page_aware_chunks
}

for strategy_name, chunks in strategies.items():
    daf.save_chunks_to_json(chunks, f"chunks_{strategy_name}.json")
    print(f"Saved {len(chunks)} chunks to chunks_{strategy_name}.json")
```

### Expert Usage - Direct Class Access

```python
# For advanced customization, use the classes directly
from docling_analysis_framework import DoclingAnalyzer, DoclingChunkingOrchestrator, ChunkingConfig

analyzer = DoclingAnalyzer(max_file_size_mb=100)
result = analyzer.analyze_document("document.pdf")

# Custom chunking with config
config = ChunkingConfig(
    max_chunk_size=1500,
    min_chunk_size=200,
    overlap_size=100,
    preserve_structure=True
)

orchestrator = DoclingChunkingOrchestrator(config=config)
# Note: In advanced usage, you'd pass the actual docling_result object
```

## ๐Ÿ“‹ Supported Document Types

| Format | Extensions | Confidence | Powered By |
|--------|------------|------------|------------|
| **๐Ÿ“„ PDF Documents** | .pdf | 95% | Docling extraction |
| **๐Ÿ“ Word Documents** | .docx | 95% | Docling extraction |
| **๐Ÿ“Š Spreadsheets** | .xlsx | 70% | Docling extraction |
| **๐Ÿ“… Presentations** | .pptx | 70% | Docling extraction |
| **๐Ÿ–ผ๏ธ Images with Text** | .png, .jpg, .tiff | 70% | Docling OCR |

## ๐ŸŽฏ Key Features

### **๐Ÿ” Docling-Powered Extraction**
- **PDF text extraction** - High-quality content extraction
- **Table detection** - Preserves table structure in markdown
- **Figure references** - Maintains image/figure relationships  
- **Header hierarchy** - Document structure preservation

### **๐Ÿค– AI Preparation Layer**
- **Quality assessment** - Extraction quality scoring
- **Structure analysis** - Document type detection and analysis
- **Chunk optimization** - AI-ready segmentation strategies
- **Rich metadata** - Page counts, figures, tables, quality metrics

### **โšก Smart Chunking Strategies**
- **Structural chunking** - Respects document hierarchy (headers, sections)
- **Table-aware chunking** - Separates tables from text content
- **Page-aware chunking** - Considers original page boundaries
- **Auto-selection** - Document-type-aware strategy selection

### **๐Ÿ“Š Extraction Quality Analysis**
- **Text coverage** - How much text was successfully extracted
- **Structure preservation** - Whether headers, lists, tables are maintained
- **Overall confidence** - Combined quality score for AI processing

## ๐ŸŽ‰ Framework Status - COMPLETED

The Docling Analysis Framework is now **fully functional** and follows the same successful patterns as the XML Analysis Framework:

### โœ… Completed Features

- **๐ŸŽฏ Simple API**: One-line functions for `analyze()`, `chunk()`, `analyze_enhanced()`
- **๐Ÿ”ง Advanced API**: Direct class access for customization with `DoclingAnalyzer`, `DoclingChunkingOrchestrator`
- **โš™๏ธ Configurable Chunking**: `ChunkingConfig` class for fine-tuning chunk parameters
- **๐Ÿ“ฆ Multiple Strategies**: Structural, table-aware, page-aware, and auto-selection chunking
- **๐Ÿ’พ JSON Export**: Easy export of analysis results and chunks to JSON files
- **๐Ÿ›ก๏ธ Enhanced Error Handling**: Comprehensive logging and error reporting
- **๐Ÿ“Š Quality Metrics**: Extraction quality assessment and content analysis
- **๐Ÿงช Testing Framework**: Complete Jupyter notebook for validation
- **๐Ÿ“š Documentation**: Comprehensive README with examples and usage patterns

### ๐Ÿš€ Ready for AI/ML Integration

The framework provides everything needed for AI/ML pipelines:

- **Token-optimized chunks** sized for LLM context windows
- **Rich metadata** with document structure and quality metrics  
- **JSON export** for vector database ingestion
- **Multiple chunking strategies** for different document types
- **Quality assessment** to determine AI suitability

### ๐Ÿ”„ Comparison with XML Framework

| Feature | XML Framework | Docling Framework | Status |
|---------|---------------|-------------------|---------|
| Simple API | โœ… | โœ… | Complete |
| Advanced Classes | โœ… | โœ… | Complete |
| Multiple Strategies | โœ… | โœ… | Complete |
| Configuration | โœ… | โœ… | Complete |
| JSON Export | โœ… | โœ… | Complete |
| Error Handling | โœ… | โœ… | Complete |
| Testing Notebooks | โœ… | โœ… | Complete |
| Quality Metrics | โœ… | โœ… | Complete |

Both frameworks now provide identical API patterns and functionality!

## ๐Ÿ”ง Installation

```bash
# Install Docling first
pip install docling

# Install framework
git clone https://github.com/rdwj/docling-analysis-framework.git
cd docling-analysis-framework  
pip install -e .
```

## ๐Ÿ“– Usage Examples

### Document Analysis with Quality Assessment
```python
from core.analyzer import DoclingAnalyzer

analyzer = DoclingAnalyzer()
result = analyzer.analyze_document("contract.pdf")

# Access extraction quality
quality = result['analysis'].extraction_quality
print(f"Text coverage: {quality['text_coverage']:.2f}")
print(f"Structure score: {quality['structure_score']:.2f}")
print(f"Overall quality: {quality['overall_score']:.2f}")

# Access document insights
findings = result['analysis'].key_findings
print(f"Pages: {findings['page_count']}")
print(f"Tables: {findings['table_rows']}")
print(f"Figures: {findings['figure_count']}")
```

### Advanced Chunking for Academic Papers
```python
# Perfect for research papers with complex structure
chunks = orchestrator.chunk_document(
    "paper.pdf",
    markdown_content,
    docling_result,
    strategy='structural'
)

# Chunks respect paper structure
for chunk in chunks:
    if chunk.chunk_type == 'section':
        print(f"Section: {chunk.metadata['section_title']}")
    elif chunk.chunk_type == 'table':
        print(f"Table data: {len(chunk.content)} chars")
    elif chunk.chunk_type == 'figure':
        print(f"Figure reference preserved")
```

### File Size Management
```python
from core.analyzer import DoclingAnalyzer

# Large PDF processing with limits
analyzer = DoclingAnalyzer(max_file_size_mb=100.0)

result = analyzer.analyze_document("large_manual.pdf")
if 'error' in result:
    print(f"File too large: {result['error']}")
else:
    print(f"Successfully processed {result['file_size']} bytes")
```

## ๐Ÿงช Framework Ecosystem

This framework is part of a larger document analysis ecosystem:

- **`xml-analysis-framework`** - Specialized XML document analysis
- **`docling-analysis-framework`** - PDF and Office documents (this package)
- **`document-analysis-framework`** - Text, code, and configuration files  
- **`unified-analysis-orchestrator`** - Routes documents to appropriate frameworks

## ๐Ÿ”— Integration with Docling

This framework acts as an **AI preparation layer** on top of Docling:

1. **Docling** handles the heavy lifting of document extraction
2. **Our framework** adds AI-specific analysis and chunking
3. **Result** is AI-ready structured data and optimized chunks

```python
# What Docling provides:
docling_result = docling.convert("document.pdf")
markdown_content = docling_result.document.export_to_markdown()

# What we add:
ai_analysis = our_analyzer.analyze(docling_result)
ai_chunks = our_chunker.chunk(markdown_content, ai_analysis)
quality_scores = our_quality_assessor.assess(docling_result)
```

## ๐Ÿ“„ License

MIT License - see [LICENSE](LICENSE) file for details.

## ๐Ÿ™ Acknowledgments

- Powered by [Docling](https://github.com/DS4SD/docling) for document extraction
- Built for modern AI/ML development workflows
- Part of the **AI Building Blocks** initiative

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rdwj/docling-analysis-framework",
    "name": "docling-analysis-framework",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "docling, pdf, office, ai, ml, document-processing, semantic-search, chunking",
    "author": "Wes Jackson",
    "author_email": "AI Building Blocks <wjackson@redhat.com>",
    "download_url": "https://files.pythonhosted.org/packages/00/3d/1eb48303ce2cd7630888aba9f7beb90575b7d3748f105fd527d33e98a37c/docling_analysis_framework-1.1.0.tar.gz",
    "platform": null,
    "description": "# Docling Analysis Framework\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Docling Powered](https://img.shields.io/badge/Docling-Powered-blue.svg)](https://github.com/DS4SD/docling)\n[![AI Ready](https://img.shields.io/badge/AI%20Ready-\u2713-green.svg)](https://github.com/rdwj/docling-analysis-framework)\n\nAI-ready analysis framework for **PDF and Office documents** using Docling for content extraction. Transforms Docling's output into optimized chunks and structured analysis for AI/ML pipelines.\n\n> **\ud83d\udcdd For text, code, and configuration files**, use our companion [document-analysis-framework](https://github.com/redhat-ai-americas/document-analysis-framework) which uses only Python standard library.\n\n## \ud83d\ude80 Quick Start\n\n### Simple API - Get Started in Seconds\n\n```python\nimport docling_analysis_framework as daf\n\n# \ud83c\udfaf One-line analysis with Docling extraction\nresult = daf.analyze(\"document.pdf\")\nprint(f\"Document type: {result['document_type'].type_name}\")\nprint(f\"Pages: {result['document_type'].pages}\")\nprint(f\"Handler used: {result['handler_used']}\")\n\n# \u2702\ufe0f Smart chunking for AI/ML\nchunks = daf.chunk(\"document.pdf\", strategy=\"auto\")\nprint(f\"Created {len(chunks)} optimized chunks\")\n\n# \ud83d\ude80 Enhanced analysis with both analysis and chunking\nenhanced = daf.analyze_enhanced(\"document.pdf\", chunking_strategy=\"structural\")\nprint(f\"Document: {enhanced['analysis']['document_type'].type_name}\")\nprint(f\"Chunks: {len(enhanced['chunks'])}\")\n\n# \ud83d\udcbe Save chunks to JSON\ndaf.save_chunks_to_json(chunks, \"chunks_output.json\")\n\n# \ud83d\udcbe Save analysis to JSON  \ndaf.save_analysis_to_json(result, \"analysis_output.json\")\n```\n\n### Advanced Usage\n\n```python\nimport docling_analysis_framework as daf\n\n# Enhanced analysis with full results\nenhanced = daf.analyze_enhanced(\"research_paper.pdf\")\n\nprint(f\"Type: {enhanced['analysis']['document_type'].type_name}\")\nprint(f\"Confidence: {enhanced['analysis']['confidence']:.2f}\")\nprint(f\"AI use cases: {len(enhanced['analysis']['analysis'].ai_use_cases)}\")\nif enhanced['analysis']['analysis'].quality_metrics:\n    for metric, score in enhanced['analysis']['analysis'].quality_metrics.items():\n        print(f\"{metric}: {score:.2f}\")\n\n# Different chunking strategies\nhierarchical_chunks = daf.chunk(\"document.pdf\", strategy=\"structural\")\ntable_aware_chunks = daf.chunk(\"document.pdf\", strategy=\"table_aware\") \npage_aware_chunks = daf.chunk(\"document.pdf\", strategy=\"page_aware\")\n\n# Process chunks\nfor chunk in hierarchical_chunks:\n    print(f\"Chunk {chunk.chunk_id}: {chunk.token_count} tokens\")\n    print(f\"Type: {chunk.chunk_type}\")\n    # Send to your AI model...\n\n# \ud83d\udcbe Save different strategies to separate files\nstrategies = {\n    \"structural\": hierarchical_chunks,\n    \"table_aware\": table_aware_chunks,\n    \"page_aware\": page_aware_chunks\n}\n\nfor strategy_name, chunks in strategies.items():\n    daf.save_chunks_to_json(chunks, f\"chunks_{strategy_name}.json\")\n    print(f\"Saved {len(chunks)} chunks to chunks_{strategy_name}.json\")\n```\n\n### Expert Usage - Direct Class Access\n\n```python\n# For advanced customization, use the classes directly\nfrom docling_analysis_framework import DoclingAnalyzer, DoclingChunkingOrchestrator, ChunkingConfig\n\nanalyzer = DoclingAnalyzer(max_file_size_mb=100)\nresult = analyzer.analyze_document(\"document.pdf\")\n\n# Custom chunking with config\nconfig = ChunkingConfig(\n    max_chunk_size=1500,\n    min_chunk_size=200,\n    overlap_size=100,\n    preserve_structure=True\n)\n\norchestrator = DoclingChunkingOrchestrator(config=config)\n# Note: In advanced usage, you'd pass the actual docling_result object\n```\n\n## \ud83d\udccb Supported Document Types\n\n| Format | Extensions | Confidence | Powered By |\n|--------|------------|------------|------------|\n| **\ud83d\udcc4 PDF Documents** | .pdf | 95% | Docling extraction |\n| **\ud83d\udcdd Word Documents** | .docx | 95% | Docling extraction |\n| **\ud83d\udcca Spreadsheets** | .xlsx | 70% | Docling extraction |\n| **\ud83d\udcc5 Presentations** | .pptx | 70% | Docling extraction |\n| **\ud83d\uddbc\ufe0f Images with Text** | .png, .jpg, .tiff | 70% | Docling OCR |\n\n## \ud83c\udfaf Key Features\n\n### **\ud83d\udd0d Docling-Powered Extraction**\n- **PDF text extraction** - High-quality content extraction\n- **Table detection** - Preserves table structure in markdown\n- **Figure references** - Maintains image/figure relationships  \n- **Header hierarchy** - Document structure preservation\n\n### **\ud83e\udd16 AI Preparation Layer**\n- **Quality assessment** - Extraction quality scoring\n- **Structure analysis** - Document type detection and analysis\n- **Chunk optimization** - AI-ready segmentation strategies\n- **Rich metadata** - Page counts, figures, tables, quality metrics\n\n### **\u26a1 Smart Chunking Strategies**\n- **Structural chunking** - Respects document hierarchy (headers, sections)\n- **Table-aware chunking** - Separates tables from text content\n- **Page-aware chunking** - Considers original page boundaries\n- **Auto-selection** - Document-type-aware strategy selection\n\n### **\ud83d\udcca Extraction Quality Analysis**\n- **Text coverage** - How much text was successfully extracted\n- **Structure preservation** - Whether headers, lists, tables are maintained\n- **Overall confidence** - Combined quality score for AI processing\n\n## \ud83c\udf89 Framework Status - COMPLETED\n\nThe Docling Analysis Framework is now **fully functional** and follows the same successful patterns as the XML Analysis Framework:\n\n### \u2705 Completed Features\n\n- **\ud83c\udfaf Simple API**: One-line functions for `analyze()`, `chunk()`, `analyze_enhanced()`\n- **\ud83d\udd27 Advanced API**: Direct class access for customization with `DoclingAnalyzer`, `DoclingChunkingOrchestrator`\n- **\u2699\ufe0f Configurable Chunking**: `ChunkingConfig` class for fine-tuning chunk parameters\n- **\ud83d\udce6 Multiple Strategies**: Structural, table-aware, page-aware, and auto-selection chunking\n- **\ud83d\udcbe JSON Export**: Easy export of analysis results and chunks to JSON files\n- **\ud83d\udee1\ufe0f Enhanced Error Handling**: Comprehensive logging and error reporting\n- **\ud83d\udcca Quality Metrics**: Extraction quality assessment and content analysis\n- **\ud83e\uddea Testing Framework**: Complete Jupyter notebook for validation\n- **\ud83d\udcda Documentation**: Comprehensive README with examples and usage patterns\n\n### \ud83d\ude80 Ready for AI/ML Integration\n\nThe framework provides everything needed for AI/ML pipelines:\n\n- **Token-optimized chunks** sized for LLM context windows\n- **Rich metadata** with document structure and quality metrics  \n- **JSON export** for vector database ingestion\n- **Multiple chunking strategies** for different document types\n- **Quality assessment** to determine AI suitability\n\n### \ud83d\udd04 Comparison with XML Framework\n\n| Feature | XML Framework | Docling Framework | Status |\n|---------|---------------|-------------------|---------|\n| Simple API | \u2705 | \u2705 | Complete |\n| Advanced Classes | \u2705 | \u2705 | Complete |\n| Multiple Strategies | \u2705 | \u2705 | Complete |\n| Configuration | \u2705 | \u2705 | Complete |\n| JSON Export | \u2705 | \u2705 | Complete |\n| Error Handling | \u2705 | \u2705 | Complete |\n| Testing Notebooks | \u2705 | \u2705 | Complete |\n| Quality Metrics | \u2705 | \u2705 | Complete |\n\nBoth frameworks now provide identical API patterns and functionality!\n\n## \ud83d\udd27 Installation\n\n```bash\n# Install Docling first\npip install docling\n\n# Install framework\ngit clone https://github.com/rdwj/docling-analysis-framework.git\ncd docling-analysis-framework  \npip install -e .\n```\n\n## \ud83d\udcd6 Usage Examples\n\n### Document Analysis with Quality Assessment\n```python\nfrom core.analyzer import DoclingAnalyzer\n\nanalyzer = DoclingAnalyzer()\nresult = analyzer.analyze_document(\"contract.pdf\")\n\n# Access extraction quality\nquality = result['analysis'].extraction_quality\nprint(f\"Text coverage: {quality['text_coverage']:.2f}\")\nprint(f\"Structure score: {quality['structure_score']:.2f}\")\nprint(f\"Overall quality: {quality['overall_score']:.2f}\")\n\n# Access document insights\nfindings = result['analysis'].key_findings\nprint(f\"Pages: {findings['page_count']}\")\nprint(f\"Tables: {findings['table_rows']}\")\nprint(f\"Figures: {findings['figure_count']}\")\n```\n\n### Advanced Chunking for Academic Papers\n```python\n# Perfect for research papers with complex structure\nchunks = orchestrator.chunk_document(\n    \"paper.pdf\",\n    markdown_content,\n    docling_result,\n    strategy='structural'\n)\n\n# Chunks respect paper structure\nfor chunk in chunks:\n    if chunk.chunk_type == 'section':\n        print(f\"Section: {chunk.metadata['section_title']}\")\n    elif chunk.chunk_type == 'table':\n        print(f\"Table data: {len(chunk.content)} chars\")\n    elif chunk.chunk_type == 'figure':\n        print(f\"Figure reference preserved\")\n```\n\n### File Size Management\n```python\nfrom core.analyzer import DoclingAnalyzer\n\n# Large PDF processing with limits\nanalyzer = DoclingAnalyzer(max_file_size_mb=100.0)\n\nresult = analyzer.analyze_document(\"large_manual.pdf\")\nif 'error' in result:\n    print(f\"File too large: {result['error']}\")\nelse:\n    print(f\"Successfully processed {result['file_size']} bytes\")\n```\n\n## \ud83e\uddea Framework Ecosystem\n\nThis framework is part of a larger document analysis ecosystem:\n\n- **`xml-analysis-framework`** - Specialized XML document analysis\n- **`docling-analysis-framework`** - PDF and Office documents (this package)\n- **`document-analysis-framework`** - Text, code, and configuration files  \n- **`unified-analysis-orchestrator`** - Routes documents to appropriate frameworks\n\n## \ud83d\udd17 Integration with Docling\n\nThis framework acts as an **AI preparation layer** on top of Docling:\n\n1. **Docling** handles the heavy lifting of document extraction\n2. **Our framework** adds AI-specific analysis and chunking\n3. **Result** is AI-ready structured data and optimized chunks\n\n```python\n# What Docling provides:\ndocling_result = docling.convert(\"document.pdf\")\nmarkdown_content = docling_result.document.export_to_markdown()\n\n# What we add:\nai_analysis = our_analyzer.analyze(docling_result)\nai_chunks = our_chunker.chunk(markdown_content, ai_analysis)\nquality_scores = our_quality_assessor.assess(docling_result)\n```\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Powered by [Docling](https://github.com/DS4SD/docling) for document extraction\n- Built for modern AI/ML development workflows\n- Part of the **AI Building Blocks** initiative\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "AI-ready analysis framework for PDF and Office documents using Docling for content extraction",
    "version": "1.1.0",
    "project_urls": {
        "Documentation": "https://github.com/rdwj/docling-analysis-framework/blob/main/README.md",
        "Homepage": "https://github.com/rdwj/docling-analysis-framework",
        "Issues": "https://github.com/rdwj/docling-analysis-framework/issues",
        "Repository": "https://github.com/rdwj/docling-analysis-framework"
    },
    "split_keywords": [
        "docling",
        " pdf",
        " office",
        " ai",
        " ml",
        " document-processing",
        " semantic-search",
        " chunking"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5c47e3b29bb62616ae5f09335d09a8723bbebff020155d30f2aff8c03e7e81dd",
                "md5": "8898f4a0c35d2c8761b5f9a551f0a76f",
                "sha256": "0f2fd86091bdb9233ca2744957d0e18f38bca32491f46aef2feda7d3497d12f3"
            },
            "downloads": -1,
            "filename": "docling_analysis_framework-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8898f4a0c35d2c8761b5f9a551f0a76f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21187,
            "upload_time": "2025-07-29T14:33:56",
            "upload_time_iso_8601": "2025-07-29T14:33:56.494383Z",
            "url": "https://files.pythonhosted.org/packages/5c/47/e3b29bb62616ae5f09335d09a8723bbebff020155d30f2aff8c03e7e81dd/docling_analysis_framework-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "003d1eb48303ce2cd7630888aba9f7beb90575b7d3748f105fd527d33e98a37c",
                "md5": "f37327ff007f083af5519dd56e1977d7",
                "sha256": "9d8d4de0821326cab9994c452a2aad143d66df473b14aef691ba0caac50271a7"
            },
            "downloads": -1,
            "filename": "docling_analysis_framework-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f37327ff007f083af5519dd56e1977d7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 9259597,
            "upload_time": "2025-07-29T14:34:10",
            "upload_time_iso_8601": "2025-07-29T14:34:10.442370Z",
            "url": "https://files.pythonhosted.org/packages/00/3d/1eb48303ce2cd7630888aba9f7beb90575b7d3748f105fd527d33e98a37c/docling_analysis_framework-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 14:34:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rdwj",
    "github_project": "docling-analysis-framework",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "docling",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        }
    ],
    "lcname": "docling-analysis-framework"
}
        
Elapsed time: 0.82594s