markdrop


Namemarkdrop JSON
Version 0.3.1.3 PyPI version JSON
download
home_pagehttps://github.com/shoryasethia/markdrop
SummaryA comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI and Google's Gemini.
upload_time2025-01-29 22:37:58
maintainerNone
docs_urlNone
authorShorya Sethia
requires_python>=3.10
licenseNone
keywords pdf markdown converter ai llm table-extraction image-analysis document-processing gemini openai
VCS
bugtrack_url
requirements beautifulsoup4 beautifulsoup4 docling docling_core openai openpyxl pandas Pillow protobuf python-dotenv pymupdf torch tqdm transformers timm requests qwen_vl_utils google.generativeai vllm openai setuptools typing
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MARKDROP

<a href="https://pepy.tech/projects/markdrop"><img src="https://static.pepy.tech/badge/markdrop" alt="markdrop total downloads"></a>

A Python package for converting PDFs (or PDF URLs) to markdown while extracting images and tables, with advanced features for AI-powered content analysis and descriptions.

## Features  

- [x] PDF to Markdown conversion with formatting preservation using Docling
- [x] Automatic image extraction with quality preservation using XRef Id
- [x] Table detection using Microsoft's Table Transformer
- [x] PDF URL support for core functionalities
- [x] AI-powered image and table descriptions using multiple LLM providers
- [x] Interactive HTML output with downloadable Excel tables
- [x] Customizable image resolution and UI elements
- [x] Comprehensive logging system
- [ ] Optical Character Recognition (OCR) for images with embedded text
- [ ] Support for multi-language PDFs

## Installation  

```bash  
pip install markdrop  
```  

> https://pypi.org/project/markdrop  

## Quick Start  

### Basic PDF Processing

```python
from markdrop import extract_images, make_markdown, extract_tables_from_pdf

source_pdf = 'url/or/path/to/pdf/file'    # Replace with your local PDF file path or a URL
output_dir = 'data/output'                 # Replace with desired output directory's path

make_markdown(source_pdf, output_dir)
extract_images(source_pdf, output_dir)
extract_tables_from_pdf(source_pdf, output_dir=output_dir)
```

### Advanced PDF Processing with MarkDrop

```python
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging

# Configure processing options
config = MarkDropConfig(
    image_resolution_scale=2.0,        # Scale factor for image resolution
    download_button_color='#444444',   # Color for download buttons in HTML
    log_level=logging.INFO,           # Logging detail level
    log_dir='logs',                   # Directory for log files
    excel_dir='markdropped-excel-tables'  # Directory for Excel table exports
)

# Process PDF document
input_doc_path = "path/to/input.pdf"
output_dir = Path('output_directory')

# Convert PDF and generate HTML with images and tables
html_path = markdrop(input_doc_path, output_dir, config)

# Add interactive table download functionality
downloadable_html = add_downloadable_tables(html_path, config)
```

### AI-Powered Content Analysis

```python
from markdrop import setup_keys, process_markdown, ProcessorConfig, AIProvider
from pathlib import Path

# Set up API keys for AI providers
setup_keys(key='gemini')  # or setup_keys(key='openai')

# Configure AI processing options
config = ProcessorConfig(
    input_path="path/to/markdown/file.md",    # Input markdown file path
    output_dir=Path("output_directory"),      # Output directory
    ai_provider=AIProvider.GEMINI,            # AI provider (GEMINI or OPENAI)
    remove_images=False,                      # Keep or remove original images
    remove_tables=False,                      # Keep or remove original tables
    table_descriptions=True,                  # Generate table descriptions
    image_descriptions=True,                  # Generate image descriptions
    max_retries=3,                           # Number of API call retries
    retry_delay=2,                           # Delay between retries in seconds
    gemini_model_name="gemini-1.5-flash",    # Gemini model for images
    gemini_text_model_name="gemini-pro",     # Gemini model for text
    image_prompt=DEFAULT_IMAGE_PROMPT,        # Custom prompt for image analysis
    table_prompt=DEFAULT_TABLE_PROMPT         # Custom prompt for table analysis
)

# Process markdown with AI descriptions
output_path = process_markdown(config)
```

### Image Description Generation

```python
from markdrop import generate_descriptions

prompt = "Give textual highly detailed descriptions from this image ONLY, nothing else."
input_path = 'path/to/img_file/or/dir'
output_dir = 'data/output'
llm_clients = ['gemini', 'llama-vision']  # Available: ['qwen', 'gemini', 'openai', 'llama-vision', 'molmo', 'pixtral']

generate_descriptions(
    input_path=input_path,
    output_dir=output_dir,
    prompt=prompt,
    llm_client=llm_clients
)
```

## API Reference  

### Core Functions

#### markdrop(input_doc_path: str, output_dir: str, config: Optional[MarkDropConfig] = None) -> Path
Converts PDF to markdown and HTML with enhanced features.

Parameters:
- `input_doc_path` (str): Path to input PDF file
- `output_dir` (str): Output directory path
- `config` (MarkDropConfig, optional): Configuration options for processing

#### add_downloadable_tables(html_path: Path, config: Optional[MarkDropConfig] = None) -> Path
Adds interactive table download functionality to HTML output.

Parameters:
- `html_path` (Path): Path to HTML file
- `config` (MarkDropConfig, optional): Configuration options

### Configuration Classes

#### MarkDropConfig
Configuration for PDF processing:
- `image_resolution_scale` (float): Scale factor for image resolution (default: 2.0)
- `download_button_color` (str): HTML color code for download buttons (default: '#444444')
- `log_level` (int): Logging level (default: logging.INFO)
- `log_dir` (str): Directory for log files (default: 'logs')
- `excel_dir` (str): Directory for Excel table exports (default: 'markdropped-excel-tables')

#### ProcessorConfig
Configuration for AI processing:
- `input_path` (str): Path to markdown file
- `output_dir` (str): Output directory path
- `ai_provider` (AIProvider): AI provider selection (GEMINI or OPENAI)
- `remove_images` (bool): Whether to remove original images
- `remove_tables` (bool): Whether to remove original tables
- `table_descriptions` (bool): Generate table descriptions
- `image_descriptions` (bool): Generate image descriptions
- `max_retries` (int): Maximum API call retries
- `retry_delay` (int): Delay between retries in seconds
- `gemini_model_name` (str): Gemini model for image processing
- `gemini_text_model_name` (str): Gemini model for text processing
- `image_prompt` (str): Custom prompt for image analysis
- `table_prompt` (str): Custom prompt for table analysis

### Legacy Functions

#### make_markdown(source: str, output_dir: str, verbose: bool = False)
Legacy function for basic PDF to markdown conversion.

Parameters:
- `source` (str): Path to input PDF or URL
- `output_dir` (str): Output directory path
- `verbose` (bool): Enable detailed logging

#### extract_images(source: str, output_dir: str, verbose: bool = False)
Legacy function for basic image extraction.

Parameters:
- `source` (str): Path to input PDF or URL
- `output_dir` (str): Output directory path
- `verbose` (bool): Enable detailed logging

#### extract_tables_from_pdf(pdf_path: str, **kwargs)
Legacy function for basic table extraction.

Parameters:
- `pdf_path` (str): Path to input PDF or URL
- `start_page` (int, optional): Starting page number
- `end_page` (int, optional): Ending page number
- `threshold` (float, optional): Detection confidence threshold
- `output_dir` (str): Output directory path

## Contributing  

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.  

### Development Setup  

1. Clone the repository:  
```bash  
git clone https://github.com/shoryasethia/markdrop.git  
cd markdrop  
```  

2. Create a virtual environment:  
```bash  
python -m venv venv  
source venv/bin/activate  # On Windows: venv\Scripts\activate  
```  

3. Install development dependencies:  
```bash  
pip install -r requirements.txt  
```  

## Project Structure  

```bash  
markdrop/  
├── LICENSE  
├── README.md  
├── CONTRIBUTING.md  
├── CHANGELOG.md  
├── requirements.txt  
├── setup.py  
└── markdrop/ 
    ├── __init__.py 
    ├── src
    |    └── markdrop-logo.png
    ├── main.py
    ├── process.py
    ├── api_setup.py
    ├── parse.py
    ├── utils.py  
    ├── helper.py
    ├── ignore_warnings.py
    ├── run.py
    └── models/
        ├── __init__.py
        ├── .env
        ├── img_descriptions.py
        ├── logger.py
        ├── model_loader.py
        ├── responder.py
        └── setup_keys.py  
```  

## License  

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.  

## Changelog  

See [CHANGELOG.md](CHANGELOG.md) for version history.  

## Code of Conduct  

Please note that this project follows our [Code of Conduct](CODE_OF_CONDUCT.md).  

## Support  

- [Open an issue](https://github.com/shoryasethia/markdrop/issues)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/shoryasethia/markdrop",
    "name": "markdrop",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "pdf markdown converter ai llm table-extraction image-analysis document-processing gemini openai",
    "author": "Shorya Sethia",
    "author_email": "shoryasethia4may@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/5e/c8/f62d5088b9a4e3ee1520f7cba3daefec8d0c74d4abf461cb94fc399e2932/markdrop-0.3.1.3.tar.gz",
    "platform": null,
    "description": "# MARKDROP\r\n\r\n<a href=\"https://pepy.tech/projects/markdrop\"><img src=\"https://static.pepy.tech/badge/markdrop\" alt=\"markdrop total downloads\"></a>\r\n\r\nA Python package for converting PDFs (or PDF URLs) to markdown while extracting images and tables, with advanced features for AI-powered content analysis and descriptions.\r\n\r\n## Features  \r\n\r\n- [x] PDF to Markdown conversion with formatting preservation using Docling\r\n- [x] Automatic image extraction with quality preservation using XRef Id\r\n- [x] Table detection using Microsoft's Table Transformer\r\n- [x] PDF URL support for core functionalities\r\n- [x] AI-powered image and table descriptions using multiple LLM providers\r\n- [x] Interactive HTML output with downloadable Excel tables\r\n- [x] Customizable image resolution and UI elements\r\n- [x] Comprehensive logging system\r\n- [ ] Optical Character Recognition (OCR) for images with embedded text\r\n- [ ] Support for multi-language PDFs\r\n\r\n## Installation  \r\n\r\n```bash  \r\npip install markdrop  \r\n```  \r\n\r\n> https://pypi.org/project/markdrop  \r\n\r\n## Quick Start  \r\n\r\n### Basic PDF Processing\r\n\r\n```python\r\nfrom markdrop import extract_images, make_markdown, extract_tables_from_pdf\r\n\r\nsource_pdf = 'url/or/path/to/pdf/file'    # Replace with your local PDF file path or a URL\r\noutput_dir = 'data/output'                 # Replace with desired output directory's path\r\n\r\nmake_markdown(source_pdf, output_dir)\r\nextract_images(source_pdf, output_dir)\r\nextract_tables_from_pdf(source_pdf, output_dir=output_dir)\r\n```\r\n\r\n### Advanced PDF Processing with MarkDrop\r\n\r\n```python\r\nfrom markdrop import markdrop, MarkDropConfig, add_downloadable_tables\r\nfrom pathlib import Path\r\nimport logging\r\n\r\n# Configure processing options\r\nconfig = MarkDropConfig(\r\n    image_resolution_scale=2.0,        # Scale factor for image resolution\r\n    download_button_color='#444444',   # Color for download buttons in HTML\r\n    log_level=logging.INFO,           # Logging detail level\r\n    log_dir='logs',                   # Directory for log files\r\n    excel_dir='markdropped-excel-tables'  # Directory for Excel table exports\r\n)\r\n\r\n# Process PDF document\r\ninput_doc_path = \"path/to/input.pdf\"\r\noutput_dir = Path('output_directory')\r\n\r\n# Convert PDF and generate HTML with images and tables\r\nhtml_path = markdrop(input_doc_path, output_dir, config)\r\n\r\n# Add interactive table download functionality\r\ndownloadable_html = add_downloadable_tables(html_path, config)\r\n```\r\n\r\n### AI-Powered Content Analysis\r\n\r\n```python\r\nfrom markdrop import setup_keys, process_markdown, ProcessorConfig, AIProvider\r\nfrom pathlib import Path\r\n\r\n# Set up API keys for AI providers\r\nsetup_keys(key='gemini')  # or setup_keys(key='openai')\r\n\r\n# Configure AI processing options\r\nconfig = ProcessorConfig(\r\n    input_path=\"path/to/markdown/file.md\",    # Input markdown file path\r\n    output_dir=Path(\"output_directory\"),      # Output directory\r\n    ai_provider=AIProvider.GEMINI,            # AI provider (GEMINI or OPENAI)\r\n    remove_images=False,                      # Keep or remove original images\r\n    remove_tables=False,                      # Keep or remove original tables\r\n    table_descriptions=True,                  # Generate table descriptions\r\n    image_descriptions=True,                  # Generate image descriptions\r\n    max_retries=3,                           # Number of API call retries\r\n    retry_delay=2,                           # Delay between retries in seconds\r\n    gemini_model_name=\"gemini-1.5-flash\",    # Gemini model for images\r\n    gemini_text_model_name=\"gemini-pro\",     # Gemini model for text\r\n    image_prompt=DEFAULT_IMAGE_PROMPT,        # Custom prompt for image analysis\r\n    table_prompt=DEFAULT_TABLE_PROMPT         # Custom prompt for table analysis\r\n)\r\n\r\n# Process markdown with AI descriptions\r\noutput_path = process_markdown(config)\r\n```\r\n\r\n### Image Description Generation\r\n\r\n```python\r\nfrom markdrop import generate_descriptions\r\n\r\nprompt = \"Give textual highly detailed descriptions from this image ONLY, nothing else.\"\r\ninput_path = 'path/to/img_file/or/dir'\r\noutput_dir = 'data/output'\r\nllm_clients = ['gemini', 'llama-vision']  # Available: ['qwen', 'gemini', 'openai', 'llama-vision', 'molmo', 'pixtral']\r\n\r\ngenerate_descriptions(\r\n    input_path=input_path,\r\n    output_dir=output_dir,\r\n    prompt=prompt,\r\n    llm_client=llm_clients\r\n)\r\n```\r\n\r\n## API Reference  \r\n\r\n### Core Functions\r\n\r\n#### markdrop(input_doc_path: str, output_dir: str, config: Optional[MarkDropConfig] = None) -> Path\r\nConverts PDF to markdown and HTML with enhanced features.\r\n\r\nParameters:\r\n- `input_doc_path` (str): Path to input PDF file\r\n- `output_dir` (str): Output directory path\r\n- `config` (MarkDropConfig, optional): Configuration options for processing\r\n\r\n#### add_downloadable_tables(html_path: Path, config: Optional[MarkDropConfig] = None) -> Path\r\nAdds interactive table download functionality to HTML output.\r\n\r\nParameters:\r\n- `html_path` (Path): Path to HTML file\r\n- `config` (MarkDropConfig, optional): Configuration options\r\n\r\n### Configuration Classes\r\n\r\n#### MarkDropConfig\r\nConfiguration for PDF processing:\r\n- `image_resolution_scale` (float): Scale factor for image resolution (default: 2.0)\r\n- `download_button_color` (str): HTML color code for download buttons (default: '#444444')\r\n- `log_level` (int): Logging level (default: logging.INFO)\r\n- `log_dir` (str): Directory for log files (default: 'logs')\r\n- `excel_dir` (str): Directory for Excel table exports (default: 'markdropped-excel-tables')\r\n\r\n#### ProcessorConfig\r\nConfiguration for AI processing:\r\n- `input_path` (str): Path to markdown file\r\n- `output_dir` (str): Output directory path\r\n- `ai_provider` (AIProvider): AI provider selection (GEMINI or OPENAI)\r\n- `remove_images` (bool): Whether to remove original images\r\n- `remove_tables` (bool): Whether to remove original tables\r\n- `table_descriptions` (bool): Generate table descriptions\r\n- `image_descriptions` (bool): Generate image descriptions\r\n- `max_retries` (int): Maximum API call retries\r\n- `retry_delay` (int): Delay between retries in seconds\r\n- `gemini_model_name` (str): Gemini model for image processing\r\n- `gemini_text_model_name` (str): Gemini model for text processing\r\n- `image_prompt` (str): Custom prompt for image analysis\r\n- `table_prompt` (str): Custom prompt for table analysis\r\n\r\n### Legacy Functions\r\n\r\n#### make_markdown(source: str, output_dir: str, verbose: bool = False)\r\nLegacy function for basic PDF to markdown conversion.\r\n\r\nParameters:\r\n- `source` (str): Path to input PDF or URL\r\n- `output_dir` (str): Output directory path\r\n- `verbose` (bool): Enable detailed logging\r\n\r\n#### extract_images(source: str, output_dir: str, verbose: bool = False)\r\nLegacy function for basic image extraction.\r\n\r\nParameters:\r\n- `source` (str): Path to input PDF or URL\r\n- `output_dir` (str): Output directory path\r\n- `verbose` (bool): Enable detailed logging\r\n\r\n#### extract_tables_from_pdf(pdf_path: str, **kwargs)\r\nLegacy function for basic table extraction.\r\n\r\nParameters:\r\n- `pdf_path` (str): Path to input PDF or URL\r\n- `start_page` (int, optional): Starting page number\r\n- `end_page` (int, optional): Ending page number\r\n- `threshold` (float, optional): Detection confidence threshold\r\n- `output_dir` (str): Output directory path\r\n\r\n## Contributing  \r\n\r\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.  \r\n\r\n### Development Setup  \r\n\r\n1. Clone the repository:  \r\n```bash  \r\ngit clone https://github.com/shoryasethia/markdrop.git  \r\ncd markdrop  \r\n```  \r\n\r\n2. Create a virtual environment:  \r\n```bash  \r\npython -m venv venv  \r\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate  \r\n```  \r\n\r\n3. Install development dependencies:  \r\n```bash  \r\npip install -r requirements.txt  \r\n```  \r\n\r\n## Project Structure  \r\n\r\n```bash  \r\nmarkdrop/  \r\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac LICENSE  \r\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac README.md  \r\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac CONTRIBUTING.md  \r\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac CHANGELOG.md  \r\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac requirements.txt  \r\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac setup.py  \r\n\u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac markdrop/ \r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac __init__.py \r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac src\r\n    |    \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac markdrop-logo.png\r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac main.py\r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac process.py\r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac api_setup.py\r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac parse.py\r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac utils.py  \r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac helper.py\r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac ignore_warnings.py\r\n    \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac run.py\r\n    \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac models/\r\n        \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac __init__.py\r\n        \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac .env\r\n        \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac img_descriptions.py\r\n        \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac logger.py\r\n        \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac model_loader.py\r\n        \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac responder.py\r\n        \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac setup_keys.py  \r\n```  \r\n\r\n## License  \r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.  \r\n\r\n## Changelog  \r\n\r\nSee [CHANGELOG.md](CHANGELOG.md) for version history.  \r\n\r\n## Code of Conduct  \r\n\r\nPlease note that this project follows our [Code of Conduct](CODE_OF_CONDUCT.md).  \r\n\r\n## Support  \r\n\r\n- [Open an issue](https://github.com/shoryasethia/markdrop/issues)\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI and Google's Gemini.",
    "version": "0.3.1.3",
    "project_urls": {
        "Homepage": "https://github.com/shoryasethia/markdrop"
    },
    "split_keywords": [
        "pdf",
        "markdown",
        "converter",
        "ai",
        "llm",
        "table-extraction",
        "image-analysis",
        "document-processing",
        "gemini",
        "openai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "043e4e1a5d73dc50e31fbbf473e7af477c2cb3efc75d45f383988fbf15c8e763",
                "md5": "91dc647fe67c07e062c9f835b8b3f626",
                "sha256": "4222783602ceddc85410c9a614bbde1deadba7d456da90b6218ef3055b4e7abf"
            },
            "downloads": -1,
            "filename": "markdrop-0.3.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "91dc647fe67c07e062c9f835b8b3f626",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 30475,
            "upload_time": "2025-01-29T22:37:56",
            "upload_time_iso_8601": "2025-01-29T22:37:56.777239Z",
            "url": "https://files.pythonhosted.org/packages/04/3e/4e1a5d73dc50e31fbbf473e7af477c2cb3efc75d45f383988fbf15c8e763/markdrop-0.3.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5ec8f62d5088b9a4e3ee1520f7cba3daefec8d0c74d4abf461cb94fc399e2932",
                "md5": "6c83611c284f2fe738b5dedf19cdc5e5",
                "sha256": "9b41d0f33d08052c26db53eeb9121466e4e8cc80390c1002b138347937dc63b0"
            },
            "downloads": -1,
            "filename": "markdrop-0.3.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "6c83611c284f2fe738b5dedf19cdc5e5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 28753,
            "upload_time": "2025-01-29T22:37:58",
            "upload_time_iso_8601": "2025-01-29T22:37:58.330069Z",
            "url": "https://files.pythonhosted.org/packages/5e/c8/f62d5088b9a4e3ee1520f7cba3daefec8d0c74d4abf461cb94fc399e2932/markdrop-0.3.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-29 22:37:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "shoryasethia",
    "github_project": "markdrop",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": []
        },
        {
            "name": "beautifulsoup4",
            "specs": []
        },
        {
            "name": "docling",
            "specs": []
        },
        {
            "name": "docling_core",
            "specs": [
                [
                    "==",
                    "2.16.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "openpyxl",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "protobuf",
            "specs": []
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "pymupdf",
            "specs": []
        },
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "timm",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "qwen_vl_utils",
            "specs": []
        },
        {
            "name": "google.generativeai",
            "specs": []
        },
        {
            "name": "vllm",
            "specs": []
        },
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "setuptools",
            "specs": []
        },
        {
            "name": "typing",
            "specs": []
        }
    ],
    "lcname": "markdrop"
}
        
Elapsed time: 0.44150s