mcp-mineru


Namemcp-mineru JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryMCP server for MinerU - Parse PDFs and images (OCR) with MLX acceleration on Apple Silicon
upload_time2025-11-06 11:30:01
maintainerNone
docs_urlNone
authorNone
requires_python<3.14,>=3.10
licenseApache-2.0
keywords mcp mineru pdf ocr image parsing mlx apple-silicon claude screenshot document
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MCP-MinerU

[![PyPI version](https://badge.fury.io/py/mcp-mineru.svg)](https://pypi.org/project/mcp-mineru/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

MCP server for document and image parsing via [MinerU](https://github.com/opendatalab/MinerU). Extract text, tables, and formulas from PDFs, screenshots, and scanned documents with MLX acceleration on Apple Silicon.

## Installation

```bash
claude mcp add --transport stdio --scope user mineru -- \
  uvx --from mcp-mineru python -m mcp_mineru.server
```

This command installs and configures the server for all your Claude Code projects using `uvx` (no manual installation required).

**Alternative methods**: See [Installation Guide](docs/INSTALLATION.md) for PyPI, source installation, and Claude Desktop configuration.

## Features

- **Multiple format support**: PDF, JPEG, PNG, and other image formats
- **OCR capabilities**: Built-in text extraction from screenshots and photos
- **Table recognition**: Preserves structure when extracting tables
- **Formula extraction**: Converts mathematical equations to LaTeX
- **MLX acceleration**: Optimized for Apple Silicon (M1/M2/M3/M4)
- **Multiple backends**: Choose speed vs quality tradeoffs

## Quick Start

### Parse a PDF document
```
User: "Analyze the tables in research_paper.pdf"
Claude: [Calls parse_pdf tool] "The paper contains 3 tables..."
```

### Extract text from a screenshot
```
User: "What does this screenshot say? image.png"
Claude: [Calls parse_pdf tool] "The screenshot contains..."
```

### Check system capabilities
```
User: "Which backend should I use?"
Claude: [Calls list_backends tool] "Your system has Apple Silicon M4..."
```

For more examples, see [Usage Examples](docs/EXAMPLES.md).

## Tools

### parse_pdf

Parse PDF and image files to extract structured content as Markdown.

**Parameters:**
- `file_path` (required): Absolute path to file (PDF, JPEG, PNG, etc.)
- `backend` (optional): `pipeline` | `vlm-mlx-engine` | `vlm-transformers`
- `formula_enable` (optional): Enable formula recognition (default: true)
- `table_enable` (optional): Enable table recognition (default: true)
- `start_page` (optional): Starting page for PDFs (default: 0)
- `end_page` (optional): Ending page for PDFs (default: -1)

### list_backends

Check system capabilities and get backend recommendations.

**Returns:** System information, available backends, and performance recommendations.

## Supported Formats

- PDF documents (.pdf)
- JPEG images (.jpg, .jpeg)
- PNG images (.png)
- Other image formats (WebP, GIF, etc.)

## Performance

Benchmarked on Apple Silicon M4 (16GB RAM):

- **pipeline**: ~32s/page, CPU-only, good quality
- **vlm-mlx-engine**: ~38s/page, Apple Silicon optimized, excellent quality
- **vlm-transformers**: ~148s/page, highest quality, slowest

## Documentation

- [Installation Guide](docs/INSTALLATION.md) - Detailed installation options
- [Updating Guide](docs/UPDATING.md) - How to update to the latest version
- [Usage Examples](docs/EXAMPLES.md) - More use cases and API reference
- [MinerU Documentation](https://github.com/opendatalab/MinerU) - Underlying parsing engine

## Development

```bash
git clone https://github.com/TINKPA/mcp-mineru.git
cd mcp-mineru
uv pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/
ruff check src/
```

## License

Apache License 2.0 - see [LICENSE](LICENSE) file for details.

## Acknowledgments

Built on top of [MinerU](https://github.com/opendatalab/MinerU) by OpenDataLab.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mcp-mineru",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.10",
    "maintainer_email": null,
    "keywords": "mcp, mineru, pdf, ocr, image, parsing, mlx, apple-silicon, claude, screenshot, document",
    "author": null,
    "author_email": "TINKPA <tinkpa@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/b9/28/80cce0de08c2120903426fa899b40f5ba72b3509de1b203792ae6c7d2428/mcp_mineru-0.1.3.tar.gz",
    "platform": null,
    "description": "# MCP-MinerU\n\n[![PyPI version](https://badge.fury.io/py/mcp-mineru.svg)](https://pypi.org/project/mcp-mineru/)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)\n\nMCP server for document and image parsing via [MinerU](https://github.com/opendatalab/MinerU). Extract text, tables, and formulas from PDFs, screenshots, and scanned documents with MLX acceleration on Apple Silicon.\n\n## Installation\n\n```bash\nclaude mcp add --transport stdio --scope user mineru -- \\\n  uvx --from mcp-mineru python -m mcp_mineru.server\n```\n\nThis command installs and configures the server for all your Claude Code projects using `uvx` (no manual installation required).\n\n**Alternative methods**: See [Installation Guide](docs/INSTALLATION.md) for PyPI, source installation, and Claude Desktop configuration.\n\n## Features\n\n- **Multiple format support**: PDF, JPEG, PNG, and other image formats\n- **OCR capabilities**: Built-in text extraction from screenshots and photos\n- **Table recognition**: Preserves structure when extracting tables\n- **Formula extraction**: Converts mathematical equations to LaTeX\n- **MLX acceleration**: Optimized for Apple Silicon (M1/M2/M3/M4)\n- **Multiple backends**: Choose speed vs quality tradeoffs\n\n## Quick Start\n\n### Parse a PDF document\n```\nUser: \"Analyze the tables in research_paper.pdf\"\nClaude: [Calls parse_pdf tool] \"The paper contains 3 tables...\"\n```\n\n### Extract text from a screenshot\n```\nUser: \"What does this screenshot say? image.png\"\nClaude: [Calls parse_pdf tool] \"The screenshot contains...\"\n```\n\n### Check system capabilities\n```\nUser: \"Which backend should I use?\"\nClaude: [Calls list_backends tool] \"Your system has Apple Silicon M4...\"\n```\n\nFor more examples, see [Usage Examples](docs/EXAMPLES.md).\n\n## Tools\n\n### parse_pdf\n\nParse PDF and image files to extract structured content as Markdown.\n\n**Parameters:**\n- `file_path` (required): Absolute path to file (PDF, JPEG, PNG, etc.)\n- `backend` (optional): `pipeline` | `vlm-mlx-engine` | `vlm-transformers`\n- `formula_enable` (optional): Enable formula recognition (default: true)\n- `table_enable` (optional): Enable table recognition (default: true)\n- `start_page` (optional): Starting page for PDFs (default: 0)\n- `end_page` (optional): Ending page for PDFs (default: -1)\n\n### list_backends\n\nCheck system capabilities and get backend recommendations.\n\n**Returns:** System information, available backends, and performance recommendations.\n\n## Supported Formats\n\n- PDF documents (.pdf)\n- JPEG images (.jpg, .jpeg)\n- PNG images (.png)\n- Other image formats (WebP, GIF, etc.)\n\n## Performance\n\nBenchmarked on Apple Silicon M4 (16GB RAM):\n\n- **pipeline**: ~32s/page, CPU-only, good quality\n- **vlm-mlx-engine**: ~38s/page, Apple Silicon optimized, excellent quality\n- **vlm-transformers**: ~148s/page, highest quality, slowest\n\n## Documentation\n\n- [Installation Guide](docs/INSTALLATION.md) - Detailed installation options\n- [Updating Guide](docs/UPDATING.md) - How to update to the latest version\n- [Usage Examples](docs/EXAMPLES.md) - More use cases and API reference\n- [MinerU Documentation](https://github.com/opendatalab/MinerU) - Underlying parsing engine\n\n## Development\n\n```bash\ngit clone https://github.com/TINKPA/mcp-mineru.git\ncd mcp-mineru\nuv pip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Format code\nblack src/\nruff check src/\n```\n\n## License\n\nApache License 2.0 - see [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\nBuilt on top of [MinerU](https://github.com/opendatalab/MinerU) by OpenDataLab.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "MCP server for MinerU - Parse PDFs and images (OCR) with MLX acceleration on Apple Silicon",
    "version": "0.1.3",
    "project_urls": {
        "Documentation": "https://github.com/TINKPA/mcp-mineru#readme",
        "Homepage": "https://github.com/TINKPA/mcp-mineru",
        "Issues": "https://github.com/TINKPA/mcp-mineru/issues",
        "Repository": "https://github.com/TINKPA/mcp-mineru"
    },
    "split_keywords": [
        "mcp",
        " mineru",
        " pdf",
        " ocr",
        " image",
        " parsing",
        " mlx",
        " apple-silicon",
        " claude",
        " screenshot",
        " document"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c49901580b0b89c92fa9a77121bfa9e144b4c5d8c444e4b5c1c10e3878794798",
                "md5": "11f7bc1a2d19ba012a9ba97787225e57",
                "sha256": "96ef14b2b487049ed35107907489d888bf46f4093e8942e305a3e195737ca340"
            },
            "downloads": -1,
            "filename": "mcp_mineru-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11f7bc1a2d19ba012a9ba97787225e57",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.10",
            "size": 8447,
            "upload_time": "2025-11-06T11:30:00",
            "upload_time_iso_8601": "2025-11-06T11:30:00.280850Z",
            "url": "https://files.pythonhosted.org/packages/c4/99/01580b0b89c92fa9a77121bfa9e144b4c5d8c444e4b5c1c10e3878794798/mcp_mineru-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b92880cce0de08c2120903426fa899b40f5ba72b3509de1b203792ae6c7d2428",
                "md5": "2c8d3568480ac7a8a8d191b9c08ac265",
                "sha256": "bfe4eae7dc6b61ddfb64c71e1678efe3af1b85b6d8b447541cb5710ef06f30e1"
            },
            "downloads": -1,
            "filename": "mcp_mineru-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "2c8d3568480ac7a8a8d191b9c08ac265",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.10",
            "size": 10066,
            "upload_time": "2025-11-06T11:30:01",
            "upload_time_iso_8601": "2025-11-06T11:30:01.383398Z",
            "url": "https://files.pythonhosted.org/packages/b9/28/80cce0de08c2120903426fa899b40f5ba72b3509de1b203792ae6c7d2428/mcp_mineru-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-06 11:30:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TINKPA",
    "github_project": "mcp-mineru#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mcp-mineru"
}
        
Elapsed time: 1.81439s