# MCP-MinerU
[](https://pypi.org/project/mcp-mineru/)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/Apache-2.0)
MCP server for document and image parsing via [MinerU](https://github.com/opendatalab/MinerU). Extract text, tables, and formulas from PDFs, screenshots, and scanned documents with MLX acceleration on Apple Silicon.
## Installation
```bash
claude mcp add --transport stdio --scope user mineru -- \
uvx --from mcp-mineru python -m mcp_mineru.server
```
This command installs and configures the server for all your Claude Code projects using `uvx` (no manual installation required).
**Alternative methods**: See [Installation Guide](docs/INSTALLATION.md) for PyPI, source installation, and Claude Desktop configuration.
## Features
- **Multiple format support**: PDF, JPEG, PNG, and other image formats
- **OCR capabilities**: Built-in text extraction from screenshots and photos
- **Table recognition**: Preserves structure when extracting tables
- **Formula extraction**: Converts mathematical equations to LaTeX
- **MLX acceleration**: Optimized for Apple Silicon (M1/M2/M3/M4)
- **Multiple backends**: Choose speed vs quality tradeoffs
## Quick Start
### Parse a PDF document
```
User: "Analyze the tables in research_paper.pdf"
Claude: [Calls parse_pdf tool] "The paper contains 3 tables..."
```
### Extract text from a screenshot
```
User: "What does this screenshot say? image.png"
Claude: [Calls parse_pdf tool] "The screenshot contains..."
```
### Check system capabilities
```
User: "Which backend should I use?"
Claude: [Calls list_backends tool] "Your system has Apple Silicon M4..."
```
For more examples, see [Usage Examples](docs/EXAMPLES.md).
## Tools
### parse_pdf
Parse PDF and image files to extract structured content as Markdown.
**Parameters:**
- `file_path` (required): Absolute path to file (PDF, JPEG, PNG, etc.)
- `backend` (optional): `pipeline` | `vlm-mlx-engine` | `vlm-transformers`
- `formula_enable` (optional): Enable formula recognition (default: true)
- `table_enable` (optional): Enable table recognition (default: true)
- `start_page` (optional): Starting page for PDFs (default: 0)
- `end_page` (optional): Ending page for PDFs (default: -1)
### list_backends
Check system capabilities and get backend recommendations.
**Returns:** System information, available backends, and performance recommendations.
## Supported Formats
- PDF documents (.pdf)
- JPEG images (.jpg, .jpeg)
- PNG images (.png)
- Other image formats (WebP, GIF, etc.)
## Performance
Benchmarked on Apple Silicon M4 (16GB RAM):
- **pipeline**: ~32s/page, CPU-only, good quality
- **vlm-mlx-engine**: ~38s/page, Apple Silicon optimized, excellent quality
- **vlm-transformers**: ~148s/page, highest quality, slowest
## Documentation
- [Installation Guide](docs/INSTALLATION.md) - Detailed installation options
- [Updating Guide](docs/UPDATING.md) - How to update to the latest version
- [Usage Examples](docs/EXAMPLES.md) - More use cases and API reference
- [MinerU Documentation](https://github.com/opendatalab/MinerU) - Underlying parsing engine
## Development
```bash
git clone https://github.com/TINKPA/mcp-mineru.git
cd mcp-mineru
uv pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/
ruff check src/
```
## License
Apache License 2.0 - see [LICENSE](LICENSE) file for details.
## Acknowledgments
Built on top of [MinerU](https://github.com/opendatalab/MinerU) by OpenDataLab.
Raw data
{
"_id": null,
"home_page": null,
"name": "mcp-mineru",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.10",
"maintainer_email": null,
"keywords": "mcp, mineru, pdf, ocr, image, parsing, mlx, apple-silicon, claude, screenshot, document",
"author": null,
"author_email": "TINKPA <tinkpa@example.com>",
"download_url": "https://files.pythonhosted.org/packages/b9/28/80cce0de08c2120903426fa899b40f5ba72b3509de1b203792ae6c7d2428/mcp_mineru-0.1.3.tar.gz",
"platform": null,
"description": "# MCP-MinerU\n\n[](https://pypi.org/project/mcp-mineru/)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/Apache-2.0)\n\nMCP server for document and image parsing via [MinerU](https://github.com/opendatalab/MinerU). Extract text, tables, and formulas from PDFs, screenshots, and scanned documents with MLX acceleration on Apple Silicon.\n\n## Installation\n\n```bash\nclaude mcp add --transport stdio --scope user mineru -- \\\n uvx --from mcp-mineru python -m mcp_mineru.server\n```\n\nThis command installs and configures the server for all your Claude Code projects using `uvx` (no manual installation required).\n\n**Alternative methods**: See [Installation Guide](docs/INSTALLATION.md) for PyPI, source installation, and Claude Desktop configuration.\n\n## Features\n\n- **Multiple format support**: PDF, JPEG, PNG, and other image formats\n- **OCR capabilities**: Built-in text extraction from screenshots and photos\n- **Table recognition**: Preserves structure when extracting tables\n- **Formula extraction**: Converts mathematical equations to LaTeX\n- **MLX acceleration**: Optimized for Apple Silicon (M1/M2/M3/M4)\n- **Multiple backends**: Choose speed vs quality tradeoffs\n\n## Quick Start\n\n### Parse a PDF document\n```\nUser: \"Analyze the tables in research_paper.pdf\"\nClaude: [Calls parse_pdf tool] \"The paper contains 3 tables...\"\n```\n\n### Extract text from a screenshot\n```\nUser: \"What does this screenshot say? image.png\"\nClaude: [Calls parse_pdf tool] \"The screenshot contains...\"\n```\n\n### Check system capabilities\n```\nUser: \"Which backend should I use?\"\nClaude: [Calls list_backends tool] \"Your system has Apple Silicon M4...\"\n```\n\nFor more examples, see [Usage Examples](docs/EXAMPLES.md).\n\n## Tools\n\n### parse_pdf\n\nParse PDF and image files to extract structured content as Markdown.\n\n**Parameters:**\n- `file_path` (required): Absolute path to file (PDF, JPEG, PNG, etc.)\n- `backend` (optional): `pipeline` | `vlm-mlx-engine` | `vlm-transformers`\n- `formula_enable` (optional): Enable formula recognition (default: true)\n- `table_enable` (optional): Enable table recognition (default: true)\n- `start_page` (optional): Starting page for PDFs (default: 0)\n- `end_page` (optional): Ending page for PDFs (default: -1)\n\n### list_backends\n\nCheck system capabilities and get backend recommendations.\n\n**Returns:** System information, available backends, and performance recommendations.\n\n## Supported Formats\n\n- PDF documents (.pdf)\n- JPEG images (.jpg, .jpeg)\n- PNG images (.png)\n- Other image formats (WebP, GIF, etc.)\n\n## Performance\n\nBenchmarked on Apple Silicon M4 (16GB RAM):\n\n- **pipeline**: ~32s/page, CPU-only, good quality\n- **vlm-mlx-engine**: ~38s/page, Apple Silicon optimized, excellent quality\n- **vlm-transformers**: ~148s/page, highest quality, slowest\n\n## Documentation\n\n- [Installation Guide](docs/INSTALLATION.md) - Detailed installation options\n- [Updating Guide](docs/UPDATING.md) - How to update to the latest version\n- [Usage Examples](docs/EXAMPLES.md) - More use cases and API reference\n- [MinerU Documentation](https://github.com/opendatalab/MinerU) - Underlying parsing engine\n\n## Development\n\n```bash\ngit clone https://github.com/TINKPA/mcp-mineru.git\ncd mcp-mineru\nuv pip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Format code\nblack src/\nruff check src/\n```\n\n## License\n\nApache License 2.0 - see [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\nBuilt on top of [MinerU](https://github.com/opendatalab/MinerU) by OpenDataLab.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "MCP server for MinerU - Parse PDFs and images (OCR) with MLX acceleration on Apple Silicon",
"version": "0.1.3",
"project_urls": {
"Documentation": "https://github.com/TINKPA/mcp-mineru#readme",
"Homepage": "https://github.com/TINKPA/mcp-mineru",
"Issues": "https://github.com/TINKPA/mcp-mineru/issues",
"Repository": "https://github.com/TINKPA/mcp-mineru"
},
"split_keywords": [
"mcp",
" mineru",
" pdf",
" ocr",
" image",
" parsing",
" mlx",
" apple-silicon",
" claude",
" screenshot",
" document"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c49901580b0b89c92fa9a77121bfa9e144b4c5d8c444e4b5c1c10e3878794798",
"md5": "11f7bc1a2d19ba012a9ba97787225e57",
"sha256": "96ef14b2b487049ed35107907489d888bf46f4093e8942e305a3e195737ca340"
},
"downloads": -1,
"filename": "mcp_mineru-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "11f7bc1a2d19ba012a9ba97787225e57",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.10",
"size": 8447,
"upload_time": "2025-11-06T11:30:00",
"upload_time_iso_8601": "2025-11-06T11:30:00.280850Z",
"url": "https://files.pythonhosted.org/packages/c4/99/01580b0b89c92fa9a77121bfa9e144b4c5d8c444e4b5c1c10e3878794798/mcp_mineru-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b92880cce0de08c2120903426fa899b40f5ba72b3509de1b203792ae6c7d2428",
"md5": "2c8d3568480ac7a8a8d191b9c08ac265",
"sha256": "bfe4eae7dc6b61ddfb64c71e1678efe3af1b85b6d8b447541cb5710ef06f30e1"
},
"downloads": -1,
"filename": "mcp_mineru-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "2c8d3568480ac7a8a8d191b9c08ac265",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.10",
"size": 10066,
"upload_time": "2025-11-06T11:30:01",
"upload_time_iso_8601": "2025-11-06T11:30:01.383398Z",
"url": "https://files.pythonhosted.org/packages/b9/28/80cce0de08c2120903426fa899b40f5ba72b3509de1b203792ae6c7d2428/mcp_mineru-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-06 11:30:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TINKPA",
"github_project": "mcp-mineru#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mcp-mineru"
}