# ๐ Docsray MCP Server
[](https://pypi.org/project/docsray-mcp/)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/downloads/)
[](https://github.com/anthropics/mcp)
[](https://github.com/docsray/docsray-mcp)
[](https://app.netlify.com/projects/docsray/deploys)
**Docsray** is a powerful Model Context Protocol (MCP) server that gives AI assistants like Claude advanced document perception capabilities. Extract text, navigate pages, analyze structure, and understand any document with ease.
**โ
Status: Published to PyPI and TestPyPI - Working in Cursor, Claude Desktop, and other MCP clients**
## โจ Features
### ๐ฏ Five Powerful Tools
1. **`docsray_peek`** - Quick document overview with format detection and provider capabilities
2. **`docsray_map`** - Generate comprehensive document structure maps with caching
3. **`docsray_xray`** - AI-powered deep analysis extracting entities, relationships, and insights
4. **`docsray_extract`** - Extract content in multiple formats (markdown, text, JSON, tables)
5. **`docsray_seek`** - Navigate to specific pages, sections, or search for content
### ๐ Multi-Provider Architecture
- **PyMuPDF4LLM** - Lightning-fast PDF processing (โ
Implemented)
- Fast markdown extraction
- Basic table detection
- Multi-page support
- Always enabled as fallback
- **LlamaParse** - Deep document understanding with LLMs (โ
Implemented)
- AI-powered entity extraction
- Custom analysis instructions
- Comprehensive caching in .docsray directories
- Rich format preservation (markdown, images, tables)
- **PyTesseract** - OCR for scanned documents (๐ Planned)
- **Mistral OCR** - AI-powered OCR and analysis (๐ Planned)
### ๐ Key Benefits
- **Universal Input Support** - Local files (./path, ../path, /absolute) and URLs (https://)
- **Intelligent Provider Selection** - Automatically chooses the best tool for each task
- **Smart Caching** - LlamaParse results cached in .docsray directories for instant access
- **Dynamic Discovery** - Tools report actual capabilities based on what's enabled
- **Production Ready** - Comprehensive error handling, logging, and 56 tests
- **Self-Documenting** - Built-in resources for discovery by MCP clients
## ๐ฆ Installation
### Quick Start with uvx (Recommended)
```bash
# Run directly without installation
uvx docsray-mcp start
# Or install globally
uv tool install docsray-mcp
# Then run with:
docsray start
# or
docsray-mcp start
```
### Alternative: Install with pip
```bash
# Basic installation (PyMuPDF4LLM only)
pip install docsray-mcp
# With LlamaParse for AI analysis
pip install "docsray-mcp[ai]"
# Development installation
pip install -e ".[dev]"
```
## ๐ Quick Start
### 1. Set up API Keys (Optional but Recommended)
Create a `.env` file in your project:
```bash
# For AI-powered analysis with LlamaParse
LLAMAPARSE_API_KEY=llx-your-key-here
# Or use environment variables
export LLAMAPARSE_API_KEY=llx-your-key-here
```
Get your free LlamaParse API key at [cloud.llamaindex.ai](https://cloud.llamaindex.ai)
### 2. Configure with Your MCP Client
#### For Cursor
Add to your Cursor settings:
```json
{
"mcpServers": {
"docsray": {
"command": "uvx",
"args": ["docsray-mcp"],
"env": {
"LLAMAPARSE_API_KEY": "llx-your-key-here"
}
}
}
}
```
#### For Claude Desktop
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"docsray": {
"command": "uvx",
"args": ["docsray-mcp"],
"env": {
"LLAMAPARSE_API_KEY": "llx-your-key-here"
}
}
}
}
```
## ๐ Usage Examples
### Basic Document Overview
```
Peek at ./document.pdf to see its structure and available formats
```
### Extract Entities from Contracts
```
Xray ./contract.pdf and extract all parties, dates, payment terms, and obligations
```
### Navigate Documents
```
Map the complete structure of ./manual.pdf including all sections and subsections
```
### Extract Specific Content
```
Extract pages 10-20 from ./report.pdf as markdown
```
### Analyze Web Documents
```
Analyze https://arxiv.org/pdf/2301.00234.pdf for methodology and key findings
```
### Compare Providers
```
Extract text from document.pdf with provider pymupdf4llm (fast)
Xray document.pdf with provider llama-parse (AI analysis)
```
## ๐ ๏ธ Advanced Configuration
### Environment Variables
```bash
# Provider Configuration
DOCSRAY_PYMUPDF4LLM_ENABLED=true # Always true by default
DOCSRAY_LLAMAPARSE_ENABLED=true
LLAMAPARSE_API_KEY=llx-your-key
# Performance Tuning
DOCSRAY_CACHE_ENABLED=true
DOCSRAY_CACHE_TTL=3600
DOCSRAY_MAX_CONCURRENT_REQUESTS=5
DOCSRAY_TIMEOUT_SECONDS=30
# Logging
DOCSRAY_LOG_LEVEL=INFO
```
### Provider Capabilities
#### PyMuPDF4LLM (Always Available)
- โ
Fast text extraction
- โ
Markdown formatting
- โ
Basic table detection
- โ
Multi-page support
- โ No AI analysis
- โ No OCR
#### LlamaParse (When API Key Configured)
- โ
AI-powered analysis
- โ
Entity extraction
- โ
Custom instructions
- โ
Table extraction
- โ
Image extraction
- โ
Layout preservation
- โ
Relationship mapping
- โ
Result caching
## ๐งช Testing
```bash
# Run all tests
pytest tests/
# Run only unit tests (no API calls)
pytest tests/unit/
# Run integration tests
pytest tests/integration/
# Run with coverage
pytest tests/ --cov=src/docsray --cov-report=html
```
Current test coverage: **52 tests passing** with comprehensive coverage across all components
## ๐ API Reference
### Tool: docsray_peek
Get quick document overview and metadata.
```python
{
"document_url": "path/to/document.pdf",
"depth": "structure", # metadata | structure | preview
"provider": "auto" # auto | pymupdf4llm | llama-parse
}
```
### Tool: docsray_map
Generate comprehensive document structure map.
```python
{
"document_url": "path/to/document.pdf",
"include_content": false,
"analysis_depth": "deep", # basic | deep | comprehensive
"provider": "auto"
}
```
### Tool: docsray_xray
Deep AI-powered document analysis.
```python
{
"document_url": "path/to/document.pdf",
"analysis_type": ["entities", "key-points"],
"custom_instructions": "Extract all dates and amounts",
"provider": "llama-parse"
}
```
### Tool: docsray_extract
Extract content in various formats.
```python
{
"document_url": "path/to/document.pdf",
"extraction_targets": ["text", "tables"],
"output_format": "markdown", # markdown | text | json
"pages": [1, 2, 3], # Optional: specific pages
"provider": "auto"
}
```
### Tool: docsray_seek
Navigate to specific document locations.
```python
{
"document_url": "path/to/document.pdf",
"target": {"page": 5}, # or {"section": "Introduction"} or {"query": "search text"}
"extract_content": true,
"provider": "auto"
}
```
## ๐๏ธ Architecture
```
docsray-mcp/
โโโ src/docsray/
โ โโโ server.py # FastMCP server with discovery resources
โ โโโ providers/ # Provider implementations
โ โ โโโ base.py # Provider interface
โ โ โโโ pymupdf4llm.py # Fast PDF extraction
โ โ โโโ llamaparse.py # AI-powered analysis
โ โโโ tools/ # MCP tool implementations
โ โ โโโ peek.py # Document overview
โ โ โโโ map.py # Structure mapping
โ โ โโโ xray.py # Deep analysis
โ โ โโโ extract.py # Content extraction
โ โ โโโ seek.py # Navigation
โ โโโ utils/ # Utilities
โ โโโ cache.py # Document caching
โ โโโ llamaparse_cache.py # LlamaParse .docsray cache
โโโ tests/
โ โโโ unit/ # Fast isolated tests
โ โโโ integration/ # Component interaction tests
โ โโโ manual/ # Debugging scripts
โโโ PROMPTS.md # Example prompts for all use cases
```
## ๐ค Contributing
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### Development Setup
```bash
# Clone the repository
git clone https://github.com/docsray/docsray-mcp.git
cd docsray-mcp
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest tests/
# Run linting
ruff check src/
```
## ๐ License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## ๐ Acknowledgments
- Built on [FastMCP](https://github.com/jlowin/fastmcp) framework
- Document processing powered by [PyMuPDF4LLM](https://github.com/pymupdf/PyMuPDF4LLM)
- AI analysis powered by [LlamaParse](https://github.com/run-llama/llama_parse)
- Inspired by the [Model Context Protocol](https://github.com/anthropics/mcp) specification
## ๐ฌ Support
- ๐ [Documentation](https://docs.docsray.dev)
- ๐ [Issue Tracker](https://github.com/docsray/docsray-mcp/issues)
- ๐ฌ [Discussions](https://github.com/docsray/docsray-mcp/discussions)
---
**Made with โค๏ธ for the MCP ecosystem**
Raw data
{
"_id": null,
"home_page": null,
"name": "docsray-mcp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "mcp, document, pdf, ai, llm, analysis, extraction, llamaparse, mistral-ocr, ocrmypdf, tesseract",
"author": null,
"author_email": "Docsray Team <team@docsray.dev>",
"download_url": "https://files.pythonhosted.org/packages/f4/9b/a927f03cb84d539434d067cd335cd77fed64f6c188c20defee6a64758050/docsray_mcp-0.3.3.tar.gz",
"platform": null,
"description": "# \ud83d\udd0d Docsray MCP Server\n\n[](https://pypi.org/project/docsray-mcp/)\n[](https://opensource.org/licenses/Apache-2.0)\n[](https://www.python.org/downloads/)\n[](https://github.com/anthropics/mcp)\n[](https://github.com/docsray/docsray-mcp)\n[](https://app.netlify.com/projects/docsray/deploys)\n\n**Docsray** is a powerful Model Context Protocol (MCP) server that gives AI assistants like Claude advanced document perception capabilities. Extract text, navigate pages, analyze structure, and understand any document with ease.\n\n**\u2705 Status: Published to PyPI and TestPyPI - Working in Cursor, Claude Desktop, and other MCP clients**\n\n## \u2728 Features\n\n### \ud83c\udfaf Five Powerful Tools\n\n1. **`docsray_peek`** - Quick document overview with format detection and provider capabilities\n2. **`docsray_map`** - Generate comprehensive document structure maps with caching\n3. **`docsray_xray`** - AI-powered deep analysis extracting entities, relationships, and insights\n4. **`docsray_extract`** - Extract content in multiple formats (markdown, text, JSON, tables)\n5. **`docsray_seek`** - Navigate to specific pages, sections, or search for content\n\n### \ud83d\udd0c Multi-Provider Architecture\n\n- **PyMuPDF4LLM** - Lightning-fast PDF processing (\u2705 Implemented)\n - Fast markdown extraction\n - Basic table detection\n - Multi-page support\n - Always enabled as fallback\n\n- **LlamaParse** - Deep document understanding with LLMs (\u2705 Implemented)\n - AI-powered entity extraction\n - Custom analysis instructions\n - Comprehensive caching in .docsray directories\n - Rich format preservation (markdown, images, tables)\n\n- **PyTesseract** - OCR for scanned documents (\ud83d\udd04 Planned)\n- **Mistral OCR** - AI-powered OCR and analysis (\ud83d\udd04 Planned)\n\n### \ud83d\ude80 Key Benefits\n\n- **Universal Input Support** - Local files (./path, ../path, /absolute) and URLs (https://)\n- **Intelligent Provider Selection** - Automatically chooses the best tool for each task\n- **Smart Caching** - LlamaParse results cached in .docsray directories for instant access\n- **Dynamic Discovery** - Tools report actual capabilities based on what's enabled\n- **Production Ready** - Comprehensive error handling, logging, and 56 tests\n- **Self-Documenting** - Built-in resources for discovery by MCP clients\n\n## \ud83d\udce6 Installation\n\n### Quick Start with uvx (Recommended)\n\n```bash\n# Run directly without installation\nuvx docsray-mcp start\n\n# Or install globally\nuv tool install docsray-mcp\n# Then run with:\ndocsray start\n# or\ndocsray-mcp start\n```\n\n### Alternative: Install with pip\n\n```bash\n# Basic installation (PyMuPDF4LLM only)\npip install docsray-mcp\n\n# With LlamaParse for AI analysis\npip install \"docsray-mcp[ai]\"\n\n# Development installation\npip install -e \".[dev]\"\n```\n\n## \ud83d\ude80 Quick Start\n\n### 1. Set up API Keys (Optional but Recommended)\n\nCreate a `.env` file in your project:\n\n```bash\n# For AI-powered analysis with LlamaParse\nLLAMAPARSE_API_KEY=llx-your-key-here\n\n# Or use environment variables\nexport LLAMAPARSE_API_KEY=llx-your-key-here\n```\n\nGet your free LlamaParse API key at [cloud.llamaindex.ai](https://cloud.llamaindex.ai)\n\n### 2. Configure with Your MCP Client\n\n#### For Cursor\n\nAdd to your Cursor settings:\n\n```json\n{\n \"mcpServers\": {\n \"docsray\": {\n \"command\": \"uvx\",\n \"args\": [\"docsray-mcp\"],\n \"env\": {\n \"LLAMAPARSE_API_KEY\": \"llx-your-key-here\"\n }\n }\n }\n}\n```\n\n#### For Claude Desktop\n\nAdd to `~/Library/Application Support/Claude/claude_desktop_config.json`:\n\n```json\n{\n \"mcpServers\": {\n \"docsray\": {\n \"command\": \"uvx\",\n \"args\": [\"docsray-mcp\"],\n \"env\": {\n \"LLAMAPARSE_API_KEY\": \"llx-your-key-here\"\n }\n }\n }\n}\n```\n\n## \ud83d\udcda Usage Examples\n\n### Basic Document Overview\n\n```\nPeek at ./document.pdf to see its structure and available formats\n```\n\n### Extract Entities from Contracts\n\n```\nXray ./contract.pdf and extract all parties, dates, payment terms, and obligations\n```\n\n### Navigate Documents\n\n```\nMap the complete structure of ./manual.pdf including all sections and subsections\n```\n\n### Extract Specific Content\n\n```\nExtract pages 10-20 from ./report.pdf as markdown\n```\n\n### Analyze Web Documents\n\n```\nAnalyze https://arxiv.org/pdf/2301.00234.pdf for methodology and key findings\n```\n\n### Compare Providers\n\n```\nExtract text from document.pdf with provider pymupdf4llm (fast)\nXray document.pdf with provider llama-parse (AI analysis)\n```\n\n## \ud83d\udee0\ufe0f Advanced Configuration\n\n### Environment Variables\n\n```bash\n# Provider Configuration\nDOCSRAY_PYMUPDF4LLM_ENABLED=true # Always true by default\nDOCSRAY_LLAMAPARSE_ENABLED=true\nLLAMAPARSE_API_KEY=llx-your-key\n\n# Performance Tuning\nDOCSRAY_CACHE_ENABLED=true\nDOCSRAY_CACHE_TTL=3600\nDOCSRAY_MAX_CONCURRENT_REQUESTS=5\nDOCSRAY_TIMEOUT_SECONDS=30\n\n# Logging\nDOCSRAY_LOG_LEVEL=INFO\n```\n\n### Provider Capabilities\n\n#### PyMuPDF4LLM (Always Available)\n- \u2705 Fast text extraction\n- \u2705 Markdown formatting\n- \u2705 Basic table detection\n- \u2705 Multi-page support\n- \u274c No AI analysis\n- \u274c No OCR\n\n#### LlamaParse (When API Key Configured)\n- \u2705 AI-powered analysis\n- \u2705 Entity extraction\n- \u2705 Custom instructions\n- \u2705 Table extraction\n- \u2705 Image extraction\n- \u2705 Layout preservation\n- \u2705 Relationship mapping\n- \u2705 Result caching\n\n## \ud83e\uddea Testing\n\n```bash\n# Run all tests\npytest tests/\n\n# Run only unit tests (no API calls)\npytest tests/unit/\n\n# Run integration tests\npytest tests/integration/\n\n# Run with coverage\npytest tests/ --cov=src/docsray --cov-report=html\n```\n\nCurrent test coverage: **52 tests passing** with comprehensive coverage across all components\n\n## \ud83d\udcd6 API Reference\n\n### Tool: docsray_peek\n\nGet quick document overview and metadata.\n\n```python\n{\n \"document_url\": \"path/to/document.pdf\",\n \"depth\": \"structure\", # metadata | structure | preview\n \"provider\": \"auto\" # auto | pymupdf4llm | llama-parse\n}\n```\n\n### Tool: docsray_map\n\nGenerate comprehensive document structure map.\n\n```python\n{\n \"document_url\": \"path/to/document.pdf\",\n \"include_content\": false,\n \"analysis_depth\": \"deep\", # basic | deep | comprehensive\n \"provider\": \"auto\"\n}\n```\n\n### Tool: docsray_xray\n\nDeep AI-powered document analysis.\n\n```python\n{\n \"document_url\": \"path/to/document.pdf\",\n \"analysis_type\": [\"entities\", \"key-points\"],\n \"custom_instructions\": \"Extract all dates and amounts\",\n \"provider\": \"llama-parse\"\n}\n```\n\n### Tool: docsray_extract\n\nExtract content in various formats.\n\n```python\n{\n \"document_url\": \"path/to/document.pdf\",\n \"extraction_targets\": [\"text\", \"tables\"],\n \"output_format\": \"markdown\", # markdown | text | json\n \"pages\": [1, 2, 3], # Optional: specific pages\n \"provider\": \"auto\"\n}\n```\n\n### Tool: docsray_seek\n\nNavigate to specific document locations.\n\n```python\n{\n \"document_url\": \"path/to/document.pdf\",\n \"target\": {\"page\": 5}, # or {\"section\": \"Introduction\"} or {\"query\": \"search text\"}\n \"extract_content\": true,\n \"provider\": \"auto\"\n}\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\n```\ndocsray-mcp/\n\u251c\u2500\u2500 src/docsray/\n\u2502 \u251c\u2500\u2500 server.py # FastMCP server with discovery resources\n\u2502 \u251c\u2500\u2500 providers/ # Provider implementations\n\u2502 \u2502 \u251c\u2500\u2500 base.py # Provider interface\n\u2502 \u2502 \u251c\u2500\u2500 pymupdf4llm.py # Fast PDF extraction\n\u2502 \u2502 \u2514\u2500\u2500 llamaparse.py # AI-powered analysis\n\u2502 \u251c\u2500\u2500 tools/ # MCP tool implementations\n\u2502 \u2502 \u251c\u2500\u2500 peek.py # Document overview\n\u2502 \u2502 \u251c\u2500\u2500 map.py # Structure mapping\n\u2502 \u2502 \u251c\u2500\u2500 xray.py # Deep analysis\n\u2502 \u2502 \u251c\u2500\u2500 extract.py # Content extraction\n\u2502 \u2502 \u2514\u2500\u2500 seek.py # Navigation\n\u2502 \u2514\u2500\u2500 utils/ # Utilities\n\u2502 \u251c\u2500\u2500 cache.py # Document caching\n\u2502 \u2514\u2500\u2500 llamaparse_cache.py # LlamaParse .docsray cache\n\u251c\u2500\u2500 tests/\n\u2502 \u251c\u2500\u2500 unit/ # Fast isolated tests\n\u2502 \u251c\u2500\u2500 integration/ # Component interaction tests\n\u2502 \u2514\u2500\u2500 manual/ # Debugging scripts\n\u2514\u2500\u2500 PROMPTS.md # Example prompts for all use cases\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/docsray/docsray-mcp.git\ncd docsray-mcp\n\n# Install in development mode\npip install -e \".[dev]\"\n\n# Run tests\npytest tests/\n\n# Run linting\nruff check src/\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Built on [FastMCP](https://github.com/jlowin/fastmcp) framework\n- Document processing powered by [PyMuPDF4LLM](https://github.com/pymupdf/PyMuPDF4LLM)\n- AI analysis powered by [LlamaParse](https://github.com/run-llama/llama_parse)\n- Inspired by the [Model Context Protocol](https://github.com/anthropics/mcp) specification\n\n## \ud83d\udcec Support\n\n- \ud83d\udcd6 [Documentation](https://docs.docsray.dev)\n- \ud83d\udc1b [Issue Tracker](https://github.com/docsray/docsray-mcp/issues)\n- \ud83d\udcac [Discussions](https://github.com/docsray/docsray-mcp/discussions)\n\n---\n\n**Made with \u2764\ufe0f for the MCP ecosystem**\n",
"bugtrack_url": null,
"license": null,
"summary": "AI-powered document perception and analysis MCP server with intelligent provider selection",
"version": "0.3.3",
"project_urls": {
"Documentation": "https://docsray.dev",
"Homepage": "https://docsray.dev",
"Issues": "https://github.com/xingh/docsray-mcp/issues",
"Repository": "https://github.com/xingh/docsray-mcp"
},
"split_keywords": [
"mcp",
" document",
" pdf",
" ai",
" llm",
" analysis",
" extraction",
" llamaparse",
" mistral-ocr",
" ocrmypdf",
" tesseract"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ae6bc28592742595d74b0a4cb09737456a73b2f655a271f34f06b1fc669bf8e7",
"md5": "2c62971759cdf000a698726843f78b76",
"sha256": "8ecd57a743a1c473c298884e2ef95be4bcaa3d538226fa4f8c8ca6253ef0f7a6"
},
"downloads": -1,
"filename": "docsray_mcp-0.3.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2c62971759cdf000a698726843f78b76",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 50839,
"upload_time": "2025-08-06T23:41:03",
"upload_time_iso_8601": "2025-08-06T23:41:03.989089Z",
"url": "https://files.pythonhosted.org/packages/ae/6b/c28592742595d74b0a4cb09737456a73b2f655a271f34f06b1fc669bf8e7/docsray_mcp-0.3.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f49ba927f03cb84d539434d067cd335cd77fed64f6c188c20defee6a64758050",
"md5": "44b944ced4ad673e82d68d0c3b39d301",
"sha256": "f813428b2f23c1833249752933197afde7ba3145dc9438371091a42657913155"
},
"downloads": -1,
"filename": "docsray_mcp-0.3.3.tar.gz",
"has_sig": false,
"md5_digest": "44b944ced4ad673e82d68d0c3b39d301",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 46539,
"upload_time": "2025-08-06T23:41:05",
"upload_time_iso_8601": "2025-08-06T23:41:05.392323Z",
"url": "https://files.pythonhosted.org/packages/f4/9b/a927f03cb84d539434d067cd335cd77fed64f6c188c20defee6a64758050/docsray_mcp-0.3.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 23:41:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xingh",
"github_project": "docsray-mcp",
"github_not_found": true,
"lcname": "docsray-mcp"
}