# PDF Knowledgebase MCP Server
A Model Context Protocol (MCP) server that enables intelligent document search and retrieval from PDF collections. Built for seamless integration with Claude Desktop, Continue, Cline, and other MCP clients, this server provides semantic search capabilities powered by OpenAI embeddings and ChromaDB vector storage.
## Table of Contents
- [π Quick Start](#-quick-start)
- [ποΈ Architecture Overview](#οΈ-architecture-overview)
- [π― Parser Selection Guide](#-parser-selection-guide)
- [βοΈ Configuration](#οΈ-configuration)
- [π₯οΈ MCP Client Setup](#οΈ-mcp-client-setup)
- [π Performance & Troubleshooting](#-performance--troubleshooting)
- [π§ Advanced Configuration](#-advanced-configuration)
- [π Appendix](#-appendix)
## π Quick Start
### Step 1: Install the Server
```bash
uvx pdfkb-mcp
```
### Step 2: Configure Your MCP Client
**Claude Desktop** (Most Common):
*Configuration file locations:*
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**: `~/.config/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"KNOWLEDGEBASE_PATH": "/Users/yourname/Documents/PDFs"
},
"transport": "stdio",
"autoRestart": true
}
}
}
```
**VS Code (Native MCP)** - Create `.vscode/mcp.json` in workspace:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"KNOWLEDGEBASE_PATH": "${workspaceFolder}/pdfs"
},
"transport": "stdio"
}
}
}
```
### Step 3: Verify Installation
1. **Restart your MCP client** completely
2. **Check for PDF KB tools**: Look for `add_document`, `search_documents`, `list_documents`, `remove_document`
3. **Test functionality**: Try adding a PDF and searching for content
## ποΈ Architecture Overview
### MCP Integration
```raw
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β MCP Client β β MCP Client β β MCP Client β
β (Claude Desktop)β β(VS Code/Continue)| β (Other) β
βββββββββββ¬ββββββββ βββββββββββ¬βββββββββ βββββββββββ¬ββββββββ
β β β
ββββββββββββββββββββββββΌββββββββββββββββββββββββ
β
ββββββββββββββ΄βββββββββββββ
β Model Context β
β Protocol (MCP) β
β Standard Layer β
ββββββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββββββΌββββββββββββββββββββββββ
β β β
βββββββββββ΄ββββββββ βββββββββββ΄βββββββββ βββββββββββ΄ββββββββ
β PDF KB Server β β Other MCP β β Other MCP β
β (This Server) β β Server β β Server β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
```
### Available Tools & Resources
**Tools** (Actions your client can perform):
- [`add_document(path, metadata?)`](src/pdfkb/main.py:278) - Add PDF to knowledgebase
- [`search_documents(query, limit=5, metadata_filter?)`](src/pdfkb/main.py:345) - Semantic search across PDFs
- [`list_documents(metadata_filter?)`](src/pdfkb/main.py:422) - List all documents with metadata
- [`remove_document(document_id)`](src/pdfkb/main.py:488) - Remove document from knowledgebase
**Resources** (Data your client can access):
- `pdf://{document_id}` - Full document content as JSON
- `pdf://{document_id}/page/{page_number}` - Specific page content
- `pdf://list` - List of all documents with metadata
## π― Parser Selection Guide
### Decision Tree
```
Document Type & Priority?
βββ π Speed Priority β PyMuPDF4LLM (fastest processing, low memory)
βββ π Academic Papers β MinerU (fast with GPU, excellent formulas)
βββ π Business Reports β Docling (medium speed, best tables)
βββ βοΈ Balanced Quality β Marker (medium speed, good structure)
βββ π― Maximum Accuracy β LLM (slow, vision-based API calls)
```
### Performance Comparison
| Parser | Processing Speed | Memory | Text Quality | Table Quality | Best For |
|--------|------------------|--------|--------------|---------------|----------|
| **PyMuPDF4LLM** | **Fastest** | Low | Good | Basic | Speed priority |
| **MinerU** | Fast (with GPU) | High | Excellent | Excellent | Scientific papers |
| **Docling** | Medium | Medium | Excellent | **Excellent** | Business documents |
| **Marker** | Medium | Medium | Excellent | Good | **Balanced** |
| **LLM** | Slow | Low | Excellent | Excellent | Maximum accuracy |
*Benchmarks from research studies and technical reports*
## βοΈ Configuration
### Tier 1: Basic Configurations (80% of users)
**Default (Recommended)**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"PDF_PARSER": "pymupdf4llm",
"PDF_CHUNKER": "langchain",
"EMBEDDING_MODEL": "text-embedding-3-small"
},
"transport": "stdio"
}
}
}
```
**Speed Optimized**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"PDF_PARSER": "pymupdf4llm",
"CHUNK_SIZE": "800"
},
"transport": "stdio"
}
}
}
```
**Memory Efficient**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"PDF_PARSER": "pymupdf4llm",
"EMBEDDING_BATCH_SIZE": "50"
},
"transport": "stdio"
}
}
}
```
### Tier 2: Use Case Specific (15% of users)
**Academic Papers**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"PDF_PARSER": "mineru",
"CHUNK_SIZE": "1200"
},
"transport": "stdio"
}
}
}
```
**Business Documents**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"PDF_PARSER": "docling",
"DOCLING_TABLE_MODE": "ACCURATE",
"DOCLING_DO_TABLE_STRUCTURE": "true"
},
"transport": "stdio"
}
}
}
```
**Multi-language Documents**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"PDF_PARSER": "docling",
"DOCLING_OCR_LANGUAGES": "en,fr,de,es",
"DOCLING_DO_OCR": "true"
},
"transport": "stdio"
}
}
}
```
**Maximum Quality**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"OPENROUTER_API_KEY": "sk-or-v1-abc123def456ghi789...",
"PDF_PARSER": "llm",
"LLM_MODEL": "anthropic/claude-3.5-sonnet",
"EMBEDDING_MODEL": "text-embedding-3-large"
},
"transport": "stdio"
}
}
}
```
### Essential Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |
| `KNOWLEDGEBASE_PATH` | `./pdfs` | Directory containing PDF files |
| `CACHE_DIR` | `./.cache` | Cache directory for processing |
-| `PDF_PARSER` | `marker` | Parser: `marker`, `pymupdf4llm`, `mineru`, `docling`, `llm` |
+| `PDF_PARSER` | `pymupdf4llm` | Parser: `pymupdf4llm` (default), `marker`, `mineru`, `docling`, `llm` |
-| `CHUNK_SIZE` | `1000` | Target chunk size for LangChain chunker |
-| `EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI embedding model |
+| `PDF_CHUNKER` | `langchain` | Chunking strategy: `langchain` (default), `unstructured` |
+| `CHUNK_SIZE` | `1000` | Target chunk size for LangChain chunker |
+| `EMBEDDING_MODEL` | `text-embedding-3-small` | OpenAI embedding model (use `text-embedding-3-large` for higher recall) |
## π₯οΈ MCP Client Setup
### Claude Desktop
**Configuration File Location**:
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**: `~/.config/Claude/claude_desktop_config.json`
**Configuration**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"KNOWLEDGEBASE_PATH": "/Users/yourname/Documents/PDFs",
"CACHE_DIR": "/Users/yourname/Documents/PDFs/.cache"
},
"transport": "stdio",
"autoRestart": true
}
}
}
```
**Verification**:
1. Restart Claude Desktop completely
2. Look for PDF KB tools in the interface
3. Test with "Add a document" or "Search documents"
### VS Code with Native MCP Support
**Configuration** (`.vscode/mcp.json` in workspace):
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"KNOWLEDGEBASE_PATH": "${workspaceFolder}/pdfs"
},
"transport": "stdio"
}
}
}
```
**Verification**:
1. Reload VS Code window
2. Check VS Code's MCP server status in Command Palette
3. Use MCP tools in Copilot Chat
### VS Code with Continue Extension
**Configuration** (`.continue/config.json`):
```json
{
"models": [...],
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
"KNOWLEDGEBASE_PATH": "${workspaceFolder}/pdfs"
},
"transport": "stdio"
}
}
}
```
**Verification**:
1. Reload VS Code window
2. Check Continue panel for server connection
3. Use `@pdfkb` in Continue chat
### Generic MCP Client
**Standard Configuration Template**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "required",
"KNOWLEDGEBASE_PATH": "required-absolute-path",
"PDF_PARSER": "optional-default-marker"
},
"transport": "stdio",
"autoRestart": true,
"timeout": 30000
}
}
}
```
## π Performance & Troubleshooting
### Common Issues
**Server not appearing in MCP client**:
```json
// β Wrong: Missing transport
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"]
}
}
}
// β
Correct: Include transport and restart client
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"transport": "stdio"
}
}
}
```
**Processing too slow**:
```json
// Switch to faster parser
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-key",
"PDF_PARSER": "pymupdf4llm"
},
"transport": "stdio"
}
}
}
```
**Memory issues**:
```json
// Reduce memory usage
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-key",
"EMBEDDING_BATCH_SIZE": "25",
"CHUNK_SIZE": "500"
},
"transport": "stdio"
}
}
}
```
**Poor table extraction**:
```json
// Use table-optimized parser
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-key",
"PDF_PARSER": "docling",
"DOCLING_TABLE_MODE": "ACCURATE"
},
"transport": "stdio"
}
}
}
```
### Resource Requirements
| Configuration | RAM Usage | Processing Speed | Best For |
|---------------|-----------|------------------|----------|
| **Speed** | 2-4 GB | Fastest | Large collections |
| **Balanced** | 4-6 GB | Medium | Most users |
| **Quality** | 6-12 GB | Medium-Fast | Accuracy priority |
| **GPU** | 8-16 GB | Very Fast | High-volume processing |
## π§ Advanced Configuration
### Parser-Specific Options
**MinerU Configuration**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-key",
"PDF_PARSER": "mineru",
"MINERU_LANG": "en",
"MINERU_METHOD": "auto",
"MINERU_VRAM": "16"
},
"transport": "stdio"
}
}
}
```
**LLM Parser Configuration**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-key",
"OPENROUTER_API_KEY": "sk-or-v1-abc123def456ghi789...",
"PDF_PARSER": "llm",
"LLM_MODEL": "google/gemini-2.5-flash-lite",
"LLM_CONCURRENCY": "5",
"LLM_DPI": "150"
},
"transport": "stdio"
}
}
}
```
### Performance Tuning
**High-Performance Setup**:
```json
{
"mcpServers": {
"pdfkb": {
"command": "uvx",
"args": ["pdfkb-mcp"],
"env": {
"OPENAI_API_KEY": "sk-key",
"PDF_PARSER": "mineru",
"KNOWLEDGEBASE_PATH": "/Volumes/FastSSD/Documents/PDFs",
"CACHE_DIR": "/Volumes/FastSSD/Documents/PDFs/.cache",
"EMBEDDING_BATCH_SIZE": "200",
"VECTOR_SEARCH_K": "15",
"FILE_SCAN_INTERVAL": "30"
},
"transport": "stdio"
}
}
}
```
### Intelligent Caching
The server uses multi-stage caching:
- **Parsing Cache**: Stores converted markdown ([`src/pdfkb/intelligent_cache.py:139`](src/pdfkb/intelligent_cache.py:139))
- **Chunking Cache**: Stores processed chunks
- **Vector Cache**: ChromaDB embeddings storage
**Cache Invalidation Rules**:
- Changing `PDF_PARSER` β Full reset (parsing + chunking + embeddings)
- Changing `PDF_CHUNKER` β Partial reset (chunking + embeddings)
- Changing `EMBEDDING_MODEL` β Minimal reset (embeddings only)
## π Appendix
### Installation Options
**Primary (Recommended)**:
```bash
uvx pdfkb-mcp
```
**With Specific Parser Dependencies**:
```bash
uvx pdfkb-mcp[marker] # Marker parser
uvx pdfkb-mcp[mineru] # MinerU parser
uvx pdfkb-mcp[docling] # Docling parser
uvx pdfkb-mcp[llm] # LLM parser
-uvx pdfkb-mcp[langchain] # LangChain chunker
+uvx pdfkb-mcp[unstructured_chunker] # Unstructured chunker
```
Or via pip/pipx:
```bash
pip install "pdfkb-mcp[marker]" # Marker parser
pip install "pdfkb-mcp[docling-complete]" # Docling with OCR and full features
```
**Development Installation**:
```bash
git clone https://github.com/juanqui/pdfkb-mcp.git
cd pdfkb-mcp
pip install -e ".[dev]"
```
### Complete Environment Variables Reference
| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |
| `OPENROUTER_API_KEY` | *optional* | Required for LLM parser |
| `KNOWLEDGEBASE_PATH` | `./pdfs` | PDF directory path |
| `CACHE_DIR` | `./.cache` | Cache directory |
| `PDF_PARSER` | `pymupdf4llm` | PDF parser selection |
| `PDF_CHUNKER` | `unstructured` | Chunking strategy |
| `CHUNK_SIZE` | `1000` | LangChain chunk size |
| `CHUNK_OVERLAP` | `200` | LangChain chunk overlap |
| `EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI model |
| `EMBEDDING_BATCH_SIZE` | `100` | Embedding batch size |
| `VECTOR_SEARCH_K` | `5` | Default search results |
| `FILE_SCAN_INTERVAL` | `60` | File monitoring interval |
| `LOG_LEVEL` | `INFO` | Logging level |
### Parser Comparison Details
| Feature | PyMuPDF4LLM | Marker | MinerU | Docling | LLM |
|---------|-------------|--------|--------|---------|-----|
| **Speed** | Fastest | Medium | Fast (GPU) | Medium | Slowest |
| **Memory** | Lowest | Medium | High | Medium | Lowest |
| **Tables** | Basic | Good | Excellent | **Excellent** | Excellent |
| **Formulas** | Basic | Good | **Excellent** | Good | Excellent |
| **Images** | Basic | Good | Good | **Excellent** | **Excellent** |
| **Setup** | Simple | Simple | Moderate | Simple | Simple |
| **Cost** | Free | Free | Free | Free | API costs |
### Chunking Strategies
**LangChain** (`PDF_CHUNKER=langchain`):
- Header-aware splitting with [`MarkdownHeaderTextSplitter`](src/pdfkb/chunker/chunker_langchain.py)
- Configurable via `CHUNK_SIZE` and `CHUNK_OVERLAP`
- Best for customizable chunking
- Default and installed with base package
**Unstructured** (`PDF_CHUNKER=unstructured`):
- Intelligent semantic chunking with [`unstructured`](src/pdfkb/chunker/chunker_unstructured.py) library
- Zero configuration required
- Install extra: `pip install "pdfkb-mcp[unstructured_chunker]"` to enable
- Best for document structure awareness
### First-run notes
- On the first run, the server initializes caches and vector store and logs selected components:
- Parser: PyMuPDF4LLM (default)
- Chunker: LangChain (default)
- Embedding Model: text-embedding-3-small (default)
- If you select a parser/chunker that isnβt installed, the server logs a warning with the exact install command and falls back to the default components instead of exiting.
### Troubleshooting Guide
**API Key Issues**:
1. Verify key format starts with `sk-`
2. Check account has sufficient credits
3. Test connectivity: `curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models`
**Parser Installation Issues**:
1. MinerU: `pip install mineru[all]` and verify `mineru --version`
2. Docling: `pip install docling` for basic, `pip install pdfkb-mcp[docling-complete]` for all features
3. LLM: Requires `OPENROUTER_API_KEY` environment variable
**Performance Optimization**:
1. **Speed**: Use `pymupdf4llm` parser
2. **Memory**: Reduce `EMBEDDING_BATCH_SIZE` and `CHUNK_SIZE`
3. **Quality**: Use `mineru` (GPU) or `docling` (CPU)
4. **Tables**: Use `docling` with `DOCLING_TABLE_MODE=ACCURATE`
For additional support, see implementation details in [`src/pdfkb/main.py`](src/pdfkb/main.py) and [`src/pdfkb/config.py`](src/pdfkb/config.py).
Raw data
{
"_id": null,
"home_page": null,
"name": "pdfkb-mcp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "ai, chroma, embeddings, knowledge-base, mcp, openai, pdf, vector-search",
"author": "PDF Knowledgebase MCP Team",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/a2/49/89d8c244fb447ff1da3013b261f8a11f12a0c0c62851fdb7a633d2feaebe/pdfkb_mcp-0.1.2.tar.gz",
"platform": null,
"description": "# PDF Knowledgebase MCP Server\n\nA Model Context Protocol (MCP) server that enables intelligent document search and retrieval from PDF collections. Built for seamless integration with Claude Desktop, Continue, Cline, and other MCP clients, this server provides semantic search capabilities powered by OpenAI embeddings and ChromaDB vector storage.\n\n## Table of Contents\n\n- [\ud83d\ude80 Quick Start](#-quick-start)\n- [\ud83c\udfd7\ufe0f Architecture Overview](#\ufe0f-architecture-overview)\n- [\ud83c\udfaf Parser Selection Guide](#-parser-selection-guide)\n- [\u2699\ufe0f Configuration](#\ufe0f-configuration)\n- [\ud83d\udda5\ufe0f MCP Client Setup](#\ufe0f-mcp-client-setup)\n- [\ud83d\udcca Performance & Troubleshooting](#-performance--troubleshooting)\n- [\ud83d\udd27 Advanced Configuration](#-advanced-configuration)\n- [\ud83d\udcda Appendix](#-appendix)\n\n## \ud83d\ude80 Quick Start\n\n### Step 1: Install the Server\n\n```bash\nuvx pdfkb-mcp\n```\n\n### Step 2: Configure Your MCP Client\n\n**Claude Desktop** (Most Common):\n\n*Configuration file locations:*\n- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`\n- **Windows**: `%APPDATA%\\Claude\\claude_desktop_config.json`\n- **Linux**: `~/.config/Claude/claude_desktop_config.json`\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents/PDFs\"\n },\n \"transport\": \"stdio\",\n \"autoRestart\": true\n }\n }\n}\n```\n\n**VS Code (Native MCP)** - Create `.vscode/mcp.json` in workspace:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"KNOWLEDGEBASE_PATH\": \"${workspaceFolder}/pdfs\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n### Step 3: Verify Installation\n\n1. **Restart your MCP client** completely\n2. **Check for PDF KB tools**: Look for `add_document`, `search_documents`, `list_documents`, `remove_document`\n3. **Test functionality**: Try adding a PDF and searching for content\n\n## \ud83c\udfd7\ufe0f Architecture Overview\n\n### MCP Integration\n\n```raw\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 MCP Client \u2502 \u2502 MCP Client \u2502 \u2502 MCP Client \u2502\n\u2502 (Claude Desktop)\u2502 \u2502(VS Code/Continue)| \u2502 (Other) \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 \u2502 \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 Model Context \u2502\n \u2502 Protocol (MCP) \u2502\n \u2502 Standard Layer \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 \u2502 \u2502\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 PDF KB Server \u2502 \u2502 Other MCP \u2502 \u2502 Other MCP \u2502\n\u2502 (This Server) \u2502 \u2502 Server \u2502 \u2502 Server \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### Available Tools & Resources\n\n**Tools** (Actions your client can perform):\n- [`add_document(path, metadata?)`](src/pdfkb/main.py:278) - Add PDF to knowledgebase\n- [`search_documents(query, limit=5, metadata_filter?)`](src/pdfkb/main.py:345) - Semantic search across PDFs\n- [`list_documents(metadata_filter?)`](src/pdfkb/main.py:422) - List all documents with metadata\n- [`remove_document(document_id)`](src/pdfkb/main.py:488) - Remove document from knowledgebase\n\n**Resources** (Data your client can access):\n- `pdf://{document_id}` - Full document content as JSON\n- `pdf://{document_id}/page/{page_number}` - Specific page content\n- `pdf://list` - List of all documents with metadata\n\n## \ud83c\udfaf Parser Selection Guide\n\n### Decision Tree\n\n```\nDocument Type & Priority?\n\u251c\u2500\u2500 \ud83c\udfc3 Speed Priority \u2192 PyMuPDF4LLM (fastest processing, low memory)\n\u251c\u2500\u2500 \ud83d\udcda Academic Papers \u2192 MinerU (fast with GPU, excellent formulas)\n\u251c\u2500\u2500 \ud83d\udcca Business Reports \u2192 Docling (medium speed, best tables)\n\u251c\u2500\u2500 \u2696\ufe0f Balanced Quality \u2192 Marker (medium speed, good structure)\n\u2514\u2500\u2500 \ud83c\udfaf Maximum Accuracy \u2192 LLM (slow, vision-based API calls)\n```\n\n### Performance Comparison\n\n| Parser | Processing Speed | Memory | Text Quality | Table Quality | Best For |\n|--------|------------------|--------|--------------|---------------|----------|\n| **PyMuPDF4LLM** | **Fastest** | Low | Good | Basic | Speed priority |\n| **MinerU** | Fast (with GPU) | High | Excellent | Excellent | Scientific papers |\n| **Docling** | Medium | Medium | Excellent | **Excellent** | Business documents |\n| **Marker** | Medium | Medium | Excellent | Good | **Balanced** |\n| **LLM** | Slow | Low | Excellent | Excellent | Maximum accuracy |\n\n*Benchmarks from research studies and technical reports*\n\n## \u2699\ufe0f Configuration\n\n### Tier 1: Basic Configurations (80% of users)\n\n**Default (Recommended)**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"PDF_PARSER\": \"pymupdf4llm\",\n \"PDF_CHUNKER\": \"langchain\",\n \"EMBEDDING_MODEL\": \"text-embedding-3-small\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Speed Optimized**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"PDF_PARSER\": \"pymupdf4llm\",\n \"CHUNK_SIZE\": \"800\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Memory Efficient**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"PDF_PARSER\": \"pymupdf4llm\",\n \"EMBEDDING_BATCH_SIZE\": \"50\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n### Tier 2: Use Case Specific (15% of users)\n\n**Academic Papers**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"PDF_PARSER\": \"mineru\",\n \"CHUNK_SIZE\": \"1200\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Business Documents**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"PDF_PARSER\": \"docling\",\n \"DOCLING_TABLE_MODE\": \"ACCURATE\",\n \"DOCLING_DO_TABLE_STRUCTURE\": \"true\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Multi-language Documents**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"PDF_PARSER\": \"docling\",\n \"DOCLING_OCR_LANGUAGES\": \"en,fr,de,es\",\n \"DOCLING_DO_OCR\": \"true\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Maximum Quality**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"OPENROUTER_API_KEY\": \"sk-or-v1-abc123def456ghi789...\",\n \"PDF_PARSER\": \"llm\",\n \"LLM_MODEL\": \"anthropic/claude-3.5-sonnet\",\n \"EMBEDDING_MODEL\": \"text-embedding-3-large\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n### Essential Environment Variables\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |\n| `KNOWLEDGEBASE_PATH` | `./pdfs` | Directory containing PDF files |\n| `CACHE_DIR` | `./.cache` | Cache directory for processing |\n-| `PDF_PARSER` | `marker` | Parser: `marker`, `pymupdf4llm`, `mineru`, `docling`, `llm` |\n+| `PDF_PARSER` | `pymupdf4llm` | Parser: `pymupdf4llm` (default), `marker`, `mineru`, `docling`, `llm` |\n-| `CHUNK_SIZE` | `1000` | Target chunk size for LangChain chunker |\n-| `EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI embedding model |\n+| `PDF_CHUNKER` | `langchain` | Chunking strategy: `langchain` (default), `unstructured` |\n+| `CHUNK_SIZE` | `1000` | Target chunk size for LangChain chunker |\n+| `EMBEDDING_MODEL` | `text-embedding-3-small` | OpenAI embedding model (use `text-embedding-3-large` for higher recall) |\n\n## \ud83d\udda5\ufe0f MCP Client Setup\n\n### Claude Desktop\n\n**Configuration File Location**:\n- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`\n- **Windows**: `%APPDATA%\\Claude\\claude_desktop_config.json`\n- **Linux**: `~/.config/Claude/claude_desktop_config.json`\n\n**Configuration**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents/PDFs\",\n \"CACHE_DIR\": \"/Users/yourname/Documents/PDFs/.cache\"\n },\n \"transport\": \"stdio\",\n \"autoRestart\": true\n }\n }\n}\n```\n\n**Verification**:\n1. Restart Claude Desktop completely\n2. Look for PDF KB tools in the interface\n3. Test with \"Add a document\" or \"Search documents\"\n\n### VS Code with Native MCP Support\n\n**Configuration** (`.vscode/mcp.json` in workspace):\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"KNOWLEDGEBASE_PATH\": \"${workspaceFolder}/pdfs\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Verification**:\n1. Reload VS Code window\n2. Check VS Code's MCP server status in Command Palette\n3. Use MCP tools in Copilot Chat\n\n### VS Code with Continue Extension\n\n**Configuration** (`.continue/config.json`):\n```json\n{\n \"models\": [...],\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n \"KNOWLEDGEBASE_PATH\": \"${workspaceFolder}/pdfs\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Verification**:\n1. Reload VS Code window\n2. Check Continue panel for server connection\n3. Use `@pdfkb` in Continue chat\n\n### Generic MCP Client\n\n**Standard Configuration Template**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"required\",\n \"KNOWLEDGEBASE_PATH\": \"required-absolute-path\",\n \"PDF_PARSER\": \"optional-default-marker\"\n },\n \"transport\": \"stdio\",\n \"autoRestart\": true,\n \"timeout\": 30000\n }\n }\n}\n```\n\n## \ud83d\udcca Performance & Troubleshooting\n\n### Common Issues\n\n**Server not appearing in MCP client**:\n```json\n// \u274c Wrong: Missing transport\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"]\n }\n }\n}\n\n// \u2705 Correct: Include transport and restart client\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Processing too slow**:\n```json\n// Switch to faster parser\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-key\",\n \"PDF_PARSER\": \"pymupdf4llm\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Memory issues**:\n```json\n// Reduce memory usage\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-key\",\n \"EMBEDDING_BATCH_SIZE\": \"25\",\n \"CHUNK_SIZE\": \"500\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**Poor table extraction**:\n```json\n// Use table-optimized parser\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-key\",\n \"PDF_PARSER\": \"docling\",\n \"DOCLING_TABLE_MODE\": \"ACCURATE\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n### Resource Requirements\n\n| Configuration | RAM Usage | Processing Speed | Best For |\n|---------------|-----------|------------------|----------|\n| **Speed** | 2-4 GB | Fastest | Large collections |\n| **Balanced** | 4-6 GB | Medium | Most users |\n| **Quality** | 6-12 GB | Medium-Fast | Accuracy priority |\n| **GPU** | 8-16 GB | Very Fast | High-volume processing |\n\n## \ud83d\udd27 Advanced Configuration\n\n### Parser-Specific Options\n\n**MinerU Configuration**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-key\",\n \"PDF_PARSER\": \"mineru\",\n \"MINERU_LANG\": \"en\",\n \"MINERU_METHOD\": \"auto\",\n \"MINERU_VRAM\": \"16\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n**LLM Parser Configuration**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-key\",\n \"OPENROUTER_API_KEY\": \"sk-or-v1-abc123def456ghi789...\",\n \"PDF_PARSER\": \"llm\",\n \"LLM_MODEL\": \"google/gemini-2.5-flash-lite\",\n \"LLM_CONCURRENCY\": \"5\",\n \"LLM_DPI\": \"150\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n### Performance Tuning\n\n**High-Performance Setup**:\n```json\n{\n \"mcpServers\": {\n \"pdfkb\": {\n \"command\": \"uvx\",\n \"args\": [\"pdfkb-mcp\"],\n \"env\": {\n \"OPENAI_API_KEY\": \"sk-key\",\n \"PDF_PARSER\": \"mineru\",\n \"KNOWLEDGEBASE_PATH\": \"/Volumes/FastSSD/Documents/PDFs\",\n \"CACHE_DIR\": \"/Volumes/FastSSD/Documents/PDFs/.cache\",\n \"EMBEDDING_BATCH_SIZE\": \"200\",\n \"VECTOR_SEARCH_K\": \"15\",\n \"FILE_SCAN_INTERVAL\": \"30\"\n },\n \"transport\": \"stdio\"\n }\n }\n}\n```\n\n### Intelligent Caching\n\nThe server uses multi-stage caching:\n- **Parsing Cache**: Stores converted markdown ([`src/pdfkb/intelligent_cache.py:139`](src/pdfkb/intelligent_cache.py:139))\n- **Chunking Cache**: Stores processed chunks\n- **Vector Cache**: ChromaDB embeddings storage\n\n**Cache Invalidation Rules**:\n- Changing `PDF_PARSER` \u2192 Full reset (parsing + chunking + embeddings)\n- Changing `PDF_CHUNKER` \u2192 Partial reset (chunking + embeddings)\n- Changing `EMBEDDING_MODEL` \u2192 Minimal reset (embeddings only)\n\n## \ud83d\udcda Appendix\n\n### Installation Options\n\n**Primary (Recommended)**:\n```bash\nuvx pdfkb-mcp\n```\n\n**With Specific Parser Dependencies**:\n```bash\nuvx pdfkb-mcp[marker] # Marker parser\nuvx pdfkb-mcp[mineru] # MinerU parser\nuvx pdfkb-mcp[docling] # Docling parser\nuvx pdfkb-mcp[llm] # LLM parser\n-uvx pdfkb-mcp[langchain] # LangChain chunker\n+uvx pdfkb-mcp[unstructured_chunker] # Unstructured chunker\n```\n\nOr via pip/pipx:\n```bash\npip install \"pdfkb-mcp[marker]\" # Marker parser\npip install \"pdfkb-mcp[docling-complete]\" # Docling with OCR and full features\n```\n\n**Development Installation**:\n```bash\ngit clone https://github.com/juanqui/pdfkb-mcp.git\ncd pdfkb-mcp\npip install -e \".[dev]\"\n```\n\n### Complete Environment Variables Reference\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |\n| `OPENROUTER_API_KEY` | *optional* | Required for LLM parser |\n| `KNOWLEDGEBASE_PATH` | `./pdfs` | PDF directory path |\n| `CACHE_DIR` | `./.cache` | Cache directory |\n| `PDF_PARSER` | `pymupdf4llm` | PDF parser selection |\n| `PDF_CHUNKER` | `unstructured` | Chunking strategy |\n| `CHUNK_SIZE` | `1000` | LangChain chunk size |\n| `CHUNK_OVERLAP` | `200` | LangChain chunk overlap |\n| `EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI model |\n| `EMBEDDING_BATCH_SIZE` | `100` | Embedding batch size |\n| `VECTOR_SEARCH_K` | `5` | Default search results |\n| `FILE_SCAN_INTERVAL` | `60` | File monitoring interval |\n| `LOG_LEVEL` | `INFO` | Logging level |\n\n### Parser Comparison Details\n\n| Feature | PyMuPDF4LLM | Marker | MinerU | Docling | LLM |\n|---------|-------------|--------|--------|---------|-----|\n| **Speed** | Fastest | Medium | Fast (GPU) | Medium | Slowest |\n| **Memory** | Lowest | Medium | High | Medium | Lowest |\n| **Tables** | Basic | Good | Excellent | **Excellent** | Excellent |\n| **Formulas** | Basic | Good | **Excellent** | Good | Excellent |\n| **Images** | Basic | Good | Good | **Excellent** | **Excellent** |\n| **Setup** | Simple | Simple | Moderate | Simple | Simple |\n| **Cost** | Free | Free | Free | Free | API costs |\n\n### Chunking Strategies\n\n**LangChain** (`PDF_CHUNKER=langchain`):\n- Header-aware splitting with [`MarkdownHeaderTextSplitter`](src/pdfkb/chunker/chunker_langchain.py)\n- Configurable via `CHUNK_SIZE` and `CHUNK_OVERLAP`\n- Best for customizable chunking\n- Default and installed with base package\n\n**Unstructured** (`PDF_CHUNKER=unstructured`):\n- Intelligent semantic chunking with [`unstructured`](src/pdfkb/chunker/chunker_unstructured.py) library\n- Zero configuration required\n- Install extra: `pip install \"pdfkb-mcp[unstructured_chunker]\"` to enable\n- Best for document structure awareness\n\n### First-run notes\n\n- On the first run, the server initializes caches and vector store and logs selected components:\n - Parser: PyMuPDF4LLM (default)\n - Chunker: LangChain (default)\n - Embedding Model: text-embedding-3-small (default)\n- If you select a parser/chunker that isn\u2019t installed, the server logs a warning with the exact install command and falls back to the default components instead of exiting.\n\n### Troubleshooting Guide\n\n**API Key Issues**:\n1. Verify key format starts with `sk-`\n2. Check account has sufficient credits\n3. Test connectivity: `curl -H \"Authorization: Bearer $OPENAI_API_KEY\" https://api.openai.com/v1/models`\n\n**Parser Installation Issues**:\n1. MinerU: `pip install mineru[all]` and verify `mineru --version`\n2. Docling: `pip install docling` for basic, `pip install pdfkb-mcp[docling-complete]` for all features\n3. LLM: Requires `OPENROUTER_API_KEY` environment variable\n\n**Performance Optimization**:\n1. **Speed**: Use `pymupdf4llm` parser\n2. **Memory**: Reduce `EMBEDDING_BATCH_SIZE` and `CHUNK_SIZE`\n3. **Quality**: Use `mineru` (GPU) or `docling` (CPU)\n4. **Tables**: Use `docling` with `DOCLING_TABLE_MODE=ACCURATE`\n\nFor additional support, see implementation details in [`src/pdfkb/main.py`](src/pdfkb/main.py) and [`src/pdfkb/config.py`](src/pdfkb/config.py).\n",
"bugtrack_url": null,
"license": null,
"summary": "A Model Context Protocol server for managing PDF documents with vector search capabilities",
"version": "0.1.2",
"project_urls": {
"Documentation": "https://github.com/your-org/pdfkb-mcp#readme",
"Homepage": "https://github.com/your-org/pdfkb-mcp",
"Issues": "https://github.com/your-org/pdfkb-mcp/issues",
"Repository": "https://github.com/your-org/pdfkb-mcp"
},
"split_keywords": [
"ai",
" chroma",
" embeddings",
" knowledge-base",
" mcp",
" openai",
" pdf",
" vector-search"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f9a05d606c3d7208219c8b489617f3ea3839b6dd09ab7e367c50b8ab51934fba",
"md5": "69046b646c538eb2c62e501c1b2e7046",
"sha256": "fc10a5ba20461c8027a2c07d97c883f9b4cb3b5a607516f81f2a09a061f23058"
},
"downloads": -1,
"filename": "pdfkb_mcp-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "69046b646c538eb2c62e501c1b2e7046",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 75093,
"upload_time": "2025-08-07T23:46:57",
"upload_time_iso_8601": "2025-08-07T23:46:57.972596Z",
"url": "https://files.pythonhosted.org/packages/f9/a0/5d606c3d7208219c8b489617f3ea3839b6dd09ab7e367c50b8ab51934fba/pdfkb_mcp-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a24989d8c244fb447ff1da3013b261f8a11f12a0c0c62851fdb7a633d2feaebe",
"md5": "635492fe559f8552c581690fa3ace46d",
"sha256": "c630cd3c064f504feefcb4f3c169757cfe0abccdbfa64342a0c7ebed8a204903"
},
"downloads": -1,
"filename": "pdfkb_mcp-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "635492fe559f8552c581690fa3ace46d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 99810,
"upload_time": "2025-08-07T23:46:59",
"upload_time_iso_8601": "2025-08-07T23:46:59.219005Z",
"url": "https://files.pythonhosted.org/packages/a2/49/89d8c244fb447ff1da3013b261f8a11f12a0c0c62851fdb7a633d2feaebe/pdfkb_mcp-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-07 23:46:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "your-org",
"github_project": "pdfkb-mcp#readme",
"github_not_found": true,
"lcname": "pdfkb-mcp"
}