pdfkb-mcp

Name	pdfkb-mcp JSON
Version	0.7.0 JSON
	download
home_page	None
Summary	A Model Context Protocol server for managing PDF documents with vector search capabilities
upload_time	2025-09-13 02:20:41
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	None
keywords	ai chroma embeddings knowledge-base mcp openai pdf vector-search
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # PDF Knowledgebase MCP Server

A Model Context Protocol (MCP) server that enables intelligent document search and retrieval from PDF collections. Built for seamless integration with Claude Desktop, Continue, Cline, and other MCP clients, this server provides advanced search capabilities powered by local, OpenAI, or HuggingFace embeddings and ChromaDB vector storage.

## ✨ Major New Features

### 🤖 Document Summarization (NEW!)
**Automatically generate rich document metadata using AI**
- **Local LLM Support**: Use Qwen3, Phi-3, and other local models - no API costs, full privacy
- **Remote LLM Support**: OpenAI-compatible APIs for cloud-based summarization
- **Rich Metadata**: Auto-generates titles, short descriptions, and detailed summaries
- **Smart Content Processing**: Configurable page limits and intelligent content truncation
- **Fallback Handling**: Graceful degradation when summarization fails

### 🌐 Remote Access & Multi-Client Support (NEW!)
**Access your document repository from anywhere**
- **SSE Transport Mode**: Server-Sent Events for real-time remote access
- **HTTP Transport Mode**: RESTful API access for modern MCP clients
- **Multi-Client Architecture**: Share document processing across multiple clients
- **Integrated Web + MCP**: Run both web interface and MCP server concurrently
- **Flexible Deployment**: Local, remote, or hybrid deployment modes

### 🎯 Advanced Search & Intelligence
**Best-in-class document retrieval capabilities**
- **Hybrid Search**: Combines semantic similarity with keyword matching (BM25)
- **Reranking Support**: Qwen3-Reranker models for improved search relevance
- **GGUF Quantized Models**: 50-70% smaller models with maintained quality
- **Local Embeddings**: Full privacy with HuggingFace models - no API costs
- **Custom Endpoints**: Support for OpenAI-compatible APIs and custom providers
- **Semantic Chunking**: Content-aware chunking for better context preservation

### 🔄 Enterprise-Ready Operations
**Production-ready document processing**
- **Non-blocking Operations**: Background processing with graceful startup
- **Intelligent Caching**: Multi-stage caching with selective invalidation
- **Enhanced Monitoring**: Better logging, error handling, and resource management
- **Graceful Shutdown**: Configurable timeouts and proper cleanup
- **Performance Optimized**: Improved memory usage and concurrent processing
## Table of Contents

- [🚀 Quick Start](#-quick-start)
- [🌐 Web Interface](#-web-interface)
- [🏗️ Architecture Overview](#️-architecture-overview)
- [📝 Document Summarization](#-document-summarization)
- [🤖 Local Embeddings](#-local-embeddings)
- [🔄 Reranking](#-reranking)
- [🔍 Hybrid Search](#-hybrid-search)
- [🔽 Minimum Chunk Filtering](#-minimum-chunk-filtering)
- [🧩 Semantic Chunking](#-semantic-chunking)
- [🎯 Parser Selection Guide](#-parser-selection-guide)
- [⚙️ Configuration](#️-configuration)
- [🖥️ MCP Client Setup](#️-mcp-client-setup)
- [📊 Performance & Troubleshooting](#-performance--troubleshooting)
- [🔧 Advanced Configuration](#-advanced-configuration)
- [📚 Appendix](#-appendix)

## 🚀 Quick Start

### Step 1: Configure Your MCP Client

**🆕 Option A: Complete Local Setup with Document Summarization (No API Key Required)**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true",
        "PDFKB_ENABLE_SUMMARIZER": "true",
        "PDFKB_SUMMARIZER_PROVIDER": "local",
        "PDFKB_SUMMARIZER_MODEL": "Qwen/Qwen3-4B-Instruct-2507-FP8"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

**Option B: Local Embeddings w/ Hybrid Search (No API Key Required)**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

**🆕 Option B: Remote/SSE Mode (Accessible from Multiple Clients)**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true"
      },
      "transport": "sse",
      "autoRestart": true
    }
  }
}
```

**🆕 Option A2: Local GGUF Embeddings (Memory Optimized, No API Key Required)**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_LOCAL_EMBEDDING_MODEL": "Qwen/Qwen3-Embedding-0.6B-GGUF",
        "PDFKB_GGUF_QUANTIZATION": "Q6_K",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

**🆕 Option A3: Local Embeddings with Reranking (Best Search Quality, No API Key Required)**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true",
        "PDFKB_ENABLE_RERANKER": "true",
        "PDFKB_RERANKER_MODEL": "Qwen/Qwen3-Reranker-0.6B"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

**Option B: OpenAI Embeddings w/ Hybrid Search**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_EMBEDDING_PROVIDER": "openai",
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

**🆕 Option C: HuggingFace w/ Custom Provider**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_EMBEDDING_PROVIDER": "huggingface",
        "PDFKB_HUGGINGFACE_EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2",
        "PDFKB_HUGGINGFACE_PROVIDER": "nebius",
        "HF_TOKEN": "hf_your_token_here",
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

**🆕 Option D: Custom OpenAI-Compatible API**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_EMBEDDING_PROVIDER": "openai",
        "PDFKB_OPENAI_API_KEY": "your-api-key",
        "PDFKB_OPENAI_API_BASE": "https://api.studio.nebius.com/v1/",
        "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large",
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true"
      },
      "transport": "stdio",
      "autoRestart": true
}
```

### Step 3: Verify Installation

1. **Restart your MCP client** completely
2. **Check for PDF KB tools**: Look for `add_document`, `search_documents`, `list_documents`, `remove_document`
3. **Test functionality**: Try adding a PDF and searching for content

## 🌐 Web Interface

The PDF Knowledgebase includes a modern web interface for easy document management and search. **The web interface is disabled by default and must be explicitly enabled.**


### Server Modes

**1. MCP Only Mode - Stdio Transport** (Default):
```bash
pdfkb-mcp
```
- Runs only the MCP server for integration with Claude Desktop, VS Code, etc.
- Most resource-efficient option
- Best for pure MCP integration

**2. MCP Only Mode - SSE/Remote Transport**:
```bash
# Option A: Environment variable
PDFKB_TRANSPORT=sse pdfkb-mcp

# Option B: Command line flags
pdfkb-mcp --transport sse --sse-port 8000 --sse-host localhost
```
- Runs MCP server in SSE mode for remote access from multiple clients
- MCP server available at http://localhost:8000 (or configured host/port)
- Best for centralized document processing accessible from multiple clients

**3. Integrated Mode** (MCP + Web):
```bash
# Option A: Environment variable
PDFKB_WEB_ENABLE=true pdfkb-mcp

# Option B: Command line flag
pdfkb-mcp --enable-web
```
- Runs both MCP server AND web interface concurrently
- Web interface available at http://localhost:8080
- MCP server runs in stdio mode by default (can be configured to SSE)
- Best of both worlds: API integration + web UI
### Web Interface Features

![PDF Knowledgebase Web Interface - Documents List](docs/images/web_documents_list.png)
*Modern web interface showing document collection with search, filtering, and management capabilities*

- **📄 Document Upload**: Drag & drop PDF files or upload via file picker
- **🔍 Semantic Search**: Powerful vector-based search with real-time results
- **📊 Document Management**: List, preview, and manage your PDF collection
- **📈 Real-time Status**: Live processing updates via WebSocket connections
- **🎯 Chunk Explorer**: View and navigate document chunks for detailed analysis
- **⚙️ System Metrics**: Monitor server performance and resource usage

![PDF Knowledgebase Web Interface - Document Summary](docs/images/web_document_summary.png)
*Detailed document view showing metadata, chunk analysis, and content preview*

### Quick Web Setup

1. **Install and run**:
   ```bash
   uvx pdfkb-mcp                    # Install if needed
   PDFKB_WEB_ENABLE=true pdfkb-mcp  # Start integrated server
   ```

2. **Open your browser**: http://localhost:8080

3. **Configure environment** (create `.env` file):
   ```bash
   PDFKB_OPENAI_API_KEY=sk-proj-abc123def456ghi789...
   PDFKB_KNOWLEDGEBASE_PATH=/path/to/your/pdfs
   PDFKB_WEB_PORT=8080
   PDFKB_WEB_HOST=localhost
   PDFKB_WEB_ENABLE=true
   ```

### Web Configuration Options

| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `PDFKB_WEB_ENABLE` | `false` | Enable/disable web interface |
| `PDFKB_WEB_PORT` | `8080` | Web server port |
| `PDFKB_WEB_HOST` | `localhost` | Web server host |
| `PDFKB_WEB_CORS_ORIGINS` | `http://localhost:3000,http://127.0.0.1:3000` | CORS allowed origins |

### Command Line Options

The server supports command line arguments:

```bash
# Customize web server port with web interface enabled
pdfkb-mcp --enable-web --port 9000

# Use custom configuration file
pdfkb-mcp --config myconfig.env

# Change log level
pdfkb-mcp --log-level DEBUG

# Enable web interface via command line
pdfkb-mcp --enable-web
```

### API Documentation

When running with web interface enabled, comprehensive API documentation is available at:
- **Swagger UI**: http://localhost:8080/docs
- **ReDoc**: http://localhost:8080/redoc

## 🏗️ Architecture Overview

### MCP Integration

```mermaid
graph TB
    subgraph "MCP Clients"
        C1[Claude Desktop]
        C2[VS Code/Continue]
        C3[Other MCP Clients]
    end

    subgraph "MCP Protocol Layer"
        MCP[Model Context Protocol<br/>Standard Layer]
    end

    subgraph "MCP Servers"
        PDFKB[PDF KB Server<br/>This Server]
        S1[Other MCP<br/>Server]
        S2[Other MCP<br/>Server]
    end

    C1 --> MCP
    C2 --> MCP
    C3 --> MCP

    MCP --> PDFKB
    MCP --> S1
    MCP --> S2

    classDef client fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef protocol fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef server fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef highlight fill:#c8e6c9,stroke:#1b5e20,stroke-width:3px

    class C1,C2,C3 client
    class MCP protocol
    class S1,S2 server
    class PDFKB highlight
```

### Internal Architecture

```mermaid
graph LR
    subgraph "Input Layer"
        PDF[PDF Files]
        WEB[Web Interface<br/>Port 8080]
        MCP_IN[MCP Protocol]
    end

    subgraph "Processing Pipeline"
        PARSER[PDF Parser<br/>PyMuPDF/Marker/MinerU]
        CHUNKER[Text Chunker<br/>LangChain/Unstructured]
        EMBED[Embedding Service<br/>Local/OpenAI]
    end

    subgraph "Storage Layer"
        CACHE[Intelligent Cache<br/>Multi-stage]
        VECTOR[Vector Store<br/>ChromaDB]
        TEXT[Text Index<br/>Whoosh BM25]
    end

    subgraph "Search Engine"
        HYBRID[Hybrid Search<br/>RRF Fusion]
    end

    PDF --> PARSER
    WEB --> PARSER
    MCP_IN --> PARSER

    PARSER --> CHUNKER
    CHUNKER --> EMBED

    EMBED --> CACHE
    CACHE --> VECTOR
    CACHE --> TEXT

    VECTOR --> HYBRID
    TEXT --> HYBRID

    HYBRID --> WEB
    HYBRID --> MCP_IN

    classDef input fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    classDef process fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    classDef storage fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    classDef search fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

    class PDF,WEB,MCP_IN input
    class PARSER,CHUNKER,EMBED process
    class CACHE,VECTOR,TEXT storage
    class HYBRID search
```

### Available Tools & Resources

**Tools** (Actions your client can perform):
- [`add_document(path, metadata?)`](src/pdfkb/main.py:278) - Add PDF to knowledgebase
- [`search_documents(query, limit=5, metadata_filter?, search_type?)`](src/pdfkb/main.py:345) - Hybrid search across PDFs (semantic + keyword matching)
- [`list_documents(metadata_filter?)`](src/pdfkb/main.py:422) - List all documents with metadata
- [`remove_document(document_id)`](src/pdfkb/main.py:488) - Remove document from knowledgebase

**Resources** (Data your client can access):
- `pdf://{document_id}` - Full document content as JSON
- `pdf://{document_id}/page/{page_number}` - Specific page content
- `pdf://list` - List of all documents with metadata

## 🤖 Embedding Options

The server supports three embedding providers, each with different trade-offs:

### 1. Local Embeddings (Default)

Run embeddings locally using HuggingFace models, eliminating API costs and keeping your data completely private.

**Features:**
- **Zero API Costs**: No external API charges
- **Complete Privacy**: Documents never leave your machine
- **Hardware Acceleration**: Automatic detection of Metal (macOS), CUDA (NVIDIA), or CPU
- **Smart Caching**: LRU cache for frequently embedded texts
- **Multiple Model Sizes**: Choose based on your hardware capabilities

Local embeddings are **enabled by default**. No configuration needed for basic usage:

```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs"
      }
    }
  }
}
```

### Supported Models

**🆕 Qwen3-Embedding Series Only**: The server now exclusively supports the Qwen3-Embedding model family, including both standard and quantized GGUF variants for optimized performance.

#### Standard Models

| Model | Size | Dimensions | Max Context | Best For |
|-------|------|------------|-------------|----------|
| **Qwen/Qwen3-Embedding-0.6B** (default) | 1.2GB | 1024 | 32K tokens | Best overall - long docs, fast |
| **Qwen/Qwen3-Embedding-4B** | 8.0GB | 2560 | 32K tokens | High quality, long context |
| **Qwen/Qwen3-Embedding-8B** | 16.0GB | 3584 | 32K tokens | Maximum quality, long context |

#### 🆕 GGUF Quantized Models (Reduced Memory Usage)

| Model | Size | Dimensions | Max Context | Best For |
|-------|------|------------|-------------|----------|
| **Qwen/Qwen3-Embedding-0.6B-GGUF** | 0.6GB | 1024 | 32K tokens | Quantized lightweight, 32K context |
| **Qwen/Qwen3-Embedding-4B-GGUF** | 2.4GB | 2560 | 32K tokens | Quantized high quality, 32K context |
| **Qwen/Qwen3-Embedding-8B-GGUF** | 4.8GB | 3584 | 32K tokens | Quantized maximum quality, 32K context |

Configure your preferred model:
```bash
# Standard models
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B"  # Default
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-4B"
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-8B"

# GGUF quantized models (reduced memory usage)
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B-GGUF"
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-4B-GGUF"
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-8B-GGUF"
```

#### 🆕 GGUF Quantization Options

When using GGUF models, you can configure the quantization level to balance between model size and quality:

```bash
# Configure quantization (default: Q6_K)
PDFKB_GGUF_QUANTIZATION="Q6_K"    # Default - balanced size/quality
PDFKB_GGUF_QUANTIZATION="Q8_0"    # Higher quality, larger size
PDFKB_GGUF_QUANTIZATION="F16"     # Highest quality, largest size
PDFKB_GGUF_QUANTIZATION="Q4_K_M"  # Smaller size, lower quality
```

**Quantization Recommendations:**
- **Q6_K** (default): Best balance of quality and size
- **Q8_0**: Near-original quality with moderate compression
- **F16**: Original quality, minimal compression
- **Q4_K_M**: Maximum compression, acceptable quality loss

### Hardware Optimization

The server automatically detects and uses the best available hardware:

- **Apple Silicon (M1/M2/M3)**: Uses Metal Performance Shaders (MPS)
- **NVIDIA GPUs**: Uses CUDA acceleration
- **CPU Fallback**: Optimized for multi-core processing

Force a specific device if needed:
```bash
PDFKB_EMBEDDING_DEVICE="mps"   # Force Metal/MPS
PDFKB_EMBEDDING_DEVICE="cuda"  # Force CUDA
PDFKB_EMBEDDING_DEVICE="cpu"   # Force CPU
```

### Configuration Options

```bash
# Embedding provider (local or openai)
PDFKB_EMBEDDING_PROVIDER="local"  # Default

# Model selection (Qwen3-Embedding series only)
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B"  # Default
# Standard options:
# - "Qwen/Qwen3-Embedding-0.6B" (1.2GB, 1024 dims, default)
# - "Qwen/Qwen3-Embedding-4B" (8GB, 2560 dims, high quality)
# - "Qwen/Qwen3-Embedding-8B" (16GB, 3584 dims, maximum quality)
# GGUF quantized options (reduced memory usage):
# - "Qwen/Qwen3-Embedding-0.6B-GGUF" (0.6GB, 1024 dims)
# - "Qwen/Qwen3-Embedding-4B-GGUF" (2.4GB, 2560 dims)
# - "Qwen/Qwen3-Embedding-8B-GGUF" (4.8GB, 3584 dims)

# GGUF quantization configuration (only used with GGUF models)
PDFKB_GGUF_QUANTIZATION="Q6_K"  # Default quantization level
# Available options: Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S

# Performance tuning
PDFKB_LOCAL_EMBEDDING_BATCH_SIZE=32  # Adjust based on memory
PDFKB_EMBEDDING_CACHE_SIZE=10000     # Number of cached embeddings
PDFKB_MAX_SEQUENCE_LENGTH=512        # Maximum text length

# Hardware acceleration
PDFKB_EMBEDDING_DEVICE="auto"        # auto, mps, cuda, cpu
PDFKB_USE_MODEL_OPTIMIZATION=true    # Enable torch.compile optimization

# Fallback options
PDFKB_FALLBACK_TO_OPENAI=false  # Use OpenAI if local fails
```

### 2. OpenAI Embeddings

Use OpenAI's embedding API or **any OpenAI-compatible endpoint** for high-quality embeddings with minimal setup.

**Features:**
- **High Quality**: State-of-the-art embedding models
- **No Local Resources**: Runs entirely in the cloud
- **Fast**: Optimized API with batching support
- **🆕 Custom Endpoints**: Support for OpenAI-compatible APIs like Together, Nebius, etc.

**Standard OpenAI:**
```json
{
  "env": {
    "PDFKB_EMBEDDING_PROVIDER": "openai",
    "PDFKB_OPENAI_API_KEY": "sk-proj-...",
    "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large"
  }
}
```

**🆕 Custom OpenAI-Compatible Endpoints:**
```json
{
  "env": {
    "PDFKB_EMBEDDING_PROVIDER": "openai",
    "PDFKB_OPENAI_API_KEY": "your-api-key",
    "PDFKB_OPENAI_API_BASE": "https://api.studio.nebius.com/v1/",
    "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large"
  }
}
```

### 3. HuggingFace Embeddings

**🆕 ENHANCED**: Use HuggingFace's Inference API with support for custom providers and thousands of embedding models.

**Features:**
- **🆕 Multiple Providers**: Use HuggingFace directly or third-party providers like Nebius
- **Wide Model Selection**: Access to thousands of embedding models
- **Cost-Effective**: Many free or low-cost options available
- **🆕 Provider Support**: Seamlessly switch between HuggingFace and custom inference providers

**Configuration:**

```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "pdfkb-mcp",
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/your/pdfs",
        "PDFKB_EMBEDDING_PROVIDER": "huggingface",
        "PDFKB_HUGGINGFACE_EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2",
        "HF_TOKEN": "hf_your_token_here"
      }
    }
  }
}
```

**Advanced Configuration:**

```bash
# Use a specific provider like Nebius
PDFKB_HUGGINGFACE_PROVIDER=nebius
PDFKB_HUGGINGFACE_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B

# Or use HuggingFace directly (auto/default)
PDFKB_HUGGINGFACE_PROVIDER=  # Leave empty for auto
```

### Performance Tips

1. **Batch Size**: Larger batches are faster but use more memory
   - Apple Silicon: 32-64 recommended
   - NVIDIA GPUs: 64-128 recommended
   - CPU: 16-32 recommended

2. **Model Selection**: Choose based on your needs
   - **Default (Qwen3-0.6B)**: Best for most users - 32K context, fast, 1.2GB
   - **GGUF (Qwen3-0.6B-GGUF)**: Memory-optimized version - 32K context, fast, 0.6GB
   - **High Quality (Qwen3-4B)**: Better accuracy - 32K context, 8GB
   - **GGUF High Quality (Qwen3-4B-GGUF)**: Memory-optimized high quality - 32K context, 2.4GB
   - **Maximum Quality (Qwen3-8B)**: Best accuracy - 32K context, 16GB
   - **GGUF Maximum Quality (Qwen3-8B-GGUF)**: Memory-optimized maximum quality - 32K context, 4.8GB

3. **GGUF Quantization**: Choose based on memory constraints
   - **Q6_K** (default): Best balance of quality and size
   - **Q8_0**: Higher quality, larger size
   - **F16**: Near-original quality, largest size
   - **Q4_K_M**: Smallest size, acceptable quality

4. **Memory Management**: The server automatically handles OOM errors by reducing batch size

## 📝 Markdown Document Support

The server now supports **Markdown documents** (.md, .markdown) alongside PDFs, perfect for:
- Pre-processed documents where you've already extracted clean markdown
- Technical documentation and notes
- Avoiding complex PDF parsing for better quality content
- Faster processing with no conversion overhead

### Features

- **Native Processing**: Markdown files are read directly without conversion
- **Page Boundary Detection**: Automatically splits documents on page markers like `--[PAGE: 142]--`
- **Frontmatter Support**: Automatically extracts YAML/TOML frontmatter metadata
- **Title Extraction**: Intelligently extracts titles from H1 headers or frontmatter
- **Same Pipeline**: Uses the same chunking, embedding, and search infrastructure as PDFs
- **Mixed Collections**: Search across both PDFs and Markdown documents seamlessly

### Usage

Simply add Markdown files the same way you add PDFs:

```python
# In your MCP client
await add_document("/path/to/document.md")
await add_document("/path/to/paper.pdf")

# Search across both types
results = await search_documents("your query")
```

### Configuration

```bash
# Markdown-specific settings
PDFKB_MARKDOWN_PAGE_BOUNDARY_PATTERN="--\\[PAGE:\\s*(\\d+)\\]--"  # Regex pattern for page boundaries
PDFKB_MARKDOWN_SPLIT_ON_PAGE_BOUNDARIES=true  # Enable page boundary detection
PDFKB_MARKDOWN_PARSE_FRONTMATTER=true  # Parse YAML/TOML frontmatter (default: true)
PDFKB_MARKDOWN_EXTRACT_TITLE=true      # Extract title from first H1 (default: true)
```

## 🔄 Reranking

**🆕 NEW**: The server now supports **advanced reranking** using multiple providers to significantly improve search result relevance and quality. Reranking is a post-processing step that re-orders initial search results based on deeper semantic understanding.

### Supported Providers

1. **Local Models**: Qwen3-Reranker models (both standard and GGUF quantized variants)
2. **DeepInfra API**: Qwen3-Reranker-8B via DeepInfra's native API

### How It Works

1. **Initial Search**: Retrieves `limit + reranker_sample_additional` candidates using hybrid/vector/text search
2. **Reranking**: Uses Qwen3-Reranker to deeply analyze query-document relevance and re-score results
3. **Final Results**: Returns the top `limit` results based on reranker scores

### Supported Models

#### Local Models (Qwen3-Reranker Series)

**Standard Models**
| Model | Size | Best For |
|-------|------|----------|
| **Qwen/Qwen3-Reranker-0.6B** (default) | 1.2GB | Lightweight, fast reranking |
| **Qwen/Qwen3-Reranker-4B** | 8.0GB | High quality reranking |
| **Qwen/Qwen3-Reranker-8B** | 16.0GB | Maximum quality reranking |

**🆕 GGUF Quantized Models (Reduced Memory Usage)**
| Model | Size | Best For |
|-------|------|----------|
| **Mungert/Qwen3-Reranker-0.6B-GGUF** | 0.3GB | Quantized lightweight, very fast |
| **Mungert/Qwen3-Reranker-4B-GGUF** | 2.0GB | Quantized high quality |
| **Mungert/Qwen3-Reranker-8B-GGUF** | 4.0GB | Quantized maximum quality |

#### 🆕 DeepInfra Model

| Model | Best For |
|-------|----------|
| **Qwen/Qwen3-Reranker-8B** | High-quality cross-encoder reranking via DeepInfra API |

### Configuration

#### Option 1: Local Reranking (Standard Models)
```bash
# Enable reranking with local models
PDFKB_ENABLE_RERANKER=true
PDFKB_RERANKER_PROVIDER=local  # Default

# Choose reranker model
PDFKB_RERANKER_MODEL="Qwen/Qwen3-Reranker-0.6B"  # Default
PDFKB_RERANKER_MODEL="Qwen/Qwen3-Reranker-4B"     # Higher quality
PDFKB_RERANKER_MODEL="Qwen/Qwen3-Reranker-8B"     # Maximum quality

# Configure candidate sampling
PDFKB_RERANKER_SAMPLE_ADDITIONAL=5  # Default: get 5 extra candidates for reranking

# Optional: specify device
PDFKB_RERANKER_DEVICE="mps"         # For Apple Silicon
PDFKB_RERANKER_DEVICE="cuda"        # For NVIDIA GPUs
PDFKB_RERANKER_DEVICE="cpu"         # For CPU-only
```

#### Option 2: GGUF Quantized Local Reranking (Memory Optimized)
```bash
# Enable reranking with GGUF quantized models
PDFKB_ENABLE_RERANKER=true
PDFKB_RERANKER_PROVIDER=local

# Choose GGUF reranker model
PDFKB_RERANKER_MODEL="Mungert/Qwen3-Reranker-0.6B-GGUF"  # Smallest
PDFKB_RERANKER_MODEL="Mungert/Qwen3-Reranker-4B-GGUF"    # Balanced
PDFKB_RERANKER_MODEL="Mungert/Qwen3-Reranker-8B-GGUF"    # Highest quality

# Configure GGUF quantization level
PDFKB_RERANKER_GGUF_QUANTIZATION="Q6_K"  # Balanced (recommended)
PDFKB_RERANKER_GGUF_QUANTIZATION="Q8_0"  # Higher quality, larger
PDFKB_RERANKER_GGUF_QUANTIZATION="Q4_K_M" # Smaller, lower quality

# Configure candidate sampling
PDFKB_RERANKER_SAMPLE_ADDITIONAL=5  # Default: get 5 extra candidates
```

#### 🆕 Option 3: DeepInfra Reranking (API-based)
```bash
# Enable reranking with DeepInfra
PDFKB_ENABLE_RERANKER=true
PDFKB_RERANKER_PROVIDER=deepinfra

# Set your DeepInfra API key
PDFKB_DEEPINFRA_API_KEY="your-deepinfra-api-key"

# Optional: Choose model (default: Qwen/Qwen3-Reranker-8B)
# Available: Qwen/Qwen3-Reranker-0.6B, Qwen/Qwen3-Reranker-4B, Qwen/Qwen3-Reranker-8B
PDFKB_DEEPINFRA_RERANKER_MODEL="Qwen/Qwen3-Reranker-8B"

# Configure candidate sampling
PDFKB_RERANKER_SAMPLE_ADDITIONAL=8  # Sample 8 extra docs for reranking
```

**About DeepInfra Reranker**:
- Supports three Qwen3-Reranker models:
  - **0.6B**: Lightweight model, fastest inference
  - **4B**: Balanced model with good quality and speed
  - **8B**: Maximum quality model (default)
- Optimized for high-quality cross-encoder relevance scoring
- Pay-per-use pricing model
- Get your API key at https://deepinfra.com
- Note: The API requires equal-length query and document arrays, so the query is duplicated for each document internally

#### Complete Examples

**Local Reranking with GGUF Models**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true",
        "PDFKB_ENABLE_RERANKER": "true",
        "PDFKB_RERANKER_PROVIDER": "local",
        "PDFKB_RERANKER_MODEL": "Mungert/Qwen3-Reranker-4B-GGUF",
        "PDFKB_RERANKER_GGUF_QUANTIZATION": "Q6_K",
        "PDFKB_RERANKER_SAMPLE_ADDITIONAL": "8",
        "PDFKB_LOCAL_EMBEDDING_MODEL": "Qwen/Qwen3-Embedding-0.6B-GGUF",
        "PDFKB_GGUF_QUANTIZATION": "Q6_K"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

**🆕 DeepInfra Reranking with Local Embeddings**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[hybrid]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true",
        "PDFKB_ENABLE_RERANKER": "true",
        "PDFKB_RERANKER_PROVIDER": "deepinfra",
        "PDFKB_DEEPINFRA_API_KEY": "your-deepinfra-api-key",
        "PDFKB_RERANKER_SAMPLE_ADDITIONAL": "8",
        "PDFKB_LOCAL_EMBEDDING_MODEL": "Qwen/Qwen3-Embedding-0.6B",
        "PDFKB_EMBEDDING_PROVIDER": "local"
      },
      "transport": "stdio",
      "autoRestart": true
    }
  }
}
```

### Performance Impact

**Search Quality**: Reranking typically improves search relevance by 15-30% by better understanding query intent and document relevance.

**Memory Usage**:
- Local standard models: 1.2GB - 16GB depending on model size
- GGUF quantized: 0.3GB - 4GB depending on model and quantization
- DeepInfra: No local memory usage (API-based)

**Speed**:
- Local models: Adds ~100-500ms per search
- GGUF models: Slightly slower initial load, similar inference
- DeepInfra: Adds ~200-800ms depending on API latency

**Cost**:
- Local models: Free after initial download
- DeepInfra: Pay-per-use based on token usage

### When to Use Reranking

**✅ Recommended for:**
- High-stakes searches where quality matters most
- Complex queries requiring nuanced understanding
- Large document collections with diverse content
- When you have adequate hardware resources

**❌ Skip reranking for:**
- Simple keyword-based searches
- Real-time applications requiring sub-100ms responses
- Limited memory/compute environments
- Very small document collections (<100 documents)

### GGUF Quantization Recommendations

For GGUF reranker models, choose quantization based on your needs:

- **Q6_K** (recommended): Best balance of quality and size
- **Q8_0**: Near-original quality with moderate compression
- **F16**: Original quality, minimal compression
- **Q4_K_M**: Maximum compression, acceptable quality loss
- **Q4_K_S**: Small size, lower quality
- **Q5_K_M**: Medium compression and quality
- **Q5_K_S**: Smaller variant of Q5

## 📝 Document Summarization

The server supports **automatic document summarization** to generate meaningful titles, short descriptions, and detailed summaries for each document. This creates rich metadata that improves document organization and search quality.

### Summary Components

Each processed document can automatically generate:
- **Title**: A descriptive title that captures the document's main subject (max 80 characters)
- **Short Description**: A concise 1-2 sentence summary (max 200 characters)
- **Long Description**: A detailed paragraph explaining content, key points, and findings (max 500 characters)

### Summarization Options

#### Option 1: Local LLM Summarization

```bash
# Enable summarization with local LLM
PDFKB_ENABLE_SUMMARIZER=true
PDFKB_SUMMARIZER_PROVIDER=local

# Model selection (default: Qwen/Qwen3-4B-Instruct-2507-FP8)
PDFKB_SUMMARIZER_MODEL="Qwen/Qwen3-4B-Instruct-2507-FP8"  # Balanced (default)
PDFKB_SUMMARIZER_MODEL="Qwen/Qwen3-1.5B-Instruct"        # Lightweight
PDFKB_SUMMARIZER_MODEL="Qwen/Qwen3-8B-Instruct"          # High quality

# Hardware configuration
PDFKB_SUMMARIZER_DEVICE="auto"  # auto, mps, cuda, cpu
PDFKB_SUMMARIZER_MODEL_CACHE_DIR="~/.cache/pdfkb-mcp/summarizer"

# Content configuration
PDFKB_SUMMARIZER_MAX_PAGES=10  # Number of pages to analyze (default: 10)
```

**About Local Summarization**:
- Uses transformer-based instruction-tuned models locally
- No API costs or external dependencies
- Full privacy - content never leaves your machine
- Supports multiple model sizes for different hardware capabilities
- Configurable page limits to manage processing time

#### Option 2: Remote LLM Summarization (OpenAI-Compatible)

```bash
# Enable summarization with remote API
PDFKB_ENABLE_SUMMARIZER=true
PDFKB_SUMMARIZER_PROVIDER=remote

# API configuration
PDFKB_SUMMARIZER_API_KEY="your-api-key"              # Optional, falls back to OPENAI_API_KEY
PDFKB_SUMMARIZER_API_BASE="https://api.openai.com/v1"  # Custom API endpoint
PDFKB_SUMMARIZER_MODEL="gpt-4"                      # Model to use

# Content configuration
PDFKB_SUMMARIZER_MAX_PAGES=10  # Number of pages to analyze
```

**About Remote Summarization**:
- Works with OpenAI API and compatible services
- Supports custom API endpoints for other providers
- Higher quality summaries with advanced models
- Pay-per-use pricing model
- Faster processing for large documents

### Usage Examples

**Local Summarization with Custom Model**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs",
        "PDFKB_ENABLE_SUMMARIZER": "true",
        "PDFKB_SUMMARIZER_PROVIDER": "local",
        "PDFKB_SUMMARIZER_MODEL": "Qwen/Qwen3-4B-Instruct-2507-FP8",
        "PDFKB_SUMMARIZER_MAX_PAGES": "15",
        "PDFKB_SUMMARIZER_DEVICE": "mps"
      }
    }
  }
}
```

**Remote Summarization with Custom Endpoint**
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs",
        "PDFKB_ENABLE_SUMMARIZER": "true",
        "PDFKB_SUMMARIZER_PROVIDER": "remote",
        "PDFKB_SUMMARIZER_API_KEY": "your-api-key",
        "PDFKB_SUMMARIZER_MODEL": "gpt-4",
        "PDFKB_SUMMARIZER_MAX_PAGES": "20"
      }
    }
  }
}
```

### Performance Considerations

**Local Models**:
- **Qwen3-1.5B**: ~3GB RAM, fast processing, good quality
- **Qwen3-4B-FP8**: ~8GB RAM, balanced speed/quality (recommended)
- **Qwen3-8B**: ~16GB RAM, highest quality, slower processing

**Remote Models**:
- **GPT-3.5-turbo**: Fast, cost-effective, good quality
- **GPT-4**: Highest quality, more expensive, slower
- **Custom models**: Varies by provider

**Page Limits**:
- More pages = better context but slower processing
- Recommended: 10-20 pages for most documents
- Academic papers: 5-10 pages (focus on abstract/conclusion)
- Technical manuals: 15-25 pages (capture key sections)

### When to Use Summarization

**Recommended for**:
- Large document collections requiring organization
- Research document management
- Content discovery and browsing
- Document metadata enhancement

**Consider disabling for**:
- Very small document collections
- Documents with highly sensitive content (use local if needed)
- Limited processing resources
- Real-time document processing requirements

## 🔍 Hybrid Search

The server now supports **Hybrid Search**, which combines the strengths of semantic similarity search (vector embeddings) with traditional keyword matching (BM25) for improved search quality.

### How It Works

1. **Dual Indexing**: Documents are indexed in both a vector database (ChromaDB) and a full-text search index (Whoosh)
2. **Parallel Search**: Queries execute both semantic and keyword searches simultaneously
3. **Reciprocal Rank Fusion (RRF)**: Results are intelligently merged using RRF algorithm for optimal ranking

### Benefits

- **Better Recall**: Finds documents that match exact keywords even if semantically different
- **Improved Precision**: Combines conceptual understanding with keyword relevance
- **Technical Terms**: Excellent for technical documentation, code references, and domain-specific terminology
- **Balanced Results**: Configurable weights let you adjust the balance between semantic and keyword matching

### Configuration

Enable hybrid search by setting:
```bash
PDFKB_ENABLE_HYBRID_SEARCH=true  # Enable hybrid search (default: true)
PDFKB_HYBRID_VECTOR_WEIGHT=0.6   # Weight for semantic search (default: 0.6)
PDFKB_HYBRID_TEXT_WEIGHT=0.4     # Weight for keyword search (default: 0.4)
PDFKB_RRF_K=60                   # RRF constant (default: 60)
```

### Installation

To use hybrid search, install with the optional dependency:
```bash
pip install "pdfkb-mcp[hybrid]"
```

Or if using uvx, it's included by default when hybrid search is enabled.

## 🔽 Minimum Chunk Filtering

**NEW**: The server now supports **Minimum Chunk Filtering**, which automatically filters out short, low-information chunks that don't contain enough content to be useful for search and retrieval.

### How It Works

Documents are processed normally through parsing and chunking, then chunks below the configured character threshold are automatically filtered out before indexing and embedding.

### Benefits

- **Improved Search Quality**: Eliminates noise from short, uninformative chunks
- **Reduced Storage**: Less vector storage and faster search by removing low-value content
- **Better Context**: Search results focus on chunks with substantial, meaningful content
- **Configurable**: Set custom thresholds based on your document types and use case

### Configuration

```bash
# Enable filtering (default: 0 = disabled)
PDFKB_MIN_CHUNK_SIZE=150  # Filter chunks smaller than 150 characters

# Examples for different use cases:
PDFKB_MIN_CHUNK_SIZE=100  # Permissive - keep most content
PDFKB_MIN_CHUNK_SIZE=200  # Stricter - only substantial chunks
PDFKB_MIN_CHUNK_SIZE=0    # Disabled - keep all chunks (default)
```

Or in your MCP client configuration:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-...",
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs",
        "PDFKB_MIN_CHUNK_SIZE": "150"
      }
    }
  }
}
```

### Usage Guidelines

- **Default (0)**: No filtering - keeps all chunks for maximum recall
- **Conservative (100-150)**: Good balance - removes very short chunks while preserving content
- **Aggressive (200+)**: Strict filtering - only keeps substantial chunks with rich content

## 🧩 Semantic Chunking

**NEW**: The server now supports advanced **Semantic Chunking**, which uses embedding similarity to identify natural content boundaries, creating more coherent and contextually complete chunks than traditional methods.

### How It Works

1. **Sentence Embedding**: Each sentence in the document is embedded using your configured embedding model
2. **Similarity Analysis**: Distances between consecutive sentence embeddings are calculated
3. **Breakpoint Detection**: Natural content boundaries are identified where similarity drops significantly
4. **Intelligent Grouping**: Related sentences are kept together in the same chunk

### Benefits

- **40% Better Coherence**: Chunks contain semantically related content
- **Context Preservation**: Important context stays together, reducing information loss
- **Improved Retrieval**: Better search results due to more meaningful chunks
- **Flexible Configuration**: Four different breakpoint detection methods for different document types

### Quick Start

Enable semantic chunking by setting:
```bash
PDFKB_PDF_CHUNKER=semantic
PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE=percentile  # Default
PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT=95.0      # Default
```

Or in your MCP client configuration:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[semantic]"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs",
        "PDFKB_PDF_CHUNKER": "semantic",
        "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE": "percentile",
        "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT": "95.0"
      }
    }
  }
}
```

### Breakpoint Detection Methods

| Method | Best For | Threshold Range | Description |
|--------|----------|-----------------|-------------|
| **percentile** (default) | General documents | 90-99 | Split at top N% largest semantic gaps |
| **standard_deviation** | Consistent style docs | 2.0-4.0 | Split at mean + N×σ distance |
| **interquartile** | Noisy documents | 1.0-2.0 | Split at mean + N×IQR, robust to outliers |
| **gradient** | Technical/legal docs | 90-99 | Analyze rate of change in similarity |

### Configuration Options

```bash
# Breakpoint detection method
PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE=percentile  # percentile, standard_deviation, interquartile, gradient

# Threshold amount (interpretation depends on type)
PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT=95.0  # For percentile/gradient: 0-100, for others: positive float

# Context buffer size (sentences to include around breakpoints)
PDFKB_SEMANTIC_CHUNKER_BUFFER_SIZE=1  # Default: 1

# Optional: Fixed number of chunks (overrides threshold-based splitting)
PDFKB_SEMANTIC_CHUNKER_NUMBER_OF_CHUNKS=  # Leave empty for dynamic

# Minimum chunk size in characters
PDFKB_SEMANTIC_CHUNKER_MIN_CHUNK_CHARS=100  # Default: 100

# Sentence splitting regex
PDFKB_SEMANTIC_CHUNKER_SENTENCE_SPLIT_REGEX="(?<=[.?!])\\s+"  # Default pattern
```

### Tuning Guidelines

1. **For General Documents** (default):
   - Use `percentile` with `95.0` threshold
   - Good balance between chunk size and coherence

2. **For Technical Documentation**:
   - Use `gradient` with `90.0` threshold
   - Better at detecting technical section boundaries

3. **For Academic Papers**:
   - Use `standard_deviation` with `3.0` threshold
   - Maintains paragraph and section integrity

4. **For Mixed Content**:
   - Use `interquartile` with `1.5` threshold
   - Robust against varying content styles

### Installation

Install with the semantic chunking dependency:
```bash
pip install "pdfkb-mcp[semantic]"
```

Or if using uvx:
```bash
uvx pdfkb-mcp[semantic]
```

### Compatibility

- Works with both **local** and **OpenAI** embeddings
- Compatible with all PDF parsers
- Integrates with intelligent caching system
- Falls back to LangChain chunker if dependencies missing

## 🎯 Parser Selection Guide

### Decision Tree

```
Document Type & Priority?
├── 🏃 Speed Priority → PyMuPDF4LLM (fastest processing, low memory)
├── 📚 Academic Papers → MinerU (GPU-accelerated, excellent formulas/tables)
├── 📊 Business Reports → Docling (accurate tables, structured output)
├── ⚖️ Balanced Quality → Marker (good multilingual, selective OCR)
└── 🎯 Maximum Accuracy → LLM (slow, API costs, complex layouts)
```

### Performance Comparison

| Parser | Processing Speed | Memory | Text Quality | Table Quality | Best For |
|--------|------------------|--------|--------------|---------------|----------|
| **PyMuPDF4LLM** | **Fastest** | Low | Good | Basic-Moderate | RAG pipelines, bulk ingestion |
| **MinerU** | Fast with GPU¹ | ~4GB VRAM² | Excellent | Excellent | Scientific/technical PDFs |
| **Docling** | 0.9-2.5 pages/s³ | 2.5-6GB⁴ | Excellent | **Excellent** | Structured documents, tables |
| **Marker** | ~25 p/s batch⁵ | ~4GB VRAM⁶ | Excellent | Good-Excellent⁷ | Scientific papers, multilingual |
| **LLM** | Slow⁸ | Variable⁹ | Excellent¹⁰ | Excellent | Complex layouts, high-value docs |

**Notes:**
¹ >10,000 tokens/s on RTX 4090 with sglang
² Reported for <1B parameter model
³ CPU benchmarks: 0.92-1.34 p/s (native), 1.57-2.45 p/s (pypdfium)
⁴ 2.42-2.56GB (pypdfium), 6.16-6.20GB (native backend)
⁵ Projected on H100 GPU in batch mode
⁶ Benchmark configuration on NVIDIA A6000
⁷ Enhanced with optional LLM mode for table merging
⁸ Order of magnitude slower than traditional parsers
⁹ Depends on token usage and model size
¹⁰ 98.7-100% accuracy when given clean text

## ⚙️ Configuration

### Tier 1: Basic Configurations (80% of users)

**Default (Recommended)**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "pymupdf4llm",
        "PDFKB_PDF_CHUNKER": "langchain",
        "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large"
      },
      "transport": "stdio"
    }
  }
}
```

**Speed Optimized**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "pymupdf4llm",
        "PDFKB_CHUNK_SIZE": "800"
      },
      "transport": "stdio"
    }
  }
}
```

**Memory Efficient**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "pymupdf4llm",
        "PDFKB_EMBEDDING_BATCH_SIZE": "50"
      },
      "transport": "stdio"
    }
  }
}
```

### Tier 2: Use Case Specific (15% of users)

**Academic Papers**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "mineru",
        "PDFKB_CHUNK_SIZE": "1200"
      },
      "transport": "stdio"
    }
  }
}
```

**Business Documents**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "pymupdf4llm",
        "PDFKB_DOCLING_TABLE_MODE": "ACCURATE",
        "PDFKB_DOCLING_DO_TABLE_STRUCTURE": "true"
      },
      "transport": "stdio"
    }
  }
}
```

**Multi-language Documents**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "docling",
        "PDFKB_DOCLING_OCR_LANGUAGES": "en,fr,de,es",
        "PDFKB_DOCLING_DO_OCR": "true"
      },
      "transport": "stdio"
    }
  }
}
```

**Hybrid Search (NEW - Improved Search Quality)**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true",
        "PDFKB_HYBRID_VECTOR_WEIGHT": "0.6",
        "PDFKB_HYBRID_TEXT_WEIGHT": "0.4"
      },
      "transport": "stdio"
    }
  }
}
```

**Semantic Chunking (NEW - Context-Aware Chunking)**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp[semantic]"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_PDF_CHUNKER": "semantic",
        "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE": "gradient",
        "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT": "90.0",
        "PDFKB_ENABLE_HYBRID_SEARCH": "true"
      },
      "transport": "stdio"
    }
  }
}
```

**Maximum Quality**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_OPENROUTER_API_KEY": "sk-or-v1-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "llm",
        "PDFKB_LLM_MODEL": "anthropic/claude-3.5-sonnet",
        "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large"
      },
      "transport": "stdio"
    }
  }
}
```

### Essential Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `PDFKB_OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |
| `PDFKB_KNOWLEDGEBASE_PATH` | `./pdfs` | Directory containing PDF files |
| `PDFKB_CACHE_DIR` | `./.cache` | Cache directory for processing |
| `PDFKB_PDF_PARSER` | `pymupdf4llm` | Parser: `pymupdf4llm` (default), `marker`, `mineru`, `docling`, `llm` |
| `PDFKB_PDF_CHUNKER` | `langchain` | Chunking strategy: `langchain` (default), `page`, `unstructured`, `semantic` |
| `PDFKB_CHUNK_SIZE` | `1000` | Target chunk size for LangChain chunker |
| `PDFKB_WEB_ENABLE` | `false` | Enable/disable web interface |
| `PDFKB_WEB_PORT` | `8080` | Web server port |
| `PDFKB_WEB_HOST` | `localhost` | Web server host |
| `PDFKB_WEB_CORS_ORIGINS` | `http://localhost:3000,http://127.0.0.1:3000` | CORS allowed origins (comma-separated) |
| `PDFKB_EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI embedding model (use `text-embedding-3-small` for faster processing) |
| `PDFKB_MIN_CHUNK_SIZE` | `0` | Minimum chunk size in characters (0 = disabled, filters out chunks smaller than this size) |
| `PDFKB_OPENAI_API_BASE` | *optional* | Custom base URL for OpenAI-compatible APIs (e.g., https://api.studio.nebius.com/v1/) |
| `PDFKB_HUGGINGFACE_EMBEDDING_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | HuggingFace model for embeddings when using huggingface provider |
| `PDFKB_HUGGINGFACE_PROVIDER` | *optional* | HuggingFace provider (e.g., "nebius"), leave empty for default |
| `PDFKB_ENABLE_HYBRID_SEARCH` | `true` | Enable hybrid search combining semantic and keyword matching |
| `PDFKB_HYBRID_VECTOR_WEIGHT` | `0.6` | Weight for semantic search (0-1, must sum to 1 with text weight) |
| `PDFKB_HYBRID_TEXT_WEIGHT` | `0.4` | Weight for keyword/BM25 search (0-1, must sum to 1 with vector weight) |
| `PDFKB_RRF_K` | `60` | Reciprocal Rank Fusion constant (higher = less emphasis on rank differences) |
| `PDFKB_LOCAL_EMBEDDING_MODEL` | `Qwen/Qwen3-Embedding-0.6B` | Local embedding model (Qwen3-Embedding series only) |
| `PDFKB_GGUF_QUANTIZATION` | `Q6_K` | GGUF quantization level (Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S) |
| `PDFKB_ENABLE_RERANKER` | `false` | Enable/disable result reranking for improved search quality |
| `PDFKB_RERANKER_PROVIDER` | `local` | Reranker provider: 'local' or 'deepinfra' |
| `PDFKB_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-0.6B` | Reranker model for local provider |
| `PDFKB_RERANKER_SAMPLE_ADDITIONAL` | `5` | Additional results to sample for reranking |
| `PDFKB_RERANKER_GGUF_QUANTIZATION` | *optional* | GGUF quantization level (Q6_K, Q8_0, etc.) |
| `PDFKB_DEEPINFRA_API_KEY` | *required* | DeepInfra API key for reranking |
| `PDFKB_DEEPINFRA_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-8B` | DeepInfra model: 0.6B, 4B, or 8B |
| `PDFKB_ENABLE_SUMMARIZER` | `false` | Enable/disable document summarization |
| `PDFKB_SUMMARIZER_PROVIDER` | `local` | Summarizer provider: 'local' or 'remote' |
| `PDFKB_SUMMARIZER_MODEL` | `Qwen/Qwen3-4B-Instruct-2507-FP8` | Model for summarization |
| `PDFKB_SUMMARIZER_MAX_PAGES` | `10` | Maximum pages to analyze for summarization |
| `PDFKB_SUMMARIZER_DEVICE` | `auto` | Hardware device for local summarizer |
| `PDFKB_SUMMARIZER_MODEL_CACHE_DIR` | `~/.cache/pdfkb-mcp/summarizer` | Cache directory for summarizer models |
| `PDFKB_SUMMARIZER_API_BASE` | *optional* | Custom API base URL for remote summarizer |
| `PDFKB_SUMMARIZER_API_KEY` | *optional* | API key for remote summarizer (fallback to OPENAI_API_KEY) |

## 🐳 Docker Deployment

Deploy pdfkb-mcp using Docker for consistent, scalable, and isolated deployment across any environment.

### Quick Start with Docker

**1. Using Container Run (Local Embeddings - No API Key Required)**:
```bash
# Create directories
mkdir -p ./documents ./cache ./logs

# Run with Podman (preferred)
podman run -d \
  --name pdfkb-mcp \
  -p 8000:8000 \
  -v "$(pwd)/documents:/app/documents:rw" \
  -v "$(pwd)/cache:/app/cache" \
  -e PDFKB_EMBEDDING_PROVIDER=local \
  -e PDFKB_TRANSPORT=http \
  pdfkb-mcp:latest

# Or with Docker
docker run -d \
  --name pdfkb-mcp \
  -p 8000:8000 \
  -v "$(pwd)/documents:/app/documents:rw" \
  -v "$(pwd)/cache:/app/cache" \
  -e PDFKB_EMBEDDING_PROVIDER=local \
  -e PDFKB_TRANSPORT=http \
  pdfkb-mcp:latest
```

**2. Using Compose (Recommended: Podman)**:
```bash
# 1) Copy the sample file and edit it
cp docker-compose.sample.yml docker-compose.yml

# 2) Edit docker-compose.yml
#    - Set the documents volume path to your folder
#    - Optionally adjust ports, resources, and any env vars
$EDITOR docker-compose.yml

# 3) Create recommended local directories (if using bind mounts)
mkdir -p ./documents ./cache ./logs

# 4a) Start with Podman (preferred per project rules)
podman-compose up -d

# 4b) Or with Docker (if you aren't using Podman)
docker compose up -d
```
> Security note: docker-compose.yml is already in .gitignore. Do not commit API keys. Use the sample file and keep your local docker-compose.yml untracked.

### Docker Compose Configuration

The `docker-compose.sample.yml` provides a comprehensive configuration template with:

- 📋 **All environment variables** documented with examples and default values
- 🔧 **Logical sections** (Core, Embedding, Web Interface, Processing, Advanced AI, etc.)
- 🚀 **Multiple configuration examples** for different use cases
- 🔒 **Security best practices** with no committed API keys
- 🎯 **Quick start recommendations** at the bottom of the file

**Key Configuration Areas**:

1. **Documents Volume**: Update the path to your document collection:
   ```yaml
   volumes:
     - "/path/to/your/documents:/app/documents:rw"  # ← CHANGE THIS
   ```

2. **Embedding Provider**: Choose your preferred option in the environment section:
   ```yaml
   # Local (no API key - recommended for privacy)
   PDFKB_EMBEDDING_PROVIDER: "local"

   # OpenAI/compatible APIs (requires API key)
   # PDFKB_EMBEDDING_PROVIDER: "openai"
   # PDFKB_OPENAI_API_KEY: "YOUR-API-KEY-HERE"
   ```

3. **Resource Limits**: Adjust based on your system:
   ```yaml
   deploy:
     resources:
       limits:
         cpus: '4.0'    # ← Increase for better performance
         memory: 8G     # ← Increase for large document collections
   ```

**3. Alternative: Using Environment File**:
For sensitive configuration, create a separate `.env` file:
```bash
# Create .env file for sensitive settings
cat > .env << 'EOF'
PDFKB_OPENAI_API_KEY=sk-proj-your-actual-key-here
PDFKB_EMBEDDING_PROVIDER=openai
PDFKB_DEEPINFRA_API_KEY=your-deepinfra-key
PDFKB_ENABLE_RERANKER=true
EOF

# Reference in docker-compose.yml
# env_file:
#   - .env

# Restart with new configuration
podman-compose down && podman-compose up -d
```

### Building from Source

```bash
# Clone the repository
git clone https://github.com/juanqui/pdfkb-mcp.git
cd pdfkb-mcp

# Copy and customize the configuration
cp docker-compose.sample.yml docker-compose.yml
# Edit docker-compose.yml to update volumes and configuration
$EDITOR docker-compose.yml

# Build with Podman (preferred)
podman build -t pdfkb-mcp:latest .

# Or use Podman Compose to build
podman-compose build

# Alternative: Build with Docker
docker build -t pdfkb-mcp:latest .
docker compose build
```

### Container Configuration

#### Volume Mounts

**Required Volumes**:
- **Documents**: `/app/documents` - Mount your PDF/Markdown collection here
- **Cache**: `/app/cache` - Persistent storage for ChromaDB and processing cache

**Optional Volumes**:
- **Logs**: `/app/logs` - Container logs (useful for debugging)
- **Config**: `/app/config` - Custom configuration files

```bash
# Example with all volumes (Podman preferred)
podman run -d \
  --name pdfkb-mcp \
  -p 8000:8000 -p 8080:8080 \
  -v "/path/to/your/documents:/app/documents:rw" \
  -v "pdfkb-cache:/app/cache" \
  -v "pdfkb-logs:/app/logs" \
  -e PDFKB_EMBEDDING_PROVIDER=local \
  -e PDFKB_WEB_ENABLE=true \
  pdfkb-mcp:latest

# Or with Docker
docker run -d \
  --name pdfkb-mcp \
  -p 8000:8000 -p 8080:8080 \
  -v "/path/to/your/documents:/app/documents:rw" \
  -v "pdfkb-cache:/app/cache" \
  -v "pdfkb-logs:/app/logs" \
  -e PDFKB_EMBEDDING_PROVIDER=local \
  -e PDFKB_WEB_ENABLE=true \
  pdfkb-mcp:latest
```

#### Port Configuration

- **8000**: MCP HTTP/SSE transport (required for MCP clients)
- **8080**: Web interface (optional, only if `PDFKB_WEB_ENABLE=true`)

#### Environment Variables

**Core Configuration**:
```bash
# Documents and cache
PDFKB_KNOWLEDGEBASE_PATH=/app/documents    # Container path (don't change)
PDFKB_CACHE_DIR=/app/cache                 # Container path (don't change)
PDFKB_LOG_LEVEL=INFO                       # DEBUG, INFO, WARNING, ERROR

# Transport mode
PDFKB_TRANSPORT=http                       # "http", "sse" (stdio not recommended for containers)
PDFKB_SERVER_HOST=0.0.0.0                 # Bind to all interfaces
PDFKB_SERVER_PORT=8000                     # Port inside container

# Embedding provider
PDFKB_EMBEDDING_PROVIDER=local             # "local", "openai", "huggingface"
PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B"

# Optional: OpenAI configuration
PDFKB_OPENAI_API_KEY=sk-proj-your-key-here
PDFKB_EMBEDDING_MODEL=text-embedding-3-large

# Optional: Web interface
PDFKB_WEB_ENABLE=false                     # Enable web UI
PDFKB_WEB_HOST=0.0.0.0                    # Web interface host
PDFKB_WEB_PORT=8080                       # Web interface port
```

**Performance Configuration**:
```bash
# Processing configuration
PDFKB_PDF_PARSER=pymupdf4llm              # Parser selection
PDFKB_DOCUMENT_CHUNKER=langchain           # Chunking strategy
PDFKB_CHUNK_SIZE=1000                     # Chunk size
PDFKB_CHUNK_OVERLAP=200                   # Chunk overlap

# Parallel processing (adjust based on container resources)
PDFKB_MAX_PARALLEL_PARSING=1              # Concurrent PDF processing
PDFKB_MAX_PARALLEL_EMBEDDING=1            # Concurrent embedding generation
PDFKB_BACKGROUND_QUEUE_WORKERS=2          # Background workers

# Search configuration
PDFKB_ENABLE_HYBRID_SEARCH=true           # Hybrid search (recommended)
PDFKB_ENABLE_RERANKER=false               # Result reranking
PDFKB_ENABLE_SUMMARIZER=false             # Document summarization
```

### MCP Client Configuration with Docker

#### For Cline (HTTP Transport)

**MCP Settings (`~/.continue/config.json`)**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "curl",
      "args": [
        "-X", "POST",
        "-H", "Content-Type: application/json",
        "http://localhost:8000/mcp"
      ],
      "transport": "http"
    }
  }
}
```

#### For Roo (SSE Transport)

**Set container to SSE mode**:
```bash
# Update docker-compose.yml or add environment variable
PDFKB_TRANSPORT=sse

# Restart container with Podman
podman-compose restart

# Or with Docker
docker compose restart
```

**MCP Settings**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "transport": "sse",
      "url": "http://localhost:8000/sse"
    }
  }
}
```

### Production Configuration

For production deployments, use the comprehensive `docker-compose.sample.yml` as your starting point:

1. **Copy and customize**: `cp docker-compose.sample.yml docker-compose.yml`
2. **Update paths and secrets**: Edit the documents volume and any API keys
3. **Adjust resource limits**: Configure CPU/memory based on your infrastructure
4. **Enable security features**: Review security settings and network configuration

The sample file includes:
- 🔒 **Security hardening** (non-root user, no-new-privileges)
- 📊 **Resource limits** and health checks
- 🌐 **Network isolation**
- 📋 **Comprehensive environment variable documentation**
- 🚀 **Performance optimization examples**

#### Development Configuration

**docker-compose.dev.yml**:
```yaml
version: '3.8'

services:
  pdfkb-mcp-dev:
    build: .
    container_name: pdfkb-mcp-dev

    ports:
      - "8000:8000"
      - "8080:8080"

    volumes:
      - "./documents:/app/documents:rw"
      - "./src:/app/src:ro"                    # Live source code mounting
      - "./dev-cache:/app/cache"
      - "./dev-logs:/app/logs"

    environment:
      - PDFKB_LOG_LEVEL=DEBUG                   # Debug logging
      - PDFKB_WEB_ENABLE=true                   # Enable web interface
      - PDFKB_EMBEDDING_PROVIDER=local          # No API costs

    env_file:
      - .env.dev
```

### Container Management

#### Health Monitoring

```bash
# Check container health (Podman preferred)
podman ps
podman-compose ps

# Or with Docker
docker ps
docker compose ps

# View logs
podman logs pdfkb-mcp      # or: docker logs pdfkb-mcp
podman-compose logs -f     # or: docker compose logs -f

# Check health endpoint
curl http://localhost:8000/health

# Monitor resource usage
podman stats pdfkb-mcp     # or: docker stats pdfkb-mcp
```

#### Container Operations

```bash
# Start/stop container (Podman preferred)
podman-compose up -d
podman-compose down

# Or with Docker
docker compose up -d
docker compose down

# Restart with new configuration
podman-compose restart     # or: docker compose restart

# Update container image
podman-compose pull        # or: docker compose pull
podman-compose up -d       # or: docker compose up -d

# View container details
podman inspect pdfkb-mcp   # or: docker inspect pdfkb-mcp

# Execute commands in container
podman exec -it pdfkb-mcp bash   # or: docker exec -it pdfkb-mcp bash
```

### Troubleshooting

#### Common Issues

**1. Permission Errors**:
```bash
# Fix volume permissions
sudo chown -R 1001:1001 ./documents ./cache ./logs

# Or use current user
sudo chown -R $(id -u):$(id -g) ./documents ./cache ./logs
```

**2. Port Conflicts**:
```bash
# Check if ports are in use
netstat -tulpn | grep :8000
lsof -i :8000

# Use different ports
podman run -p 8001:8000 -p 8081:8080 pdfkb-mcp:latest   # Podman
# or: docker run -p 8001:8000 -p 8081:8080 pdfkb-mcp:latest  # Docker
```

**3. Memory Issues**:
```bash
# Check container memory usage
podman stats --no-stream   # or: docker stats --no-stream

# Increase memory limits in docker-compose.yml
deploy:
  resources:
    limits:
      memory: 8G    # Increase memory
```

**4. Connection Issues**:
```bash
# Test container connectivity
curl http://localhost:8000/health

# Check if container is running
podman ps | grep pdfkb     # or: docker ps | grep pdfkb

# Check logs for errors
podman logs pdfkb-mcp --tail 50   # or: docker logs pdfkb-mcp --tail 50
```

#### Debug Mode

```bash
# Run container in debug mode (Podman preferred)
podman run -it \
  -p 8000:8000 \
  -v "$(pwd)/documents:/app/documents:rw" \
  -e PDFKB_LOG_LEVEL=DEBUG \
  -e PDFKB_EMBEDDING_PROVIDER=local \
  pdfkb-mcp:latest

# Or with Docker
docker run -it \
  -p 8000:8000 \
  -v "$(pwd)/documents:/app/documents:rw" \
  -e PDFKB_LOG_LEVEL=DEBUG \
  -e PDFKB_EMBEDDING_PROVIDER=local \
  pdfkb-mcp:latest

# Use development compose
podman-compose -f docker-compose.dev.yml up   # or: docker compose -f docker-compose.dev.yml up
```

#### Performance Tuning

**For Low-Memory Systems**:
```yaml
environment:
  - PDFKB_MAX_PARALLEL_PARSING=1
  - PDFKB_MAX_PARALLEL_EMBEDDING=1
  - PDFKB_BACKGROUND_QUEUE_WORKERS=1
  - PDFKB_CHUNK_SIZE=500                      # Smaller chunks
deploy:
  resources:
    limits:
      memory: 2G                              # Lower memory limit
```

**For High-Performance Systems**:
```yaml
environment:
  - PDFKB_MAX_PARALLEL_PARSING=4
  - PDFKB_MAX_PARALLEL_EMBEDDING=2
  - PDFKB_BACKGROUND_QUEUE_WORKERS=4
  - PDFKB_CHUNK_SIZE=1500                     # Larger chunks
deploy:
  resources:
    limits:
      memory: 8G                              # Higher memory limit
      cpus: '4.0'
```

### Security Considerations

- **Non-root execution**: Container runs as user `pdfkb` (UID 1001)
- **Read-only root filesystem**: Container filesystem is read-only except for mounted volumes
- **Network isolation**: Use Docker networks for service isolation
- **Resource limits**: Set appropriate CPU/memory limits
- **Secret management**: Use Docker secrets or environment files for API keys

## 🖥️ MCP Client Setup

### Claude Desktop

**Configuration File Location**:
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**: `~/.config/Claude/claude_desktop_config.json`

**Configuration**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents",
        "PDFKB_CACHE_DIR": "/Users/yourname/Documents/PDFs/.cache"
      },
      "transport": "stdio",
      "autoRestart": true,
                 "PDFKB_EMBEDDING_MODEL": "text-embedding-3-small",
    }
  }
}
```

**Verification**:
1. Restart Claude Desktop completely
2. Look for PDF KB tools in the interface
3. Test with "Add a document" or "Search documents"

### 🆕 VS Code with Native MCP Support (SSE Mode)

**Configuration for SSE/Remote Mode** (`.vscode/mcp.json` in workspace):
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/your/pdfs"
      },
      "transport": "sse",
      "autoRestart": true
    }
  }
}
```

**Verification**:
1. Reload VS Code window
2. Check VS Code's MCP server status in Command Palette
3. Use MCP tools in Copilot Chat

### VS Code with Continue Extension

**Configuration** (`.continue/config.json`):
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_KNOWLEDGEBASE_PATH": "${workspaceFolder}/pdfs"
      },
      "transport": "stdio"
    }
  }
}
```

**Verification**:
1. Reload VS Code window
2. Check VS Code's MCP server status in Command Palette
3. Use MCP tools in Copilot Chat

### VS Code with Continue Extension

**Configuration** (`.continue/config.json`):
```json
{
  "models": [...],
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...",
        "PDFKB_KNOWLEDGEBASE_PATH": "${workspaceFolder}/pdfs"
      },
      "transport": "stdio"
    }
  }
}
```

**Verification**:
1. Reload VS Code window
2. Check Continue panel for server connection
3. Use `@pdfkb` in Continue chat

### Generic MCP Client

**Standard Configuration Template**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "required",
        "PDFKB_KNOWLEDGEBASE_PATH": "required-absolute-path",
        "PDFKB_PDF_PARSER": "optional-default-pymupdf4llm"
      },
      "transport": "stdio",
      "autoRestart": true,
      "timeout": 30000
    }
  }
}
```

## 📊 Performance & Troubleshooting

### Common Issues

**Server not appearing in MCP client**:
```json
// ❌ Wrong: Missing transport
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"]
    }
  }
}

// ✅ Correct: Include transport and restart client
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "transport": "stdio"
    }
  }
}
```

**System overload when processing multiple PDFs**:
```bash
# Reduce parallel operations to prevent system stress
PDFKB_MAX_PARALLEL_PARSING=1       # Process one PDF at a time
PDFKB_MAX_PARALLEL_EMBEDDING=1     # Embed one document at a time
PDFKB_BACKGROUND_QUEUE_WORKERS=1   # Single background worker
```

**Processing too slow**:
```json
// Switch to faster parser and increase parallelism (if system can handle it)
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-key",
        "PDFKB_PDF_PARSER": "pymupdf4llm"
      },
      "transport": "stdio"
    }
  }
}
```

**Memory issues**:
```json
// Reduce memory usage
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-key",
        "PDFKB_EMBEDDING_BATCH_SIZE": "25",
        "PDFKB_CHUNK_SIZE": "500"
      },
      "transport": "stdio"
    }
  }
}
```

**Poor table extraction**:
```json
// Use table-optimized parser
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-key",
        "PDFKB_PDF_PARSER": "docling",
        "PDFKB_DOCLING_TABLE_MODE": "ACCURATE"
      },
      "transport": "stdio"
    }
  }
}
```

**🆕 SSE/Remote Mode - Client Connection Issues**:
```json
// ❌ Wrong: Missing URL for SSE transport (client can't connect)
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_TRANSPORT": "sse"
      },
      "transport": "sse"
    }
  }
}

// ✅ Correct: Include URL pointing to SSE server
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs",
        "PDFKB_TRANSPORT": "sse",
        "PDFKB_SSE_HOST": "localhost",
        "PDFKB_SSE_PORT": "8000"
      },
      "transport": "sse",
      "url": "http://localhost:8000"
    }
  }
}
```
**Tip**: Ensure the SSE server is running first (`pdfkb-mcp --transport sse --sse-port 8000`), then configure the client with the correct URL. Check firewall settings if connecting remotely.

**🆕 SSE/Remote Mode - Port Conflicts in Integrated Mode**:
```bash
# ❌ Wrong: Web and SSE using same port (will fail to start)
PDFKB_WEB_ENABLE=true PDFKB_WEB_PORT=8000 PDFKB_TRANSPORT=sse PDFKB_SSE_PORT=8000 pdfkb-mcp

# ✅ Correct: Use different ports for web (8080) and SSE (8000)
PDFKB_WEB_ENABLE=true PDFKB_WEB_PORT=8080 PDFKB_TRANSPORT=sse PDFKB_SSE_PORT=8000 pdfkb-mcp
```
**Tip**: The server validates port conflicts on startup. Web interface runs on `PDFKB_WEB_PORT` (default 8080), SSE MCP runs on `PDFKB_SSE_PORT` (default 8000). Access web at http://localhost:8080 and connect MCP clients to http://localhost:8000.

**🆕 SSE/Remote Mode - Server Not Starting in SSE Mode**:
```bash
# ❌ Wrong: Invalid transport value (server defaults to stdio)
PDFKB_TRANSPORT=remote pdfkb-mcp  # 'remote' is invalid

# ✅ Correct: Use 'sse' for remote transport
PDFKB_TRANSPORT=sse pdfkb-mcp --sse-host 0.0.0.0 --sse-port 8000

# Or via command line flags
pdfkb-mcp --transport sse --sse-host localhost --sse-port 8000
```
**Tip**: Valid transport values are 'stdio' (default) or 'sse'. Check server logs for "Running MCP server in SSE mode on http://host:port" confirmation. Use `--log-level DEBUG` for detailed startup information.

### Resource Requirements

| Configuration | RAM Usage | Processing Speed | Best For |
|---------------|-----------|------------------|----------|
| **Speed** | 2-4 GB | Fastest | Large collections |
| **Balanced** | 4-6 GB | Medium | Most users |
| **Quality** | 6-12 GB | Medium-Fast | Accuracy priority |
| **GPU** | 8-16 GB | Very Fast | High-volume processing |

## 🔧 Advanced Configuration

### Parser-Specific Options

**MinerU Configuration**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-key",
        "PDFKB_PDF_PARSER": "mineru",
        "PDFKB_MINERU_LANG": "en",
        "PDFKB_MINERU_METHOD": "auto",
        "PDFKB_MINERU_VRAM": "16"
      },
      "transport": "stdio"
    }
  }
}
```

**LLM Parser Configuration**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-key",
        "PDFKB_OPENROUTER_API_KEY": "sk-or-v1-abc123def456ghi789...",
        "PDFKB_PDF_PARSER": "llm",
        "PDFKB_LLM_MODEL": "google/gemini-2.5-flash-lite",
        "PDFKB_LLM_CONCURRENCY": "5",
        "PDFKB_LLM_DPI": "150"
      },
      "transport": "stdio"
    }
  }
}
```

### Performance Tuning

**Parallel Processing Configuration**:

Control the number of concurrent operations to optimize performance and prevent system overload:

```bash
# Maximum number of PDFs to parse simultaneously
PDFKB_MAX_PARALLEL_PARSING=1  # Default: 1 (conservative to prevent overload)

# Maximum number of documents to embed simultaneously
PDFKB_MAX_PARALLEL_EMBEDDING=1  # Default: 1 (prevents API rate limits)

# Number of background queue workers
PDFKB_BACKGROUND_QUEUE_WORKERS=2  # Default: 2

# Thread pool size for CPU-intensive operations
PDFKB_THREAD_POOL_SIZE=1  # Default: 1
```

**Resource-Optimized Setup** (for low-powered systems):
```json
{
  "env": {
    "PDFKB_MAX_PARALLEL_PARSING": "1",      # Process one PDF at a time
    "PDFKB_MAX_PARALLEL_EMBEDDING": "1",    # Embed one document at a time
    "PDFKB_BACKGROUND_QUEUE_WORKERS": "1",  # Single background worker
    "PDFKB_THREAD_POOL_SIZE": "1"           # Single thread for CPU tasks
  }
}
```

**High-Performance Setup** (for powerful machines):
```json
{
  "env": {
    "PDFKB_MAX_PARALLEL_PARSING": "4",      # Parse up to 4 PDFs in parallel
    "PDFKB_MAX_PARALLEL_EMBEDDING": "2",    # Embed 2 documents simultaneously
    "PDFKB_BACKGROUND_QUEUE_WORKERS": "4",  # More background workers
    "PDFKB_THREAD_POOL_SIZE": "2",          # More threads for CPU tasks
    "PDFKB_EMBEDDING_BATCH_SIZE": "200",    # Larger embedding batches
    "PDFKB_VECTOR_SEARCH_K": "15"           # More search results
  }
}
```

**Complete High-Performance Setup**:
```json
{
  "mcpServers": {
    "pdfkb": {
      "command": "uvx",
      "args": ["pdfkb-mcp"],
      "env": {
        "PDFKB_OPENAI_API_KEY": "sk-key",
        "PDFKB_PDF_PARSER": "mineru",
        "PDFKB_KNOWLEDGEBASE_PATH": "/Volumes/FastSSD/Documents/PDFs",
        "PDFKB_CACHE_DIR": "/Volumes/FastSSD/Documents/PDFs/.cache",
        "PDFKB_MAX_PARALLEL_PARSING": "4",
        "PDFKB_MAX_PARALLEL_EMBEDDING": "2",
        "PDFKB_BACKGROUND_QUEUE_WORKERS": "4",
        "PDFKB_THREAD_POOL_SIZE": "2",
        "PDFKB_EMBEDDING_BATCH_SIZE": "200",
        "PDFKB_VECTOR_SEARCH_K": "15",
        "PDFKB_FILE_SCAN_INTERVAL": "30"
      },
      "transport": "stdio"
    }
  }
}
```

### Intelligent Caching

The server uses multi-stage caching:
- **Parsing Cache**: Stores converted markdown ([`src/pdfkb/intelligent_cache.py:139`](src/pdfkb/intelligent_cache.py:139))
- **Chunking Cache**: Stores processed chunks
- **Vector Cache**: ChromaDB embeddings storage

**Cache Invalidation Rules**:
- Changing `PDFKB_PDF_PARSER` → Full reset (parsing + chunking + embeddings)
- Changing `PDFKB_PDF_CHUNKER` → Partial reset (chunking + embeddings)
- Changing `PDFKB_EMBEDDING_MODEL` → Minimal reset (embeddings only)

## 📚 Appendix

### Installation Options

**Primary (Recommended)**:
```bash
uvx pdfkb-mcp
**Web Interface Included**: All installation methods include the web interface. Use these commands:
- `pdfkb-mcp` - MCP server only (default, web disabled)
- `PDFKB_WEB_ENABLE=true pdfkb-mcp` - Integrated MCP + Web server (web enabled)
```

**With Specific Parser Dependencies**:
```bash
uvx pdfkb-mcp[marker]     # Marker parser
uvx pdfkb-mcp[mineru]     # MinerU parser
uvx pdfkb-mcp[docling]    # Docling parser
uvx pdfkb-mcp[llm]        # LLM parser
uvx pdfkb-mcp[semantic]   # Semantic chunker (NEW)
uvx pdfkb-mcp[unstructured_chunker]  # Unstructured chunker
uvx pdfkb-mcp[web]        # Enhanced web features (psutil for metrics)
```

pip install "pdfkb-mcp[web]"               # Enhanced web features
Or via pip/pipx:
```bash
pip install "pdfkb-mcp[marker]"            # Marker parser
pip install "pdfkb-mcp[docling-complete]"  # Docling with OCR and full features
```

**Development Installation**:
```bash
git clone https://github.com/juanqui/pdfkb-mcp.git
cd pdfkb-mcp
pip install -e ".[dev]"
```

### Complete Environment Variables Reference

| Variable | Default | Description |
|----------|---------|-------------|
| `PDFKB_OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |
| `PDFKB_OPENROUTER_API_KEY` | *optional* | Required for LLM parser |
| `PDFKB_KNOWLEDGEBASE_PATH` | `./pdfs` | PDF directory path |
| `PDFKB_CACHE_DIR` | `./.cache` | Cache directory |
| `PDFKB_PDF_PARSER` | `pymupdf4llm` | PDF parser selection |
| `PDFKB_PDF_CHUNKER` | `langchain` | Chunking strategy: `langchain`, `unstructured`, `semantic` |
| `PDFKB_CHUNK_SIZE` | `1000` | LangChain chunk size |
| `PDFKB_CHUNK_OVERLAP` | `200` | LangChain chunk overlap |
| `PDFKB_MIN_CHUNK_SIZE` | `0` | Minimum chunk size in characters (0 = disabled, filters out chunks smaller than this size) |
| `PDFKB_EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI model |
| `PDFKB_OPENAI_API_BASE` | *optional* | Custom base URL for OpenAI-compatible APIs |
| `PDFKB_HUGGINGFACE_EMBEDDING_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | HuggingFace model |
| `PDFKB_HUGGINGFACE_PROVIDER` | *optional* | HuggingFace provider (e.g., "nebius") |
| `PDFKB_LOCAL_EMBEDDING_MODEL` | `Qwen/Qwen3-Embedding-0.6B` | Local embedding model (Qwen3-Embedding series only) |
| `PDFKB_GGUF_QUANTIZATION` | `Q6_K` | GGUF quantization level (Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S) |
| `PDFKB_EMBEDDING_DEVICE` | `auto` | Hardware device (auto, mps, cuda, cpu) |
| `PDFKB_USE_MODEL_OPTIMIZATION` | `true` | Enable torch.compile optimization |
| `PDFKB_EMBEDDING_CACHE_SIZE` | `10000` | Number of cached embeddings in LRU cache |
| `PDFKB_MODEL_CACHE_DIR` | `~/.cache/huggingface` | Local model cache directory |
| `PDFKB_ENABLE_RERANKER` | `false` | Enable/disable result reranking |
| `PDFKB_RERANKER_PROVIDER` | `local` | Reranker provider: 'local' or 'deepinfra' |
| `PDFKB_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-0.6B` | Reranker model for local provider |
| `PDFKB_RERANKER_SAMPLE_ADDITIONAL` | `5` | Additional results to sample for reranking |
| `PDFKB_RERANKER_DEVICE` | `auto` | Hardware device for local reranker (auto, mps, cuda, cpu) |
| `PDFKB_RERANKER_MODEL_CACHE_DIR` | `~/.cache/pdfkb-mcp/reranker` | Cache directory for local reranker models |
| `PDFKB_RERANKER_GGUF_QUANTIZATION` | *optional* | GGUF quantization level (Q6_K, Q8_0, etc.) |
| `PDFKB_DEEPINFRA_API_KEY` | *required* | DeepInfra API key for reranking |
| `PDFKB_DEEPINFRA_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-8B` | Model: Qwen/Qwen3-Reranker-0.6B, 4B, or 8B |
| `PDFKB_ENABLE_SUMMARIZER` | `false` | Enable/disable document summarization |
| `PDFKB_SUMMARIZER_PROVIDER` | `local` | Summarizer provider: 'local' or 'remote' |
| `PDFKB_SUMMARIZER_MODEL` | `Qwen/Qwen3-4B-Instruct-2507-FP8` | Model for summarization |
| `PDFKB_SUMMARIZER_MAX_PAGES` | `10` | Maximum pages to analyze for summarization |
| `PDFKB_SUMMARIZER_DEVICE` | `auto` | Hardware device for local summarizer |
| `PDFKB_SUMMARIZER_MODEL_CACHE_DIR` | `~/.cache/pdfkb-mcp/summarizer` | Cache directory for summarizer models |
| `PDFKB_SUMMARIZER_API_BASE` | *optional* | Custom API base URL for remote summarizer |
| `PDFKB_SUMMARIZER_API_KEY` | *optional* | API key for remote summarizer |
| `PDFKB_EMBEDDING_BATCH_SIZE` | `100` | Embedding batch size |
| `PDFKB_MAX_PARALLEL_PARSING` | `1` | Max concurrent PDF parsing operations |
| `PDFKB_MAX_PARALLEL_EMBEDDING` | `1` | Max concurrent embedding operations |
| `PDFKB_BACKGROUND_QUEUE_WORKERS` | `2` | Number of background processing workers |
| `PDFKB_THREAD_POOL_SIZE` | `1` | Thread pool size for CPU-intensive tasks |
| `PDFKB_VECTOR_SEARCH_K` | `5` | Default search results |
| `PDFKB_FILE_SCAN_INTERVAL` | `60` | File monitoring interval |
| `PDFKB_LOG_LEVEL` | `INFO` | Logging level |
| `PDFKB_WEB_ENABLE` | `false` | Enable/disable web interface |
| `PDFKB_WEB_PORT` | `8080` | Web server port |
| `PDFKB_WEB_HOST` | `localhost` | Web server host |
| `PDFKB_WEB_CORS_ORIGINS` | `http://localhost:3000,http://127.0.0.1:3000` | CORS allowed origins (comma-separated) |

### Parser Comparison Details

| Feature | PyMuPDF4LLM | Marker | MinerU | Docling | LLM |
|---------|-------------|--------|--------|---------|-----|
| **Speed** | Fastest | Medium | Fast (GPU) | Medium | Slowest |
| **Memory** | Lowest | Medium | High | Medium | Lowest |
| **Tables** | Basic | Good | Excellent | **Excellent** | Excellent |
| **Formulas** | Basic | Good | **Excellent** | Good | Excellent |
| **Images** | Basic | Good | Good | **Excellent** | **Excellent** |
| **Setup** | Simple | Simple | Moderate | Simple | Simple |
| **Cost** | Free | Free | Free | Free | API costs |

### Chunking Strategies

**LangChain** (`PDFKB_PDF_CHUNKER=langchain`):
- Header-aware splitting with [`MarkdownHeaderTextSplitter`](src/pdfkb/chunker/chunker_langchain.py)
- Configurable via `PDFKB_CHUNK_SIZE` and `PDFKB_CHUNK_OVERLAP`
- Best for customizable chunking
- Default and installed with base package

**Page** (`PDFKB_PDF_CHUNKER=page`) **🆕 NEW**:
- Page-based chunking that preserves document page boundaries
- Works with page-aware parsers that output individual pages
- Supports merging small pages and splitting large ones
- Configurable via `PDFKB_PAGE_CHUNKER_MIN_CHUNK_SIZE` and `PDFKB_PAGE_CHUNKER_MAX_CHUNK_SIZE`
- Best for preserving original document structure and page-level metadata

**Semantic** (`PDFKB_PDF_CHUNKER=semantic`):
- Advanced semantic chunking using LangChain's [`SemanticChunker`](src/pdfkb/chunker/chunker_semantic.py)
- Groups semantically related content together using embedding similarity
- Four breakpoint detection methods: percentile, standard_deviation, interquartile, gradient
- Preserves context and improves retrieval quality by 40%
- Install extra: `pip install "pdfkb-mcp[semantic]"` to enable
- Configurable via environment variables (see Semantic Chunking section)
- Best for documents requiring high context preservation

**Unstructured** (`PDFKB_PDF_CHUNKER=unstructured`):
- Intelligent semantic chunking with [`unstructured`](src/pdfkb/chunker/chunker_unstructured.py) library
- Zero configuration required
- Install extra: `pip install "pdfkb-mcp[unstructured_chunker]"` to enable
- Best for document structure awareness

### First-run notes

- On the first run, the server initializes caches and vector store and logs selected components:
  - Parser: PyMuPDF4LLM (default)
  - Chunker: LangChain (default)
  - Embedding Model: text-embedding-3-large (default)
- If you select a parser/chunker that isn’t installed, the server logs a warning with the exact install command and falls back to the default components instead of exiting.

### Troubleshooting Guide

**API Key Issues**:
1. Verify key format starts with `sk-`
2. Check account has sufficient credits
3. Test connectivity: `curl -H "Authorization: Bearer $PDFKB_OPENAI_API_KEY" https://api.openai.com/v1/models`

**Parser Installation Issues**:
1. MinerU: `pip install mineru[all]` and verify `mineru --version`
2. Docling: `pip install docling` for basic, `pip install pdfkb-mcp[docling-complete]` for all features
3. LLM: Requires `PDFKB_OPENROUTER_API_KEY` environment variable

**Performance Optimization**:
1. **Speed**: Use `pymupdf4llm` parser (fastest, low memory footprint)
2. **Memory**: Reduce `PDFKB_EMBEDDING_BATCH_SIZE` and `PDFKB_CHUNK_SIZE`; use pypdfium backend for Docling
3. **Quality**: Use `mineru` with GPU (>10K tokens/s on RTX 4090) or `marker` for balanced quality
4. **Tables**: Use `docling` with `PDFKB_DOCLING_TABLE_MODE=ACCURATE` or `marker` with LLM mode
5. **Batch Processing**: Use `marker` on H100 (~25 pages/s) or `mineru` with sglang acceleration

For additional support, see implementation details in [`src/pdfkb/main.py`](src/pdfkb/main.py) and [`src/pdfkb/config.py`](src/pdfkb/config.py).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdfkb-mcp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "ai, chroma, embeddings, knowledge-base, mcp, openai, pdf, vector-search",
    "author": null,
    "author_email": "Juan Villa <juanqui@villafam.com>",
    "download_url": "https://files.pythonhosted.org/packages/43/c8/5aa9cd05dcd4140736bcf5d01736c05ba89176ae743e6b1023c701c73685/pdfkb_mcp-0.7.0.tar.gz",
    "platform": null,
    "description": "# PDF Knowledgebase MCP Server\n\nA Model Context Protocol (MCP) server that enables intelligent document search and retrieval from PDF collections. Built for seamless integration with Claude Desktop, Continue, Cline, and other MCP clients, this server provides advanced search capabilities powered by local, OpenAI, or HuggingFace embeddings and ChromaDB vector storage.\n\n## \u2728 Major New Features\n\n### \ud83e\udd16 Document Summarization (NEW!)\n**Automatically generate rich document metadata using AI**\n- **Local LLM Support**: Use Qwen3, Phi-3, and other local models - no API costs, full privacy\n- **Remote LLM Support**: OpenAI-compatible APIs for cloud-based summarization\n- **Rich Metadata**: Auto-generates titles, short descriptions, and detailed summaries\n- **Smart Content Processing**: Configurable page limits and intelligent content truncation\n- **Fallback Handling**: Graceful degradation when summarization fails\n\n### \ud83c\udf10 Remote Access & Multi-Client Support (NEW!)\n**Access your document repository from anywhere**\n- **SSE Transport Mode**: Server-Sent Events for real-time remote access\n- **HTTP Transport Mode**: RESTful API access for modern MCP clients\n- **Multi-Client Architecture**: Share document processing across multiple clients\n- **Integrated Web + MCP**: Run both web interface and MCP server concurrently\n- **Flexible Deployment**: Local, remote, or hybrid deployment modes\n\n### \ud83c\udfaf Advanced Search & Intelligence\n**Best-in-class document retrieval capabilities**\n- **Hybrid Search**: Combines semantic similarity with keyword matching (BM25)\n- **Reranking Support**: Qwen3-Reranker models for improved search relevance\n- **GGUF Quantized Models**: 50-70% smaller models with maintained quality\n- **Local Embeddings**: Full privacy with HuggingFace models - no API costs\n- **Custom Endpoints**: Support for OpenAI-compatible APIs and custom providers\n- **Semantic Chunking**: Content-aware chunking for better context preservation\n\n### \ud83d\udd04 Enterprise-Ready Operations\n**Production-ready document processing**\n- **Non-blocking Operations**: Background processing with graceful startup\n- **Intelligent Caching**: Multi-stage caching with selective invalidation\n- **Enhanced Monitoring**: Better logging, error handling, and resource management\n- **Graceful Shutdown**: Configurable timeouts and proper cleanup\n- **Performance Optimized**: Improved memory usage and concurrent processing\n## Table of Contents\n\n- [\ud83d\ude80 Quick Start](#-quick-start)\n- [\ud83c\udf10 Web Interface](#-web-interface)\n- [\ud83c\udfd7\ufe0f Architecture Overview](#\ufe0f-architecture-overview)\n- [\ud83d\udcdd Document Summarization](#-document-summarization)\n- [\ud83e\udd16 Local Embeddings](#-local-embeddings)\n- [\ud83d\udd04 Reranking](#-reranking)\n- [\ud83d\udd0d Hybrid Search](#-hybrid-search)\n- [\ud83d\udd3d Minimum Chunk Filtering](#-minimum-chunk-filtering)\n- [\ud83e\udde9 Semantic Chunking](#-semantic-chunking)\n- [\ud83c\udfaf Parser Selection Guide](#-parser-selection-guide)\n- [\u2699\ufe0f Configuration](#\ufe0f-configuration)\n- [\ud83d\udda5\ufe0f MCP Client Setup](#\ufe0f-mcp-client-setup)\n- [\ud83d\udcca Performance & Troubleshooting](#-performance--troubleshooting)\n- [\ud83d\udd27 Advanced Configuration](#-advanced-configuration)\n- [\ud83d\udcda Appendix](#-appendix)\n\n## \ud83d\ude80 Quick Start\n\n### Step 1: Configure Your MCP Client\n\n**\ud83c\udd95 Option A: Complete Local Setup with Document Summarization (No API Key Required)**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\",\n        \"PDFKB_ENABLE_SUMMARIZER\": \"true\",\n        \"PDFKB_SUMMARIZER_PROVIDER\": \"local\",\n        \"PDFKB_SUMMARIZER_MODEL\": \"Qwen/Qwen3-4B-Instruct-2507-FP8\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**Option B: Local Embeddings w/ Hybrid Search (No API Key Required)**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**\ud83c\udd95 Option B: Remote/SSE Mode (Accessible from Multiple Clients)**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\"\n      },\n      \"transport\": \"sse\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**\ud83c\udd95 Option A2: Local GGUF Embeddings (Memory Optimized, No API Key Required)**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_LOCAL_EMBEDDING_MODEL\": \"Qwen/Qwen3-Embedding-0.6B-GGUF\",\n        \"PDFKB_GGUF_QUANTIZATION\": \"Q6_K\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**\ud83c\udd95 Option A3: Local Embeddings with Reranking (Best Search Quality, No API Key Required)**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\",\n        \"PDFKB_ENABLE_RERANKER\": \"true\",\n        \"PDFKB_RERANKER_MODEL\": \"Qwen/Qwen3-Reranker-0.6B\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**Option B: OpenAI Embeddings w/ Hybrid Search**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_EMBEDDING_PROVIDER\": \"openai\",\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**\ud83c\udd95 Option C: HuggingFace w/ Custom Provider**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_EMBEDDING_PROVIDER\": \"huggingface\",\n        \"PDFKB_HUGGINGFACE_EMBEDDING_MODEL\": \"sentence-transformers/all-MiniLM-L6-v2\",\n        \"PDFKB_HUGGINGFACE_PROVIDER\": \"nebius\",\n        \"HF_TOKEN\": \"hf_your_token_here\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**\ud83c\udd95 Option D: Custom OpenAI-Compatible API**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_EMBEDDING_PROVIDER\": \"openai\",\n        \"PDFKB_OPENAI_API_KEY\": \"your-api-key\",\n        \"PDFKB_OPENAI_API_BASE\": \"https://api.studio.nebius.com/v1/\",\n        \"PDFKB_EMBEDDING_MODEL\": \"text-embedding-3-large\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n}\n```\n\n### Step 3: Verify Installation\n\n1. **Restart your MCP client** completely\n2. **Check for PDF KB tools**: Look for `add_document`, `search_documents`, `list_documents`, `remove_document`\n3. **Test functionality**: Try adding a PDF and searching for content\n\n## \ud83c\udf10 Web Interface\n\nThe PDF Knowledgebase includes a modern web interface for easy document management and search. **The web interface is disabled by default and must be explicitly enabled.**\n\n\n### Server Modes\n\n**1. MCP Only Mode - Stdio Transport** (Default):\n```bash\npdfkb-mcp\n```\n- Runs only the MCP server for integration with Claude Desktop, VS Code, etc.\n- Most resource-efficient option\n- Best for pure MCP integration\n\n**2. MCP Only Mode - SSE/Remote Transport**:\n```bash\n# Option A: Environment variable\nPDFKB_TRANSPORT=sse pdfkb-mcp\n\n# Option B: Command line flags\npdfkb-mcp --transport sse --sse-port 8000 --sse-host localhost\n```\n- Runs MCP server in SSE mode for remote access from multiple clients\n- MCP server available at http://localhost:8000 (or configured host/port)\n- Best for centralized document processing accessible from multiple clients\n\n**3. Integrated Mode** (MCP + Web):\n```bash\n# Option A: Environment variable\nPDFKB_WEB_ENABLE=true pdfkb-mcp\n\n# Option B: Command line flag\npdfkb-mcp --enable-web\n```\n- Runs both MCP server AND web interface concurrently\n- Web interface available at http://localhost:8080\n- MCP server runs in stdio mode by default (can be configured to SSE)\n- Best of both worlds: API integration + web UI\n### Web Interface Features\n\n![PDF Knowledgebase Web Interface - Documents List](docs/images/web_documents_list.png)\n*Modern web interface showing document collection with search, filtering, and management capabilities*\n\n- **\ud83d\udcc4 Document Upload**: Drag & drop PDF files or upload via file picker\n- **\ud83d\udd0d Semantic Search**: Powerful vector-based search with real-time results\n- **\ud83d\udcca Document Management**: List, preview, and manage your PDF collection\n- **\ud83d\udcc8 Real-time Status**: Live processing updates via WebSocket connections\n- **\ud83c\udfaf Chunk Explorer**: View and navigate document chunks for detailed analysis\n- **\u2699\ufe0f System Metrics**: Monitor server performance and resource usage\n\n![PDF Knowledgebase Web Interface - Document Summary](docs/images/web_document_summary.png)\n*Detailed document view showing metadata, chunk analysis, and content preview*\n\n### Quick Web Setup\n\n1. **Install and run**:\n   ```bash\n   uvx pdfkb-mcp                    # Install if needed\n   PDFKB_WEB_ENABLE=true pdfkb-mcp  # Start integrated server\n   ```\n\n2. **Open your browser**: http://localhost:8080\n\n3. **Configure environment** (create `.env` file):\n   ```bash\n   PDFKB_OPENAI_API_KEY=sk-proj-abc123def456ghi789...\n   PDFKB_KNOWLEDGEBASE_PATH=/path/to/your/pdfs\n   PDFKB_WEB_PORT=8080\n   PDFKB_WEB_HOST=localhost\n   PDFKB_WEB_ENABLE=true\n   ```\n\n### Web Configuration Options\n\n| Environment Variable | Default | Description |\n|---------------------|---------|-------------|\n| `PDFKB_WEB_ENABLE` | `false` | Enable/disable web interface |\n| `PDFKB_WEB_PORT` | `8080` | Web server port |\n| `PDFKB_WEB_HOST` | `localhost` | Web server host |\n| `PDFKB_WEB_CORS_ORIGINS` | `http://localhost:3000,http://127.0.0.1:3000` | CORS allowed origins |\n\n### Command Line Options\n\nThe server supports command line arguments:\n\n```bash\n# Customize web server port with web interface enabled\npdfkb-mcp --enable-web --port 9000\n\n# Use custom configuration file\npdfkb-mcp --config myconfig.env\n\n# Change log level\npdfkb-mcp --log-level DEBUG\n\n# Enable web interface via command line\npdfkb-mcp --enable-web\n```\n\n### API Documentation\n\nWhen running with web interface enabled, comprehensive API documentation is available at:\n- **Swagger UI**: http://localhost:8080/docs\n- **ReDoc**: http://localhost:8080/redoc\n\n## \ud83c\udfd7\ufe0f Architecture Overview\n\n### MCP Integration\n\n```mermaid\ngraph TB\n    subgraph \"MCP Clients\"\n        C1[Claude Desktop]\n        C2[VS Code/Continue]\n        C3[Other MCP Clients]\n    end\n\n    subgraph \"MCP Protocol Layer\"\n        MCP[Model Context Protocol<br/>Standard Layer]\n    end\n\n    subgraph \"MCP Servers\"\n        PDFKB[PDF KB Server<br/>This Server]\n        S1[Other MCP<br/>Server]\n        S2[Other MCP<br/>Server]\n    end\n\n    C1 --> MCP\n    C2 --> MCP\n    C3 --> MCP\n\n    MCP --> PDFKB\n    MCP --> S1\n    MCP --> S2\n\n    classDef client fill:#e1f5fe,stroke:#01579b,stroke-width:2px\n    classDef protocol fill:#fff3e0,stroke:#e65100,stroke-width:2px\n    classDef server fill:#f3e5f5,stroke:#4a148c,stroke-width:2px\n    classDef highlight fill:#c8e6c9,stroke:#1b5e20,stroke-width:3px\n\n    class C1,C2,C3 client\n    class MCP protocol\n    class S1,S2 server\n    class PDFKB highlight\n```\n\n### Internal Architecture\n\n```mermaid\ngraph LR\n    subgraph \"Input Layer\"\n        PDF[PDF Files]\n        WEB[Web Interface<br/>Port 8080]\n        MCP_IN[MCP Protocol]\n    end\n\n    subgraph \"Processing Pipeline\"\n        PARSER[PDF Parser<br/>PyMuPDF/Marker/MinerU]\n        CHUNKER[Text Chunker<br/>LangChain/Unstructured]\n        EMBED[Embedding Service<br/>Local/OpenAI]\n    end\n\n    subgraph \"Storage Layer\"\n        CACHE[Intelligent Cache<br/>Multi-stage]\n        VECTOR[Vector Store<br/>ChromaDB]\n        TEXT[Text Index<br/>Whoosh BM25]\n    end\n\n    subgraph \"Search Engine\"\n        HYBRID[Hybrid Search<br/>RRF Fusion]\n    end\n\n    PDF --> PARSER\n    WEB --> PARSER\n    MCP_IN --> PARSER\n\n    PARSER --> CHUNKER\n    CHUNKER --> EMBED\n\n    EMBED --> CACHE\n    CACHE --> VECTOR\n    CACHE --> TEXT\n\n    VECTOR --> HYBRID\n    TEXT --> HYBRID\n\n    HYBRID --> WEB\n    HYBRID --> MCP_IN\n\n    classDef input fill:#e3f2fd,stroke:#1565c0,stroke-width:2px\n    classDef process fill:#fff9c4,stroke:#f57f17,stroke-width:2px\n    classDef storage fill:#fce4ec,stroke:#880e4f,stroke-width:2px\n    classDef search fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px\n\n    class PDF,WEB,MCP_IN input\n    class PARSER,CHUNKER,EMBED process\n    class CACHE,VECTOR,TEXT storage\n    class HYBRID search\n```\n\n### Available Tools & Resources\n\n**Tools** (Actions your client can perform):\n- [`add_document(path, metadata?)`](src/pdfkb/main.py:278) - Add PDF to knowledgebase\n- [`search_documents(query, limit=5, metadata_filter?, search_type?)`](src/pdfkb/main.py:345) - Hybrid search across PDFs (semantic + keyword matching)\n- [`list_documents(metadata_filter?)`](src/pdfkb/main.py:422) - List all documents with metadata\n- [`remove_document(document_id)`](src/pdfkb/main.py:488) - Remove document from knowledgebase\n\n**Resources** (Data your client can access):\n- `pdf://{document_id}` - Full document content as JSON\n- `pdf://{document_id}/page/{page_number}` - Specific page content\n- `pdf://list` - List of all documents with metadata\n\n## \ud83e\udd16 Embedding Options\n\nThe server supports three embedding providers, each with different trade-offs:\n\n### 1. Local Embeddings (Default)\n\nRun embeddings locally using HuggingFace models, eliminating API costs and keeping your data completely private.\n\n**Features:**\n- **Zero API Costs**: No external API charges\n- **Complete Privacy**: Documents never leave your machine\n- **Hardware Acceleration**: Automatic detection of Metal (macOS), CUDA (NVIDIA), or CPU\n- **Smart Caching**: LRU cache for frequently embedded texts\n- **Multiple Model Sizes**: Choose based on your hardware capabilities\n\nLocal embeddings are **enabled by default**. No configuration needed for basic usage:\n\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/pdfs\"\n      }\n    }\n  }\n}\n```\n\n### Supported Models\n\n**\ud83c\udd95 Qwen3-Embedding Series Only**: The server now exclusively supports the Qwen3-Embedding model family, including both standard and quantized GGUF variants for optimized performance.\n\n#### Standard Models\n\n| Model | Size | Dimensions | Max Context | Best For |\n|-------|------|------------|-------------|----------|\n| **Qwen/Qwen3-Embedding-0.6B** (default) | 1.2GB | 1024 | 32K tokens | Best overall - long docs, fast |\n| **Qwen/Qwen3-Embedding-4B** | 8.0GB | 2560 | 32K tokens | High quality, long context |\n| **Qwen/Qwen3-Embedding-8B** | 16.0GB | 3584 | 32K tokens | Maximum quality, long context |\n\n#### \ud83c\udd95 GGUF Quantized Models (Reduced Memory Usage)\n\n| Model | Size | Dimensions | Max Context | Best For |\n|-------|------|------------|-------------|----------|\n| **Qwen/Qwen3-Embedding-0.6B-GGUF** | 0.6GB | 1024 | 32K tokens | Quantized lightweight, 32K context |\n| **Qwen/Qwen3-Embedding-4B-GGUF** | 2.4GB | 2560 | 32K tokens | Quantized high quality, 32K context |\n| **Qwen/Qwen3-Embedding-8B-GGUF** | 4.8GB | 3584 | 32K tokens | Quantized maximum quality, 32K context |\n\nConfigure your preferred model:\n```bash\n# Standard models\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-0.6B\"  # Default\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-4B\"\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-8B\"\n\n# GGUF quantized models (reduced memory usage)\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-0.6B-GGUF\"\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-4B-GGUF\"\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-8B-GGUF\"\n```\n\n#### \ud83c\udd95 GGUF Quantization Options\n\nWhen using GGUF models, you can configure the quantization level to balance between model size and quality:\n\n```bash\n# Configure quantization (default: Q6_K)\nPDFKB_GGUF_QUANTIZATION=\"Q6_K\"    # Default - balanced size/quality\nPDFKB_GGUF_QUANTIZATION=\"Q8_0\"    # Higher quality, larger size\nPDFKB_GGUF_QUANTIZATION=\"F16\"     # Highest quality, largest size\nPDFKB_GGUF_QUANTIZATION=\"Q4_K_M\"  # Smaller size, lower quality\n```\n\n**Quantization Recommendations:**\n- **Q6_K** (default): Best balance of quality and size\n- **Q8_0**: Near-original quality with moderate compression\n- **F16**: Original quality, minimal compression\n- **Q4_K_M**: Maximum compression, acceptable quality loss\n\n### Hardware Optimization\n\nThe server automatically detects and uses the best available hardware:\n\n- **Apple Silicon (M1/M2/M3)**: Uses Metal Performance Shaders (MPS)\n- **NVIDIA GPUs**: Uses CUDA acceleration\n- **CPU Fallback**: Optimized for multi-core processing\n\nForce a specific device if needed:\n```bash\nPDFKB_EMBEDDING_DEVICE=\"mps\"   # Force Metal/MPS\nPDFKB_EMBEDDING_DEVICE=\"cuda\"  # Force CUDA\nPDFKB_EMBEDDING_DEVICE=\"cpu\"   # Force CPU\n```\n\n### Configuration Options\n\n```bash\n# Embedding provider (local or openai)\nPDFKB_EMBEDDING_PROVIDER=\"local\"  # Default\n\n# Model selection (Qwen3-Embedding series only)\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-0.6B\"  # Default\n# Standard options:\n# - \"Qwen/Qwen3-Embedding-0.6B\" (1.2GB, 1024 dims, default)\n# - \"Qwen/Qwen3-Embedding-4B\" (8GB, 2560 dims, high quality)\n# - \"Qwen/Qwen3-Embedding-8B\" (16GB, 3584 dims, maximum quality)\n# GGUF quantized options (reduced memory usage):\n# - \"Qwen/Qwen3-Embedding-0.6B-GGUF\" (0.6GB, 1024 dims)\n# - \"Qwen/Qwen3-Embedding-4B-GGUF\" (2.4GB, 2560 dims)\n# - \"Qwen/Qwen3-Embedding-8B-GGUF\" (4.8GB, 3584 dims)\n\n# GGUF quantization configuration (only used with GGUF models)\nPDFKB_GGUF_QUANTIZATION=\"Q6_K\"  # Default quantization level\n# Available options: Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S\n\n# Performance tuning\nPDFKB_LOCAL_EMBEDDING_BATCH_SIZE=32  # Adjust based on memory\nPDFKB_EMBEDDING_CACHE_SIZE=10000     # Number of cached embeddings\nPDFKB_MAX_SEQUENCE_LENGTH=512        # Maximum text length\n\n# Hardware acceleration\nPDFKB_EMBEDDING_DEVICE=\"auto\"        # auto, mps, cuda, cpu\nPDFKB_USE_MODEL_OPTIMIZATION=true    # Enable torch.compile optimization\n\n# Fallback options\nPDFKB_FALLBACK_TO_OPENAI=false  # Use OpenAI if local fails\n```\n\n### 2. OpenAI Embeddings\n\nUse OpenAI's embedding API or **any OpenAI-compatible endpoint** for high-quality embeddings with minimal setup.\n\n**Features:**\n- **High Quality**: State-of-the-art embedding models\n- **No Local Resources**: Runs entirely in the cloud\n- **Fast**: Optimized API with batching support\n- **\ud83c\udd95 Custom Endpoints**: Support for OpenAI-compatible APIs like Together, Nebius, etc.\n\n**Standard OpenAI:**\n```json\n{\n  \"env\": {\n    \"PDFKB_EMBEDDING_PROVIDER\": \"openai\",\n    \"PDFKB_OPENAI_API_KEY\": \"sk-proj-...\",\n    \"PDFKB_EMBEDDING_MODEL\": \"text-embedding-3-large\"\n  }\n}\n```\n\n**\ud83c\udd95 Custom OpenAI-Compatible Endpoints:**\n```json\n{\n  \"env\": {\n    \"PDFKB_EMBEDDING_PROVIDER\": \"openai\",\n    \"PDFKB_OPENAI_API_KEY\": \"your-api-key\",\n    \"PDFKB_OPENAI_API_BASE\": \"https://api.studio.nebius.com/v1/\",\n    \"PDFKB_EMBEDDING_MODEL\": \"text-embedding-3-large\"\n  }\n}\n```\n\n### 3. HuggingFace Embeddings\n\n**\ud83c\udd95 ENHANCED**: Use HuggingFace's Inference API with support for custom providers and thousands of embedding models.\n\n**Features:**\n- **\ud83c\udd95 Multiple Providers**: Use HuggingFace directly or third-party providers like Nebius\n- **Wide Model Selection**: Access to thousands of embedding models\n- **Cost-Effective**: Many free or low-cost options available\n- **\ud83c\udd95 Provider Support**: Seamlessly switch between HuggingFace and custom inference providers\n\n**Configuration:**\n\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"pdfkb-mcp\",\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/your/pdfs\",\n        \"PDFKB_EMBEDDING_PROVIDER\": \"huggingface\",\n        \"PDFKB_HUGGINGFACE_EMBEDDING_MODEL\": \"sentence-transformers/all-MiniLM-L6-v2\",\n        \"HF_TOKEN\": \"hf_your_token_here\"\n      }\n    }\n  }\n}\n```\n\n**Advanced Configuration:**\n\n```bash\n# Use a specific provider like Nebius\nPDFKB_HUGGINGFACE_PROVIDER=nebius\nPDFKB_HUGGINGFACE_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B\n\n# Or use HuggingFace directly (auto/default)\nPDFKB_HUGGINGFACE_PROVIDER=  # Leave empty for auto\n```\n\n### Performance Tips\n\n1. **Batch Size**: Larger batches are faster but use more memory\n   - Apple Silicon: 32-64 recommended\n   - NVIDIA GPUs: 64-128 recommended\n   - CPU: 16-32 recommended\n\n2. **Model Selection**: Choose based on your needs\n   - **Default (Qwen3-0.6B)**: Best for most users - 32K context, fast, 1.2GB\n   - **GGUF (Qwen3-0.6B-GGUF)**: Memory-optimized version - 32K context, fast, 0.6GB\n   - **High Quality (Qwen3-4B)**: Better accuracy - 32K context, 8GB\n   - **GGUF High Quality (Qwen3-4B-GGUF)**: Memory-optimized high quality - 32K context, 2.4GB\n   - **Maximum Quality (Qwen3-8B)**: Best accuracy - 32K context, 16GB\n   - **GGUF Maximum Quality (Qwen3-8B-GGUF)**: Memory-optimized maximum quality - 32K context, 4.8GB\n\n3. **GGUF Quantization**: Choose based on memory constraints\n   - **Q6_K** (default): Best balance of quality and size\n   - **Q8_0**: Higher quality, larger size\n   - **F16**: Near-original quality, largest size\n   - **Q4_K_M**: Smallest size, acceptable quality\n\n4. **Memory Management**: The server automatically handles OOM errors by reducing batch size\n\n## \ud83d\udcdd Markdown Document Support\n\nThe server now supports **Markdown documents** (.md, .markdown) alongside PDFs, perfect for:\n- Pre-processed documents where you've already extracted clean markdown\n- Technical documentation and notes\n- Avoiding complex PDF parsing for better quality content\n- Faster processing with no conversion overhead\n\n### Features\n\n- **Native Processing**: Markdown files are read directly without conversion\n- **Page Boundary Detection**: Automatically splits documents on page markers like `--[PAGE: 142]--`\n- **Frontmatter Support**: Automatically extracts YAML/TOML frontmatter metadata\n- **Title Extraction**: Intelligently extracts titles from H1 headers or frontmatter\n- **Same Pipeline**: Uses the same chunking, embedding, and search infrastructure as PDFs\n- **Mixed Collections**: Search across both PDFs and Markdown documents seamlessly\n\n### Usage\n\nSimply add Markdown files the same way you add PDFs:\n\n```python\n# In your MCP client\nawait add_document(\"/path/to/document.md\")\nawait add_document(\"/path/to/paper.pdf\")\n\n# Search across both types\nresults = await search_documents(\"your query\")\n```\n\n### Configuration\n\n```bash\n# Markdown-specific settings\nPDFKB_MARKDOWN_PAGE_BOUNDARY_PATTERN=\"--\\\\[PAGE:\\\\s*(\\\\d+)\\\\]--\"  # Regex pattern for page boundaries\nPDFKB_MARKDOWN_SPLIT_ON_PAGE_BOUNDARIES=true  # Enable page boundary detection\nPDFKB_MARKDOWN_PARSE_FRONTMATTER=true  # Parse YAML/TOML frontmatter (default: true)\nPDFKB_MARKDOWN_EXTRACT_TITLE=true      # Extract title from first H1 (default: true)\n```\n\n## \ud83d\udd04 Reranking\n\n**\ud83c\udd95 NEW**: The server now supports **advanced reranking** using multiple providers to significantly improve search result relevance and quality. Reranking is a post-processing step that re-orders initial search results based on deeper semantic understanding.\n\n### Supported Providers\n\n1. **Local Models**: Qwen3-Reranker models (both standard and GGUF quantized variants)\n2. **DeepInfra API**: Qwen3-Reranker-8B via DeepInfra's native API\n\n### How It Works\n\n1. **Initial Search**: Retrieves `limit + reranker_sample_additional` candidates using hybrid/vector/text search\n2. **Reranking**: Uses Qwen3-Reranker to deeply analyze query-document relevance and re-score results\n3. **Final Results**: Returns the top `limit` results based on reranker scores\n\n### Supported Models\n\n#### Local Models (Qwen3-Reranker Series)\n\n**Standard Models**\n| Model | Size | Best For |\n|-------|------|----------|\n| **Qwen/Qwen3-Reranker-0.6B** (default) | 1.2GB | Lightweight, fast reranking |\n| **Qwen/Qwen3-Reranker-4B** | 8.0GB | High quality reranking |\n| **Qwen/Qwen3-Reranker-8B** | 16.0GB | Maximum quality reranking |\n\n**\ud83c\udd95 GGUF Quantized Models (Reduced Memory Usage)**\n| Model | Size | Best For |\n|-------|------|----------|\n| **Mungert/Qwen3-Reranker-0.6B-GGUF** | 0.3GB | Quantized lightweight, very fast |\n| **Mungert/Qwen3-Reranker-4B-GGUF** | 2.0GB | Quantized high quality |\n| **Mungert/Qwen3-Reranker-8B-GGUF** | 4.0GB | Quantized maximum quality |\n\n#### \ud83c\udd95 DeepInfra Model\n\n| Model | Best For |\n|-------|----------|\n| **Qwen/Qwen3-Reranker-8B** | High-quality cross-encoder reranking via DeepInfra API |\n\n### Configuration\n\n#### Option 1: Local Reranking (Standard Models)\n```bash\n# Enable reranking with local models\nPDFKB_ENABLE_RERANKER=true\nPDFKB_RERANKER_PROVIDER=local  # Default\n\n# Choose reranker model\nPDFKB_RERANKER_MODEL=\"Qwen/Qwen3-Reranker-0.6B\"  # Default\nPDFKB_RERANKER_MODEL=\"Qwen/Qwen3-Reranker-4B\"     # Higher quality\nPDFKB_RERANKER_MODEL=\"Qwen/Qwen3-Reranker-8B\"     # Maximum quality\n\n# Configure candidate sampling\nPDFKB_RERANKER_SAMPLE_ADDITIONAL=5  # Default: get 5 extra candidates for reranking\n\n# Optional: specify device\nPDFKB_RERANKER_DEVICE=\"mps\"         # For Apple Silicon\nPDFKB_RERANKER_DEVICE=\"cuda\"        # For NVIDIA GPUs\nPDFKB_RERANKER_DEVICE=\"cpu\"         # For CPU-only\n```\n\n#### Option 2: GGUF Quantized Local Reranking (Memory Optimized)\n```bash\n# Enable reranking with GGUF quantized models\nPDFKB_ENABLE_RERANKER=true\nPDFKB_RERANKER_PROVIDER=local\n\n# Choose GGUF reranker model\nPDFKB_RERANKER_MODEL=\"Mungert/Qwen3-Reranker-0.6B-GGUF\"  # Smallest\nPDFKB_RERANKER_MODEL=\"Mungert/Qwen3-Reranker-4B-GGUF\"    # Balanced\nPDFKB_RERANKER_MODEL=\"Mungert/Qwen3-Reranker-8B-GGUF\"    # Highest quality\n\n# Configure GGUF quantization level\nPDFKB_RERANKER_GGUF_QUANTIZATION=\"Q6_K\"  # Balanced (recommended)\nPDFKB_RERANKER_GGUF_QUANTIZATION=\"Q8_0\"  # Higher quality, larger\nPDFKB_RERANKER_GGUF_QUANTIZATION=\"Q4_K_M\" # Smaller, lower quality\n\n# Configure candidate sampling\nPDFKB_RERANKER_SAMPLE_ADDITIONAL=5  # Default: get 5 extra candidates\n```\n\n#### \ud83c\udd95 Option 3: DeepInfra Reranking (API-based)\n```bash\n# Enable reranking with DeepInfra\nPDFKB_ENABLE_RERANKER=true\nPDFKB_RERANKER_PROVIDER=deepinfra\n\n# Set your DeepInfra API key\nPDFKB_DEEPINFRA_API_KEY=\"your-deepinfra-api-key\"\n\n# Optional: Choose model (default: Qwen/Qwen3-Reranker-8B)\n# Available: Qwen/Qwen3-Reranker-0.6B, Qwen/Qwen3-Reranker-4B, Qwen/Qwen3-Reranker-8B\nPDFKB_DEEPINFRA_RERANKER_MODEL=\"Qwen/Qwen3-Reranker-8B\"\n\n# Configure candidate sampling\nPDFKB_RERANKER_SAMPLE_ADDITIONAL=8  # Sample 8 extra docs for reranking\n```\n\n**About DeepInfra Reranker**:\n- Supports three Qwen3-Reranker models:\n  - **0.6B**: Lightweight model, fastest inference\n  - **4B**: Balanced model with good quality and speed\n  - **8B**: Maximum quality model (default)\n- Optimized for high-quality cross-encoder relevance scoring\n- Pay-per-use pricing model\n- Get your API key at https://deepinfra.com\n- Note: The API requires equal-length query and document arrays, so the query is duplicated for each document internally\n\n#### Complete Examples\n\n**Local Reranking with GGUF Models**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\",\n        \"PDFKB_ENABLE_RERANKER\": \"true\",\n        \"PDFKB_RERANKER_PROVIDER\": \"local\",\n        \"PDFKB_RERANKER_MODEL\": \"Mungert/Qwen3-Reranker-4B-GGUF\",\n        \"PDFKB_RERANKER_GGUF_QUANTIZATION\": \"Q6_K\",\n        \"PDFKB_RERANKER_SAMPLE_ADDITIONAL\": \"8\",\n        \"PDFKB_LOCAL_EMBEDDING_MODEL\": \"Qwen/Qwen3-Embedding-0.6B-GGUF\",\n        \"PDFKB_GGUF_QUANTIZATION\": \"Q6_K\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**\ud83c\udd95 DeepInfra Reranking with Local Embeddings**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[hybrid]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\",\n        \"PDFKB_ENABLE_RERANKER\": \"true\",\n        \"PDFKB_RERANKER_PROVIDER\": \"deepinfra\",\n        \"PDFKB_DEEPINFRA_API_KEY\": \"your-deepinfra-api-key\",\n        \"PDFKB_RERANKER_SAMPLE_ADDITIONAL\": \"8\",\n        \"PDFKB_LOCAL_EMBEDDING_MODEL\": \"Qwen/Qwen3-Embedding-0.6B\",\n        \"PDFKB_EMBEDDING_PROVIDER\": \"local\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n### Performance Impact\n\n**Search Quality**: Reranking typically improves search relevance by 15-30% by better understanding query intent and document relevance.\n\n**Memory Usage**:\n- Local standard models: 1.2GB - 16GB depending on model size\n- GGUF quantized: 0.3GB - 4GB depending on model and quantization\n- DeepInfra: No local memory usage (API-based)\n\n**Speed**:\n- Local models: Adds ~100-500ms per search\n- GGUF models: Slightly slower initial load, similar inference\n- DeepInfra: Adds ~200-800ms depending on API latency\n\n**Cost**:\n- Local models: Free after initial download\n- DeepInfra: Pay-per-use based on token usage\n\n### When to Use Reranking\n\n**\u2705 Recommended for:**\n- High-stakes searches where quality matters most\n- Complex queries requiring nuanced understanding\n- Large document collections with diverse content\n- When you have adequate hardware resources\n\n**\u274c Skip reranking for:**\n- Simple keyword-based searches\n- Real-time applications requiring sub-100ms responses\n- Limited memory/compute environments\n- Very small document collections (<100 documents)\n\n### GGUF Quantization Recommendations\n\nFor GGUF reranker models, choose quantization based on your needs:\n\n- **Q6_K** (recommended): Best balance of quality and size\n- **Q8_0**: Near-original quality with moderate compression\n- **F16**: Original quality, minimal compression\n- **Q4_K_M**: Maximum compression, acceptable quality loss\n- **Q4_K_S**: Small size, lower quality\n- **Q5_K_M**: Medium compression and quality\n- **Q5_K_S**: Smaller variant of Q5\n\n## \ud83d\udcdd Document Summarization\n\nThe server supports **automatic document summarization** to generate meaningful titles, short descriptions, and detailed summaries for each document. This creates rich metadata that improves document organization and search quality.\n\n### Summary Components\n\nEach processed document can automatically generate:\n- **Title**: A descriptive title that captures the document's main subject (max 80 characters)\n- **Short Description**: A concise 1-2 sentence summary (max 200 characters)\n- **Long Description**: A detailed paragraph explaining content, key points, and findings (max 500 characters)\n\n### Summarization Options\n\n#### Option 1: Local LLM Summarization\n\n```bash\n# Enable summarization with local LLM\nPDFKB_ENABLE_SUMMARIZER=true\nPDFKB_SUMMARIZER_PROVIDER=local\n\n# Model selection (default: Qwen/Qwen3-4B-Instruct-2507-FP8)\nPDFKB_SUMMARIZER_MODEL=\"Qwen/Qwen3-4B-Instruct-2507-FP8\"  # Balanced (default)\nPDFKB_SUMMARIZER_MODEL=\"Qwen/Qwen3-1.5B-Instruct\"        # Lightweight\nPDFKB_SUMMARIZER_MODEL=\"Qwen/Qwen3-8B-Instruct\"          # High quality\n\n# Hardware configuration\nPDFKB_SUMMARIZER_DEVICE=\"auto\"  # auto, mps, cuda, cpu\nPDFKB_SUMMARIZER_MODEL_CACHE_DIR=\"~/.cache/pdfkb-mcp/summarizer\"\n\n# Content configuration\nPDFKB_SUMMARIZER_MAX_PAGES=10  # Number of pages to analyze (default: 10)\n```\n\n**About Local Summarization**:\n- Uses transformer-based instruction-tuned models locally\n- No API costs or external dependencies\n- Full privacy - content never leaves your machine\n- Supports multiple model sizes for different hardware capabilities\n- Configurable page limits to manage processing time\n\n#### Option 2: Remote LLM Summarization (OpenAI-Compatible)\n\n```bash\n# Enable summarization with remote API\nPDFKB_ENABLE_SUMMARIZER=true\nPDFKB_SUMMARIZER_PROVIDER=remote\n\n# API configuration\nPDFKB_SUMMARIZER_API_KEY=\"your-api-key\"              # Optional, falls back to OPENAI_API_KEY\nPDFKB_SUMMARIZER_API_BASE=\"https://api.openai.com/v1\"  # Custom API endpoint\nPDFKB_SUMMARIZER_MODEL=\"gpt-4\"                      # Model to use\n\n# Content configuration\nPDFKB_SUMMARIZER_MAX_PAGES=10  # Number of pages to analyze\n```\n\n**About Remote Summarization**:\n- Works with OpenAI API and compatible services\n- Supports custom API endpoints for other providers\n- Higher quality summaries with advanced models\n- Pay-per-use pricing model\n- Faster processing for large documents\n\n### Usage Examples\n\n**Local Summarization with Custom Model**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/pdfs\",\n        \"PDFKB_ENABLE_SUMMARIZER\": \"true\",\n        \"PDFKB_SUMMARIZER_PROVIDER\": \"local\",\n        \"PDFKB_SUMMARIZER_MODEL\": \"Qwen/Qwen3-4B-Instruct-2507-FP8\",\n        \"PDFKB_SUMMARIZER_MAX_PAGES\": \"15\",\n        \"PDFKB_SUMMARIZER_DEVICE\": \"mps\"\n      }\n    }\n  }\n}\n```\n\n**Remote Summarization with Custom Endpoint**\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/pdfs\",\n        \"PDFKB_ENABLE_SUMMARIZER\": \"true\",\n        \"PDFKB_SUMMARIZER_PROVIDER\": \"remote\",\n        \"PDFKB_SUMMARIZER_API_KEY\": \"your-api-key\",\n        \"PDFKB_SUMMARIZER_MODEL\": \"gpt-4\",\n        \"PDFKB_SUMMARIZER_MAX_PAGES\": \"20\"\n      }\n    }\n  }\n}\n```\n\n### Performance Considerations\n\n**Local Models**:\n- **Qwen3-1.5B**: ~3GB RAM, fast processing, good quality\n- **Qwen3-4B-FP8**: ~8GB RAM, balanced speed/quality (recommended)\n- **Qwen3-8B**: ~16GB RAM, highest quality, slower processing\n\n**Remote Models**:\n- **GPT-3.5-turbo**: Fast, cost-effective, good quality\n- **GPT-4**: Highest quality, more expensive, slower\n- **Custom models**: Varies by provider\n\n**Page Limits**:\n- More pages = better context but slower processing\n- Recommended: 10-20 pages for most documents\n- Academic papers: 5-10 pages (focus on abstract/conclusion)\n- Technical manuals: 15-25 pages (capture key sections)\n\n### When to Use Summarization\n\n**Recommended for**:\n- Large document collections requiring organization\n- Research document management\n- Content discovery and browsing\n- Document metadata enhancement\n\n**Consider disabling for**:\n- Very small document collections\n- Documents with highly sensitive content (use local if needed)\n- Limited processing resources\n- Real-time document processing requirements\n\n## \ud83d\udd0d Hybrid Search\n\nThe server now supports **Hybrid Search**, which combines the strengths of semantic similarity search (vector embeddings) with traditional keyword matching (BM25) for improved search quality.\n\n### How It Works\n\n1. **Dual Indexing**: Documents are indexed in both a vector database (ChromaDB) and a full-text search index (Whoosh)\n2. **Parallel Search**: Queries execute both semantic and keyword searches simultaneously\n3. **Reciprocal Rank Fusion (RRF)**: Results are intelligently merged using RRF algorithm for optimal ranking\n\n### Benefits\n\n- **Better Recall**: Finds documents that match exact keywords even if semantically different\n- **Improved Precision**: Combines conceptual understanding with keyword relevance\n- **Technical Terms**: Excellent for technical documentation, code references, and domain-specific terminology\n- **Balanced Results**: Configurable weights let you adjust the balance between semantic and keyword matching\n\n### Configuration\n\nEnable hybrid search by setting:\n```bash\nPDFKB_ENABLE_HYBRID_SEARCH=true  # Enable hybrid search (default: true)\nPDFKB_HYBRID_VECTOR_WEIGHT=0.6   # Weight for semantic search (default: 0.6)\nPDFKB_HYBRID_TEXT_WEIGHT=0.4     # Weight for keyword search (default: 0.4)\nPDFKB_RRF_K=60                   # RRF constant (default: 60)\n```\n\n### Installation\n\nTo use hybrid search, install with the optional dependency:\n```bash\npip install \"pdfkb-mcp[hybrid]\"\n```\n\nOr if using uvx, it's included by default when hybrid search is enabled.\n\n## \ud83d\udd3d Minimum Chunk Filtering\n\n**NEW**: The server now supports **Minimum Chunk Filtering**, which automatically filters out short, low-information chunks that don't contain enough content to be useful for search and retrieval.\n\n### How It Works\n\nDocuments are processed normally through parsing and chunking, then chunks below the configured character threshold are automatically filtered out before indexing and embedding.\n\n### Benefits\n\n- **Improved Search Quality**: Eliminates noise from short, uninformative chunks\n- **Reduced Storage**: Less vector storage and faster search by removing low-value content\n- **Better Context**: Search results focus on chunks with substantial, meaningful content\n- **Configurable**: Set custom thresholds based on your document types and use case\n\n### Configuration\n\n```bash\n# Enable filtering (default: 0 = disabled)\nPDFKB_MIN_CHUNK_SIZE=150  # Filter chunks smaller than 150 characters\n\n# Examples for different use cases:\nPDFKB_MIN_CHUNK_SIZE=100  # Permissive - keep most content\nPDFKB_MIN_CHUNK_SIZE=200  # Stricter - only substantial chunks\nPDFKB_MIN_CHUNK_SIZE=0    # Disabled - keep all chunks (default)\n```\n\nOr in your MCP client configuration:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-...\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/pdfs\",\n        \"PDFKB_MIN_CHUNK_SIZE\": \"150\"\n      }\n    }\n  }\n}\n```\n\n### Usage Guidelines\n\n- **Default (0)**: No filtering - keeps all chunks for maximum recall\n- **Conservative (100-150)**: Good balance - removes very short chunks while preserving content\n- **Aggressive (200+)**: Strict filtering - only keeps substantial chunks with rich content\n\n## \ud83e\udde9 Semantic Chunking\n\n**NEW**: The server now supports advanced **Semantic Chunking**, which uses embedding similarity to identify natural content boundaries, creating more coherent and contextually complete chunks than traditional methods.\n\n### How It Works\n\n1. **Sentence Embedding**: Each sentence in the document is embedded using your configured embedding model\n2. **Similarity Analysis**: Distances between consecutive sentence embeddings are calculated\n3. **Breakpoint Detection**: Natural content boundaries are identified where similarity drops significantly\n4. **Intelligent Grouping**: Related sentences are kept together in the same chunk\n\n### Benefits\n\n- **40% Better Coherence**: Chunks contain semantically related content\n- **Context Preservation**: Important context stays together, reducing information loss\n- **Improved Retrieval**: Better search results due to more meaningful chunks\n- **Flexible Configuration**: Four different breakpoint detection methods for different document types\n\n### Quick Start\n\nEnable semantic chunking by setting:\n```bash\nPDFKB_PDF_CHUNKER=semantic\nPDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE=percentile  # Default\nPDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT=95.0      # Default\n```\n\nOr in your MCP client configuration:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[semantic]\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/pdfs\",\n        \"PDFKB_PDF_CHUNKER\": \"semantic\",\n        \"PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE\": \"percentile\",\n        \"PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT\": \"95.0\"\n      }\n    }\n  }\n}\n```\n\n### Breakpoint Detection Methods\n\n| Method | Best For | Threshold Range | Description |\n|--------|----------|-----------------|-------------|\n| **percentile** (default) | General documents | 90-99 | Split at top N% largest semantic gaps |\n| **standard_deviation** | Consistent style docs | 2.0-4.0 | Split at mean + N\u00d7\u03c3 distance |\n| **interquartile** | Noisy documents | 1.0-2.0 | Split at mean + N\u00d7IQR, robust to outliers |\n| **gradient** | Technical/legal docs | 90-99 | Analyze rate of change in similarity |\n\n### Configuration Options\n\n```bash\n# Breakpoint detection method\nPDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE=percentile  # percentile, standard_deviation, interquartile, gradient\n\n# Threshold amount (interpretation depends on type)\nPDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT=95.0  # For percentile/gradient: 0-100, for others: positive float\n\n# Context buffer size (sentences to include around breakpoints)\nPDFKB_SEMANTIC_CHUNKER_BUFFER_SIZE=1  # Default: 1\n\n# Optional: Fixed number of chunks (overrides threshold-based splitting)\nPDFKB_SEMANTIC_CHUNKER_NUMBER_OF_CHUNKS=  # Leave empty for dynamic\n\n# Minimum chunk size in characters\nPDFKB_SEMANTIC_CHUNKER_MIN_CHUNK_CHARS=100  # Default: 100\n\n# Sentence splitting regex\nPDFKB_SEMANTIC_CHUNKER_SENTENCE_SPLIT_REGEX=\"(?<=[.?!])\\\\s+\"  # Default pattern\n```\n\n### Tuning Guidelines\n\n1. **For General Documents** (default):\n   - Use `percentile` with `95.0` threshold\n   - Good balance between chunk size and coherence\n\n2. **For Technical Documentation**:\n   - Use `gradient` with `90.0` threshold\n   - Better at detecting technical section boundaries\n\n3. **For Academic Papers**:\n   - Use `standard_deviation` with `3.0` threshold\n   - Maintains paragraph and section integrity\n\n4. **For Mixed Content**:\n   - Use `interquartile` with `1.5` threshold\n   - Robust against varying content styles\n\n### Installation\n\nInstall with the semantic chunking dependency:\n```bash\npip install \"pdfkb-mcp[semantic]\"\n```\n\nOr if using uvx:\n```bash\nuvx pdfkb-mcp[semantic]\n```\n\n### Compatibility\n\n- Works with both **local** and **OpenAI** embeddings\n- Compatible with all PDF parsers\n- Integrates with intelligent caching system\n- Falls back to LangChain chunker if dependencies missing\n\n## \ud83c\udfaf Parser Selection Guide\n\n### Decision Tree\n\n```\nDocument Type & Priority?\n\u251c\u2500\u2500 \ud83c\udfc3 Speed Priority \u2192 PyMuPDF4LLM (fastest processing, low memory)\n\u251c\u2500\u2500 \ud83d\udcda Academic Papers \u2192 MinerU (GPU-accelerated, excellent formulas/tables)\n\u251c\u2500\u2500 \ud83d\udcca Business Reports \u2192 Docling (accurate tables, structured output)\n\u251c\u2500\u2500 \u2696\ufe0f Balanced Quality \u2192 Marker (good multilingual, selective OCR)\n\u2514\u2500\u2500 \ud83c\udfaf Maximum Accuracy \u2192 LLM (slow, API costs, complex layouts)\n```\n\n### Performance Comparison\n\n| Parser | Processing Speed | Memory | Text Quality | Table Quality | Best For |\n|--------|------------------|--------|--------------|---------------|----------|\n| **PyMuPDF4LLM** | **Fastest** | Low | Good | Basic-Moderate | RAG pipelines, bulk ingestion |\n| **MinerU** | Fast with GPU\u00b9 | ~4GB VRAM\u00b2 | Excellent | Excellent | Scientific/technical PDFs |\n| **Docling** | 0.9-2.5 pages/s\u00b3 | 2.5-6GB\u2074 | Excellent | **Excellent** | Structured documents, tables |\n| **Marker** | ~25 p/s batch\u2075 | ~4GB VRAM\u2076 | Excellent | Good-Excellent\u2077 | Scientific papers, multilingual |\n| **LLM** | Slow\u2078 | Variable\u2079 | Excellent\u00b9\u2070 | Excellent | Complex layouts, high-value docs |\n\n**Notes:**\n\u00b9 >10,000 tokens/s on RTX 4090 with sglang\n\u00b2 Reported for <1B parameter model\n\u00b3 CPU benchmarks: 0.92-1.34 p/s (native), 1.57-2.45 p/s (pypdfium)\n\u2074 2.42-2.56GB (pypdfium), 6.16-6.20GB (native backend)\n\u2075 Projected on H100 GPU in batch mode\n\u2076 Benchmark configuration on NVIDIA A6000\n\u2077 Enhanced with optional LLM mode for table merging\n\u2078 Order of magnitude slower than traditional parsers\n\u2079 Depends on token usage and model size\n\u00b9\u2070 98.7-100% accuracy when given clean text\n\n## \u2699\ufe0f Configuration\n\n### Tier 1: Basic Configurations (80% of users)\n\n**Default (Recommended)**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"pymupdf4llm\",\n        \"PDFKB_PDF_CHUNKER\": \"langchain\",\n        \"PDFKB_EMBEDDING_MODEL\": \"text-embedding-3-large\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Speed Optimized**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"pymupdf4llm\",\n        \"PDFKB_CHUNK_SIZE\": \"800\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Memory Efficient**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"pymupdf4llm\",\n        \"PDFKB_EMBEDDING_BATCH_SIZE\": \"50\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n### Tier 2: Use Case Specific (15% of users)\n\n**Academic Papers**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"mineru\",\n        \"PDFKB_CHUNK_SIZE\": \"1200\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Business Documents**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"pymupdf4llm\",\n        \"PDFKB_DOCLING_TABLE_MODE\": \"ACCURATE\",\n        \"PDFKB_DOCLING_DO_TABLE_STRUCTURE\": \"true\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Multi-language Documents**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"docling\",\n        \"PDFKB_DOCLING_OCR_LANGUAGES\": \"en,fr,de,es\",\n        \"PDFKB_DOCLING_DO_OCR\": \"true\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Hybrid Search (NEW - Improved Search Quality)**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\",\n        \"PDFKB_HYBRID_VECTOR_WEIGHT\": \"0.6\",\n        \"PDFKB_HYBRID_TEXT_WEIGHT\": \"0.4\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Semantic Chunking (NEW - Context-Aware Chunking)**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp[semantic]\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_PDF_CHUNKER\": \"semantic\",\n        \"PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE\": \"gradient\",\n        \"PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT\": \"90.0\",\n        \"PDFKB_ENABLE_HYBRID_SEARCH\": \"true\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Maximum Quality**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_OPENROUTER_API_KEY\": \"sk-or-v1-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"llm\",\n        \"PDFKB_LLM_MODEL\": \"anthropic/claude-3.5-sonnet\",\n        \"PDFKB_EMBEDDING_MODEL\": \"text-embedding-3-large\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n### Essential Environment Variables\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `PDFKB_OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |\n| `PDFKB_KNOWLEDGEBASE_PATH` | `./pdfs` | Directory containing PDF files |\n| `PDFKB_CACHE_DIR` | `./.cache` | Cache directory for processing |\n| `PDFKB_PDF_PARSER` | `pymupdf4llm` | Parser: `pymupdf4llm` (default), `marker`, `mineru`, `docling`, `llm` |\n| `PDFKB_PDF_CHUNKER` | `langchain` | Chunking strategy: `langchain` (default), `page`, `unstructured`, `semantic` |\n| `PDFKB_CHUNK_SIZE` | `1000` | Target chunk size for LangChain chunker |\n| `PDFKB_WEB_ENABLE` | `false` | Enable/disable web interface |\n| `PDFKB_WEB_PORT` | `8080` | Web server port |\n| `PDFKB_WEB_HOST` | `localhost` | Web server host |\n| `PDFKB_WEB_CORS_ORIGINS` | `http://localhost:3000,http://127.0.0.1:3000` | CORS allowed origins (comma-separated) |\n| `PDFKB_EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI embedding model (use `text-embedding-3-small` for faster processing) |\n| `PDFKB_MIN_CHUNK_SIZE` | `0` | Minimum chunk size in characters (0 = disabled, filters out chunks smaller than this size) |\n| `PDFKB_OPENAI_API_BASE` | *optional* | Custom base URL for OpenAI-compatible APIs (e.g., https://api.studio.nebius.com/v1/) |\n| `PDFKB_HUGGINGFACE_EMBEDDING_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | HuggingFace model for embeddings when using huggingface provider |\n| `PDFKB_HUGGINGFACE_PROVIDER` | *optional* | HuggingFace provider (e.g., \"nebius\"), leave empty for default |\n| `PDFKB_ENABLE_HYBRID_SEARCH` | `true` | Enable hybrid search combining semantic and keyword matching |\n| `PDFKB_HYBRID_VECTOR_WEIGHT` | `0.6` | Weight for semantic search (0-1, must sum to 1 with text weight) |\n| `PDFKB_HYBRID_TEXT_WEIGHT` | `0.4` | Weight for keyword/BM25 search (0-1, must sum to 1 with vector weight) |\n| `PDFKB_RRF_K` | `60` | Reciprocal Rank Fusion constant (higher = less emphasis on rank differences) |\n| `PDFKB_LOCAL_EMBEDDING_MODEL` | `Qwen/Qwen3-Embedding-0.6B` | Local embedding model (Qwen3-Embedding series only) |\n| `PDFKB_GGUF_QUANTIZATION` | `Q6_K` | GGUF quantization level (Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S) |\n| `PDFKB_ENABLE_RERANKER` | `false` | Enable/disable result reranking for improved search quality |\n| `PDFKB_RERANKER_PROVIDER` | `local` | Reranker provider: 'local' or 'deepinfra' |\n| `PDFKB_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-0.6B` | Reranker model for local provider |\n| `PDFKB_RERANKER_SAMPLE_ADDITIONAL` | `5` | Additional results to sample for reranking |\n| `PDFKB_RERANKER_GGUF_QUANTIZATION` | *optional* | GGUF quantization level (Q6_K, Q8_0, etc.) |\n| `PDFKB_DEEPINFRA_API_KEY` | *required* | DeepInfra API key for reranking |\n| `PDFKB_DEEPINFRA_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-8B` | DeepInfra model: 0.6B, 4B, or 8B |\n| `PDFKB_ENABLE_SUMMARIZER` | `false` | Enable/disable document summarization |\n| `PDFKB_SUMMARIZER_PROVIDER` | `local` | Summarizer provider: 'local' or 'remote' |\n| `PDFKB_SUMMARIZER_MODEL` | `Qwen/Qwen3-4B-Instruct-2507-FP8` | Model for summarization |\n| `PDFKB_SUMMARIZER_MAX_PAGES` | `10` | Maximum pages to analyze for summarization |\n| `PDFKB_SUMMARIZER_DEVICE` | `auto` | Hardware device for local summarizer |\n| `PDFKB_SUMMARIZER_MODEL_CACHE_DIR` | `~/.cache/pdfkb-mcp/summarizer` | Cache directory for summarizer models |\n| `PDFKB_SUMMARIZER_API_BASE` | *optional* | Custom API base URL for remote summarizer |\n| `PDFKB_SUMMARIZER_API_KEY` | *optional* | API key for remote summarizer (fallback to OPENAI_API_KEY) |\n\n## \ud83d\udc33 Docker Deployment\n\nDeploy pdfkb-mcp using Docker for consistent, scalable, and isolated deployment across any environment.\n\n### Quick Start with Docker\n\n**1. Using Container Run (Local Embeddings - No API Key Required)**:\n```bash\n# Create directories\nmkdir -p ./documents ./cache ./logs\n\n# Run with Podman (preferred)\npodman run -d \\\n  --name pdfkb-mcp \\\n  -p 8000:8000 \\\n  -v \"$(pwd)/documents:/app/documents:rw\" \\\n  -v \"$(pwd)/cache:/app/cache\" \\\n  -e PDFKB_EMBEDDING_PROVIDER=local \\\n  -e PDFKB_TRANSPORT=http \\\n  pdfkb-mcp:latest\n\n# Or with Docker\ndocker run -d \\\n  --name pdfkb-mcp \\\n  -p 8000:8000 \\\n  -v \"$(pwd)/documents:/app/documents:rw\" \\\n  -v \"$(pwd)/cache:/app/cache\" \\\n  -e PDFKB_EMBEDDING_PROVIDER=local \\\n  -e PDFKB_TRANSPORT=http \\\n  pdfkb-mcp:latest\n```\n\n**2. Using Compose (Recommended: Podman)**:\n```bash\n# 1) Copy the sample file and edit it\ncp docker-compose.sample.yml docker-compose.yml\n\n# 2) Edit docker-compose.yml\n#    - Set the documents volume path to your folder\n#    - Optionally adjust ports, resources, and any env vars\n$EDITOR docker-compose.yml\n\n# 3) Create recommended local directories (if using bind mounts)\nmkdir -p ./documents ./cache ./logs\n\n# 4a) Start with Podman (preferred per project rules)\npodman-compose up -d\n\n# 4b) Or with Docker (if you aren't using Podman)\ndocker compose up -d\n```\n> Security note: docker-compose.yml is already in .gitignore. Do not commit API keys. Use the sample file and keep your local docker-compose.yml untracked.\n\n### Docker Compose Configuration\n\nThe `docker-compose.sample.yml` provides a comprehensive configuration template with:\n\n- \ud83d\udccb **All environment variables** documented with examples and default values\n- \ud83d\udd27 **Logical sections** (Core, Embedding, Web Interface, Processing, Advanced AI, etc.)\n- \ud83d\ude80 **Multiple configuration examples** for different use cases\n- \ud83d\udd12 **Security best practices** with no committed API keys\n- \ud83c\udfaf **Quick start recommendations** at the bottom of the file\n\n**Key Configuration Areas**:\n\n1. **Documents Volume**: Update the path to your document collection:\n   ```yaml\n   volumes:\n     - \"/path/to/your/documents:/app/documents:rw\"  # \u2190 CHANGE THIS\n   ```\n\n2. **Embedding Provider**: Choose your preferred option in the environment section:\n   ```yaml\n   # Local (no API key - recommended for privacy)\n   PDFKB_EMBEDDING_PROVIDER: \"local\"\n\n   # OpenAI/compatible APIs (requires API key)\n   # PDFKB_EMBEDDING_PROVIDER: \"openai\"\n   # PDFKB_OPENAI_API_KEY: \"YOUR-API-KEY-HERE\"\n   ```\n\n3. **Resource Limits**: Adjust based on your system:\n   ```yaml\n   deploy:\n     resources:\n       limits:\n         cpus: '4.0'    # \u2190 Increase for better performance\n         memory: 8G     # \u2190 Increase for large document collections\n   ```\n\n**3. Alternative: Using Environment File**:\nFor sensitive configuration, create a separate `.env` file:\n```bash\n# Create .env file for sensitive settings\ncat > .env << 'EOF'\nPDFKB_OPENAI_API_KEY=sk-proj-your-actual-key-here\nPDFKB_EMBEDDING_PROVIDER=openai\nPDFKB_DEEPINFRA_API_KEY=your-deepinfra-key\nPDFKB_ENABLE_RERANKER=true\nEOF\n\n# Reference in docker-compose.yml\n# env_file:\n#   - .env\n\n# Restart with new configuration\npodman-compose down && podman-compose up -d\n```\n\n### Building from Source\n\n```bash\n# Clone the repository\ngit clone https://github.com/juanqui/pdfkb-mcp.git\ncd pdfkb-mcp\n\n# Copy and customize the configuration\ncp docker-compose.sample.yml docker-compose.yml\n# Edit docker-compose.yml to update volumes and configuration\n$EDITOR docker-compose.yml\n\n# Build with Podman (preferred)\npodman build -t pdfkb-mcp:latest .\n\n# Or use Podman Compose to build\npodman-compose build\n\n# Alternative: Build with Docker\ndocker build -t pdfkb-mcp:latest .\ndocker compose build\n```\n\n### Container Configuration\n\n#### Volume Mounts\n\n**Required Volumes**:\n- **Documents**: `/app/documents` - Mount your PDF/Markdown collection here\n- **Cache**: `/app/cache` - Persistent storage for ChromaDB and processing cache\n\n**Optional Volumes**:\n- **Logs**: `/app/logs` - Container logs (useful for debugging)\n- **Config**: `/app/config` - Custom configuration files\n\n```bash\n# Example with all volumes (Podman preferred)\npodman run -d \\\n  --name pdfkb-mcp \\\n  -p 8000:8000 -p 8080:8080 \\\n  -v \"/path/to/your/documents:/app/documents:rw\" \\\n  -v \"pdfkb-cache:/app/cache\" \\\n  -v \"pdfkb-logs:/app/logs\" \\\n  -e PDFKB_EMBEDDING_PROVIDER=local \\\n  -e PDFKB_WEB_ENABLE=true \\\n  pdfkb-mcp:latest\n\n# Or with Docker\ndocker run -d \\\n  --name pdfkb-mcp \\\n  -p 8000:8000 -p 8080:8080 \\\n  -v \"/path/to/your/documents:/app/documents:rw\" \\\n  -v \"pdfkb-cache:/app/cache\" \\\n  -v \"pdfkb-logs:/app/logs\" \\\n  -e PDFKB_EMBEDDING_PROVIDER=local \\\n  -e PDFKB_WEB_ENABLE=true \\\n  pdfkb-mcp:latest\n```\n\n#### Port Configuration\n\n- **8000**: MCP HTTP/SSE transport (required for MCP clients)\n- **8080**: Web interface (optional, only if `PDFKB_WEB_ENABLE=true`)\n\n#### Environment Variables\n\n**Core Configuration**:\n```bash\n# Documents and cache\nPDFKB_KNOWLEDGEBASE_PATH=/app/documents    # Container path (don't change)\nPDFKB_CACHE_DIR=/app/cache                 # Container path (don't change)\nPDFKB_LOG_LEVEL=INFO                       # DEBUG, INFO, WARNING, ERROR\n\n# Transport mode\nPDFKB_TRANSPORT=http                       # \"http\", \"sse\" (stdio not recommended for containers)\nPDFKB_SERVER_HOST=0.0.0.0                 # Bind to all interfaces\nPDFKB_SERVER_PORT=8000                     # Port inside container\n\n# Embedding provider\nPDFKB_EMBEDDING_PROVIDER=local             # \"local\", \"openai\", \"huggingface\"\nPDFKB_LOCAL_EMBEDDING_MODEL=\"Qwen/Qwen3-Embedding-0.6B\"\n\n# Optional: OpenAI configuration\nPDFKB_OPENAI_API_KEY=sk-proj-your-key-here\nPDFKB_EMBEDDING_MODEL=text-embedding-3-large\n\n# Optional: Web interface\nPDFKB_WEB_ENABLE=false                     # Enable web UI\nPDFKB_WEB_HOST=0.0.0.0                    # Web interface host\nPDFKB_WEB_PORT=8080                       # Web interface port\n```\n\n**Performance Configuration**:\n```bash\n# Processing configuration\nPDFKB_PDF_PARSER=pymupdf4llm              # Parser selection\nPDFKB_DOCUMENT_CHUNKER=langchain           # Chunking strategy\nPDFKB_CHUNK_SIZE=1000                     # Chunk size\nPDFKB_CHUNK_OVERLAP=200                   # Chunk overlap\n\n# Parallel processing (adjust based on container resources)\nPDFKB_MAX_PARALLEL_PARSING=1              # Concurrent PDF processing\nPDFKB_MAX_PARALLEL_EMBEDDING=1            # Concurrent embedding generation\nPDFKB_BACKGROUND_QUEUE_WORKERS=2          # Background workers\n\n# Search configuration\nPDFKB_ENABLE_HYBRID_SEARCH=true           # Hybrid search (recommended)\nPDFKB_ENABLE_RERANKER=false               # Result reranking\nPDFKB_ENABLE_SUMMARIZER=false             # Document summarization\n```\n\n### MCP Client Configuration with Docker\n\n#### For Cline (HTTP Transport)\n\n**MCP Settings (`~/.continue/config.json`)**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"curl\",\n      \"args\": [\n        \"-X\", \"POST\",\n        \"-H\", \"Content-Type: application/json\",\n        \"http://localhost:8000/mcp\"\n      ],\n      \"transport\": \"http\"\n    }\n  }\n}\n```\n\n#### For Roo (SSE Transport)\n\n**Set container to SSE mode**:\n```bash\n# Update docker-compose.yml or add environment variable\nPDFKB_TRANSPORT=sse\n\n# Restart container with Podman\npodman-compose restart\n\n# Or with Docker\ndocker compose restart\n```\n\n**MCP Settings**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"transport\": \"sse\",\n      \"url\": \"http://localhost:8000/sse\"\n    }\n  }\n}\n```\n\n### Production Configuration\n\nFor production deployments, use the comprehensive `docker-compose.sample.yml` as your starting point:\n\n1. **Copy and customize**: `cp docker-compose.sample.yml docker-compose.yml`\n2. **Update paths and secrets**: Edit the documents volume and any API keys\n3. **Adjust resource limits**: Configure CPU/memory based on your infrastructure\n4. **Enable security features**: Review security settings and network configuration\n\nThe sample file includes:\n- \ud83d\udd12 **Security hardening** (non-root user, no-new-privileges)\n- \ud83d\udcca **Resource limits** and health checks\n- \ud83c\udf10 **Network isolation**\n- \ud83d\udccb **Comprehensive environment variable documentation**\n- \ud83d\ude80 **Performance optimization examples**\n\n#### Development Configuration\n\n**docker-compose.dev.yml**:\n```yaml\nversion: '3.8'\n\nservices:\n  pdfkb-mcp-dev:\n    build: .\n    container_name: pdfkb-mcp-dev\n\n    ports:\n      - \"8000:8000\"\n      - \"8080:8080\"\n\n    volumes:\n      - \"./documents:/app/documents:rw\"\n      - \"./src:/app/src:ro\"                    # Live source code mounting\n      - \"./dev-cache:/app/cache\"\n      - \"./dev-logs:/app/logs\"\n\n    environment:\n      - PDFKB_LOG_LEVEL=DEBUG                   # Debug logging\n      - PDFKB_WEB_ENABLE=true                   # Enable web interface\n      - PDFKB_EMBEDDING_PROVIDER=local          # No API costs\n\n    env_file:\n      - .env.dev\n```\n\n### Container Management\n\n#### Health Monitoring\n\n```bash\n# Check container health (Podman preferred)\npodman ps\npodman-compose ps\n\n# Or with Docker\ndocker ps\ndocker compose ps\n\n# View logs\npodman logs pdfkb-mcp      # or: docker logs pdfkb-mcp\npodman-compose logs -f     # or: docker compose logs -f\n\n# Check health endpoint\ncurl http://localhost:8000/health\n\n# Monitor resource usage\npodman stats pdfkb-mcp     # or: docker stats pdfkb-mcp\n```\n\n#### Container Operations\n\n```bash\n# Start/stop container (Podman preferred)\npodman-compose up -d\npodman-compose down\n\n# Or with Docker\ndocker compose up -d\ndocker compose down\n\n# Restart with new configuration\npodman-compose restart     # or: docker compose restart\n\n# Update container image\npodman-compose pull        # or: docker compose pull\npodman-compose up -d       # or: docker compose up -d\n\n# View container details\npodman inspect pdfkb-mcp   # or: docker inspect pdfkb-mcp\n\n# Execute commands in container\npodman exec -it pdfkb-mcp bash   # or: docker exec -it pdfkb-mcp bash\n```\n\n### Troubleshooting\n\n#### Common Issues\n\n**1. Permission Errors**:\n```bash\n# Fix volume permissions\nsudo chown -R 1001:1001 ./documents ./cache ./logs\n\n# Or use current user\nsudo chown -R $(id -u):$(id -g) ./documents ./cache ./logs\n```\n\n**2. Port Conflicts**:\n```bash\n# Check if ports are in use\nnetstat -tulpn | grep :8000\nlsof -i :8000\n\n# Use different ports\npodman run -p 8001:8000 -p 8081:8080 pdfkb-mcp:latest   # Podman\n# or: docker run -p 8001:8000 -p 8081:8080 pdfkb-mcp:latest  # Docker\n```\n\n**3. Memory Issues**:\n```bash\n# Check container memory usage\npodman stats --no-stream   # or: docker stats --no-stream\n\n# Increase memory limits in docker-compose.yml\ndeploy:\n  resources:\n    limits:\n      memory: 8G    # Increase memory\n```\n\n**4. Connection Issues**:\n```bash\n# Test container connectivity\ncurl http://localhost:8000/health\n\n# Check if container is running\npodman ps | grep pdfkb     # or: docker ps | grep pdfkb\n\n# Check logs for errors\npodman logs pdfkb-mcp --tail 50   # or: docker logs pdfkb-mcp --tail 50\n```\n\n#### Debug Mode\n\n```bash\n# Run container in debug mode (Podman preferred)\npodman run -it \\\n  -p 8000:8000 \\\n  -v \"$(pwd)/documents:/app/documents:rw\" \\\n  -e PDFKB_LOG_LEVEL=DEBUG \\\n  -e PDFKB_EMBEDDING_PROVIDER=local \\\n  pdfkb-mcp:latest\n\n# Or with Docker\ndocker run -it \\\n  -p 8000:8000 \\\n  -v \"$(pwd)/documents:/app/documents:rw\" \\\n  -e PDFKB_LOG_LEVEL=DEBUG \\\n  -e PDFKB_EMBEDDING_PROVIDER=local \\\n  pdfkb-mcp:latest\n\n# Use development compose\npodman-compose -f docker-compose.dev.yml up   # or: docker compose -f docker-compose.dev.yml up\n```\n\n#### Performance Tuning\n\n**For Low-Memory Systems**:\n```yaml\nenvironment:\n  - PDFKB_MAX_PARALLEL_PARSING=1\n  - PDFKB_MAX_PARALLEL_EMBEDDING=1\n  - PDFKB_BACKGROUND_QUEUE_WORKERS=1\n  - PDFKB_CHUNK_SIZE=500                      # Smaller chunks\ndeploy:\n  resources:\n    limits:\n      memory: 2G                              # Lower memory limit\n```\n\n**For High-Performance Systems**:\n```yaml\nenvironment:\n  - PDFKB_MAX_PARALLEL_PARSING=4\n  - PDFKB_MAX_PARALLEL_EMBEDDING=2\n  - PDFKB_BACKGROUND_QUEUE_WORKERS=4\n  - PDFKB_CHUNK_SIZE=1500                     # Larger chunks\ndeploy:\n  resources:\n    limits:\n      memory: 8G                              # Higher memory limit\n      cpus: '4.0'\n```\n\n### Security Considerations\n\n- **Non-root execution**: Container runs as user `pdfkb` (UID 1001)\n- **Read-only root filesystem**: Container filesystem is read-only except for mounted volumes\n- **Network isolation**: Use Docker networks for service isolation\n- **Resource limits**: Set appropriate CPU/memory limits\n- **Secret management**: Use Docker secrets or environment files for API keys\n\n## \ud83d\udda5\ufe0f MCP Client Setup\n\n### Claude Desktop\n\n**Configuration File Location**:\n- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`\n- **Windows**: `%APPDATA%\\Claude\\claude_desktop_config.json`\n- **Linux**: `~/.config/Claude/claude_desktop_config.json`\n\n**Configuration**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Users/yourname/Documents\",\n        \"PDFKB_CACHE_DIR\": \"/Users/yourname/Documents/PDFs/.cache\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true,\n                 \"PDFKB_EMBEDDING_MODEL\": \"text-embedding-3-small\",\n    }\n  }\n}\n```\n\n**Verification**:\n1. Restart Claude Desktop completely\n2. Look for PDF KB tools in the interface\n3. Test with \"Add a document\" or \"Search documents\"\n\n### \ud83c\udd95 VS Code with Native MCP Support (SSE Mode)\n\n**Configuration for SSE/Remote Mode** (`.vscode/mcp.json` in workspace):\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/your/pdfs\"\n      },\n      \"transport\": \"sse\",\n      \"autoRestart\": true\n    }\n  }\n}\n```\n\n**Verification**:\n1. Reload VS Code window\n2. Check VS Code's MCP server status in Command Palette\n3. Use MCP tools in Copilot Chat\n\n### VS Code with Continue Extension\n\n**Configuration** (`.continue/config.json`):\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"${workspaceFolder}/pdfs\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Verification**:\n1. Reload VS Code window\n2. Check VS Code's MCP server status in Command Palette\n3. Use MCP tools in Copilot Chat\n\n### VS Code with Continue Extension\n\n**Configuration** (`.continue/config.json`):\n```json\n{\n  \"models\": [...],\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-proj-abc123def456ghi789...\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"${workspaceFolder}/pdfs\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Verification**:\n1. Reload VS Code window\n2. Check Continue panel for server connection\n3. Use `@pdfkb` in Continue chat\n\n### Generic MCP Client\n\n**Standard Configuration Template**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"required\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"required-absolute-path\",\n        \"PDFKB_PDF_PARSER\": \"optional-default-pymupdf4llm\"\n      },\n      \"transport\": \"stdio\",\n      \"autoRestart\": true,\n      \"timeout\": 30000\n    }\n  }\n}\n```\n\n## \ud83d\udcca Performance & Troubleshooting\n\n### Common Issues\n\n**Server not appearing in MCP client**:\n```json\n// \u274c Wrong: Missing transport\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"]\n    }\n  }\n}\n\n// \u2705 Correct: Include transport and restart client\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**System overload when processing multiple PDFs**:\n```bash\n# Reduce parallel operations to prevent system stress\nPDFKB_MAX_PARALLEL_PARSING=1       # Process one PDF at a time\nPDFKB_MAX_PARALLEL_EMBEDDING=1     # Embed one document at a time\nPDFKB_BACKGROUND_QUEUE_WORKERS=1   # Single background worker\n```\n\n**Processing too slow**:\n```json\n// Switch to faster parser and increase parallelism (if system can handle it)\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-key\",\n        \"PDFKB_PDF_PARSER\": \"pymupdf4llm\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Memory issues**:\n```json\n// Reduce memory usage\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-key\",\n        \"PDFKB_EMBEDDING_BATCH_SIZE\": \"25\",\n        \"PDFKB_CHUNK_SIZE\": \"500\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**Poor table extraction**:\n```json\n// Use table-optimized parser\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-key\",\n        \"PDFKB_PDF_PARSER\": \"docling\",\n        \"PDFKB_DOCLING_TABLE_MODE\": \"ACCURATE\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**\ud83c\udd95 SSE/Remote Mode - Client Connection Issues**:\n```json\n// \u274c Wrong: Missing URL for SSE transport (client can't connect)\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_TRANSPORT\": \"sse\"\n      },\n      \"transport\": \"sse\"\n    }\n  }\n}\n\n// \u2705 Correct: Include URL pointing to SSE server\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/path/to/pdfs\",\n        \"PDFKB_TRANSPORT\": \"sse\",\n        \"PDFKB_SSE_HOST\": \"localhost\",\n        \"PDFKB_SSE_PORT\": \"8000\"\n      },\n      \"transport\": \"sse\",\n      \"url\": \"http://localhost:8000\"\n    }\n  }\n}\n```\n**Tip**: Ensure the SSE server is running first (`pdfkb-mcp --transport sse --sse-port 8000`), then configure the client with the correct URL. Check firewall settings if connecting remotely.\n\n**\ud83c\udd95 SSE/Remote Mode - Port Conflicts in Integrated Mode**:\n```bash\n# \u274c Wrong: Web and SSE using same port (will fail to start)\nPDFKB_WEB_ENABLE=true PDFKB_WEB_PORT=8000 PDFKB_TRANSPORT=sse PDFKB_SSE_PORT=8000 pdfkb-mcp\n\n# \u2705 Correct: Use different ports for web (8080) and SSE (8000)\nPDFKB_WEB_ENABLE=true PDFKB_WEB_PORT=8080 PDFKB_TRANSPORT=sse PDFKB_SSE_PORT=8000 pdfkb-mcp\n```\n**Tip**: The server validates port conflicts on startup. Web interface runs on `PDFKB_WEB_PORT` (default 8080), SSE MCP runs on `PDFKB_SSE_PORT` (default 8000). Access web at http://localhost:8080 and connect MCP clients to http://localhost:8000.\n\n**\ud83c\udd95 SSE/Remote Mode - Server Not Starting in SSE Mode**:\n```bash\n# \u274c Wrong: Invalid transport value (server defaults to stdio)\nPDFKB_TRANSPORT=remote pdfkb-mcp  # 'remote' is invalid\n\n# \u2705 Correct: Use 'sse' for remote transport\nPDFKB_TRANSPORT=sse pdfkb-mcp --sse-host 0.0.0.0 --sse-port 8000\n\n# Or via command line flags\npdfkb-mcp --transport sse --sse-host localhost --sse-port 8000\n```\n**Tip**: Valid transport values are 'stdio' (default) or 'sse'. Check server logs for \"Running MCP server in SSE mode on http://host:port\" confirmation. Use `--log-level DEBUG` for detailed startup information.\n\n### Resource Requirements\n\n| Configuration | RAM Usage | Processing Speed | Best For |\n|---------------|-----------|------------------|----------|\n| **Speed** | 2-4 GB | Fastest | Large collections |\n| **Balanced** | 4-6 GB | Medium | Most users |\n| **Quality** | 6-12 GB | Medium-Fast | Accuracy priority |\n| **GPU** | 8-16 GB | Very Fast | High-volume processing |\n\n## \ud83d\udd27 Advanced Configuration\n\n### Parser-Specific Options\n\n**MinerU Configuration**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-key\",\n        \"PDFKB_PDF_PARSER\": \"mineru\",\n        \"PDFKB_MINERU_LANG\": \"en\",\n        \"PDFKB_MINERU_METHOD\": \"auto\",\n        \"PDFKB_MINERU_VRAM\": \"16\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n**LLM Parser Configuration**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-key\",\n        \"PDFKB_OPENROUTER_API_KEY\": \"sk-or-v1-abc123def456ghi789...\",\n        \"PDFKB_PDF_PARSER\": \"llm\",\n        \"PDFKB_LLM_MODEL\": \"google/gemini-2.5-flash-lite\",\n        \"PDFKB_LLM_CONCURRENCY\": \"5\",\n        \"PDFKB_LLM_DPI\": \"150\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n### Performance Tuning\n\n**Parallel Processing Configuration**:\n\nControl the number of concurrent operations to optimize performance and prevent system overload:\n\n```bash\n# Maximum number of PDFs to parse simultaneously\nPDFKB_MAX_PARALLEL_PARSING=1  # Default: 1 (conservative to prevent overload)\n\n# Maximum number of documents to embed simultaneously\nPDFKB_MAX_PARALLEL_EMBEDDING=1  # Default: 1 (prevents API rate limits)\n\n# Number of background queue workers\nPDFKB_BACKGROUND_QUEUE_WORKERS=2  # Default: 2\n\n# Thread pool size for CPU-intensive operations\nPDFKB_THREAD_POOL_SIZE=1  # Default: 1\n```\n\n**Resource-Optimized Setup** (for low-powered systems):\n```json\n{\n  \"env\": {\n    \"PDFKB_MAX_PARALLEL_PARSING\": \"1\",      # Process one PDF at a time\n    \"PDFKB_MAX_PARALLEL_EMBEDDING\": \"1\",    # Embed one document at a time\n    \"PDFKB_BACKGROUND_QUEUE_WORKERS\": \"1\",  # Single background worker\n    \"PDFKB_THREAD_POOL_SIZE\": \"1\"           # Single thread for CPU tasks\n  }\n}\n```\n\n**High-Performance Setup** (for powerful machines):\n```json\n{\n  \"env\": {\n    \"PDFKB_MAX_PARALLEL_PARSING\": \"4\",      # Parse up to 4 PDFs in parallel\n    \"PDFKB_MAX_PARALLEL_EMBEDDING\": \"2\",    # Embed 2 documents simultaneously\n    \"PDFKB_BACKGROUND_QUEUE_WORKERS\": \"4\",  # More background workers\n    \"PDFKB_THREAD_POOL_SIZE\": \"2\",          # More threads for CPU tasks\n    \"PDFKB_EMBEDDING_BATCH_SIZE\": \"200\",    # Larger embedding batches\n    \"PDFKB_VECTOR_SEARCH_K\": \"15\"           # More search results\n  }\n}\n```\n\n**Complete High-Performance Setup**:\n```json\n{\n  \"mcpServers\": {\n    \"pdfkb\": {\n      \"command\": \"uvx\",\n      \"args\": [\"pdfkb-mcp\"],\n      \"env\": {\n        \"PDFKB_OPENAI_API_KEY\": \"sk-key\",\n        \"PDFKB_PDF_PARSER\": \"mineru\",\n        \"PDFKB_KNOWLEDGEBASE_PATH\": \"/Volumes/FastSSD/Documents/PDFs\",\n        \"PDFKB_CACHE_DIR\": \"/Volumes/FastSSD/Documents/PDFs/.cache\",\n        \"PDFKB_MAX_PARALLEL_PARSING\": \"4\",\n        \"PDFKB_MAX_PARALLEL_EMBEDDING\": \"2\",\n        \"PDFKB_BACKGROUND_QUEUE_WORKERS\": \"4\",\n        \"PDFKB_THREAD_POOL_SIZE\": \"2\",\n        \"PDFKB_EMBEDDING_BATCH_SIZE\": \"200\",\n        \"PDFKB_VECTOR_SEARCH_K\": \"15\",\n        \"PDFKB_FILE_SCAN_INTERVAL\": \"30\"\n      },\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\n### Intelligent Caching\n\nThe server uses multi-stage caching:\n- **Parsing Cache**: Stores converted markdown ([`src/pdfkb/intelligent_cache.py:139`](src/pdfkb/intelligent_cache.py:139))\n- **Chunking Cache**: Stores processed chunks\n- **Vector Cache**: ChromaDB embeddings storage\n\n**Cache Invalidation Rules**:\n- Changing `PDFKB_PDF_PARSER` \u2192 Full reset (parsing + chunking + embeddings)\n- Changing `PDFKB_PDF_CHUNKER` \u2192 Partial reset (chunking + embeddings)\n- Changing `PDFKB_EMBEDDING_MODEL` \u2192 Minimal reset (embeddings only)\n\n## \ud83d\udcda Appendix\n\n### Installation Options\n\n**Primary (Recommended)**:\n```bash\nuvx pdfkb-mcp\n**Web Interface Included**: All installation methods include the web interface. Use these commands:\n- `pdfkb-mcp` - MCP server only (default, web disabled)\n- `PDFKB_WEB_ENABLE=true pdfkb-mcp` - Integrated MCP + Web server (web enabled)\n```\n\n**With Specific Parser Dependencies**:\n```bash\nuvx pdfkb-mcp[marker]     # Marker parser\nuvx pdfkb-mcp[mineru]     # MinerU parser\nuvx pdfkb-mcp[docling]    # Docling parser\nuvx pdfkb-mcp[llm]        # LLM parser\nuvx pdfkb-mcp[semantic]   # Semantic chunker (NEW)\nuvx pdfkb-mcp[unstructured_chunker]  # Unstructured chunker\nuvx pdfkb-mcp[web]        # Enhanced web features (psutil for metrics)\n```\n\npip install \"pdfkb-mcp[web]\"               # Enhanced web features\nOr via pip/pipx:\n```bash\npip install \"pdfkb-mcp[marker]\"            # Marker parser\npip install \"pdfkb-mcp[docling-complete]\"  # Docling with OCR and full features\n```\n\n**Development Installation**:\n```bash\ngit clone https://github.com/juanqui/pdfkb-mcp.git\ncd pdfkb-mcp\npip install -e \".[dev]\"\n```\n\n### Complete Environment Variables Reference\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `PDFKB_OPENAI_API_KEY` | *required* | OpenAI API key for embeddings |\n| `PDFKB_OPENROUTER_API_KEY` | *optional* | Required for LLM parser |\n| `PDFKB_KNOWLEDGEBASE_PATH` | `./pdfs` | PDF directory path |\n| `PDFKB_CACHE_DIR` | `./.cache` | Cache directory |\n| `PDFKB_PDF_PARSER` | `pymupdf4llm` | PDF parser selection |\n| `PDFKB_PDF_CHUNKER` | `langchain` | Chunking strategy: `langchain`, `unstructured`, `semantic` |\n| `PDFKB_CHUNK_SIZE` | `1000` | LangChain chunk size |\n| `PDFKB_CHUNK_OVERLAP` | `200` | LangChain chunk overlap |\n| `PDFKB_MIN_CHUNK_SIZE` | `0` | Minimum chunk size in characters (0 = disabled, filters out chunks smaller than this size) |\n| `PDFKB_EMBEDDING_MODEL` | `text-embedding-3-large` | OpenAI model |\n| `PDFKB_OPENAI_API_BASE` | *optional* | Custom base URL for OpenAI-compatible APIs |\n| `PDFKB_HUGGINGFACE_EMBEDDING_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | HuggingFace model |\n| `PDFKB_HUGGINGFACE_PROVIDER` | *optional* | HuggingFace provider (e.g., \"nebius\") |\n| `PDFKB_LOCAL_EMBEDDING_MODEL` | `Qwen/Qwen3-Embedding-0.6B` | Local embedding model (Qwen3-Embedding series only) |\n| `PDFKB_GGUF_QUANTIZATION` | `Q6_K` | GGUF quantization level (Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S) |\n| `PDFKB_EMBEDDING_DEVICE` | `auto` | Hardware device (auto, mps, cuda, cpu) |\n| `PDFKB_USE_MODEL_OPTIMIZATION` | `true` | Enable torch.compile optimization |\n| `PDFKB_EMBEDDING_CACHE_SIZE` | `10000` | Number of cached embeddings in LRU cache |\n| `PDFKB_MODEL_CACHE_DIR` | `~/.cache/huggingface` | Local model cache directory |\n| `PDFKB_ENABLE_RERANKER` | `false` | Enable/disable result reranking |\n| `PDFKB_RERANKER_PROVIDER` | `local` | Reranker provider: 'local' or 'deepinfra' |\n| `PDFKB_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-0.6B` | Reranker model for local provider |\n| `PDFKB_RERANKER_SAMPLE_ADDITIONAL` | `5` | Additional results to sample for reranking |\n| `PDFKB_RERANKER_DEVICE` | `auto` | Hardware device for local reranker (auto, mps, cuda, cpu) |\n| `PDFKB_RERANKER_MODEL_CACHE_DIR` | `~/.cache/pdfkb-mcp/reranker` | Cache directory for local reranker models |\n| `PDFKB_RERANKER_GGUF_QUANTIZATION` | *optional* | GGUF quantization level (Q6_K, Q8_0, etc.) |\n| `PDFKB_DEEPINFRA_API_KEY` | *required* | DeepInfra API key for reranking |\n| `PDFKB_DEEPINFRA_RERANKER_MODEL` | `Qwen/Qwen3-Reranker-8B` | Model: Qwen/Qwen3-Reranker-0.6B, 4B, or 8B |\n| `PDFKB_ENABLE_SUMMARIZER` | `false` | Enable/disable document summarization |\n| `PDFKB_SUMMARIZER_PROVIDER` | `local` | Summarizer provider: 'local' or 'remote' |\n| `PDFKB_SUMMARIZER_MODEL` | `Qwen/Qwen3-4B-Instruct-2507-FP8` | Model for summarization |\n| `PDFKB_SUMMARIZER_MAX_PAGES` | `10` | Maximum pages to analyze for summarization |\n| `PDFKB_SUMMARIZER_DEVICE` | `auto` | Hardware device for local summarizer |\n| `PDFKB_SUMMARIZER_MODEL_CACHE_DIR` | `~/.cache/pdfkb-mcp/summarizer` | Cache directory for summarizer models |\n| `PDFKB_SUMMARIZER_API_BASE` | *optional* | Custom API base URL for remote summarizer |\n| `PDFKB_SUMMARIZER_API_KEY` | *optional* | API key for remote summarizer |\n| `PDFKB_EMBEDDING_BATCH_SIZE` | `100` | Embedding batch size |\n| `PDFKB_MAX_PARALLEL_PARSING` | `1` | Max concurrent PDF parsing operations |\n| `PDFKB_MAX_PARALLEL_EMBEDDING` | `1` | Max concurrent embedding operations |\n| `PDFKB_BACKGROUND_QUEUE_WORKERS` | `2` | Number of background processing workers |\n| `PDFKB_THREAD_POOL_SIZE` | `1` | Thread pool size for CPU-intensive tasks |\n| `PDFKB_VECTOR_SEARCH_K` | `5` | Default search results |\n| `PDFKB_FILE_SCAN_INTERVAL` | `60` | File monitoring interval |\n| `PDFKB_LOG_LEVEL` | `INFO` | Logging level |\n| `PDFKB_WEB_ENABLE` | `false` | Enable/disable web interface |\n| `PDFKB_WEB_PORT` | `8080` | Web server port |\n| `PDFKB_WEB_HOST` | `localhost` | Web server host |\n| `PDFKB_WEB_CORS_ORIGINS` | `http://localhost:3000,http://127.0.0.1:3000` | CORS allowed origins (comma-separated) |\n\n### Parser Comparison Details\n\n| Feature | PyMuPDF4LLM | Marker | MinerU | Docling | LLM |\n|---------|-------------|--------|--------|---------|-----|\n| **Speed** | Fastest | Medium | Fast (GPU) | Medium | Slowest |\n| **Memory** | Lowest | Medium | High | Medium | Lowest |\n| **Tables** | Basic | Good | Excellent | **Excellent** | Excellent |\n| **Formulas** | Basic | Good | **Excellent** | Good | Excellent |\n| **Images** | Basic | Good | Good | **Excellent** | **Excellent** |\n| **Setup** | Simple | Simple | Moderate | Simple | Simple |\n| **Cost** | Free | Free | Free | Free | API costs |\n\n### Chunking Strategies\n\n**LangChain** (`PDFKB_PDF_CHUNKER=langchain`):\n- Header-aware splitting with [`MarkdownHeaderTextSplitter`](src/pdfkb/chunker/chunker_langchain.py)\n- Configurable via `PDFKB_CHUNK_SIZE` and `PDFKB_CHUNK_OVERLAP`\n- Best for customizable chunking\n- Default and installed with base package\n\n**Page** (`PDFKB_PDF_CHUNKER=page`) **\ud83c\udd95 NEW**:\n- Page-based chunking that preserves document page boundaries\n- Works with page-aware parsers that output individual pages\n- Supports merging small pages and splitting large ones\n- Configurable via `PDFKB_PAGE_CHUNKER_MIN_CHUNK_SIZE` and `PDFKB_PAGE_CHUNKER_MAX_CHUNK_SIZE`\n- Best for preserving original document structure and page-level metadata\n\n**Semantic** (`PDFKB_PDF_CHUNKER=semantic`):\n- Advanced semantic chunking using LangChain's [`SemanticChunker`](src/pdfkb/chunker/chunker_semantic.py)\n- Groups semantically related content together using embedding similarity\n- Four breakpoint detection methods: percentile, standard_deviation, interquartile, gradient\n- Preserves context and improves retrieval quality by 40%\n- Install extra: `pip install \"pdfkb-mcp[semantic]\"` to enable\n- Configurable via environment variables (see Semantic Chunking section)\n- Best for documents requiring high context preservation\n\n**Unstructured** (`PDFKB_PDF_CHUNKER=unstructured`):\n- Intelligent semantic chunking with [`unstructured`](src/pdfkb/chunker/chunker_unstructured.py) library\n- Zero configuration required\n- Install extra: `pip install \"pdfkb-mcp[unstructured_chunker]\"` to enable\n- Best for document structure awareness\n\n### First-run notes\n\n- On the first run, the server initializes caches and vector store and logs selected components:\n  - Parser: PyMuPDF4LLM (default)\n  - Chunker: LangChain (default)\n  - Embedding Model: text-embedding-3-large (default)\n- If you select a parser/chunker that isn\u2019t installed, the server logs a warning with the exact install command and falls back to the default components instead of exiting.\n\n### Troubleshooting Guide\n\n**API Key Issues**:\n1. Verify key format starts with `sk-`\n2. Check account has sufficient credits\n3. Test connectivity: `curl -H \"Authorization: Bearer $PDFKB_OPENAI_API_KEY\" https://api.openai.com/v1/models`\n\n**Parser Installation Issues**:\n1. MinerU: `pip install mineru[all]` and verify `mineru --version`\n2. Docling: `pip install docling` for basic, `pip install pdfkb-mcp[docling-complete]` for all features\n3. LLM: Requires `PDFKB_OPENROUTER_API_KEY` environment variable\n\n**Performance Optimization**:\n1. **Speed**: Use `pymupdf4llm` parser (fastest, low memory footprint)\n2. **Memory**: Reduce `PDFKB_EMBEDDING_BATCH_SIZE` and `PDFKB_CHUNK_SIZE`; use pypdfium backend for Docling\n3. **Quality**: Use `mineru` with GPU (>10K tokens/s on RTX 4090) or `marker` for balanced quality\n4. **Tables**: Use `docling` with `PDFKB_DOCLING_TABLE_MODE=ACCURATE` or `marker` with LLM mode\n5. **Batch Processing**: Use `marker` on H100 (~25 pages/s) or `mineru` with sglang acceleration\n\nFor additional support, see implementation details in [`src/pdfkb/main.py`](src/pdfkb/main.py) and [`src/pdfkb/config.py`](src/pdfkb/config.py).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Model Context Protocol server for managing PDF documents with vector search capabilities",
    "version": "0.7.0",
    "project_urls": {
        "Documentation": "https://github.com/juanqui/pdfkb-mcp#readme",
        "Homepage": "https://github.com/juanqui/pdfkb-mcp",
        "Issues": "https://github.com/juanqui/pdfkb-mcp/issues",
        "Repository": "https://github.com/juanqui/pdfkb-mcp"
    },
    "split_keywords": [
        "ai",
        " chroma",
        " embeddings",
        " knowledge-base",
        " mcp",
        " openai",
        " pdf",
        " vector-search"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a1a7435239edb97c1c5f3a3c07f494619b59f07599c50637b808c8c51189c36d",
                "md5": "375a096dd8c614f79f99ab7b8a5e4b64",
                "sha256": "468f58312459586188cdf0d3933ad0d1209be499051092b1b91dcf53bafc6e6d"
            },
            "downloads": -1,
            "filename": "pdfkb_mcp-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "375a096dd8c614f79f99ab7b8a5e4b64",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 248090,
            "upload_time": "2025-09-13T02:20:39",
            "upload_time_iso_8601": "2025-09-13T02:20:39.922100Z",
            "url": "https://files.pythonhosted.org/packages/a1/a7/435239edb97c1c5f3a3c07f494619b59f07599c50637b808c8c51189c36d/pdfkb_mcp-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "43c85aa9cd05dcd4140736bcf5d01736c05ba89176ae743e6b1023c701c73685",
                "md5": "253ee194f2cc6a67f1a8914f8210aade",
                "sha256": "5ac90858457f818a47ab79003e38d3c58354a825af245e1ff18d796b5795809c"
            },
            "downloads": -1,
            "filename": "pdfkb_mcp-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "253ee194f2cc6a67f1a8914f8210aade",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 688311,
            "upload_time": "2025-09-13T02:20:41",
            "upload_time_iso_8601": "2025-09-13T02:20:41.857702Z",
            "url": "https://files.pythonhosted.org/packages/43/c8/5aa9cd05dcd4140736bcf5d01736c05ba89176ae743e6b1023c701c73685/pdfkb_mcp-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-13 02:20:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "juanqui",
    "github_project": "pdfkb-mcp#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pdfkb-mcp"
}

None