# RAG MCP Server
<div align="center">
<h1>RAG MCP Server</h1>
<p>
<a href="https://pypi.org/project/rag-mcp-server/"><img src="https://img.shields.io/pypi/v/rag-mcp-server" alt="PyPI"></a>
<a href="LICENSE"><img src="https://img.shields.io/github/license/yourusername/rag-mcp-server" alt="License"></a>
</p>
</div>
A Model Context Protocol (MCP) server for Retrieval-Augmented Generation (RAG) operations. This server provides tools for building and querying vector-based knowledge bases from document collections, enabling semantic search and document retrieval capabilities.
- [Features](#features)
- [Architecture](#architecture)
- [Installation](#installation)
- [Setup](#setup)
- [Find the MCP settings file for the client](#find-the-mcp-settings-file-for-the-client)
- [Claude Desktop](#claude-desktop)
- [Claude Code](#claude-code)
- [Cursor](#cursor)
- [Cline](#cline)
- [Windsurf](#windsurf)
- [Any other client](#any-other-client)
- [Set up the MCP server](#set-up-the-mcp-server)
- [Variant: Manual setup with uvx](#variant-manual-setup-with-uvx)
- [Usage Examples](#usage-examples)
- [Sample LLM Queries](#sample-llm-queries)
- [Command Line Examples](#command-line-examples)
- [MCP Tools](#mcp-tools)
- [Technical Details](#technical-details)
- [Configuration Examples](#configuration-examples)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
- [License](#license)
## Features
- **Document Processing**: Supports multiple file formats (.txt, .pdf) with automatic text extraction
- **Intelligent Chunking**: Configurable text chunking with overlap to preserve context
- **Vector Embeddings**: Uses SentenceTransformers for high-quality text embeddings
- **Semantic Search**: FAISS-powered similarity search for fast and accurate retrieval
- **Incremental Updates**: Smart document tracking to only process new or changed files
- **Persistent Storage**: SQLite-based document store for metadata and change tracking
- **Flexible Configuration**: Customizable embedding models, chunk sizes, and search parameters
## Architecture
```
rag-mcp-server/
├── src/rag_mcp_server/
│ ├── server.py # Main MCP server implementation
│ └── core/
│ ├── document_processor.py # Document loading and chunking
│ ├── embedding_service.py # Text embedding generation
│ ├── faiss_index.py # Vector similarity search
│ └── document_store.py # Document metadata storage
```
## Installation
### Using uvx (Recommended)
```bash
# Install with uvx (comes with uv)
uvx rag-mcp-server
```
### Using pip
```bash
pip install rag-mcp-server
```
### From source
```bash
git clone <repository-url>
cd rag-mcp-server
pip install -e .
```
## Setup
The easiest way to run the MCP server is with `uvx`, but manual setup is also available.
### Find the MCP settings file for the client
#### Claude Desktop
1. [Install Claude Desktop](https://claude.ai/download) as needed
2. Open the config file by opening the Claude Desktop app, going into its Settings, opening the 'Developer' tab, and clicking the 'Edit Config' button
3. Follow the 'Set up the MCP server' steps below
#### Claude Code
1. Install [Claude Code](https://docs.anthropic.com/en/docs/claude-code/getting-started) as needed
2. Run the following command to add the RAG server:
```bash
claude mcp add rag
```
Or manually add with custom configuration:
```bash
claude mcp add-json rag '{"command":"uvx","args":["rag-mcp-server","--knowledge-base","/path/to/your/docs","--embedding-model","all-MiniLM-L6-v2","--chunk-size","1000","--chunk-overlap","200"]}'
```
#### Cursor
1. [Install Cursor](https://www.cursor.com/downloads) as needed
2. Open the config file by opening Cursor, going into 'Cursor Settings' (not the normal VSCode IDE settings), opening the 'MCP' tab, and clicking the 'Add new global MCP server' button
3. Follow the 'Set up the MCP server' steps below
#### Cline
1. [Install Cline](https://cline.bot/) in your IDE as needed
2. Open the config file by opening your IDE, opening the Cline sidebar, clicking the 'MCP Servers' icon button that is second from left at the top, opening the 'Installed' tab, and clicking the 'Configure MCP Servers' button
3. Follow the 'Set up the MCP server' steps below
#### Windsurf
1. [Install Windsurf](https://windsurf.com/download) as needed
2. Open the config file by opening Windsurf, going into 'Windsurf Settings' (not the normal VSCode IDE settings), opening the 'Cascade' tab, and clicking the 'View raw config' button in the 'Model Context Protocol (MCP) Servers' section
3. Follow the 'Set up the MCP server' steps below
#### Any other client
1. Find the MCP settings file, usually something like `[client]_mcp_config.json`
2. Follow the 'Set up the MCP server' steps below
### Set up the MCP server
1. [Install uv](https://docs.astral.sh/uv/getting-started/installation/) as needed (uvx comes bundled with uv)
2. Add the following to your MCP setup:
**Basic Configuration:**
```json
{
"mcpServers": {
"rag": {
"command": "uvx",
"args": ["rag-mcp-server"]
}
}
}
```
**Full Configuration with All Parameters:**
```json
{
"mcpServers": {
"rag": {
"command": "uvx",
"args": [
"rag-mcp-server",
"--knowledge-base", "/path/to/your/documents",
"--embedding-model", "ibm-granite/granite-embedding-278m-multilingual",
"--chunk-size", "500",
"--chunk-overlap", "200",
"--top-k", "7",
"--verbose"
]
}
}
}
```
### Variant: Manual setup with uvx
If you prefer to run the server manually or need specific Python version:
```bash
# Run with default settings
uvx rag-mcp-server
# Run with all parameters specified
uvx rag-mcp-server \
--knowledge-base /path/to/documents \
--embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
--chunk-size 500 \
--chunk-overlap 200 \
--top-k 7 \
--verbose
# Run from source directory
uvx --from . rag-mcp-server \
--knowledge-base /home/user/documents \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 800 \
--chunk-overlap 100 \
--top-k 5
```
## Usage Examples
### Sample LLM Queries
Here are example queries you can use with your LLM to interact with the RAG server:
**Initialize a knowledge base with custom parameters:**
```
Initialize the knowledge base with:
- knowledge_base_path: "/home/user/research_papers"
- embedding_model: "ibm-granite/granite-embedding-278m-multilingual"
- chunk_size: 300
- chunk_overlap: 50
```
**Search with specific parameters:**
```
Search for "machine learning optimization techniques" in the knowledge base at "/home/user/research_papers" and return the top 10 results with similarity scores.
```
**Initialize with high-quality embeddings:**
```
Set up a knowledge base at "/data/technical_docs" using the "all-mpnet-base-v2" model with chunk_size of 1000 and chunk_overlap of 400 for better context preservation.
```
**Refresh and get statistics:**
```
Refresh the knowledge base at "/home/user/documents" to include any new files, then show me the statistics including total documents, chunks, and current configuration.
```
**List and search documents:**
```
List all documents in the knowledge base, then search for information about "API authentication" and show me the top 5 most relevant chunks.
```
**Complex workflow example:**
```
1. Initialize a knowledge base at "/home/user/project_docs" with embedding_model "all-MiniLM-L6-v2", chunk_size 800, and chunk_overlap 150
2. Show me the statistics
3. Search for "database optimization strategies"
4. List all documents that were processed
```
**Multilingual search example:**
```
Initialize the knowledge base at "/docs/international" using the multilingual model "ibm-granite/granite-embedding-278m-multilingual", then search for "machine learning" in multiple languages and show the top 7 results.
```
### Command Line Examples
**High-Quality Configuration for Research:**
```bash
uvx rag-mcp-server \
--knowledge-base /home/tommasomariaungetti/RAG \
--embedding-model "all-mpnet-base-v2" \
--chunk-size 1000 \
--chunk-overlap 400 \
--top-k 10 \
--verbose
```
**Fast Processing for Large Document Sets:**
```bash
uvx rag-mcp-server \
--knowledge-base /data/large_corpus \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 2000 \
--chunk-overlap 100 \
--top-k 5
```
**Multilingual Document Processing:**
```bash
uvx rag-mcp-server \
--knowledge-base /docs/multilingual \
--embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
--chunk-size 500 \
--chunk-overlap 200 \
--top-k 7
```
**Running from Source with Custom Settings:**
```bash
uvx --from . rag-mcp-server \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 800 \
--chunk-overlap 100 \
--top-k 5 \
--knowledge-base /home/tommasomariaungetti/RAG
```
## MCP Tools
The following tools are available:
### 1. initialize_knowledge_base
Initialize a knowledge base from a directory of documents.
**Parameters:**
- `knowledge_base_path` (optional): Path to document directory - defaults to server config
- `embedding_model` (optional): Model name for embeddings - defaults to "ibm-granite/granite-embedding-278m-multilingual"
- `chunk_size` (optional): Maximum chunk size in characters - defaults to 500
- `chunk_overlap` (optional): Chunk overlap size in characters - defaults to 200
**Example Tool Call:**
```json
{
"tool": "initialize_knowledge_base",
"arguments": {
"knowledge_base_path": "/path/to/docs",
"embedding_model": "all-mpnet-base-v2",
"chunk_size": 1000,
"chunk_overlap": 200
}
}
```
**Example LLM Query:**
> "Initialize a knowledge base from /home/user/documents using the all-mpnet-base-v2 embedding model with 1000 character chunks and 200 character overlap"
### 2. semantic_search
Perform semantic search on the knowledge base.
**Parameters:**
- `query`: Search query text
- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB
- `top_k` (optional): Number of results to return - defaults to 7
- `include_scores` (optional): Include similarity scores - defaults to false
**Example Tool Call:**
```json
{
"tool": "semantic_search",
"arguments": {
"query": "How to implement RAG systems?",
"knowledge_base_path": "/path/to/docs",
"top_k": 5,
"include_scores": true
}
}
```
**Example LLM Query:**
> "Search for 'machine learning optimization techniques' and show me the top 5 results with similarity scores"
### 3. refresh_knowledge_base
Update the knowledge base with new or changed documents.
**Parameters:**
- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB
**Example Tool Call:**
```json
{
"tool": "refresh_knowledge_base",
"arguments": {
"knowledge_base_path": "/path/to/docs"
}
}
```
**Example LLM Query:**
> "Refresh the knowledge base to include any new or modified documents"
### 4. get_knowledge_base_stats
Get detailed statistics about the knowledge base.
**Parameters:**
- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB
**Example Tool Call:**
```json
{
"tool": "get_knowledge_base_stats",
"arguments": {
"knowledge_base_path": "/path/to/docs"
}
}
```
**Example LLM Query:**
> "Show me the statistics for the knowledge base including document count, chunk information, and current configuration"
### 5. list_documents
List all documents in the knowledge base with metadata.
**Parameters:**
- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB
**Example Tool Call:**
```json
{
"tool": "list_documents",
"arguments": {
"knowledge_base_path": "/path/to/docs"
}
}
```
**Example LLM Query:**
> "List all documents in the knowledge base with their chunk counts and metadata"
## Technical Details
### Document Processing
The system uses a sophisticated document processing pipeline:
1. **File Detection**: Scans directories for supported file types
2. **Content Extraction**:
- Plain text files: Direct UTF-8/Latin-1 reading
- PDF files: PyMuPDF-based text extraction
3. **Text Chunking**:
- Splits documents into manageable chunks
- Preserves word boundaries
- Maintains context with configurable overlap
### Embedding Generation
- **Default Model**: `ibm-granite/granite-embedding-278m-multilingual`
- **Batch Processing**: Efficient batch encoding for large document sets
- **Fallback Support**: Automatic fallback to `all-MiniLM-L6-v2` if primary model fails
- **Progress Tracking**: Visual progress bars for large operations
### Vector Search
- **Index Type**: FAISS IndexFlatIP (Inner Product)
- **Similarity Metric**: Cosine similarity (via L2 normalization)
- **Performance**: Scales to millions of documents
- **Accuracy**: Exact nearest neighbor search
### Document Store
- **Storage**: SQLite database
- **Tracking**: File hash, modification time, chunk count
- **Incremental Updates**: Only processes changed files
- **Location**: Stored alongside knowledge base documents
## Configuration Examples
### MCP Client Configurations
**Basic Configuration (Claude Desktop/Cursor/Cline):**
```json
{
"mcpServers": {
"rag": {
"command": "uvx",
"args": ["rag-mcp-server"]
}
}
}
```
**Full Configuration with All Parameters:**
```json
{
"mcpServers": {
"rag": {
"command": "uvx",
"args": [
"rag-mcp-server",
"--knowledge-base", "/path/to/documents",
"--embedding-model", "ibm-granite/granite-embedding-278m-multilingual",
"--chunk-size", "500",
"--chunk-overlap", "200",
"--top-k", "7",
"--verbose"
]
}
}
}
```
**Multiple Knowledge Base Configuration:**
```json
{
"mcpServers": {
"rag-technical": {
"command": "uvx",
"args": [
"rag-mcp-server",
"--knowledge-base", "/docs/technical",
"--embedding-model", "all-mpnet-base-v2",
"--chunk-size", "1000",
"--chunk-overlap", "400"
]
},
"rag-research": {
"command": "uvx",
"args": [
"rag-mcp-server",
"--knowledge-base", "/docs/research",
"--embedding-model", "all-MiniLM-L6-v2",
"--chunk-size", "500",
"--chunk-overlap", "100",
"--port", "8001"
]
}
}
}
```
### Command Line Examples
**High-Quality Configuration for Research:**
```bash
uvx rag-mcp-server \
--knowledge-base /path/to/research/docs \
--embedding-model "all-mpnet-base-v2" \
--chunk-size 1000 \
--chunk-overlap 400 \
--top-k 10
```
**Fast Processing Configuration:**
```bash
uvx rag-mcp-server \
--knowledge-base /path/to/large/corpus \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 2000 \
--chunk-overlap 100 \
--top-k 5
```
**Multilingual Configuration:**
```bash
uvx rag-mcp-server \
--knowledge-base /path/to/multilingual/docs \
--embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
--chunk-size 500 \
--chunk-overlap 200 \
--top-k 7
```
**Development Configuration with Verbose Logging:**
```bash
uvx --from . rag-mcp-server \
--knowledge-base ./test_documents \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 300 \
--chunk-overlap 50 \
--top-k 3 \
--verbose
```
## Error Handling
The server implements comprehensive error handling:
- **File Access Errors**: Graceful handling of permission issues
- **Encoding Errors**: Automatic encoding detection and fallback
- **Model Loading Errors**: Fallback to default models
- **Database Errors**: Transaction rollback and recovery
- **Search Errors**: Informative error messages
## Performance Considerations
### Memory Usage
- Embeddings are stored in memory for fast search
- Approximate memory: `num_chunks × embedding_dimension × 4 bytes`
- Example: 10,000 chunks × 384 dimensions ≈ 15 MB
### Processing Speed
- Document processing: ~100-500 docs/minute (depending on size)
- Embedding generation: ~50-200 chunks/second (model dependent)
- Search latency: <10ms for 100K documents
### Optimization Tips
1. Use smaller embedding models for faster processing
2. Increase chunk size for fewer chunks (may reduce accuracy)
3. Decrease overlap for faster processing (may lose context)
4. Use SSD storage for document store database
## Development
### Running Tests
```bash
pytest tests/
```
### Code Formatting
```bash
black src/
isort src/
```
### Type Checking
```bash
mypy src/
```
## Troubleshooting
### Common Issues
1. **"No knowledge base path provided"**
- Solution: Either provide path in tool call or use `--knowledge-base` flag
2. **"Model mismatch detected"**
- Solution: This is a warning; the system will use the closest available model
3. **"Failed to initialize embedding model"**
- Solution: Check internet connection or use a locally cached model
4. **"No documents found in knowledge base"**
- Solution: Ensure directory contains .txt or .pdf files
### Debug Mode
Enable verbose logging for troubleshooting:
```bash
uvx rag-mcp-server --verbose
```
## Help and Resources
- [GitHub Repository](https://github.com/yourusername/rag-mcp-server)
- [PyPI Package](https://pypi.org/project/rag-mcp-server/)
- [MCP Documentation](https://github.com/modelcontextprotocol/mcp)
- [Issue Tracker](https://github.com/yourusername/rag-mcp-server/issues)
- Email: [your-email@example.com](mailto:your-email@example.com)
## Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request
## License
MIT License - see LICENSE file for details.
## Acknowledgments
- Built on [MCP (Model Context Protocol)](https://github.com/anthropics/mcp)
- Powered by [Sentence Transformers](https://www.sbert.net/)
- Vector search by [FAISS](https://github.com/facebookresearch/faiss)
- PDF processing by [PyMuPDF](https://pymupdf.readthedocs.io/)
Raw data
{
"_id": null,
"home_page": null,
"name": "rag-mcp-server",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Tommaso Maria Ungetti <tommaso.ungetti@outlook.com>",
"keywords": "mcp, rag, retrieval, embedding, semantic-search, ai, llm",
"author": null,
"author_email": "Tommaso Maria Ungetti <tommaso.ungetti@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/fb/bd/1c6baae103b8e40e74cbc7db858e4655578e67d89e72814f40f3d2a37193/rag_mcp_server-0.1.0.tar.gz",
"platform": null,
"description": "# RAG MCP Server\n\n<div align=\"center\">\n <h1>RAG MCP Server</h1>\n <p>\n <a href=\"https://pypi.org/project/rag-mcp-server/\"><img src=\"https://img.shields.io/pypi/v/rag-mcp-server\" alt=\"PyPI\"></a>\n <a href=\"LICENSE\"><img src=\"https://img.shields.io/github/license/yourusername/rag-mcp-server\" alt=\"License\"></a>\n </p>\n</div>\n\nA Model Context Protocol (MCP) server for Retrieval-Augmented Generation (RAG) operations. This server provides tools for building and querying vector-based knowledge bases from document collections, enabling semantic search and document retrieval capabilities.\n\n- [Features](#features)\n- [Architecture](#architecture)\n- [Installation](#installation)\n- [Setup](#setup)\n - [Find the MCP settings file for the client](#find-the-mcp-settings-file-for-the-client)\n - [Claude Desktop](#claude-desktop)\n - [Claude Code](#claude-code)\n - [Cursor](#cursor)\n - [Cline](#cline)\n - [Windsurf](#windsurf)\n - [Any other client](#any-other-client)\n - [Set up the MCP server](#set-up-the-mcp-server)\n - [Variant: Manual setup with uvx](#variant-manual-setup-with-uvx)\n- [Usage Examples](#usage-examples)\n - [Sample LLM Queries](#sample-llm-queries)\n - [Command Line Examples](#command-line-examples)\n- [MCP Tools](#mcp-tools)\n- [Technical Details](#technical-details)\n- [Configuration Examples](#configuration-examples)\n- [Troubleshooting](#troubleshooting)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Features\n\n- **Document Processing**: Supports multiple file formats (.txt, .pdf) with automatic text extraction\n- **Intelligent Chunking**: Configurable text chunking with overlap to preserve context\n- **Vector Embeddings**: Uses SentenceTransformers for high-quality text embeddings\n- **Semantic Search**: FAISS-powered similarity search for fast and accurate retrieval\n- **Incremental Updates**: Smart document tracking to only process new or changed files\n- **Persistent Storage**: SQLite-based document store for metadata and change tracking\n- **Flexible Configuration**: Customizable embedding models, chunk sizes, and search parameters\n\n## Architecture\n\n```\nrag-mcp-server/\n\u251c\u2500\u2500 src/rag_mcp_server/\n\u2502 \u251c\u2500\u2500 server.py # Main MCP server implementation\n\u2502 \u2514\u2500\u2500 core/\n\u2502 \u251c\u2500\u2500 document_processor.py # Document loading and chunking\n\u2502 \u251c\u2500\u2500 embedding_service.py # Text embedding generation\n\u2502 \u251c\u2500\u2500 faiss_index.py # Vector similarity search\n\u2502 \u2514\u2500\u2500 document_store.py # Document metadata storage\n```\n\n## Installation\n\n### Using uvx (Recommended)\n\n```bash\n# Install with uvx (comes with uv)\nuvx rag-mcp-server\n```\n\n### Using pip\n\n```bash\npip install rag-mcp-server\n```\n\n### From source\n\n```bash\ngit clone <repository-url>\ncd rag-mcp-server\npip install -e .\n```\n\n## Setup\n\nThe easiest way to run the MCP server is with `uvx`, but manual setup is also available.\n\n### Find the MCP settings file for the client\n\n#### Claude Desktop\n\n1. [Install Claude Desktop](https://claude.ai/download) as needed\n2. Open the config file by opening the Claude Desktop app, going into its Settings, opening the 'Developer' tab, and clicking the 'Edit Config' button\n3. Follow the 'Set up the MCP server' steps below\n\n#### Claude Code\n\n1. Install [Claude Code](https://docs.anthropic.com/en/docs/claude-code/getting-started) as needed\n2. Run the following command to add the RAG server:\n\n ```bash\n claude mcp add rag\n ```\n\n Or manually add with custom configuration:\n\n ```bash\n claude mcp add-json rag '{\"command\":\"uvx\",\"args\":[\"rag-mcp-server\",\"--knowledge-base\",\"/path/to/your/docs\",\"--embedding-model\",\"all-MiniLM-L6-v2\",\"--chunk-size\",\"1000\",\"--chunk-overlap\",\"200\"]}'\n ```\n\n#### Cursor\n\n1. [Install Cursor](https://www.cursor.com/downloads) as needed\n2. Open the config file by opening Cursor, going into 'Cursor Settings' (not the normal VSCode IDE settings), opening the 'MCP' tab, and clicking the 'Add new global MCP server' button\n3. Follow the 'Set up the MCP server' steps below\n\n#### Cline\n\n1. [Install Cline](https://cline.bot/) in your IDE as needed\n2. Open the config file by opening your IDE, opening the Cline sidebar, clicking the 'MCP Servers' icon button that is second from left at the top, opening the 'Installed' tab, and clicking the 'Configure MCP Servers' button\n3. Follow the 'Set up the MCP server' steps below\n\n#### Windsurf\n\n1. [Install Windsurf](https://windsurf.com/download) as needed\n2. Open the config file by opening Windsurf, going into 'Windsurf Settings' (not the normal VSCode IDE settings), opening the 'Cascade' tab, and clicking the 'View raw config' button in the 'Model Context Protocol (MCP) Servers' section\n3. Follow the 'Set up the MCP server' steps below\n\n#### Any other client\n\n1. Find the MCP settings file, usually something like `[client]_mcp_config.json`\n2. Follow the 'Set up the MCP server' steps below\n\n### Set up the MCP server\n\n1. [Install uv](https://docs.astral.sh/uv/getting-started/installation/) as needed (uvx comes bundled with uv)\n2. Add the following to your MCP setup:\n\n **Basic Configuration:**\n ```json\n {\n \"mcpServers\": {\n \"rag\": {\n \"command\": \"uvx\",\n \"args\": [\"rag-mcp-server\"]\n }\n }\n }\n ```\n\n **Full Configuration with All Parameters:**\n ```json\n {\n \"mcpServers\": {\n \"rag\": {\n \"command\": \"uvx\",\n \"args\": [\n \"rag-mcp-server\",\n \"--knowledge-base\", \"/path/to/your/documents\",\n \"--embedding-model\", \"ibm-granite/granite-embedding-278m-multilingual\",\n \"--chunk-size\", \"500\",\n \"--chunk-overlap\", \"200\",\n \"--top-k\", \"7\",\n \"--verbose\"\n ]\n }\n }\n }\n ```\n\n### Variant: Manual setup with uvx\n\nIf you prefer to run the server manually or need specific Python version:\n\n```bash\n# Run with default settings\nuvx rag-mcp-server\n\n# Run with all parameters specified\nuvx rag-mcp-server \\\n --knowledge-base /path/to/documents \\\n --embedding-model \"ibm-granite/granite-embedding-278m-multilingual\" \\\n --chunk-size 500 \\\n --chunk-overlap 200 \\\n --top-k 7 \\\n --verbose\n\n# Run from source directory\nuvx --from . rag-mcp-server \\\n --knowledge-base /home/user/documents \\\n --embedding-model \"all-MiniLM-L6-v2\" \\\n --chunk-size 800 \\\n --chunk-overlap 100 \\\n --top-k 5\n```\n\n## Usage Examples\n\n### Sample LLM Queries\n\nHere are example queries you can use with your LLM to interact with the RAG server:\n\n**Initialize a knowledge base with custom parameters:**\n```\nInitialize the knowledge base with:\n- knowledge_base_path: \"/home/user/research_papers\"\n- embedding_model: \"ibm-granite/granite-embedding-278m-multilingual\"\n- chunk_size: 300\n- chunk_overlap: 50\n```\n\n**Search with specific parameters:**\n```\nSearch for \"machine learning optimization techniques\" in the knowledge base at \"/home/user/research_papers\" and return the top 10 results with similarity scores.\n```\n\n**Initialize with high-quality embeddings:**\n```\nSet up a knowledge base at \"/data/technical_docs\" using the \"all-mpnet-base-v2\" model with chunk_size of 1000 and chunk_overlap of 400 for better context preservation.\n```\n\n**Refresh and get statistics:**\n```\nRefresh the knowledge base at \"/home/user/documents\" to include any new files, then show me the statistics including total documents, chunks, and current configuration.\n```\n\n**List and search documents:**\n```\nList all documents in the knowledge base, then search for information about \"API authentication\" and show me the top 5 most relevant chunks.\n```\n\n**Complex workflow example:**\n```\n1. Initialize a knowledge base at \"/home/user/project_docs\" with embedding_model \"all-MiniLM-L6-v2\", chunk_size 800, and chunk_overlap 150\n2. Show me the statistics\n3. Search for \"database optimization strategies\"\n4. List all documents that were processed\n```\n\n**Multilingual search example:**\n```\nInitialize the knowledge base at \"/docs/international\" using the multilingual model \"ibm-granite/granite-embedding-278m-multilingual\", then search for \"machine learning\" in multiple languages and show the top 7 results.\n```\n\n### Command Line Examples\n\n**High-Quality Configuration for Research:**\n```bash\nuvx rag-mcp-server \\\n --knowledge-base /home/tommasomariaungetti/RAG \\\n --embedding-model \"all-mpnet-base-v2\" \\\n --chunk-size 1000 \\\n --chunk-overlap 400 \\\n --top-k 10 \\\n --verbose\n```\n\n**Fast Processing for Large Document Sets:**\n```bash\nuvx rag-mcp-server \\\n --knowledge-base /data/large_corpus \\\n --embedding-model \"all-MiniLM-L6-v2\" \\\n --chunk-size 2000 \\\n --chunk-overlap 100 \\\n --top-k 5\n```\n\n**Multilingual Document Processing:**\n```bash\nuvx rag-mcp-server \\\n --knowledge-base /docs/multilingual \\\n --embedding-model \"ibm-granite/granite-embedding-278m-multilingual\" \\\n --chunk-size 500 \\\n --chunk-overlap 200 \\\n --top-k 7\n```\n\n**Running from Source with Custom Settings:**\n```bash\nuvx --from . rag-mcp-server \\\n --embedding-model \"all-MiniLM-L6-v2\" \\\n --chunk-size 800 \\\n --chunk-overlap 100 \\\n --top-k 5 \\\n --knowledge-base /home/tommasomariaungetti/RAG\n```\n\n## MCP Tools\n\nThe following tools are available:\n\n### 1. initialize_knowledge_base\n\nInitialize a knowledge base from a directory of documents.\n\n**Parameters:**\n- `knowledge_base_path` (optional): Path to document directory - defaults to server config\n- `embedding_model` (optional): Model name for embeddings - defaults to \"ibm-granite/granite-embedding-278m-multilingual\"\n- `chunk_size` (optional): Maximum chunk size in characters - defaults to 500\n- `chunk_overlap` (optional): Chunk overlap size in characters - defaults to 200\n\n**Example Tool Call:**\n```json\n{\n \"tool\": \"initialize_knowledge_base\",\n \"arguments\": {\n \"knowledge_base_path\": \"/path/to/docs\",\n \"embedding_model\": \"all-mpnet-base-v2\",\n \"chunk_size\": 1000,\n \"chunk_overlap\": 200\n }\n}\n```\n\n**Example LLM Query:**\n> \"Initialize a knowledge base from /home/user/documents using the all-mpnet-base-v2 embedding model with 1000 character chunks and 200 character overlap\"\n\n### 2. semantic_search\n\nPerform semantic search on the knowledge base.\n\n**Parameters:**\n- `query`: Search query text\n- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB\n- `top_k` (optional): Number of results to return - defaults to 7\n- `include_scores` (optional): Include similarity scores - defaults to false\n\n**Example Tool Call:**\n```json\n{\n \"tool\": \"semantic_search\",\n \"arguments\": {\n \"query\": \"How to implement RAG systems?\",\n \"knowledge_base_path\": \"/path/to/docs\",\n \"top_k\": 5,\n \"include_scores\": true\n }\n}\n```\n\n**Example LLM Query:**\n> \"Search for 'machine learning optimization techniques' and show me the top 5 results with similarity scores\"\n\n### 3. refresh_knowledge_base\n\nUpdate the knowledge base with new or changed documents.\n\n**Parameters:**\n- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB\n\n**Example Tool Call:**\n```json\n{\n \"tool\": \"refresh_knowledge_base\",\n \"arguments\": {\n \"knowledge_base_path\": \"/path/to/docs\"\n }\n}\n```\n\n**Example LLM Query:**\n> \"Refresh the knowledge base to include any new or modified documents\"\n\n### 4. get_knowledge_base_stats\n\nGet detailed statistics about the knowledge base.\n\n**Parameters:**\n- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB\n\n**Example Tool Call:**\n```json\n{\n \"tool\": \"get_knowledge_base_stats\",\n \"arguments\": {\n \"knowledge_base_path\": \"/path/to/docs\"\n }\n}\n```\n\n**Example LLM Query:**\n> \"Show me the statistics for the knowledge base including document count, chunk information, and current configuration\"\n\n### 5. list_documents\n\nList all documents in the knowledge base with metadata.\n\n**Parameters:**\n- `knowledge_base_path` (optional): Path to knowledge base - defaults to current KB\n\n**Example Tool Call:**\n```json\n{\n \"tool\": \"list_documents\",\n \"arguments\": {\n \"knowledge_base_path\": \"/path/to/docs\"\n }\n}\n```\n\n**Example LLM Query:**\n> \"List all documents in the knowledge base with their chunk counts and metadata\"\n\n## Technical Details\n\n### Document Processing\n\nThe system uses a sophisticated document processing pipeline:\n\n1. **File Detection**: Scans directories for supported file types\n2. **Content Extraction**: \n - Plain text files: Direct UTF-8/Latin-1 reading\n - PDF files: PyMuPDF-based text extraction\n3. **Text Chunking**: \n - Splits documents into manageable chunks\n - Preserves word boundaries\n - Maintains context with configurable overlap\n\n### Embedding Generation\n\n- **Default Model**: `ibm-granite/granite-embedding-278m-multilingual`\n- **Batch Processing**: Efficient batch encoding for large document sets\n- **Fallback Support**: Automatic fallback to `all-MiniLM-L6-v2` if primary model fails\n- **Progress Tracking**: Visual progress bars for large operations\n\n### Vector Search\n\n- **Index Type**: FAISS IndexFlatIP (Inner Product)\n- **Similarity Metric**: Cosine similarity (via L2 normalization)\n- **Performance**: Scales to millions of documents\n- **Accuracy**: Exact nearest neighbor search\n\n### Document Store\n\n- **Storage**: SQLite database\n- **Tracking**: File hash, modification time, chunk count\n- **Incremental Updates**: Only processes changed files\n- **Location**: Stored alongside knowledge base documents\n\n## Configuration Examples\n\n### MCP Client Configurations\n\n**Basic Configuration (Claude Desktop/Cursor/Cline):**\n```json\n{\n \"mcpServers\": {\n \"rag\": {\n \"command\": \"uvx\",\n \"args\": [\"rag-mcp-server\"]\n }\n }\n}\n```\n\n**Full Configuration with All Parameters:**\n```json\n{\n \"mcpServers\": {\n \"rag\": {\n \"command\": \"uvx\",\n \"args\": [\n \"rag-mcp-server\",\n \"--knowledge-base\", \"/path/to/documents\",\n \"--embedding-model\", \"ibm-granite/granite-embedding-278m-multilingual\",\n \"--chunk-size\", \"500\",\n \"--chunk-overlap\", \"200\",\n \"--top-k\", \"7\",\n \"--verbose\"\n ]\n }\n }\n}\n```\n\n**Multiple Knowledge Base Configuration:**\n```json\n{\n \"mcpServers\": {\n \"rag-technical\": {\n \"command\": \"uvx\",\n \"args\": [\n \"rag-mcp-server\",\n \"--knowledge-base\", \"/docs/technical\",\n \"--embedding-model\", \"all-mpnet-base-v2\",\n \"--chunk-size\", \"1000\",\n \"--chunk-overlap\", \"400\"\n ]\n },\n \"rag-research\": {\n \"command\": \"uvx\",\n \"args\": [\n \"rag-mcp-server\",\n \"--knowledge-base\", \"/docs/research\",\n \"--embedding-model\", \"all-MiniLM-L6-v2\",\n \"--chunk-size\", \"500\",\n \"--chunk-overlap\", \"100\",\n \"--port\", \"8001\"\n ]\n }\n }\n}\n```\n\n### Command Line Examples\n\n**High-Quality Configuration for Research:**\n```bash\nuvx rag-mcp-server \\\n --knowledge-base /path/to/research/docs \\\n --embedding-model \"all-mpnet-base-v2\" \\\n --chunk-size 1000 \\\n --chunk-overlap 400 \\\n --top-k 10\n```\n\n**Fast Processing Configuration:**\n```bash\nuvx rag-mcp-server \\\n --knowledge-base /path/to/large/corpus \\\n --embedding-model \"all-MiniLM-L6-v2\" \\\n --chunk-size 2000 \\\n --chunk-overlap 100 \\\n --top-k 5\n```\n\n**Multilingual Configuration:**\n```bash\nuvx rag-mcp-server \\\n --knowledge-base /path/to/multilingual/docs \\\n --embedding-model \"ibm-granite/granite-embedding-278m-multilingual\" \\\n --chunk-size 500 \\\n --chunk-overlap 200 \\\n --top-k 7\n```\n\n**Development Configuration with Verbose Logging:**\n```bash\nuvx --from . rag-mcp-server \\\n --knowledge-base ./test_documents \\\n --embedding-model \"all-MiniLM-L6-v2\" \\\n --chunk-size 300 \\\n --chunk-overlap 50 \\\n --top-k 3 \\\n --verbose\n```\n\n## Error Handling\n\nThe server implements comprehensive error handling:\n\n- **File Access Errors**: Graceful handling of permission issues\n- **Encoding Errors**: Automatic encoding detection and fallback\n- **Model Loading Errors**: Fallback to default models\n- **Database Errors**: Transaction rollback and recovery\n- **Search Errors**: Informative error messages\n\n## Performance Considerations\n\n### Memory Usage\n- Embeddings are stored in memory for fast search\n- Approximate memory: `num_chunks \u00d7 embedding_dimension \u00d7 4 bytes`\n- Example: 10,000 chunks \u00d7 384 dimensions \u2248 15 MB\n\n### Processing Speed\n- Document processing: ~100-500 docs/minute (depending on size)\n- Embedding generation: ~50-200 chunks/second (model dependent)\n- Search latency: <10ms for 100K documents\n\n### Optimization Tips\n1. Use smaller embedding models for faster processing\n2. Increase chunk size for fewer chunks (may reduce accuracy)\n3. Decrease overlap for faster processing (may lose context)\n4. Use SSD storage for document store database\n\n## Development\n\n### Running Tests\n```bash\npytest tests/\n```\n\n### Code Formatting\n```bash\nblack src/\nisort src/\n```\n\n### Type Checking\n```bash\nmypy src/\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **\"No knowledge base path provided\"**\n - Solution: Either provide path in tool call or use `--knowledge-base` flag\n\n2. **\"Model mismatch detected\"**\n - Solution: This is a warning; the system will use the closest available model\n\n3. **\"Failed to initialize embedding model\"**\n - Solution: Check internet connection or use a locally cached model\n\n4. **\"No documents found in knowledge base\"**\n - Solution: Ensure directory contains .txt or .pdf files\n\n### Debug Mode\n\nEnable verbose logging for troubleshooting:\n```bash\nuvx rag-mcp-server --verbose\n```\n\n## Help and Resources\n\n- [GitHub Repository](https://github.com/yourusername/rag-mcp-server)\n- [PyPI Package](https://pypi.org/project/rag-mcp-server/)\n- [MCP Documentation](https://github.com/modelcontextprotocol/mcp)\n- [Issue Tracker](https://github.com/yourusername/rag-mcp-server/issues)\n- Email: [your-email@example.com](mailto:your-email@example.com)\n\n## Contributing\n\nContributions are welcome! Please:\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new functionality\n4. Ensure all tests pass\n5. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Acknowledgments\n\n- Built on [MCP (Model Context Protocol)](https://github.com/anthropics/mcp)\n- Powered by [Sentence Transformers](https://www.sbert.net/)\n- Vector search by [FAISS](https://github.com/facebookresearch/faiss)\n- PDF processing by [PyMuPDF](https://pymupdf.readthedocs.io/)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "MCP server for RAG (Retrieval-Augmented Generation) operations with local document indexing",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/tungetti/rag-mcp-server/issues",
"Documentation": "https://github.com/tungetti/rag-mcp-server#readme",
"Homepage": "https://github.com/tungetti/rag-mcp-server",
"Repository": "https://github.com/tungetti/rag-mcp-server"
},
"split_keywords": [
"mcp",
" rag",
" retrieval",
" embedding",
" semantic-search",
" ai",
" llm"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "991a321e7fc878da7e8f35ae4aced3fdcde0ba3881065d6745861660b4bad5df",
"md5": "e5415306b874f1660bded46f9db1951e",
"sha256": "0267f6764bc4c9a434737d791dc8bae39e6c0f7606e14710a89ca6a1763ce342"
},
"downloads": -1,
"filename": "rag_mcp_server-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e5415306b874f1660bded46f9db1951e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 21458,
"upload_time": "2025-06-22T21:37:59",
"upload_time_iso_8601": "2025-06-22T21:37:59.452848Z",
"url": "https://files.pythonhosted.org/packages/99/1a/321e7fc878da7e8f35ae4aced3fdcde0ba3881065d6745861660b4bad5df/rag_mcp_server-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "fbbd1c6baae103b8e40e74cbc7db858e4655578e67d89e72814f40f3d2a37193",
"md5": "d4a1acef0ba2a3968e07ea857abaf2f6",
"sha256": "ac2e40214b83b1d60758fea48265f9f64d3c89d1d68102d8661d02d793a81ab4"
},
"downloads": -1,
"filename": "rag_mcp_server-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "d4a1acef0ba2a3968e07ea857abaf2f6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 25121,
"upload_time": "2025-06-22T21:38:01",
"upload_time_iso_8601": "2025-06-22T21:38:01.298037Z",
"url": "https://files.pythonhosted.org/packages/fb/bd/1c6baae103b8e40e74cbc7db858e4655578e67d89e72814f40f3d2a37193/rag_mcp_server-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-06-22 21:38:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tungetti",
"github_project": "rag-mcp-server",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "rag-mcp-server"
}