# Thoth
MCP server providing persistent codebase memory with semantic search for AI assistants.
<p align="center">
<a href="https://pypi.org/project/mcp-server-thoth/">
<img src="https://img.shields.io/pypi/v/mcp-server-thoth.svg" alt="PyPI">
</a>
<a href="https://github.com/braininahat/thoth/blob/main/LICENSE">
<img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License">
</a>
<a href="https://pypi.org/project/mcp-server-thoth/">
<img src="https://img.shields.io/pypi/pyversions/mcp-server-thoth.svg" alt="Python Versions">
</a>
</p>
## Overview
Thoth indexes code repositories using AST parsing and provides tools for symbol lookup, cross-repository navigation, and architecture visualization. With v0.2.0, semantic search was added using local embeddings. v0.3.0 introduced **development memory** to track and learn from all coding attempts. v0.4.0 brings **architectural separation** for instant MCP server startup.
The index persists in `~/.thoth/`, giving Claude and other MCP-compatible assistants memory across conversations.
## Features
- 🚀 **Instant Startup**: MCP server starts in <1 second with separated embedding service (v0.4.0)
- 🔍 **Semantic Search**: Find code using natural language queries with local embeddings
- 🧠 **Persistent Memory**: Code understanding persists between conversations
- 📝 **Development Memory**: Track all coding attempts and learn from failures
- 🔗 **Cross-Repository**: Navigate dependencies across multiple related repositories
- 📊 **Visualizations**: Generate architecture diagrams and dependency graphs
- ⚡ **Fast Indexing**: AST-based parsing with incremental updates
- 🎯 **Precise Navigation**: Jump to exact definitions, find all callers
- 🔧 **Local-First**: All processing happens locally, no cloud dependencies
## Installation
### Requirements
- Python 3.10-3.12 (Python 3.13 not yet supported due to some dependencies)
- For semantic search: ~500MB disk space for embedding model
### Quick Start
```bash
# Build and install Thoth
uv build
# Initialize (sets up database and starts embedding server)
uv run thoth-cli init
# Source environment variables
source ~/.thoth/env
# Index your first repository
uv run thoth-cli index myproject /path/to/repo
# Add to Claude Desktop
claude mcp add thoth -s user -- uvx --python 3.12 mcp-server-thoth
```
That's it! The `init` command automatically:
- Creates the database
- Starts the Text Embeddings Inference (TEI) server for high-quality semantic search
- Sets up environment variables
- Verifies the installation
### Architecture
Thoth uses a microservices architecture for optimal performance:
- **MCP Server**: Lightweight, starts in <1 second (was 30+ seconds)
- **TEI Server**: Handles embeddings (Qwen3-Embedding-0.6B model)
- **ChromaDB Server**: Vector storage as a dedicated service
### Manual Setup (Advanced)
If you prefer to manage services manually:
```bash
# Initialize without starting services
uv run thoth-cli init --no-start-services
# Start TEI server manually
./scripts/run_tei_server.sh
# Set environment variables
export THOTH_EMBEDDING_SERVER_URL=http://localhost:8765
# Check status
uv run thoth-cli status
```
### First-Time Setup
Before using Thoth with Claude, run the initialization:
```bash
thoth-cli init
```
This will:
- ✅ Set up the database
- ✅ Create necessary directories
- ✅ Verify the installation
### Claude Desktop
Add to your configuration file:
- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
- Linux: `~/.config/claude/claude_desktop_config.json`
#### Configuration:
```json
{
"mcpServers": {
"thoth": {
"command": "uvx",
"args": ["--python", "3.12", "mcp-server-thoth"]
}
}
}
```
To index repositories, either:
1. Use the CLI: `thoth-cli index myrepo /path/to/repo`
2. Use the `index_repository` tool from within Claude
### Command Line
```bash
# Install globally
uv tool install --python 3.12 mcp-server-thoth
# Initialize Thoth (first time only)
thoth-cli init
# Index a repository
thoth-cli index myproject /path/to/repo
# Search symbols
thoth-cli search "database connection"
# List indexed repositories
thoth-cli list
# Start MCP server
mcp-server-thoth
```
## Tools
### Core Tools
- `find_definition` - Locate symbol definitions
- `get_file_structure` - Extract functions, classes, imports from a file
- `search_symbols` - Search symbols by name pattern
- `get_callers` - Find callers of a function
- `get_repositories` - List indexed repositories
- `index_repository` - Index a new repository
### Semantic Search (v0.2.0+)
- `search_semantic` - Natural language code search using embeddings
- Example: "function that handles user authentication"
- Returns relevant symbols ranked by semantic similarity
### Development Memory (v0.3.0+)
- `start_dev_session` - Start tracking development attempts
- Persists across Claude conversations
- Links attempts to specific tasks
- `track_attempt` - Record coding attempts (edit, test, refactor)
- Automatically captures errors and solutions
- Builds knowledge base of what works/fails
- `check_approach` - See if an approach has been tried before
- Learn from past attempts
- Avoid repeating mistakes
- `analyze_failure` - Get insights from past failures
- Find solutions to similar problems
- See common error patterns
- `analyze_patterns` - Analyze failure patterns
- Identify problematic files
- Get suggestions based on history
### Visualization Tools
- `generate_module_diagram` - Generate Mermaid dependency diagrams
- `generate_system_architecture` - Visualize cross-repository relationships
- `trace_api_flow` - Trace client-server communication paths
## Architecture
### Storage Backend
Thoth uses a hybrid storage approach:
- **SQLite** (`~/.thoth/index.db`): Source of truth for structured data
- `symbols` - Functions, classes, methods with location and parent relationships
- `imports` - Import statements with cross-repository resolution
- `calls` - Function call graph (caller → callee mapping)
- `files` - File metadata and content hashes for incremental updates
- `development_sessions` - Track coding sessions across Claude conversations
- `development_attempts` - Record all edit/test/refactor attempts
- `failure_patterns` - Identify common failure patterns
- `learned_solutions` - Store successful solutions for reuse
- **ChromaDB** (`~/.thoth/chroma/`): Vector storage for semantic search
- Stores embeddings for all indexed symbols
- Enables natural language queries
- **NetworkX**: In-memory graph for fast relationship traversal
### Embedding Model
Semantic search uses **Qwen3-Embedding-0.6B** via vLLM:
- Lightweight (600M parameters, ~1.2GB on disk)
- Code-aware embeddings with instruction support
- Fast inference with GPU acceleration (optional)
- Falls back to TF-IDF when vLLM is unavailable
## Performance
- **Indexing**: ~10K symbols/minute
- **Semantic Search**: <100ms for typical queries
- **Memory**: ~2GB for model + ~100MB per 100K symbols
- **Accuracy**: 0.7-0.9 relevance scores for code search
## Advanced Usage
### Pre-indexing Large Repositories
For large monorepos, pre-index before adding to Claude:
```bash
thoth-cli index myrepo /path/to/large-repo
```
### Using Redis Cache (Optional)
For improved performance with multiple users:
```bash
# Install with Redis support
uv tool install "mcp-server-thoth[cache]"
# Requires Redis server running locally
```
### Dashboard (Coming Soon)
A separate `thoth-dashboard` package will provide:
- Web UI for exploring indexed code
- Interactive dependency graphs
- Real-time search interface
## Development
```bash
git clone https://github.com/braininahat/thoth
cd thoth
uv pip install -e ".[dev]"
# Run tests
pytest
# Type checking
mypy thoth
```
## Token Efficiency
Thoth dramatically reduces the tokens needed for code navigation:
**Without Thoth**: Multiple searches + reading entire files = ~50K tokens
**With Thoth**: Semantic search + precise results = ~2K tokens
Example:
```
User: "How does the dashboard update in real-time?"
Without Thoth:
- grep "dashboard" → 50 results
- grep "update" → 200 results
- Read 10+ files to understand
With Thoth semantic search:
- Returns: WebSocketHandler.send_update(), Dashboard.subscribe_to_changes(), etc.
- Ranked by relevance
```
## Troubleshooting
### Python Version Issues
If you see errors about `xformers` or build failures:
```bash
# Ensure Python 3.12 is used
uvx --python 3.12 mcp-server-thoth
```
### GPU Memory
For systems with limited GPU memory:
- Embeddings are automatically moved to CPU after computation
- Set `CUDA_VISIBLE_DEVICES=-1` to force CPU-only mode
### Model Download
First run downloads the embedding model (~460MB). Use `thoth-cli init` to pre-download:
```bash
# Download model before using with Claude
thoth-cli init
# Or skip model download (disables semantic search)
thoth-cli init --skip-model
```
### MCP Timeouts
If tools timeout in Claude, run `thoth-cli init` first to pre-download the model. The embedding model takes time to load on first use.
## License
MIT
## Contributing
Contributions welcome! Please check the [issues](https://github.com/braininahat/thoth/issues) page.
## Acknowledgments
- [MCP](https://modelcontextprotocol.io/) by Anthropic
- [vLLM](https://github.com/vllm-project/vllm) for fast inference
- [Qwen](https://github.com/QwenLM/Qwen) for lightweight embeddings
Raw data
{
"_id": null,
"home_page": null,
"name": "mcp-server-thoth",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": "Varun Shijo <varun.shijo@gmail.com>",
"keywords": "analysis, codebase, mcp, memory, semantic-search, visualization",
"author": null,
"author_email": "Varun Shijo <varunshi@buffalo.edu>",
"download_url": "https://files.pythonhosted.org/packages/8b/8d/880dc21293ee2f2fceeae16837b7678043fc9e6fcf0f251a8a003a5b4bf2/mcp_server_thoth-0.4.0.tar.gz",
"platform": null,
"description": "# Thoth\n\nMCP server providing persistent codebase memory with semantic search for AI assistants.\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/mcp-server-thoth/\">\n <img src=\"https://img.shields.io/pypi/v/mcp-server-thoth.svg\" alt=\"PyPI\">\n </a>\n <a href=\"https://github.com/braininahat/thoth/blob/main/LICENSE\">\n <img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" alt=\"License\">\n </a>\n <a href=\"https://pypi.org/project/mcp-server-thoth/\">\n <img src=\"https://img.shields.io/pypi/pyversions/mcp-server-thoth.svg\" alt=\"Python Versions\">\n </a>\n</p>\n\n## Overview\n\nThoth indexes code repositories using AST parsing and provides tools for symbol lookup, cross-repository navigation, and architecture visualization. With v0.2.0, semantic search was added using local embeddings. v0.3.0 introduced **development memory** to track and learn from all coding attempts. v0.4.0 brings **architectural separation** for instant MCP server startup.\n\nThe index persists in `~/.thoth/`, giving Claude and other MCP-compatible assistants memory across conversations.\n\n## Features\n\n- \ud83d\ude80 **Instant Startup**: MCP server starts in <1 second with separated embedding service (v0.4.0)\n- \ud83d\udd0d **Semantic Search**: Find code using natural language queries with local embeddings\n- \ud83e\udde0 **Persistent Memory**: Code understanding persists between conversations\n- \ud83d\udcdd **Development Memory**: Track all coding attempts and learn from failures\n- \ud83d\udd17 **Cross-Repository**: Navigate dependencies across multiple related repositories\n- \ud83d\udcca **Visualizations**: Generate architecture diagrams and dependency graphs\n- \u26a1 **Fast Indexing**: AST-based parsing with incremental updates\n- \ud83c\udfaf **Precise Navigation**: Jump to exact definitions, find all callers\n- \ud83d\udd27 **Local-First**: All processing happens locally, no cloud dependencies\n\n## Installation\n\n### Requirements\n\n- Python 3.10-3.12 (Python 3.13 not yet supported due to some dependencies)\n- For semantic search: ~500MB disk space for embedding model\n\n### Quick Start\n\n```bash\n# Build and install Thoth\nuv build\n\n# Initialize (sets up database and starts embedding server)\nuv run thoth-cli init\n\n# Source environment variables\nsource ~/.thoth/env\n\n# Index your first repository\nuv run thoth-cli index myproject /path/to/repo\n\n# Add to Claude Desktop\nclaude mcp add thoth -s user -- uvx --python 3.12 mcp-server-thoth\n```\n\nThat's it! The `init` command automatically:\n- Creates the database\n- Starts the Text Embeddings Inference (TEI) server for high-quality semantic search\n- Sets up environment variables\n- Verifies the installation\n\n### Architecture\n\nThoth uses a microservices architecture for optimal performance:\n\n- **MCP Server**: Lightweight, starts in <1 second (was 30+ seconds)\n- **TEI Server**: Handles embeddings (Qwen3-Embedding-0.6B model)\n- **ChromaDB Server**: Vector storage as a dedicated service\n\n### Manual Setup (Advanced)\n\nIf you prefer to manage services manually:\n\n```bash\n# Initialize without starting services\nuv run thoth-cli init --no-start-services\n\n# Start TEI server manually\n./scripts/run_tei_server.sh\n\n# Set environment variables\nexport THOTH_EMBEDDING_SERVER_URL=http://localhost:8765\n\n# Check status\nuv run thoth-cli status\n```\n\n### First-Time Setup\n\nBefore using Thoth with Claude, run the initialization:\n\n```bash\nthoth-cli init\n```\n\nThis will:\n- \u2705 Set up the database\n- \u2705 Create necessary directories\n- \u2705 Verify the installation\n\n### Claude Desktop\n\nAdd to your configuration file:\n- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`\n- Windows: `%APPDATA%\\Claude\\claude_desktop_config.json`\n- Linux: `~/.config/claude/claude_desktop_config.json`\n\n#### Configuration:\n```json\n{\n \"mcpServers\": {\n \"thoth\": {\n \"command\": \"uvx\",\n \"args\": [\"--python\", \"3.12\", \"mcp-server-thoth\"]\n }\n }\n}\n```\n\nTo index repositories, either:\n1. Use the CLI: `thoth-cli index myrepo /path/to/repo`\n2. Use the `index_repository` tool from within Claude\n\n### Command Line\n\n```bash\n# Install globally\nuv tool install --python 3.12 mcp-server-thoth\n\n# Initialize Thoth (first time only)\nthoth-cli init\n\n# Index a repository\nthoth-cli index myproject /path/to/repo\n\n# Search symbols\nthoth-cli search \"database connection\"\n\n# List indexed repositories\nthoth-cli list\n\n# Start MCP server\nmcp-server-thoth\n```\n\n## Tools\n\n### Core Tools\n- `find_definition` - Locate symbol definitions\n- `get_file_structure` - Extract functions, classes, imports from a file\n- `search_symbols` - Search symbols by name pattern\n- `get_callers` - Find callers of a function\n- `get_repositories` - List indexed repositories\n- `index_repository` - Index a new repository\n\n### Semantic Search (v0.2.0+)\n- `search_semantic` - Natural language code search using embeddings\n - Example: \"function that handles user authentication\"\n - Returns relevant symbols ranked by semantic similarity\n\n### Development Memory (v0.3.0+)\n- `start_dev_session` - Start tracking development attempts\n - Persists across Claude conversations\n - Links attempts to specific tasks\n- `track_attempt` - Record coding attempts (edit, test, refactor)\n - Automatically captures errors and solutions\n - Builds knowledge base of what works/fails\n- `check_approach` - See if an approach has been tried before\n - Learn from past attempts\n - Avoid repeating mistakes\n- `analyze_failure` - Get insights from past failures\n - Find solutions to similar problems\n - See common error patterns\n- `analyze_patterns` - Analyze failure patterns\n - Identify problematic files\n - Get suggestions based on history\n\n### Visualization Tools\n- `generate_module_diagram` - Generate Mermaid dependency diagrams\n- `generate_system_architecture` - Visualize cross-repository relationships\n- `trace_api_flow` - Trace client-server communication paths\n\n## Architecture\n\n### Storage Backend\n\nThoth uses a hybrid storage approach:\n- **SQLite** (`~/.thoth/index.db`): Source of truth for structured data\n - `symbols` - Functions, classes, methods with location and parent relationships\n - `imports` - Import statements with cross-repository resolution\n - `calls` - Function call graph (caller \u2192 callee mapping)\n - `files` - File metadata and content hashes for incremental updates\n - `development_sessions` - Track coding sessions across Claude conversations\n - `development_attempts` - Record all edit/test/refactor attempts\n - `failure_patterns` - Identify common failure patterns\n - `learned_solutions` - Store successful solutions for reuse\n\n- **ChromaDB** (`~/.thoth/chroma/`): Vector storage for semantic search\n - Stores embeddings for all indexed symbols\n - Enables natural language queries\n\n- **NetworkX**: In-memory graph for fast relationship traversal\n\n### Embedding Model\n\nSemantic search uses **Qwen3-Embedding-0.6B** via vLLM:\n- Lightweight (600M parameters, ~1.2GB on disk)\n- Code-aware embeddings with instruction support\n- Fast inference with GPU acceleration (optional)\n- Falls back to TF-IDF when vLLM is unavailable\n\n## Performance\n\n- **Indexing**: ~10K symbols/minute\n- **Semantic Search**: <100ms for typical queries\n- **Memory**: ~2GB for model + ~100MB per 100K symbols\n- **Accuracy**: 0.7-0.9 relevance scores for code search\n\n## Advanced Usage\n\n### Pre-indexing Large Repositories\nFor large monorepos, pre-index before adding to Claude:\n```bash\nthoth-cli index myrepo /path/to/large-repo\n```\n\n### Using Redis Cache (Optional)\nFor improved performance with multiple users:\n```bash\n# Install with Redis support\nuv tool install \"mcp-server-thoth[cache]\"\n\n# Requires Redis server running locally\n```\n\n### Dashboard (Coming Soon)\nA separate `thoth-dashboard` package will provide:\n- Web UI for exploring indexed code\n- Interactive dependency graphs\n- Real-time search interface\n\n## Development\n\n```bash\ngit clone https://github.com/braininahat/thoth\ncd thoth\nuv pip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Type checking\nmypy thoth\n```\n\n## Token Efficiency\n\nThoth dramatically reduces the tokens needed for code navigation:\n\n**Without Thoth**: Multiple searches + reading entire files = ~50K tokens\n**With Thoth**: Semantic search + precise results = ~2K tokens\n\nExample:\n```\nUser: \"How does the dashboard update in real-time?\"\n\nWithout Thoth:\n- grep \"dashboard\" \u2192 50 results\n- grep \"update\" \u2192 200 results \n- Read 10+ files to understand\n\nWith Thoth semantic search:\n- Returns: WebSocketHandler.send_update(), Dashboard.subscribe_to_changes(), etc.\n- Ranked by relevance\n```\n\n## Troubleshooting\n\n### Python Version Issues\nIf you see errors about `xformers` or build failures:\n```bash\n# Ensure Python 3.12 is used\nuvx --python 3.12 mcp-server-thoth\n```\n\n### GPU Memory\nFor systems with limited GPU memory:\n- Embeddings are automatically moved to CPU after computation\n- Set `CUDA_VISIBLE_DEVICES=-1` to force CPU-only mode\n\n### Model Download\nFirst run downloads the embedding model (~460MB). Use `thoth-cli init` to pre-download:\n```bash\n# Download model before using with Claude\nthoth-cli init\n\n# Or skip model download (disables semantic search)\nthoth-cli init --skip-model\n```\n\n### MCP Timeouts\nIf tools timeout in Claude, run `thoth-cli init` first to pre-download the model. The embedding model takes time to load on first use.\n\n## License\n\nMIT\n\n## Contributing\n\nContributions welcome! Please check the [issues](https://github.com/braininahat/thoth/issues) page.\n\n## Acknowledgments\n\n- [MCP](https://modelcontextprotocol.io/) by Anthropic\n- [vLLM](https://github.com/vllm-project/vllm) for fast inference\n- [Qwen](https://github.com/QwenLM/Qwen) for lightweight embeddings",
"bugtrack_url": null,
"license": "MIT",
"summary": "MCP server for persistent codebase memory with semantic search and development tracking",
"version": "0.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/braininahat/thoth/issues",
"Homepage": "https://github.com/braininahat/thoth",
"Source Code": "https://github.com/braininahat/thoth"
},
"split_keywords": [
"analysis",
" codebase",
" mcp",
" memory",
" semantic-search",
" visualization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1870dc592f466e2113b131b64a21b46aa4b08c28f918fe8d377f812a16b9d7ef",
"md5": "8e167ee598d135fbd67c656fbfbc2f4a",
"sha256": "c522017fd8aa1a7d6f4a02cd827d04036c66bb7b2d7866f876612db0c7ac4b00"
},
"downloads": -1,
"filename": "mcp_server_thoth-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8e167ee598d135fbd67c656fbfbc2f4a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 56509,
"upload_time": "2025-07-11T00:30:49",
"upload_time_iso_8601": "2025-07-11T00:30:49.811126Z",
"url": "https://files.pythonhosted.org/packages/18/70/dc592f466e2113b131b64a21b46aa4b08c28f918fe8d377f812a16b9d7ef/mcp_server_thoth-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8b8d880dc21293ee2f2fceeae16837b7678043fc9e6fcf0f251a8a003a5b4bf2",
"md5": "64374c5daed8b08caaee147e227f3476",
"sha256": "35b632e131fe323ae47b6633baf0fb67d6a757cdbc7341c653888469f0814840"
},
"downloads": -1,
"filename": "mcp_server_thoth-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "64374c5daed8b08caaee147e227f3476",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 270918,
"upload_time": "2025-07-11T00:30:51",
"upload_time_iso_8601": "2025-07-11T00:30:51.114253Z",
"url": "https://files.pythonhosted.org/packages/8b/8d/880dc21293ee2f2fceeae16837b7678043fc9e6fcf0f251a8a003a5b4bf2/mcp_server_thoth-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-11 00:30:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "braininahat",
"github_project": "thoth",
"github_not_found": true,
"lcname": "mcp-server-thoth"
}