# GitScribe π
> *Scribing knowledge from the Git universe*
GitScribe is a powerful Model Context Protocol (MCP) server that enables intelligent web scraping of Git-based documentation with Retrieval Augmented Generation (RAG) capabilities. This tool helps code assistants and developers efficiently extract, process, and retrieve information from documentation websites, GitHub repositories, and other Git-based resources to accelerate application development.
## β¨ Features
- **π Universal Git Support**: Works with GitHub, GitLab, Bitbucket, and Azure DevOps
- **π§ Intelligent RAG System**: ChromaDB + Sentence Transformers for semantic search
- **π Multi-Format Parsing**: Markdown, HTML, reStructuredText, and source code files
- **β‘ High Performance**: Async scraping with intelligent rate limiting
- **π§ MCP Integration**: Full Model Context Protocol compliance for AI assistants
- **π Rich CLI**: Command-line interface for testing and management
- **π― Smart Filtering**: Automatic content filtering and relevance scoring
## π Quick Start
### Installation
```bash
# Install from PyPI (recommended)
pip install gitscribe-mcp
# Or install with uv (recommended for development)
uv sync
# Or install with pip for development
pip install -e .
# Or install dependencies manually
pip install -r requirements-gitscribe.txt
```
### Verify Installation
```bash
# Check if installation was successful
gitscribe-mcp --help
# Test the server (should start without errors)
gitscribe-mcp server --help
```
### Basic Usage
#### 1. Start the MCP Server
```bash
# Start the server for use with AI assistants
gitscribe-mcp server
# Or run directly with uv
uv run gitscribe-mcp server
```
#### 2. Scrape Documentation
```bash
# Scrape Python documentation
gitscribe-mcp scrape https://docs.python.org --depth 2 --output python_docs.json
# Scrape a GitHub repository
gitscribe-mcp scrape https://github.com/microsoft/vscode --formats md html rst
```
#### 3. Index Documents
```bash
# Index scraped documents into the RAG system
gitscribe-mcp index python_docs.json
```
#### 4. Search Documentation
```bash
# Search indexed documentation
gitscribe-mcp search "async await python examples"
gitscribe-mcp search "VSCode extension API" --limit 5
```
#### 5. Analyze Repositories
```bash
# Get repository information and structure
gitscribe-mcp repo-info https://github.com/microsoft/vscode
```
## π€ Using as MCP Server
GitScribe is designed to work as a Model Context Protocol (MCP) server with AI assistants like Claude Desktop. Once installed and configured, you can interact with it naturally through your AI assistant.
### Example Interactions
**Scraping Documentation:**
```
"Can you scrape the FastAPI documentation and index it for me?"
```
**Searching for Information:**
```
"Search the indexed documentation for examples of async database operations"
```
**Getting Code Examples:**
```
"Show me code examples for implementing JWT authentication in Python"
```
**Repository Analysis:**
```
"Analyze the structure of the React repository and tell me about its testing setup"
```
### Available MCP Tools
When configured as an MCP server, GitScribe provides these tools to AI assistants:
## π MCP Tools
GitScribe provides the following MCP tools:
### `scrape_documentation`
Scrape and index documentation from a Git repository or website.
**Parameters:**
- `url` (string, required): Repository or documentation URL
- `depth` (integer, optional): Maximum crawling depth (default: 3)
- `formats` (array, optional): Supported document formats
### `search_documentation`
Search indexed documentation using semantic search.
**Parameters:**
- `query` (string, required): Natural language search query
- `limit` (integer, optional): Maximum number of results (default: 10)
- `filter` (object, optional): Filter criteria (language, framework, etc.)
### `get_code_examples`
Extract code examples related to a specific topic.
**Parameters:**
- `topic` (string, required): Programming topic or concept
- `language` (string, optional): Programming language filter
- `framework` (string, optional): Framework or library filter
## π οΈ Configuration
GitScribe can be configured through environment variables:
```bash
# Server settings
export GITSCRIBE_DEBUG=true
export GITSCRIBE_MAX_DEPTH=3
export GITSCRIBE_MAX_PAGES=100
# RAG system settings
export GITSCRIBE_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
export GITSCRIBE_CHUNK_SIZE=1000
export GITSCRIBE_CHROMA_DIR="./chroma_db"
# Rate limiting
export GITSCRIBE_REQUEST_DELAY=1.0
export GITSCRIBE_CONCURRENT_REQUESTS=5
# Git platform authentication (optional)
export GITHUB_TOKEN="your_github_token"
export GITLAB_TOKEN="your_gitlab_token"
```
## π Claude Desktop Integration
To use GitScribe as an MCP server with Claude Desktop, you need to configure it in your Claude Desktop settings.
### Prerequisites
First, install the package from PyPI:
```bash
pip install gitscribe-mcp
```
### Configuration
Add the following configuration to your Claude Desktop config file:
**MacOS:** `~/Library/Application\ Support/Claude/claude_desktop_config.json`
**Windows:** `%APPDATA%/Claude/claude_desktop_config.json`
#### Using the PyPI Package (Recommended)
```json
{
"mcpServers": {
"gitscribe": {
"command": "gitscribe-mcp",
"args": ["server"],
"env": {
"GITSCRIBE_DEBUG": "false",
"GITSCRIBE_MAX_DEPTH": "3",
"GITSCRIBE_CHROMA_DIR": "./chroma_db"
}
}
}
}
```
#### Using uvx (Alternative)
```json
{
"mcpServers": {
"gitscribe": {
"command": "uvx",
"args": ["gitscribe-mcp", "server"],
"env": {
"GITSCRIBE_DEBUG": "false"
}
}
}
}
```
#### Development Configuration (Local Development)
```json
{
"mcpServers": {
"gitscribe": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/gitscribe",
"run",
"gitscribe-mcp",
"server"
],
"env": {
"GITSCRIBE_DEBUG": "true"
}
}
}
}
```
### Verification
After adding the configuration:
1. Restart Claude Desktop
2. Start a new conversation
3. You should see GitScribe available as an MCP server
4. Try using commands like: "Can you scrape the Python documentation and help me find examples of async/await?"
## π§ͺ Development
### Building and Publishing
1. Sync dependencies:
```bash
uv sync
```
2. Build package:
```bash
uv build
```
3. Publish to PyPI:
```bash
uv publish
```
### Debugging
Use the [MCP Inspector](https://github.com/modelcontextprotocol/inspector) for debugging:
```bash
# Debug the PyPI package
npx @modelcontextprotocol/inspector gitscribe-mcp server
# Debug local development version
npx @modelcontextprotocol/inspector uv --directory /path/to/gitscribe run gitscribe-mcp server
```
### Testing
```bash
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=gitscribe
# Run specific tests
uv run pytest tests/test_scraper.py
```
## π Supported Formats
- **Documentation**: Markdown (`.md`), HTML (`.html`), reStructuredText (`.rst`)
- **Code Files**: Python (`.py`), JavaScript (`.js`), TypeScript (`.ts`), Java (`.java`), C++ (`.cpp`), Go (`.go`), Rust (`.rs`)
- **Configuration**: JSON, YAML, TOML
- **Web Content**: Dynamic HTML pages, static sites
## ποΈ Architecture
```
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β MCP Client βββββΆβ MCP Server βββββΆβ Web Scraper β
β (Code Assistant)β β (GitScribe) β β (Beautiful Soup)β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β RAG System β
β - ChromaDB β
β - Embeddings β
β - Search β
βββββββββββββββββββ
```
## π License
This project is licensed under the MIT License.
## π Acknowledgments
- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) for HTML parsing
- [ChromaDB](https://www.trychroma.com/) for vector database capabilities
- [Sentence Transformers](https://www.sbert.net/) for embeddings
- [Model Context Protocol](https://modelcontextprotocol.io/) for AI assistant integration
---
**GitScribe** - Making documentation accessible to AI assistants, one commit at a time! π
Raw data
{
"_id": null,
"home_page": null,
"name": "gitscribe-mcp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "documentation, git, mcp, rag, retrieval, scraping, search",
"author": null,
"author_email": "GitScribe Team <contact@gitscribe.dev>",
"download_url": "https://files.pythonhosted.org/packages/dd/b2/570d5deabfb65b13cb7c3bade53a0e1108c1418e59bdb32a666405eda5e8/gitscribe_mcp-1.0.3.tar.gz",
"platform": null,
"description": "# GitScribe \ud83d\udcdc\n\n> *Scribing knowledge from the Git universe*\n\nGitScribe is a powerful Model Context Protocol (MCP) server that enables intelligent web scraping of Git-based documentation with Retrieval Augmented Generation (RAG) capabilities. This tool helps code assistants and developers efficiently extract, process, and retrieve information from documentation websites, GitHub repositories, and other Git-based resources to accelerate application development.\n\n## \u2728 Features\n\n- **\ud83c\udf10 Universal Git Support**: Works with GitHub, GitLab, Bitbucket, and Azure DevOps\n- **\ud83e\udde0 Intelligent RAG System**: ChromaDB + Sentence Transformers for semantic search\n- **\ud83d\udcc4 Multi-Format Parsing**: Markdown, HTML, reStructuredText, and source code files\n- **\u26a1 High Performance**: Async scraping with intelligent rate limiting\n- **\ud83d\udd27 MCP Integration**: Full Model Context Protocol compliance for AI assistants\n- **\ud83d\udcca Rich CLI**: Command-line interface for testing and management\n- **\ud83c\udfaf Smart Filtering**: Automatic content filtering and relevance scoring\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\n# Install from PyPI (recommended)\npip install gitscribe-mcp\n\n# Or install with uv (recommended for development)\nuv sync\n\n# Or install with pip for development\npip install -e .\n\n# Or install dependencies manually\npip install -r requirements-gitscribe.txt\n```\n\n### Verify Installation\n\n```bash\n# Check if installation was successful\ngitscribe-mcp --help\n\n# Test the server (should start without errors)\ngitscribe-mcp server --help\n```\n\n### Basic Usage\n\n#### 1. Start the MCP Server\n```bash\n# Start the server for use with AI assistants\ngitscribe-mcp server\n\n# Or run directly with uv\nuv run gitscribe-mcp server\n```\n\n#### 2. Scrape Documentation\n```bash\n# Scrape Python documentation\ngitscribe-mcp scrape https://docs.python.org --depth 2 --output python_docs.json\n\n# Scrape a GitHub repository\ngitscribe-mcp scrape https://github.com/microsoft/vscode --formats md html rst\n```\n\n#### 3. Index Documents\n```bash\n# Index scraped documents into the RAG system\ngitscribe-mcp index python_docs.json\n```\n\n#### 4. Search Documentation\n```bash\n# Search indexed documentation\ngitscribe-mcp search \"async await python examples\"\ngitscribe-mcp search \"VSCode extension API\" --limit 5\n```\n\n#### 5. Analyze Repositories\n```bash\n# Get repository information and structure\ngitscribe-mcp repo-info https://github.com/microsoft/vscode\n```\n\n## \ud83e\udd16 Using as MCP Server\n\nGitScribe is designed to work as a Model Context Protocol (MCP) server with AI assistants like Claude Desktop. Once installed and configured, you can interact with it naturally through your AI assistant.\n\n### Example Interactions\n\n**Scraping Documentation:**\n```\n\"Can you scrape the FastAPI documentation and index it for me?\"\n```\n\n**Searching for Information:**\n```\n\"Search the indexed documentation for examples of async database operations\"\n```\n\n**Getting Code Examples:**\n```\n\"Show me code examples for implementing JWT authentication in Python\"\n```\n\n**Repository Analysis:**\n```\n\"Analyze the structure of the React repository and tell me about its testing setup\"\n```\n\n### Available MCP Tools\n\nWhen configured as an MCP server, GitScribe provides these tools to AI assistants:\n\n## \ud83d\udccb MCP Tools\n\nGitScribe provides the following MCP tools:\n\n### `scrape_documentation`\nScrape and index documentation from a Git repository or website.\n\n**Parameters:**\n- `url` (string, required): Repository or documentation URL\n- `depth` (integer, optional): Maximum crawling depth (default: 3)\n- `formats` (array, optional): Supported document formats\n\n### `search_documentation`\nSearch indexed documentation using semantic search.\n\n**Parameters:**\n- `query` (string, required): Natural language search query\n- `limit` (integer, optional): Maximum number of results (default: 10)\n- `filter` (object, optional): Filter criteria (language, framework, etc.)\n\n### `get_code_examples`\nExtract code examples related to a specific topic.\n\n**Parameters:**\n- `topic` (string, required): Programming topic or concept\n- `language` (string, optional): Programming language filter\n- `framework` (string, optional): Framework or library filter\n\n## \ud83d\udee0\ufe0f Configuration\n\nGitScribe can be configured through environment variables:\n\n```bash\n# Server settings\nexport GITSCRIBE_DEBUG=true\nexport GITSCRIBE_MAX_DEPTH=3\nexport GITSCRIBE_MAX_PAGES=100\n\n# RAG system settings\nexport GITSCRIBE_EMBEDDING_MODEL=\"sentence-transformers/all-MiniLM-L6-v2\"\nexport GITSCRIBE_CHUNK_SIZE=1000\nexport GITSCRIBE_CHROMA_DIR=\"./chroma_db\"\n\n# Rate limiting\nexport GITSCRIBE_REQUEST_DELAY=1.0\nexport GITSCRIBE_CONCURRENT_REQUESTS=5\n\n# Git platform authentication (optional)\nexport GITHUB_TOKEN=\"your_github_token\"\nexport GITLAB_TOKEN=\"your_gitlab_token\"\n```\n\n## \ud83d\udcd6 Claude Desktop Integration\n\nTo use GitScribe as an MCP server with Claude Desktop, you need to configure it in your Claude Desktop settings.\n\n### Prerequisites\n\nFirst, install the package from PyPI:\n```bash\npip install gitscribe-mcp\n```\n\n### Configuration\n\nAdd the following configuration to your Claude Desktop config file:\n\n**MacOS:** `~/Library/Application\\ Support/Claude/claude_desktop_config.json` \n**Windows:** `%APPDATA%/Claude/claude_desktop_config.json`\n\n#### Using the PyPI Package (Recommended)\n```json\n{\n \"mcpServers\": {\n \"gitscribe\": {\n \"command\": \"gitscribe-mcp\",\n \"args\": [\"server\"],\n \"env\": {\n \"GITSCRIBE_DEBUG\": \"false\",\n \"GITSCRIBE_MAX_DEPTH\": \"3\",\n \"GITSCRIBE_CHROMA_DIR\": \"./chroma_db\"\n }\n }\n }\n}\n```\n\n#### Using uvx (Alternative)\n```json\n{\n \"mcpServers\": {\n \"gitscribe\": {\n \"command\": \"uvx\",\n \"args\": [\"gitscribe-mcp\", \"server\"],\n \"env\": {\n \"GITSCRIBE_DEBUG\": \"false\"\n }\n }\n }\n}\n```\n\n#### Development Configuration (Local Development)\n```json\n{\n \"mcpServers\": {\n \"gitscribe\": {\n \"command\": \"uv\",\n \"args\": [\n \"--directory\",\n \"/path/to/your/gitscribe\",\n \"run\",\n \"gitscribe-mcp\",\n \"server\"\n ],\n \"env\": {\n \"GITSCRIBE_DEBUG\": \"true\"\n }\n }\n }\n}\n```\n\n### Verification\n\nAfter adding the configuration:\n\n1. Restart Claude Desktop\n2. Start a new conversation\n3. You should see GitScribe available as an MCP server\n4. Try using commands like: \"Can you scrape the Python documentation and help me find examples of async/await?\"\n\n## \ud83e\uddea Development\n\n### Building and Publishing\n\n1. Sync dependencies:\n```bash\nuv sync\n```\n\n2. Build package:\n```bash\nuv build\n```\n\n3. Publish to PyPI:\n```bash\nuv publish\n```\n\n### Debugging\n\nUse the [MCP Inspector](https://github.com/modelcontextprotocol/inspector) for debugging:\n\n```bash\n# Debug the PyPI package\nnpx @modelcontextprotocol/inspector gitscribe-mcp server\n\n# Debug local development version\nnpx @modelcontextprotocol/inspector uv --directory /path/to/gitscribe run gitscribe-mcp server\n```\n\n### Testing\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=gitscribe\n\n# Run specific tests\nuv run pytest tests/test_scraper.py\n```\n\n## \ud83d\udcda Supported Formats\n\n- **Documentation**: Markdown (`.md`), HTML (`.html`), reStructuredText (`.rst`)\n- **Code Files**: Python (`.py`), JavaScript (`.js`), TypeScript (`.ts`), Java (`.java`), C++ (`.cpp`), Go (`.go`), Rust (`.rs`)\n- **Configuration**: JSON, YAML, TOML\n- **Web Content**: Dynamic HTML pages, static sites\n\n## \ud83c\udfd7\ufe0f Architecture\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 MCP Client \u2502\u2500\u2500\u2500\u25b6\u2502 MCP Server \u2502\u2500\u2500\u2500\u25b6\u2502 Web Scraper \u2502\n\u2502 (Code Assistant)\u2502 \u2502 (GitScribe) \u2502 \u2502 (Beautiful Soup)\u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u25bc\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 RAG System \u2502\n \u2502 - ChromaDB \u2502\n \u2502 - Embeddings \u2502\n \u2502 - Search \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License.\n\n## \ud83d\ude4f Acknowledgments\n\n- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) for HTML parsing\n- [ChromaDB](https://www.trychroma.com/) for vector database capabilities\n- [Sentence Transformers](https://www.sbert.net/) for embeddings\n- [Model Context Protocol](https://modelcontextprotocol.io/) for AI assistant integration\n\n---\n\n**GitScribe** - Making documentation accessible to AI assistants, one commit at a time! \ud83d\ude80",
"bugtrack_url": null,
"license": null,
"summary": "GitScribe: Web Scraping RAG MCP Server for Git-based Documentation",
"version": "1.0.3",
"project_urls": {
"Bug Reports": "https://github.com/akhilthomas236/gitscribe/issues",
"Homepage": "https://github.com/akhilthomas236/gitscribe",
"PyPI": "https://pypi.org/project/gitscribe-mcp/",
"Source": "https://github.com/akhilthomas236/gitscribe"
},
"split_keywords": [
"documentation",
" git",
" mcp",
" rag",
" retrieval",
" scraping",
" search"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d82177c0f1e288f9e25bc57df63661be44f53bfb81dcf260efb8986978b1c088",
"md5": "2323e95ad2231e089425ed3e5cfe904b",
"sha256": "7bc5f58104503ad86ad3663856b0cb808b75bc7ffbca45be5fb759fe2b86c00f"
},
"downloads": -1,
"filename": "gitscribe_mcp-1.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2323e95ad2231e089425ed3e5cfe904b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 32985,
"upload_time": "2025-08-02T13:53:39",
"upload_time_iso_8601": "2025-08-02T13:53:39.208522Z",
"url": "https://files.pythonhosted.org/packages/d8/21/77c0f1e288f9e25bc57df63661be44f53bfb81dcf260efb8986978b1c088/gitscribe_mcp-1.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ddb2570d5deabfb65b13cb7c3bade53a0e1108c1418e59bdb32a666405eda5e8",
"md5": "c1f0391c1174af0e9afdc805b27d1d84",
"sha256": "4b557a4da513af88bfec3ad13e1abd69d32bd4d975fe6d590231da9f4ca6b083"
},
"downloads": -1,
"filename": "gitscribe_mcp-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "c1f0391c1174af0e9afdc805b27d1d84",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 320342,
"upload_time": "2025-08-02T13:53:41",
"upload_time_iso_8601": "2025-08-02T13:53:41.205923Z",
"url": "https://files.pythonhosted.org/packages/dd/b2/570d5deabfb65b13cb7c3bade53a0e1108c1418e59bdb32a666405eda5e8/gitscribe_mcp-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-02 13:53:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "akhilthomas236",
"github_project": "gitscribe",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "gitscribe-mcp"
}