Name | diffchunk JSON |
Version |
0.1.7
JSON |
| download |
home_page | None |
Summary | MCP server for navigating large diff files with intelligent chunking |
upload_time | 2025-07-19 13:02:36 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | MIT |
keywords |
ai-tools
diff
llm
mcp
parsing
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# diffchunk
[](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml)
[](https://codecov.io/gh/peteretelej/diffchunk)
[](https://pypi.org/project/diffchunk/)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/astral-sh/ruff)
[](https://github.com/astral-sh/uv)
MCP server that enables LLMs to navigate large diff files efficiently. Instead of reading entire diffs sequentially, LLMs can jump directly to relevant changes using pattern-based navigation.
## Problem
Large diffs exceed LLM context limits and waste tokens on irrelevant changes. A 50k+ line diff can't be processed directly and manual splitting loses file relationships.
## Solution
MCP server with 4 navigation tools:
- `load_diff` - Parse diff file with custom settings (optional)
- `list_chunks` - Show chunk overview with file mappings (auto-loads)
- `get_chunk` - Retrieve specific chunk content (auto-loads)
- `find_chunks_for_files` - Locate chunks by file patterns (auto-loads)
## Setup
**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.
Add to your MCP client configuration:
```json
{
"mcpServers": {
"diffchunk": {
"command": "uvx",
"args": ["--from", "diffchunk", "diffchunk-mcp"]
}
}
}
```
## Usage
Your AI assistant can now handle massive changesets that previously caused failures in Cline, Roocode, Cursor, and other tools.
### Using with AI Assistant
Once configured, your AI assistant can analyze large commits, branches, or diffs using diffchunk.
Here are some example use cases:
**Branch comparisons:**
- _"Review all changes in develop not in the main branch for any bugs"_
- _"Tell me about all the changes I have yet to merge"_
- _"What new features were added to the staging branch?"_
- _"Summarize all changes to this repo in the last 2 weeks"_
**Code review:**
- _"Use diffchunk to check my feature branch for security vulnerabilities"_
- _"Use diffchunk to find any breaking changes before I merge to production"_
- _"Use diffchunk to review this large refactor for potential issues"_
**Change analysis:**
- _"Use diffchunk to show me all database migrations that need to be run"_
- _"Use diffchunk to find what API changes might affect our mobile app"_
- _"Use diffchunk to analyze all new dependencies added recently"_
**Direct file analysis:**
- _"Use diffchunk to analyze the diff at /tmp/changes.diff and find any bugs"_
- _"Create a diff of my uncommitted changes and review it"_
- _"Compare my local branch with origin and highlight conflicts"_
### Tip: AI Assistant Rules
Add to your AI assistant's custom instructions for automatic usage:
```
When reviewing large changesets or git commits, use diffchunk to handle large diff files.
Create temporary diff files and tracking files as needed and clean up after analysis.
```
## How It Works
When you ask your AI assistant to analyze changes, it uses diffchunk's tools strategically:
1. **Creates the diff file** (e.g., `git diff main..develop > /tmp/changes.diff`) based on your question
2. **Uses `list_chunks`** to get an overview of the diff structure and total scope
3. **Uses `find_chunks_for_files`** to locate relevant sections when you ask about specific file types
4. **Uses `get_chunk`** to examine specific sections without loading the entire diff into context
5. **Tracks progress systematically** through large changesets, analyzing chunk by chunk
6. **Cleans up temporary files** after completing the analysis
This lets your AI assistant handle massive diffs that would normally crash other tools, while providing thorough analysis without losing context.
### Tool Usage Patterns
**Overview first:**
```python
list_chunks("/tmp/changes.diff")
# → 5 chunks across 12 files, 3,847 total lines
```
**Target specific files:**
```python
find_chunks_for_files("/tmp/changes.diff", "*.py")
# → [1, 3, 5] - Python file chunks
get_chunk("/tmp/changes.diff", 1)
# → Content of first Python chunk
```
**Systematic analysis:**
```python
# Process each chunk in sequence
get_chunk("/tmp/changes.diff", 1)
get_chunk("/tmp/changes.diff", 2)
# ... continue through all chunks
```
## Configuration
### Path Requirements
- **Absolute paths only**: `/home/user/project/changes.diff`
- **Cross-platform**: Windows (`C:\path`) and Unix (`/path`)
- **Home expansion**: `~/project/changes.diff`
### Auto-Loading Defaults
Tools auto-load with optimized settings:
- `max_chunk_lines`: 1000
- `skip_trivial`: true (whitespace-only)
- `skip_generated`: true (lock files, build artifacts)
### Custom Settings
Use `load_diff` for non-default behavior:
```python
load_diff(
"/tmp/large.diff",
max_chunk_lines=2000,
include_patterns="*.py,*.js",
exclude_patterns="*test*"
)
```
## Supported Formats
- Git diff output (`git diff`, `git show`)
- Unified diff format (`diff -u`)
- Multiple files in single diff
- Binary file change indicators
## Performance
- Efficiently handles 100k+ line diffs
- Memory efficient streaming
- Auto-reload on file changes
## Documentation
- [Design](docs/design.md) - Architecture and implementation details
- [Contributing](docs/CONTRIBUTING.md) - Development setup and workflows
## License
[MIT](./LICENSE)
Raw data
{
"_id": null,
"home_page": null,
"name": "diffchunk",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "Peter Etelej <peter@etelej.com>",
"keywords": "ai-tools, diff, llm, mcp, parsing",
"author": null,
"author_email": "Peter Etelej <peter@etelej.com>",
"download_url": "https://files.pythonhosted.org/packages/6c/7f/5dc86e82306f2259b963c1166a5684da5d68c1790b39e94d594050d04ad7/diffchunk-0.1.7.tar.gz",
"platform": null,
"description": "# diffchunk\n\n[](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml)\n[](https://codecov.io/gh/peteretelej/diffchunk)\n[](https://pypi.org/project/diffchunk/)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/astral-sh/ruff)\n[](https://github.com/astral-sh/uv)\n\nMCP server that enables LLMs to navigate large diff files efficiently. Instead of reading entire diffs sequentially, LLMs can jump directly to relevant changes using pattern-based navigation.\n\n## Problem\n\nLarge diffs exceed LLM context limits and waste tokens on irrelevant changes. A 50k+ line diff can't be processed directly and manual splitting loses file relationships.\n\n## Solution\n\nMCP server with 4 navigation tools:\n\n- `load_diff` - Parse diff file with custom settings (optional)\n- `list_chunks` - Show chunk overview with file mappings (auto-loads)\n- `get_chunk` - Retrieve specific chunk content (auto-loads)\n- `find_chunks_for_files` - Locate chunks by file patterns (auto-loads)\n\n## Setup\n\n**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.\n\nAdd to your MCP client configuration:\n\n```json\n{\n \"mcpServers\": {\n \"diffchunk\": {\n \"command\": \"uvx\",\n \"args\": [\"--from\", \"diffchunk\", \"diffchunk-mcp\"]\n }\n }\n}\n```\n\n## Usage\n\nYour AI assistant can now handle massive changesets that previously caused failures in Cline, Roocode, Cursor, and other tools.\n\n### Using with AI Assistant\n\nOnce configured, your AI assistant can analyze large commits, branches, or diffs using diffchunk.\n\nHere are some example use cases:\n\n**Branch comparisons:**\n\n- _\"Review all changes in develop not in the main branch for any bugs\"_\n- _\"Tell me about all the changes I have yet to merge\"_\n- _\"What new features were added to the staging branch?\"_\n- _\"Summarize all changes to this repo in the last 2 weeks\"_\n\n**Code review:**\n\n- _\"Use diffchunk to check my feature branch for security vulnerabilities\"_\n- _\"Use diffchunk to find any breaking changes before I merge to production\"_\n- _\"Use diffchunk to review this large refactor for potential issues\"_\n\n**Change analysis:**\n\n- _\"Use diffchunk to show me all database migrations that need to be run\"_\n- _\"Use diffchunk to find what API changes might affect our mobile app\"_\n- _\"Use diffchunk to analyze all new dependencies added recently\"_\n\n**Direct file analysis:**\n\n- _\"Use diffchunk to analyze the diff at /tmp/changes.diff and find any bugs\"_\n- _\"Create a diff of my uncommitted changes and review it\"_\n- _\"Compare my local branch with origin and highlight conflicts\"_\n\n### Tip: AI Assistant Rules\n\nAdd to your AI assistant's custom instructions for automatic usage:\n\n```\nWhen reviewing large changesets or git commits, use diffchunk to handle large diff files.\nCreate temporary diff files and tracking files as needed and clean up after analysis.\n```\n\n## How It Works\n\nWhen you ask your AI assistant to analyze changes, it uses diffchunk's tools strategically:\n\n1. **Creates the diff file** (e.g., `git diff main..develop > /tmp/changes.diff`) based on your question\n2. **Uses `list_chunks`** to get an overview of the diff structure and total scope\n3. **Uses `find_chunks_for_files`** to locate relevant sections when you ask about specific file types\n4. **Uses `get_chunk`** to examine specific sections without loading the entire diff into context\n5. **Tracks progress systematically** through large changesets, analyzing chunk by chunk\n6. **Cleans up temporary files** after completing the analysis\n\nThis lets your AI assistant handle massive diffs that would normally crash other tools, while providing thorough analysis without losing context.\n\n### Tool Usage Patterns\n\n**Overview first:**\n\n```python\nlist_chunks(\"/tmp/changes.diff\")\n# \u2192 5 chunks across 12 files, 3,847 total lines\n```\n\n**Target specific files:**\n\n```python\nfind_chunks_for_files(\"/tmp/changes.diff\", \"*.py\")\n# \u2192 [1, 3, 5] - Python file chunks\n\nget_chunk(\"/tmp/changes.diff\", 1)\n# \u2192 Content of first Python chunk\n```\n\n**Systematic analysis:**\n\n```python\n# Process each chunk in sequence\nget_chunk(\"/tmp/changes.diff\", 1)\nget_chunk(\"/tmp/changes.diff\", 2)\n# ... continue through all chunks\n```\n\n## Configuration\n\n### Path Requirements\n\n- **Absolute paths only**: `/home/user/project/changes.diff`\n- **Cross-platform**: Windows (`C:\\path`) and Unix (`/path`)\n- **Home expansion**: `~/project/changes.diff`\n\n### Auto-Loading Defaults\n\nTools auto-load with optimized settings:\n\n- `max_chunk_lines`: 1000\n- `skip_trivial`: true (whitespace-only)\n- `skip_generated`: true (lock files, build artifacts)\n\n### Custom Settings\n\nUse `load_diff` for non-default behavior:\n\n```python\nload_diff(\n \"/tmp/large.diff\",\n max_chunk_lines=2000,\n include_patterns=\"*.py,*.js\",\n exclude_patterns=\"*test*\"\n)\n```\n\n## Supported Formats\n\n- Git diff output (`git diff`, `git show`)\n- Unified diff format (`diff -u`)\n- Multiple files in single diff\n- Binary file change indicators\n\n## Performance\n\n- Efficiently handles 100k+ line diffs\n- Memory efficient streaming\n- Auto-reload on file changes\n\n## Documentation\n\n- [Design](docs/design.md) - Architecture and implementation details\n- [Contributing](docs/CONTRIBUTING.md) - Development setup and workflows\n\n## License\n\n[MIT](./LICENSE)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "MCP server for navigating large diff files with intelligent chunking",
"version": "0.1.7",
"project_urls": {
"Documentation": "https://github.com/peteretelej/diffchunk#readme",
"Homepage": "https://github.com/peteretelej/diffchunk",
"Issues": "https://github.com/peteretelej/diffchunk/issues",
"Repository": "https://github.com/peteretelej/diffchunk.git"
},
"split_keywords": [
"ai-tools",
" diff",
" llm",
" mcp",
" parsing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "fa8fbca3d525a7b44d1350627ab0c2edcd993e96cf5207bd2cfd596b01642809",
"md5": "72dcf53fa4cceacb9b06b74bc49d6791",
"sha256": "f2f53f53f0fc302233a5d3bc4841d9eba4b4c1946b6871ed19bbcd388143d63a"
},
"downloads": -1,
"filename": "diffchunk-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "72dcf53fa4cceacb9b06b74bc49d6791",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 15607,
"upload_time": "2025-07-19T13:02:35",
"upload_time_iso_8601": "2025-07-19T13:02:35.010685Z",
"url": "https://files.pythonhosted.org/packages/fa/8f/bca3d525a7b44d1350627ab0c2edcd993e96cf5207bd2cfd596b01642809/diffchunk-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6c7f5dc86e82306f2259b963c1166a5684da5d68c1790b39e94d594050d04ad7",
"md5": "baecc41d4da9b054e151fb8a293542ee",
"sha256": "c8705e6ba31c5c97465092d7e27b8bec321594e7246448e2ca254afb6c3f3728"
},
"downloads": -1,
"filename": "diffchunk-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "baecc41d4da9b054e151fb8a293542ee",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 5705568,
"upload_time": "2025-07-19T13:02:36",
"upload_time_iso_8601": "2025-07-19T13:02:36.003100Z",
"url": "https://files.pythonhosted.org/packages/6c/7f/5dc86e82306f2259b963c1166a5684da5d68c1790b39e94d594050d04ad7/diffchunk-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-19 13:02:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "peteretelej",
"github_project": "diffchunk#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "diffchunk"
}