diffchunk

Name	diffchunk JSON
Version	0.1.7 JSON
	download
home_page	None
Summary	MCP server for navigating large diff files with intelligent chunking
upload_time	2025-07-19 13:02:36
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	ai-tools diff llm mcp parsing
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # diffchunk

[![CI](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml/badge.svg)](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/peteretelej/diffchunk/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/diffchunk)
[![PyPI version](https://img.shields.io/pypi/v/diffchunk.svg)](https://pypi.org/project/diffchunk/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)

MCP server that enables LLMs to navigate large diff files efficiently. Instead of reading entire diffs sequentially, LLMs can jump directly to relevant changes using pattern-based navigation.

## Problem

Large diffs exceed LLM context limits and waste tokens on irrelevant changes. A 50k+ line diff can't be processed directly and manual splitting loses file relationships.

## Solution

MCP server with 4 navigation tools:

- `load_diff` - Parse diff file with custom settings (optional)
- `list_chunks` - Show chunk overview with file mappings (auto-loads)
- `get_chunk` - Retrieve specific chunk content (auto-loads)
- `find_chunks_for_files` - Locate chunks by file patterns (auto-loads)

## Setup

**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.

Add to your MCP client configuration:

```json
{
  "mcpServers": {
    "diffchunk": {
      "command": "uvx",
      "args": ["--from", "diffchunk", "diffchunk-mcp"]
    }
  }
}
```

## Usage

Your AI assistant can now handle massive changesets that previously caused failures in Cline, Roocode, Cursor, and other tools.

### Using with AI Assistant

Once configured, your AI assistant can analyze large commits, branches, or diffs using diffchunk.

Here are some example use cases:

**Branch comparisons:**

- _"Review all changes in develop not in the main branch for any bugs"_
- _"Tell me about all the changes I have yet to merge"_
- _"What new features were added to the staging branch?"_
- _"Summarize all changes to this repo in the last 2 weeks"_

**Code review:**

- _"Use diffchunk to check my feature branch for security vulnerabilities"_
- _"Use diffchunk to find any breaking changes before I merge to production"_
- _"Use diffchunk to review this large refactor for potential issues"_

**Change analysis:**

- _"Use diffchunk to show me all database migrations that need to be run"_
- _"Use diffchunk to find what API changes might affect our mobile app"_
- _"Use diffchunk to analyze all new dependencies added recently"_

**Direct file analysis:**

- _"Use diffchunk to analyze the diff at /tmp/changes.diff and find any bugs"_
- _"Create a diff of my uncommitted changes and review it"_
- _"Compare my local branch with origin and highlight conflicts"_

### Tip: AI Assistant Rules

Add to your AI assistant's custom instructions for automatic usage:

```
When reviewing large changesets or git commits, use diffchunk to handle large diff files.
Create temporary diff files and tracking files as needed and clean up after analysis.
```

## How It Works

When you ask your AI assistant to analyze changes, it uses diffchunk's tools strategically:

1. **Creates the diff file** (e.g., `git diff main..develop > /tmp/changes.diff`) based on your question
2. **Uses `list_chunks`** to get an overview of the diff structure and total scope
3. **Uses `find_chunks_for_files`** to locate relevant sections when you ask about specific file types
4. **Uses `get_chunk`** to examine specific sections without loading the entire diff into context
5. **Tracks progress systematically** through large changesets, analyzing chunk by chunk
6. **Cleans up temporary files** after completing the analysis

This lets your AI assistant handle massive diffs that would normally crash other tools, while providing thorough analysis without losing context.

### Tool Usage Patterns

**Overview first:**

```python
list_chunks("/tmp/changes.diff")
# → 5 chunks across 12 files, 3,847 total lines
```

**Target specific files:**

```python
find_chunks_for_files("/tmp/changes.diff", "*.py")
# → [1, 3, 5] - Python file chunks

get_chunk("/tmp/changes.diff", 1)
# → Content of first Python chunk
```

**Systematic analysis:**

```python
# Process each chunk in sequence
get_chunk("/tmp/changes.diff", 1)
get_chunk("/tmp/changes.diff", 2)
# ... continue through all chunks
```

## Configuration

### Path Requirements

- **Absolute paths only**: `/home/user/project/changes.diff`
- **Cross-platform**: Windows (`C:\path`) and Unix (`/path`)
- **Home expansion**: `~/project/changes.diff`

### Auto-Loading Defaults

Tools auto-load with optimized settings:

- `max_chunk_lines`: 1000
- `skip_trivial`: true (whitespace-only)
- `skip_generated`: true (lock files, build artifacts)

### Custom Settings

Use `load_diff` for non-default behavior:

```python
load_diff(
    "/tmp/large.diff",
    max_chunk_lines=2000,
    include_patterns="*.py,*.js",
    exclude_patterns="*test*"
)
```

## Supported Formats

- Git diff output (`git diff`, `git show`)
- Unified diff format (`diff -u`)
- Multiple files in single diff
- Binary file change indicators

## Performance

- Efficiently handles 100k+ line diffs
- Memory efficient streaming
- Auto-reload on file changes

## Documentation

- [Design](docs/design.md) - Architecture and implementation details
- [Contributing](docs/CONTRIBUTING.md) - Development setup and workflows

## License

[MIT](./LICENSE)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "diffchunk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Peter Etelej <peter@etelej.com>",
    "keywords": "ai-tools, diff, llm, mcp, parsing",
    "author": null,
    "author_email": "Peter Etelej <peter@etelej.com>",
    "download_url": "https://files.pythonhosted.org/packages/6c/7f/5dc86e82306f2259b963c1166a5684da5d68c1790b39e94d594050d04ad7/diffchunk-0.1.7.tar.gz",
    "platform": null,
    "description": "# diffchunk\n\n[![CI](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml/badge.svg)](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/peteretelej/diffchunk/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/diffchunk)\n[![PyPI version](https://img.shields.io/pypi/v/diffchunk.svg)](https://pypi.org/project/diffchunk/)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n\nMCP server that enables LLMs to navigate large diff files efficiently. Instead of reading entire diffs sequentially, LLMs can jump directly to relevant changes using pattern-based navigation.\n\n## Problem\n\nLarge diffs exceed LLM context limits and waste tokens on irrelevant changes. A 50k+ line diff can't be processed directly and manual splitting loses file relationships.\n\n## Solution\n\nMCP server with 4 navigation tools:\n\n- `load_diff` - Parse diff file with custom settings (optional)\n- `list_chunks` - Show chunk overview with file mappings (auto-loads)\n- `get_chunk` - Retrieve specific chunk content (auto-loads)\n- `find_chunks_for_files` - Locate chunks by file patterns (auto-loads)\n\n## Setup\n\n**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.\n\nAdd to your MCP client configuration:\n\n```json\n{\n  \"mcpServers\": {\n    \"diffchunk\": {\n      \"command\": \"uvx\",\n      \"args\": [\"--from\", \"diffchunk\", \"diffchunk-mcp\"]\n    }\n  }\n}\n```\n\n## Usage\n\nYour AI assistant can now handle massive changesets that previously caused failures in Cline, Roocode, Cursor, and other tools.\n\n### Using with AI Assistant\n\nOnce configured, your AI assistant can analyze large commits, branches, or diffs using diffchunk.\n\nHere are some example use cases:\n\n**Branch comparisons:**\n\n- _\"Review all changes in develop not in the main branch for any bugs\"_\n- _\"Tell me about all the changes I have yet to merge\"_\n- _\"What new features were added to the staging branch?\"_\n- _\"Summarize all changes to this repo in the last 2 weeks\"_\n\n**Code review:**\n\n- _\"Use diffchunk to check my feature branch for security vulnerabilities\"_\n- _\"Use diffchunk to find any breaking changes before I merge to production\"_\n- _\"Use diffchunk to review this large refactor for potential issues\"_\n\n**Change analysis:**\n\n- _\"Use diffchunk to show me all database migrations that need to be run\"_\n- _\"Use diffchunk to find what API changes might affect our mobile app\"_\n- _\"Use diffchunk to analyze all new dependencies added recently\"_\n\n**Direct file analysis:**\n\n- _\"Use diffchunk to analyze the diff at /tmp/changes.diff and find any bugs\"_\n- _\"Create a diff of my uncommitted changes and review it\"_\n- _\"Compare my local branch with origin and highlight conflicts\"_\n\n### Tip: AI Assistant Rules\n\nAdd to your AI assistant's custom instructions for automatic usage:\n\n```\nWhen reviewing large changesets or git commits, use diffchunk to handle large diff files.\nCreate temporary diff files and tracking files as needed and clean up after analysis.\n```\n\n## How It Works\n\nWhen you ask your AI assistant to analyze changes, it uses diffchunk's tools strategically:\n\n1. **Creates the diff file** (e.g., `git diff main..develop > /tmp/changes.diff`) based on your question\n2. **Uses `list_chunks`** to get an overview of the diff structure and total scope\n3. **Uses `find_chunks_for_files`** to locate relevant sections when you ask about specific file types\n4. **Uses `get_chunk`** to examine specific sections without loading the entire diff into context\n5. **Tracks progress systematically** through large changesets, analyzing chunk by chunk\n6. **Cleans up temporary files** after completing the analysis\n\nThis lets your AI assistant handle massive diffs that would normally crash other tools, while providing thorough analysis without losing context.\n\n### Tool Usage Patterns\n\n**Overview first:**\n\n```python\nlist_chunks(\"/tmp/changes.diff\")\n# \u2192 5 chunks across 12 files, 3,847 total lines\n```\n\n**Target specific files:**\n\n```python\nfind_chunks_for_files(\"/tmp/changes.diff\", \"*.py\")\n# \u2192 [1, 3, 5] - Python file chunks\n\nget_chunk(\"/tmp/changes.diff\", 1)\n# \u2192 Content of first Python chunk\n```\n\n**Systematic analysis:**\n\n```python\n# Process each chunk in sequence\nget_chunk(\"/tmp/changes.diff\", 1)\nget_chunk(\"/tmp/changes.diff\", 2)\n# ... continue through all chunks\n```\n\n## Configuration\n\n### Path Requirements\n\n- **Absolute paths only**: `/home/user/project/changes.diff`\n- **Cross-platform**: Windows (`C:\\path`) and Unix (`/path`)\n- **Home expansion**: `~/project/changes.diff`\n\n### Auto-Loading Defaults\n\nTools auto-load with optimized settings:\n\n- `max_chunk_lines`: 1000\n- `skip_trivial`: true (whitespace-only)\n- `skip_generated`: true (lock files, build artifacts)\n\n### Custom Settings\n\nUse `load_diff` for non-default behavior:\n\n```python\nload_diff(\n    \"/tmp/large.diff\",\n    max_chunk_lines=2000,\n    include_patterns=\"*.py,*.js\",\n    exclude_patterns=\"*test*\"\n)\n```\n\n## Supported Formats\n\n- Git diff output (`git diff`, `git show`)\n- Unified diff format (`diff -u`)\n- Multiple files in single diff\n- Binary file change indicators\n\n## Performance\n\n- Efficiently handles 100k+ line diffs\n- Memory efficient streaming\n- Auto-reload on file changes\n\n## Documentation\n\n- [Design](docs/design.md) - Architecture and implementation details\n- [Contributing](docs/CONTRIBUTING.md) - Development setup and workflows\n\n## License\n\n[MIT](./LICENSE)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MCP server for navigating large diff files with intelligent chunking",
    "version": "0.1.7",
    "project_urls": {
        "Documentation": "https://github.com/peteretelej/diffchunk#readme",
        "Homepage": "https://github.com/peteretelej/diffchunk",
        "Issues": "https://github.com/peteretelej/diffchunk/issues",
        "Repository": "https://github.com/peteretelej/diffchunk.git"
    },
    "split_keywords": [
        "ai-tools",
        " diff",
        " llm",
        " mcp",
        " parsing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fa8fbca3d525a7b44d1350627ab0c2edcd993e96cf5207bd2cfd596b01642809",
                "md5": "72dcf53fa4cceacb9b06b74bc49d6791",
                "sha256": "f2f53f53f0fc302233a5d3bc4841d9eba4b4c1946b6871ed19bbcd388143d63a"
            },
            "downloads": -1,
            "filename": "diffchunk-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "72dcf53fa4cceacb9b06b74bc49d6791",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 15607,
            "upload_time": "2025-07-19T13:02:35",
            "upload_time_iso_8601": "2025-07-19T13:02:35.010685Z",
            "url": "https://files.pythonhosted.org/packages/fa/8f/bca3d525a7b44d1350627ab0c2edcd993e96cf5207bd2cfd596b01642809/diffchunk-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6c7f5dc86e82306f2259b963c1166a5684da5d68c1790b39e94d594050d04ad7",
                "md5": "baecc41d4da9b054e151fb8a293542ee",
                "sha256": "c8705e6ba31c5c97465092d7e27b8bec321594e7246448e2ca254afb6c3f3728"
            },
            "downloads": -1,
            "filename": "diffchunk-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "baecc41d4da9b054e151fb8a293542ee",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 5705568,
            "upload_time": "2025-07-19T13:02:36",
            "upload_time_iso_8601": "2025-07-19T13:02:36.003100Z",
            "url": "https://files.pythonhosted.org/packages/6c/7f/5dc86e82306f2259b963c1166a5684da5d68c1790b39e94d594050d04ad7/diffchunk-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-19 13:02:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "peteretelej",
    "github_project": "diffchunk#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "diffchunk"
}

None