largefile

Name	largefile JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	MCP server that helps AI assistants work with large files that exceed context limits
upload_time	2025-07-20 21:02:03
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	ai-tools large-files llm mcp text-editing
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Largefile MCP Server

MCP server that helps your AI assistant work with large files that exceed context limits.

[![CI](https://img.shields.io/github/actions/workflow/status/peteretelej/largefile/ci.yml?branch=main&logo=github)](https://github.com/peteretelej/largefile/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/peteretelej/largefile/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/largefile) [![PyPI version](https://img.shields.io/pypi/v/largefile.svg)](https://pypi.org/project/largefile/) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)

This local MCP server enables AI assistants to navigate, search, and edit files of any size without loading entire content into memory. It provides targeted access to specific lines, patterns, and sections while maintaining file integrity using research-backed search/replace editing instead of error-prone line-based operations. Perfect for working with large codebases, generated files, logs, and datasets that would otherwise be inaccessible due to context window limitations.

## MCP Tools

The server provides 4 core tools that work together for progressive file exploration:

- **`get_overview`** - Get file structure with Tree-sitter semantic analysis, line counts, and suggested search patterns
- **`search_content`** - Find patterns with fuzzy matching, context lines, and semantic information
- **`read_content`** - Read targeted content by line number or pattern with semantic chunking (complete functions/classes)
- **`edit_content`** - Primary editing via search/replace blocks with automatic backups and preview mode

## Quick Start

**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.

Add to your MCP configuration:

```json
{
  "mcpServers": {
    "largefile": {
      "command": "uvx",
      "args": ["--from", "largefile", "largefile-mcp"]
    }
  }
}
```

## Usage

Your AI Assistant / LLM can now work with very large files that exceed its context limits. Here are some common workflows:

### Analyzing Large Code Files

**AI Question:** _"Can you analyze this large Django models file and tell me about the class structure and any potential issues? It's a large file so use largefile."_

**AI Assistant workflow:**

1. Gets file overview to understand structure
2. Searches for classes and their methods
3. Looks for code issues like TODOs or long functions

```python
# AI gets file structure
overview = get_overview("/path/to/django-models.py")
# Returns: 2,847 lines, 15 classes, semantic outline with Tree-sitter

# AI searches for all class definitions
classes = search_content("/path/to/django-models.py", "class ", max_results=20)
# Returns: Model classes with line numbers and context

# AI examines specific class implementation
model_code = read_content("/path/to/django-models.py", "class User", mode="semantic")
# Returns: Complete class definition with all methods
```

### Working with Documentation

**AI Question:** _"Find all the installation methods mentioned in this README file and update the pip install to use uv instead."_

**AI Assistant workflow:**

1. Search for installation patterns
2. Read the installation section
3. Replace pip commands with uv equivalents

```python
# AI finds installation instructions
install_sections = search_content("/path/to/readme.md", "install", fuzzy=True, context_lines=3)

# AI reads the installation section
install_content = read_content("/path/to/readme.md", "## Installation", mode="semantic")

# AI replaces pip with uv
edit_result = edit_content(
    "/path/to/readme.md",
    search_text="pip install anthropic",
    replace_text="uv add anthropic",
    preview=True
)
```

### Debugging Large Log Files

**AI Question:** _"Check this production log file for any critical errors in the last few thousand lines and show me the context around them. Use largefile mcp."_

**AI Assistant workflow:**

1. Get log file overview
2. Search for error patterns
3. Read context around critical issues

```python
# AI gets log file overview
overview = get_overview("/path/to/production.log")
# Returns: 150,000 lines, 2.1GB file size

# AI searches for critical errors
errors = search_content("/path/to/production.log", "CRITICAL|ERROR", fuzzy=True, max_results=10)

# AI examines context around each error
for error in errors:
    context = read_content("/path/to/production.log", error.line_number, mode="lines")
    # Shows surrounding log entries for debugging
```

### Refactoring Code

**AI Question:** _"I need to rename the function `process_data` to `transform_data` throughout this large codebase file. Can you help me do this safely?"_

**AI Assistant workflow:**

1. Find all occurrences of the function
2. Preview changes to ensure accuracy
3. Apply changes with automatic backup

```python
# AI finds all usages
usages = search_content("/path/to/codebase.py", "process_data", fuzzy=False, max_results=50)

# AI previews the changes
preview = edit_content(
    "/path/to/codebase.py",
    search_text="process_data",
    replace_text="transform_data",
    preview=True
)

# AI applies changes after confirmation
result = edit_content(
    "/path/to/codebase.py",
    search_text="process_data",
    replace_text="transform_data",
    preview=False
)
# Creates automatic backup before changes
```

### Exploring API Documentation

**AI Question:** _"What are all the available methods in this large API documentation file and can you show me examples of authentication?"_

**AI Assistant workflow:**

1. Get document structure overview
2. Search for method definitions and auth patterns
3. Extract relevant code examples

```python
# AI analyzes document structure
overview = get_overview("/path/to/api-docs.md")
# Returns: Section outline, headings, suggested search patterns

# AI finds API methods
methods = search_content("/path/to/api-docs.md", "###", max_results=30)
# Returns: All method headings with context

# AI searches for authentication examples
auth_examples = search_content("/path/to/api-docs.md", "auth", fuzzy=True, context_lines=5)

# AI reads complete authentication section
auth_section = read_content("/path/to/api-docs.md", "## Authentication", mode="semantic")
```

## File Size Handling

- **Small files (<50MB)**: Memory loading with Tree-sitter AST caching
- **Medium files (50-500MB)**: Memory-mapped access
- **Large files (>500MB)**: Streaming processing
- **Long lines (>1000 chars)**: Automatic truncation for display

## Supported Languages

Tree-sitter semantic analysis for:

- Python (.py)
- JavaScript/JSX (.js, .jsx)
- TypeScript/TSX (.ts, .tsx)
- Rust (.rs)
- Go (.go)

Files without Tree-sitter support use text-based analysis with graceful degradation.

## Configuration

Configure via environment variables:

```bash
# File processing thresholds
LARGEFILE_MEMORY_THRESHOLD_MB=50        # Memory loading limit
LARGEFILE_MMAP_THRESHOLD_MB=500         # Memory mapping limit

# Search settings
LARGEFILE_FUZZY_THRESHOLD=0.8           # Fuzzy match sensitivity
LARGEFILE_MAX_SEARCH_RESULTS=20         # Result limit
LARGEFILE_CONTEXT_LINES=2               # Context window

# Performance
LARGEFILE_ENABLE_TREE_SITTER=true       # Semantic features
LARGEFILE_BACKUP_DIR=".largefile_backups" # Backup location
```

## Key Features

- **Search/replace primary**: Eliminates LLM line number errors
- **Fuzzy matching**: Handles whitespace and formatting variations
- **Atomic operations**: File integrity with automatic backups
- **Semantic awareness**: Tree-sitter integration for code structure
- **Memory efficient**: Handles files of any size without context limits
- **Error recovery**: Graceful degradation with clear error messages

## Documentation

- [API Reference](docs/API.md) - Detailed tool documentation
- [Configuration Guide](docs/configuration.md) - Environment variables and tuning
- [Examples](docs/examples.md) - Real-world usage examples and workflows
- [Design Document](docs/design.md) - Architecture and implementation details
- [Contributing](docs/CONTRIBUTING.md) - Development setup and guidelines

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "largefile",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Peter Etelej <peter@etelej.com>",
    "keywords": "ai-tools, large-files, llm, mcp, text-editing",
    "author": null,
    "author_email": "Peter Etelej <peter@etelej.com>",
    "download_url": "https://files.pythonhosted.org/packages/ee/e5/6a5d8ccdbc7e8562db350676ea040d937e6f8a181f19cdc5917f0d1abda8/largefile-0.1.1.tar.gz",
    "platform": null,
    "description": "# Largefile MCP Server\n\nMCP server that helps your AI assistant work with large files that exceed context limits.\n\n[![CI](https://img.shields.io/github/actions/workflow/status/peteretelej/largefile/ci.yml?branch=main&logo=github)](https://github.com/peteretelej/largefile/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/peteretelej/largefile/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/largefile) [![PyPI version](https://img.shields.io/pypi/v/largefile.svg)](https://pypi.org/project/largefile/) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n\nThis local MCP server enables AI assistants to navigate, search, and edit files of any size without loading entire content into memory. It provides targeted access to specific lines, patterns, and sections while maintaining file integrity using research-backed search/replace editing instead of error-prone line-based operations. Perfect for working with large codebases, generated files, logs, and datasets that would otherwise be inaccessible due to context window limitations.\n\n## MCP Tools\n\nThe server provides 4 core tools that work together for progressive file exploration:\n\n- **`get_overview`** - Get file structure with Tree-sitter semantic analysis, line counts, and suggested search patterns\n- **`search_content`** - Find patterns with fuzzy matching, context lines, and semantic information\n- **`read_content`** - Read targeted content by line number or pattern with semantic chunking (complete functions/classes)\n- **`edit_content`** - Primary editing via search/replace blocks with automatic backups and preview mode\n\n## Quick Start\n\n**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.\n\nAdd to your MCP configuration:\n\n```json\n{\n  \"mcpServers\": {\n    \"largefile\": {\n      \"command\": \"uvx\",\n      \"args\": [\"--from\", \"largefile\", \"largefile-mcp\"]\n    }\n  }\n}\n```\n\n## Usage\n\nYour AI Assistant / LLM can now work with very large files that exceed its context limits. Here are some common workflows:\n\n### Analyzing Large Code Files\n\n**AI Question:** _\"Can you analyze this large Django models file and tell me about the class structure and any potential issues? It's a large file so use largefile.\"_\n\n**AI Assistant workflow:**\n\n1. Gets file overview to understand structure\n2. Searches for classes and their methods\n3. Looks for code issues like TODOs or long functions\n\n```python\n# AI gets file structure\noverview = get_overview(\"/path/to/django-models.py\")\n# Returns: 2,847 lines, 15 classes, semantic outline with Tree-sitter\n\n# AI searches for all class definitions\nclasses = search_content(\"/path/to/django-models.py\", \"class \", max_results=20)\n# Returns: Model classes with line numbers and context\n\n# AI examines specific class implementation\nmodel_code = read_content(\"/path/to/django-models.py\", \"class User\", mode=\"semantic\")\n# Returns: Complete class definition with all methods\n```\n\n### Working with Documentation\n\n**AI Question:** _\"Find all the installation methods mentioned in this README file and update the pip install to use uv instead.\"_\n\n**AI Assistant workflow:**\n\n1. Search for installation patterns\n2. Read the installation section\n3. Replace pip commands with uv equivalents\n\n```python\n# AI finds installation instructions\ninstall_sections = search_content(\"/path/to/readme.md\", \"install\", fuzzy=True, context_lines=3)\n\n# AI reads the installation section\ninstall_content = read_content(\"/path/to/readme.md\", \"## Installation\", mode=\"semantic\")\n\n# AI replaces pip with uv\nedit_result = edit_content(\n    \"/path/to/readme.md\",\n    search_text=\"pip install anthropic\",\n    replace_text=\"uv add anthropic\",\n    preview=True\n)\n```\n\n### Debugging Large Log Files\n\n**AI Question:** _\"Check this production log file for any critical errors in the last few thousand lines and show me the context around them. Use largefile mcp.\"_\n\n**AI Assistant workflow:**\n\n1. Get log file overview\n2. Search for error patterns\n3. Read context around critical issues\n\n```python\n# AI gets log file overview\noverview = get_overview(\"/path/to/production.log\")\n# Returns: 150,000 lines, 2.1GB file size\n\n# AI searches for critical errors\nerrors = search_content(\"/path/to/production.log\", \"CRITICAL|ERROR\", fuzzy=True, max_results=10)\n\n# AI examines context around each error\nfor error in errors:\n    context = read_content(\"/path/to/production.log\", error.line_number, mode=\"lines\")\n    # Shows surrounding log entries for debugging\n```\n\n### Refactoring Code\n\n**AI Question:** _\"I need to rename the function `process_data` to `transform_data` throughout this large codebase file. Can you help me do this safely?\"_\n\n**AI Assistant workflow:**\n\n1. Find all occurrences of the function\n2. Preview changes to ensure accuracy\n3. Apply changes with automatic backup\n\n```python\n# AI finds all usages\nusages = search_content(\"/path/to/codebase.py\", \"process_data\", fuzzy=False, max_results=50)\n\n# AI previews the changes\npreview = edit_content(\n    \"/path/to/codebase.py\",\n    search_text=\"process_data\",\n    replace_text=\"transform_data\",\n    preview=True\n)\n\n# AI applies changes after confirmation\nresult = edit_content(\n    \"/path/to/codebase.py\",\n    search_text=\"process_data\",\n    replace_text=\"transform_data\",\n    preview=False\n)\n# Creates automatic backup before changes\n```\n\n### Exploring API Documentation\n\n**AI Question:** _\"What are all the available methods in this large API documentation file and can you show me examples of authentication?\"_\n\n**AI Assistant workflow:**\n\n1. Get document structure overview\n2. Search for method definitions and auth patterns\n3. Extract relevant code examples\n\n```python\n# AI analyzes document structure\noverview = get_overview(\"/path/to/api-docs.md\")\n# Returns: Section outline, headings, suggested search patterns\n\n# AI finds API methods\nmethods = search_content(\"/path/to/api-docs.md\", \"###\", max_results=30)\n# Returns: All method headings with context\n\n# AI searches for authentication examples\nauth_examples = search_content(\"/path/to/api-docs.md\", \"auth\", fuzzy=True, context_lines=5)\n\n# AI reads complete authentication section\nauth_section = read_content(\"/path/to/api-docs.md\", \"## Authentication\", mode=\"semantic\")\n```\n\n## File Size Handling\n\n- **Small files (<50MB)**: Memory loading with Tree-sitter AST caching\n- **Medium files (50-500MB)**: Memory-mapped access\n- **Large files (>500MB)**: Streaming processing\n- **Long lines (>1000 chars)**: Automatic truncation for display\n\n## Supported Languages\n\nTree-sitter semantic analysis for:\n\n- Python (.py)\n- JavaScript/JSX (.js, .jsx)\n- TypeScript/TSX (.ts, .tsx)\n- Rust (.rs)\n- Go (.go)\n\nFiles without Tree-sitter support use text-based analysis with graceful degradation.\n\n## Configuration\n\nConfigure via environment variables:\n\n```bash\n# File processing thresholds\nLARGEFILE_MEMORY_THRESHOLD_MB=50        # Memory loading limit\nLARGEFILE_MMAP_THRESHOLD_MB=500         # Memory mapping limit\n\n# Search settings\nLARGEFILE_FUZZY_THRESHOLD=0.8           # Fuzzy match sensitivity\nLARGEFILE_MAX_SEARCH_RESULTS=20         # Result limit\nLARGEFILE_CONTEXT_LINES=2               # Context window\n\n# Performance\nLARGEFILE_ENABLE_TREE_SITTER=true       # Semantic features\nLARGEFILE_BACKUP_DIR=\".largefile_backups\" # Backup location\n```\n\n## Key Features\n\n- **Search/replace primary**: Eliminates LLM line number errors\n- **Fuzzy matching**: Handles whitespace and formatting variations\n- **Atomic operations**: File integrity with automatic backups\n- **Semantic awareness**: Tree-sitter integration for code structure\n- **Memory efficient**: Handles files of any size without context limits\n- **Error recovery**: Graceful degradation with clear error messages\n\n## Documentation\n\n- [API Reference](docs/API.md) - Detailed tool documentation\n- [Configuration Guide](docs/configuration.md) - Environment variables and tuning\n- [Examples](docs/examples.md) - Real-world usage examples and workflows\n- [Design Document](docs/design.md) - Architecture and implementation details\n- [Contributing](docs/CONTRIBUTING.md) - Development setup and guidelines\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MCP server that helps AI assistants work with large files that exceed context limits",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/peteretelej/largefile#readme",
        "Homepage": "https://github.com/peteretelej/largefile",
        "Issues": "https://github.com/peteretelej/largefile/issues",
        "Repository": "https://github.com/peteretelej/largefile.git"
    },
    "split_keywords": [
        "ai-tools",
        " large-files",
        " llm",
        " mcp",
        " text-editing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d67666863e2577035a9102e2d3b8e18470f35bd3de69915576c6bb16ca0ca477",
                "md5": "b44e50efe290eb29d0230ea7cdabb8c5",
                "sha256": "eef9cddb9947d641301a567db2344a4f9308c11e9c984245406158746f8af3bf"
            },
            "downloads": -1,
            "filename": "largefile-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b44e50efe290eb29d0230ea7cdabb8c5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 21418,
            "upload_time": "2025-07-20T21:02:01",
            "upload_time_iso_8601": "2025-07-20T21:02:01.739071Z",
            "url": "https://files.pythonhosted.org/packages/d6/76/66863e2577035a9102e2d3b8e18470f35bd3de69915576c6bb16ca0ca477/largefile-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eee56a5d8ccdbc7e8562db350676ea040d937e6f8a181f19cdc5917f0d1abda8",
                "md5": "1e770ddcb80e88b98e181ab354692440",
                "sha256": "88a4de41bab08e5f0030c83e746d66e48e0d2708ac07e481a943c45ecb5e4e27"
            },
            "downloads": -1,
            "filename": "largefile-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1e770ddcb80e88b98e181ab354692440",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 2580599,
            "upload_time": "2025-07-20T21:02:03",
            "upload_time_iso_8601": "2025-07-20T21:02:03.366567Z",
            "url": "https://files.pythonhosted.org/packages/ee/e5/6a5d8ccdbc7e8562db350676ea040d937e6f8a181f19cdc5917f0d1abda8/largefile-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-20 21:02:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "peteretelej",
    "github_project": "largefile#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "largefile"
}

None