chunkhound

Name	chunkhound JSON
Version	2.8.1 JSON
	download
home_page	None
Summary	Local-first semantic code search with vector and regex capabilities for AI assistants via MCP
upload_time	2025-07-20 14:47:22
maintainer	None
docs_url	None
author	Ofri Wolfus
requires_python	>=3.10
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ChunkHound

**Modern RAG for your codebase - semantic and regex search via MCP.**

Transform your codebase into a searchable knowledge base. ChunkHound provides AI assistants with intelligent code search capabilities using natural language and regex patterns.

## Installation

### 🤖 AI Agent Installation (For Humans - Recommended)

The easiest way to install and configure ChunkHound is to use your AI assistant. Simply copy and paste this prompt:

```
- Check the current OS type and version.
- Detect what IDE and agent versions are you running in.
- Read https://raw.githubusercontent.com/ofriw/chunkhound/refs/heads/main/README.md
- Install ChunkHound using uv and configure it for the current IDE and agent
  - Upgrade chunkhound if already installed
- Edit the current IDE / agent's config files to configure chunkhound as an MCP server. See instructions in the README

Start by asking the user:
1. What embeddings provider would they like to use? Suggest from the list of supported providers. Clearly articulate defaults.
2. Based on the user's response, ask for the relevant configuration parameters. Use defaults if user hasn't decided.
3. Ask clarifying questions if unsure.

Proceed with the configuration based on the user's input.
```

Your AI assistant will automatically detect your environment, install ChunkHound, and configure it for your specific IDE and embedding provider preferences.

**Install ChunkHound**:
```bash
uv tool install chunkhound
```

> **Need uv?** ChunkHound requires [uv](https://github.com/astral-sh/uv): `curl -LsSf https://astral.sh/uv/install.sh | sh`

## Quick Start

```bash
# 1. Index your codebase (creates .chunkhound.db)
uv run chunkhound index
# ChunkHound automatically detects and uses .chunkhound.json in the current directory

# 2. Start MCP server for AI assistants (auto-watches for changes)
uv run chunkhound mcp

# Work with any project directory - complete project scope control
uv run chunkhound mcp /path/to/your/project
# This sets database to /path/to/your/project/.chunkhound/db
# Searches for config at /path/to/your/project/.chunkhound.json
# Watches /path/to/your/project for changes

# 3. Optional: Use HTTP transport for enhanced VS Code compatibility
uv run chunkhound mcp --http              # HTTP on 127.0.0.1:8000
uv run chunkhound mcp --http --port 8080  # Custom port

# Optional: Set OpenAI API key for semantic search
export CHUNKHOUND_EMBEDDING__API_KEY="sk-your-key-here"

# Optional: Override with a different config file
uv run chunkhound index --config /path/to/different-config.json
```

### Automatic Configuration Detection

ChunkHound automatically detects and uses project configurations:

**For `index` command:**
```bash
cd /my/project              # Has .chunkhound.json
uv run chunkhound index     # Automatically uses /my/project/.chunkhound.json
```

**For `mcp` command with positional path:**
```bash
uv run chunkhound mcp /my/project
# Automatically uses /my/project/.chunkhound.json
# Sets database to /my/project/.chunkhound/db
# Watches /my/project for changes
```

**For `mcp` command without path:**
```bash
cd /my/project              # Has .chunkhound.json
uv run chunkhound mcp       # Automatically uses /my/project/.chunkhound.json
```

## Features

**Always Available:**
- **Regex search** - Find exact patterns like `class.*Error` (no API key needed)
- **Code context** - AI assistants understand your codebase structure
- **Multi-language** - 20+ languages supported
- **Real-time updates** - Automatically watches for file changes
- **Dual transport** - Both stdio and HTTP transport support

**With API Key:**
- **Semantic search** - Natural language queries like "find database connection code"

## Transport Options

ChunkHound supports two transport methods for MCP communication:

### Stdio Transport (Default)
- **Best for**: Most AI assistants and development tools
- **Pros**: Simple setup, automatic process management
- **Usage**: `uv run chunkhound mcp`

### HTTP Transport  
- **Best for**: VS Code, large databases, standalone server deployment
- **Pros**: Better compatibility with VS Code, supports large responses, separate process isolation
- **Usage**: `uv run chunkhound mcp --http --port 8000`

**When to use HTTP transport:**
- VS Code MCP extension (recommended)
- Large codebases that may hit stdio buffer limits
- When you need the server to run independently of the IDE
- Multiple concurrent AI assistant connections

**HTTP transport options:**
```bash
uv run chunkhound mcp --http                    # Default: 127.0.0.1:8000
uv run chunkhound mcp --http --port 8080        # Custom port
uv run chunkhound mcp --http --host 127.0.0.1   # Custom host (security: localhost only)
```

## AI Assistant Setup

ChunkHound integrates with all major AI development tools. Add these minimal configurations:

<details>
<summary><strong>Configuration for Each IDE/Tool</strong></summary>

<details>
<summary><strong>Claude Desktop</strong></summary>

Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp"]
    }
  }
}
```

**For specific project directories:**
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp", "/path/to/your/project"]
    }
  }
}
```
</details>

<details>
<summary><strong>Claude Code</strong></summary>

Add to `~/.claude.json`:
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp"]
    }
  }
}
```

**For specific project directories:**
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp", "/path/to/your/project"]
    }
  }
}
```
</details>

<details>
<summary><strong>VS Code</strong></summary>

**Standard (stdio transport):**
Add to `.vscode/mcp.json` in your project:
```json
{
  "servers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp"]
    }
  }
}
```

**HTTP transport (recommended for VS Code):**
```json
{
  "servers": {
    "chunkhound": {
      "url": "http://127.0.0.1:8000/mcp/",
      "transport": "http"
    }
  }
}
```

Then start the HTTP server separately:
```bash
uv run chunkhound mcp --http --port 8000
```

**For specific project directories:**
```json
{
  "servers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp", "/path/to/your/project"]
    }
  }
}
```
</details>

<details>
<summary><strong>Cursor</strong></summary>

Add to `.cursor/mcp.json` in your project:
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp"]
    }
  }
}
```

**For specific project directories:**
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp", "/path/to/your/project"]
    }
  }
}
```
</details>

<details>
<summary><strong>Windsurf</strong></summary>

Add to `~/.codeium/windsurf/mcp_config.json`:
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp"]
    }
  }
}
```

**For specific project directories:**
```json
{
  "mcpServers": {
    "chunkhound": {
      "command": "uv",
      "args": ["run", "chunkhound", "mcp", "/path/to/your/project"]
    }
  }
}
```
</details>

<details>
<summary><strong>Zed</strong></summary>

Add to settings.json (Preferences > Open Settings):
```json
{
  "context_servers": {
    "chunkhound": {
      "source": "custom",
      "command": {
        "path": "uv",
        "args": ["run", "chunkhound", "mcp"]
      }
    }
  }
}
```

**For specific project directories:**
```json
{
  "context_servers": {
    "chunkhound": {
      "source": "custom",
      "command": {
        "path": "uv",
        "args": ["run", "chunkhound", "mcp", "/path/to/your/project"]
      }
    }
  }
}
```
</details>

<details>
<summary><strong>IntelliJ IDEA / PyCharm / WebStorm</strong> (2025.1+)</summary>

Go to Settings > Tools > AI Assistant > Model Context Protocol (MCP) and add:
- **Name**: chunkhound
- **Command**: uv
- **Arguments**: run chunkhound mcp
- **Working Directory**: (leave empty or set to project root)

**For specific project directories:**
- **Name**: chunkhound
- **Command**: uv
- **Arguments**: run chunkhound mcp /path/to/your/project
- **Working Directory**: (leave empty)
</details>

</details>




## Supported Languages

Python, Java, C#, TypeScript, JavaScript, Groovy, Kotlin, Go, Rust, C, C++, Matlab, Bash, Makefile, Markdown, JSON, YAML, TOML, and more.


## Configuration

### Automatic Configuration Detection

**ChunkHound automatically looks for `.chunkhound.json` in the directory being indexed - no flags needed!**

```bash
# If /my/project/.chunkhound.json exists:
cd /my/project
uv run chunkhound index    # Automatically uses .chunkhound.json
```

### Configuration Priority

ChunkHound loads configuration in this order (highest priority first):
1. **Command-line arguments** - Override everything
2. **`.chunkhound.json` in indexed directory** - Automatically detected
3. **`--config` file** - When explicitly specified
4. **Environment variables** - System-wide defaults
5. **Built-in defaults** - Fallback values

### Database Location

ChunkHound automatically determines the database location based on project scope:

**With positional path argument:**
```bash
uv run chunkhound mcp /my/project
# Database automatically created at: /my/project/.chunkhound/db
```

**Without positional path (current directory):**
```bash
cd /my/project
uv run chunkhound mcp
# Database created at: /my/project/.chunkhound/db
```

**Manual overrides** (in priority order):
- **Command line**: `--database-path /path/to/my-chunks`
- **Config file**: Add to `.chunkhound.json`: `{"database": {"path": "/path/to/.chunkhound.db"}}`
- **Environment variable**: `CHUNKHOUND_DATABASE__PATH="/path/to/.chunkhound.db"`

### Configuration File Format

Create `.chunkhound.json` in your project root for automatic loading:
```json
{
  "embedding": {
    "provider": "openai",
    "api_key": "sk-your-openai-key-here",
    "model": "text-embedding-3-small",
    "batch_size": 50,
    "timeout": 30,
    "max_retries": 3,
    "max_concurrent_batches": 3
  },
  "database": {
    "path": ".chunkhound.db",
    "provider": "duckdb"
  },
  "indexing": {
    "watch": true,
    "debounce_ms": 500,
    "batch_size": 100,
    "db_batch_size": 500,
    "max_concurrent": 4,
    "include": [
      "**/*.py",
      "**/*.ts",
      "**/*.jsx"
    ],
    "exclude": [
      "**/node_modules/**",
      "**/__pycache__/**",
      "**/dist/**"
    ]
  },
  "mcp": {
    "transport": "stdio",
    "host": "127.0.0.1",
    "port": 8000
  },
  "debug": false
}
```

### Working with `.chunkhound.json`

**Example 1: Project with OpenAI embeddings**
```bash
# Create .chunkhound.json in your project
echo '{
  "embedding": {
    "provider": "openai",
    "api_key": "sk-your-key-here"
  }
}' > .chunkhound.json

# Index - automatically uses .chunkhound.json
uv run chunkhound index
```

**Example 2: Project with local Ollama**
```bash
# Create .chunkhound.json for Ollama
echo '{
  "embedding": {
    "provider": "openai-compatible",
    "base_url": "http://localhost:11434",
    "model": "nomic-embed-text"
  }
}' > .chunkhound.json

# Index - automatically uses .chunkhound.json
uv run chunkhound index
```

**Provider-Specific Examples**:

OpenAI-compatible (Ollama, LocalAI):
```json
{
  "embedding": {
    "provider": "openai-compatible",
    "base_url": "http://localhost:11434",
    "model": "nomic-embed-text",
    "api_key": "optional-api-key"
  }
}
```

Text Embeddings Inference (TEI):
```json
{
  "embedding": {
    "provider": "tei",
    "base_url": "http://localhost:8080"
  }
}
```

BGE-IN-ICL:
```json
{
  "embedding": {
    "provider": "bge-in-icl",
    "base_url": "http://localhost:8080",
    "language": "python",
    "enable_icl": true
  }
}
```

**Security Note**: 
- API keys in config files are convenient for local development
- Add `.chunkhound.json` to `.gitignore` to prevent committing API keys:

```gitignore
# ChunkHound config files
.chunkhound.json
*.chunkhound.json
chunkhound.json
```

**Configuration Options**:

- **`embedding`**: Embedding provider settings
  - `provider`: Choose from `openai`, `openai-compatible`, `tei`, `bge-in-icl`
  - `model`: Model name (uses provider default if not specified)
  - `api_key`: API key for authentication
  - `base_url`: Base URL for API (for local/custom providers)
  - `batch_size`: Number of texts to embed at once (1-1000)
  - `timeout`: Request timeout in seconds
  - `max_retries`: Retry attempts for failed requests
  - `max_concurrent_batches`: Concurrent embedding batches

- **`database`**: Database settings
  - `path`: Database file location (relative or absolute)
  - `provider`: Database type (`duckdb` or `lancedb`)

- **`indexing`**: File indexing behavior
  - `watch`: Enable file watching in standalone mode
  - `debounce_ms`: Delay before processing file changes
  - `batch_size`: Files to process per batch
  - `db_batch_size`: Database records per transaction
  - `max_concurrent`: Parallel file processing limit
  - `include`: Glob patterns for files to index
  - `exclude`: Glob patterns to ignore

- **`debug`**: Enable debug logging

**Security Note**: Never commit API keys to version control. Use environment variables or `.chunkhound.json` (added to .gitignore):
```bash
# Option 1: Environment variable
export CHUNKHOUND_EMBEDDING__API_KEY="sk-your-key-here"

# Option 2: .chunkhound.json (automatically detected)
echo '{"embedding": {"api_key": "sk-your-key-here"}}' > .chunkhound.json
echo ".chunkhound.json" >> .gitignore
```

### Embedding Providers

ChunkHound supports multiple embedding providers for semantic search:

**OpenAI (requires API key)**:
```bash
# Option 1: Use .chunkhound.json (automatically detected)
echo '{
  "embedding": {
    "provider": "openai",
    "api_key": "sk-your-key-here",
    "model": "text-embedding-3-small"
  }
}' > .chunkhound.json
uv run chunkhound index

# Option 2: Use environment variable
export CHUNKHOUND_EMBEDDING__API_KEY="sk-your-key-here"
uv run chunkhound index --provider openai --model text-embedding-3-small
```

**Local embedding servers (no API key required)**:

**Ollama**:
```bash
# First, start Ollama with an embedding model
ollama pull nomic-embed-text

# Option 1: Use .chunkhound.json (automatically detected)
echo '{
  "embedding": {
    "provider": "openai-compatible",
    "base_url": "http://localhost:11434",
    "model": "nomic-embed-text"
  }
}' > .chunkhound.json
uv run chunkhound index

# Option 2: Use command line
uv run chunkhound index --provider openai-compatible --base-url http://localhost:11434 --model nomic-embed-text
```

**LocalAI, LM Studio, or other OpenAI-compatible servers**:
```bash
# Create .chunkhound.json for automatic detection
echo '{
  "embedding": {
    "provider": "openai-compatible",
    "base_url": "http://localhost:1234",
    "model": "your-embedding-model"
  }
}' > .chunkhound.json
uv run chunkhound index

# Or use command line
uv run chunkhound index --provider openai-compatible --base-url http://localhost:1234 --model your-embedding-model
```

**Text Embeddings Inference (TEI)**:
```bash
# Create .chunkhound.json for automatic detection
echo '{
  "embedding": {
    "provider": "tei",
    "base_url": "http://localhost:8080"
  }
}' > .chunkhound.json
uv run chunkhound index

# Or use command line
uv run chunkhound index --provider tei --base-url http://localhost:8080
```

**Regex-only mode (no embeddings)**:
```bash
# Skip embedding setup entirely - only regex search will be available
uv run chunkhound index --no-embeddings
```

### Environment Variables

Environment variables are useful for system-wide defaults, but `.chunkhound.json` in your project directory will take precedence:

```bash
# For OpenAI semantic search only
export CHUNKHOUND_EMBEDDING__API_KEY="sk-your-key-here"

# For local embedding servers (Ollama, LocalAI, etc.)
export CHUNKHOUND_EMBEDDING__PROVIDER="openai-compatible"
export CHUNKHOUND_EMBEDDING__BASE_URL="http://localhost:11434"  # Ollama default
export CHUNKHOUND_EMBEDDING__MODEL="nomic-embed-text"

# Optional: Database location
export CHUNKHOUND_DATABASE__PATH="/path/to/.chunkhound.db"

# Note: .chunkhound.json in your project will override these settings
```

## Security

ChunkHound prioritizes data security through a local-first architecture:

- **Local database**: All code chunks stored in local DuckDB file - no data sent to external servers
- **Local embeddings**: Supports self-hosted embedding servers (Ollama, LocalAI, TEI) for complete data isolation
- **MCP over stdio**: Uses standard input/output for AI assistant communication - no network exposure
- **No authentication complexity**: Zero auth required since everything runs locally on your machine

Your code never leaves your environment unless you explicitly configure external embedding providers.

## Requirements

- **Python**: 3.10+
- **API Key**: Only required for semantic search - **regex search works without any API key**
  - **OpenAI API key**: For OpenAI semantic search
  - **No API key needed**: For local embedding servers (Ollama, LocalAI, TEI) or regex-only usage

## How It Works

ChunkHound indexes your codebase in three layers:
1. **Pre-index** - Run `chunkhound index` to sync database with current code (automatically uses `.chunkhound.json` if present)
2. **Background scan** - MCP server checks for changes every 5 minutes  
3. **Real-time updates** - File system events trigger immediate updates

**Project Scope Control:**
- `chunkhound mcp` - Uses current directory as project root
- `chunkhound mcp /path/to/project` - Uses specified path as project root
- Database, config, and watch paths all scoped to project root

**Processing pipeline:**
Scan → Parse → Index → Embed → Search

## Performance

ChunkHound uses smart caching and prioritization:
- **Priority queue** - User queries > file changes > background scans
- **Change detection** - Only processes modified files
- **Efficient re-indexing** - Reuses existing embeddings for unchanged code

## Origin Story

**100% of ChunkHound's code was written by an AI agent - zero lines written by hand.**

A human envisioned the project and provided strategic direction, but every single line of code, the project name, documentation, and technical decisions were generated by language models. The human acted as product manager and architect, writing prompts and validating each step, while the AI agent served as compiler - transforming requirements into working code.

The entire codebase emerged through an iterative human-AI collaboration: design → code → test → review → commit. Remarkably, the agent performed its own QA and testing by using ChunkHound to search its own code, creating a self-improving feedback loop where the tool helped build itself.

## License

MIT

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "chunkhound",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ofri Wolfus",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/41/db/5d934b30a8505aae9c09fb25ca6b0061fde3955281f7cda27abb6118d5ab/chunkhound-2.8.1.tar.gz",
    "platform": null,
    "description": "# ChunkHound\n\n**Modern RAG for your codebase - semantic and regex search via MCP.**\n\nTransform your codebase into a searchable knowledge base. ChunkHound provides AI assistants with intelligent code search capabilities using natural language and regex patterns.\n\n## Installation\n\n### \ud83e\udd16 AI Agent Installation (For Humans - Recommended)\n\nThe easiest way to install and configure ChunkHound is to use your AI assistant. Simply copy and paste this prompt:\n\n```\n- Check the current OS type and version.\n- Detect what IDE and agent versions are you running in.\n- Read https://raw.githubusercontent.com/ofriw/chunkhound/refs/heads/main/README.md\n- Install ChunkHound using uv and configure it for the current IDE and agent\n  - Upgrade chunkhound if already installed\n- Edit the current IDE / agent's config files to configure chunkhound as an MCP server. See instructions in the README\n\nStart by asking the user:\n1. What embeddings provider would they like to use? Suggest from the list of supported providers. Clearly articulate defaults.\n2. Based on the user's response, ask for the relevant configuration parameters. Use defaults if user hasn't decided.\n3. Ask clarifying questions if unsure.\n\nProceed with the configuration based on the user's input.\n```\n\nYour AI assistant will automatically detect your environment, install ChunkHound, and configure it for your specific IDE and embedding provider preferences.\n\n**Install ChunkHound**:\n```bash\nuv tool install chunkhound\n```\n\n> **Need uv?** ChunkHound requires [uv](https://github.com/astral-sh/uv): `curl -LsSf https://astral.sh/uv/install.sh | sh`\n\n## Quick Start\n\n```bash\n# 1. Index your codebase (creates .chunkhound.db)\nuv run chunkhound index\n# ChunkHound automatically detects and uses .chunkhound.json in the current directory\n\n# 2. Start MCP server for AI assistants (auto-watches for changes)\nuv run chunkhound mcp\n\n# Work with any project directory - complete project scope control\nuv run chunkhound mcp /path/to/your/project\n# This sets database to /path/to/your/project/.chunkhound/db\n# Searches for config at /path/to/your/project/.chunkhound.json\n# Watches /path/to/your/project for changes\n\n# 3. Optional: Use HTTP transport for enhanced VS Code compatibility\nuv run chunkhound mcp --http              # HTTP on 127.0.0.1:8000\nuv run chunkhound mcp --http --port 8080  # Custom port\n\n# Optional: Set OpenAI API key for semantic search\nexport CHUNKHOUND_EMBEDDING__API_KEY=\"sk-your-key-here\"\n\n# Optional: Override with a different config file\nuv run chunkhound index --config /path/to/different-config.json\n```\n\n### Automatic Configuration Detection\n\nChunkHound automatically detects and uses project configurations:\n\n**For `index` command:**\n```bash\ncd /my/project              # Has .chunkhound.json\nuv run chunkhound index     # Automatically uses /my/project/.chunkhound.json\n```\n\n**For `mcp` command with positional path:**\n```bash\nuv run chunkhound mcp /my/project\n# Automatically uses /my/project/.chunkhound.json\n# Sets database to /my/project/.chunkhound/db\n# Watches /my/project for changes\n```\n\n**For `mcp` command without path:**\n```bash\ncd /my/project              # Has .chunkhound.json\nuv run chunkhound mcp       # Automatically uses /my/project/.chunkhound.json\n```\n\n## Features\n\n**Always Available:**\n- **Regex search** - Find exact patterns like `class.*Error` (no API key needed)\n- **Code context** - AI assistants understand your codebase structure\n- **Multi-language** - 20+ languages supported\n- **Real-time updates** - Automatically watches for file changes\n- **Dual transport** - Both stdio and HTTP transport support\n\n**With API Key:**\n- **Semantic search** - Natural language queries like \"find database connection code\"\n\n## Transport Options\n\nChunkHound supports two transport methods for MCP communication:\n\n### Stdio Transport (Default)\n- **Best for**: Most AI assistants and development tools\n- **Pros**: Simple setup, automatic process management\n- **Usage**: `uv run chunkhound mcp`\n\n### HTTP Transport  \n- **Best for**: VS Code, large databases, standalone server deployment\n- **Pros**: Better compatibility with VS Code, supports large responses, separate process isolation\n- **Usage**: `uv run chunkhound mcp --http --port 8000`\n\n**When to use HTTP transport:**\n- VS Code MCP extension (recommended)\n- Large codebases that may hit stdio buffer limits\n- When you need the server to run independently of the IDE\n- Multiple concurrent AI assistant connections\n\n**HTTP transport options:**\n```bash\nuv run chunkhound mcp --http                    # Default: 127.0.0.1:8000\nuv run chunkhound mcp --http --port 8080        # Custom port\nuv run chunkhound mcp --http --host 127.0.0.1   # Custom host (security: localhost only)\n```\n\n## AI Assistant Setup\n\nChunkHound integrates with all major AI development tools. Add these minimal configurations:\n\n<details>\n<summary><strong>Configuration for Each IDE/Tool</strong></summary>\n\n<details>\n<summary><strong>Claude Desktop</strong></summary>\n\nAdd to `~/Library/Application Support/Claude/claude_desktop_config.json`:\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\"]\n    }\n  }\n}\n```\n\n**For specific project directories:**\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\", \"/path/to/your/project\"]\n    }\n  }\n}\n```\n</details>\n\n<details>\n<summary><strong>Claude Code</strong></summary>\n\nAdd to `~/.claude.json`:\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\"]\n    }\n  }\n}\n```\n\n**For specific project directories:**\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\", \"/path/to/your/project\"]\n    }\n  }\n}\n```\n</details>\n\n<details>\n<summary><strong>VS Code</strong></summary>\n\n**Standard (stdio transport):**\nAdd to `.vscode/mcp.json` in your project:\n```json\n{\n  \"servers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\"]\n    }\n  }\n}\n```\n\n**HTTP transport (recommended for VS Code):**\n```json\n{\n  \"servers\": {\n    \"chunkhound\": {\n      \"url\": \"http://127.0.0.1:8000/mcp/\",\n      \"transport\": \"http\"\n    }\n  }\n}\n```\n\nThen start the HTTP server separately:\n```bash\nuv run chunkhound mcp --http --port 8000\n```\n\n**For specific project directories:**\n```json\n{\n  \"servers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\", \"/path/to/your/project\"]\n    }\n  }\n}\n```\n</details>\n\n<details>\n<summary><strong>Cursor</strong></summary>\n\nAdd to `.cursor/mcp.json` in your project:\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\"]\n    }\n  }\n}\n```\n\n**For specific project directories:**\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\", \"/path/to/your/project\"]\n    }\n  }\n}\n```\n</details>\n\n<details>\n<summary><strong>Windsurf</strong></summary>\n\nAdd to `~/.codeium/windsurf/mcp_config.json`:\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\"]\n    }\n  }\n}\n```\n\n**For specific project directories:**\n```json\n{\n  \"mcpServers\": {\n    \"chunkhound\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"chunkhound\", \"mcp\", \"/path/to/your/project\"]\n    }\n  }\n}\n```\n</details>\n\n<details>\n<summary><strong>Zed</strong></summary>\n\nAdd to settings.json (Preferences > Open Settings):\n```json\n{\n  \"context_servers\": {\n    \"chunkhound\": {\n      \"source\": \"custom\",\n      \"command\": {\n        \"path\": \"uv\",\n        \"args\": [\"run\", \"chunkhound\", \"mcp\"]\n      }\n    }\n  }\n}\n```\n\n**For specific project directories:**\n```json\n{\n  \"context_servers\": {\n    \"chunkhound\": {\n      \"source\": \"custom\",\n      \"command\": {\n        \"path\": \"uv\",\n        \"args\": [\"run\", \"chunkhound\", \"mcp\", \"/path/to/your/project\"]\n      }\n    }\n  }\n}\n```\n</details>\n\n<details>\n<summary><strong>IntelliJ IDEA / PyCharm / WebStorm</strong> (2025.1+)</summary>\n\nGo to Settings > Tools > AI Assistant > Model Context Protocol (MCP) and add:\n- **Name**: chunkhound\n- **Command**: uv\n- **Arguments**: run chunkhound mcp\n- **Working Directory**: (leave empty or set to project root)\n\n**For specific project directories:**\n- **Name**: chunkhound\n- **Command**: uv\n- **Arguments**: run chunkhound mcp /path/to/your/project\n- **Working Directory**: (leave empty)\n</details>\n\n</details>\n\n\n\n\n## Supported Languages\n\nPython, Java, C#, TypeScript, JavaScript, Groovy, Kotlin, Go, Rust, C, C++, Matlab, Bash, Makefile, Markdown, JSON, YAML, TOML, and more.\n\n\n## Configuration\n\n### Automatic Configuration Detection\n\n**ChunkHound automatically looks for `.chunkhound.json` in the directory being indexed - no flags needed!**\n\n```bash\n# If /my/project/.chunkhound.json exists:\ncd /my/project\nuv run chunkhound index    # Automatically uses .chunkhound.json\n```\n\n### Configuration Priority\n\nChunkHound loads configuration in this order (highest priority first):\n1. **Command-line arguments** - Override everything\n2. **`.chunkhound.json` in indexed directory** - Automatically detected\n3. **`--config` file** - When explicitly specified\n4. **Environment variables** - System-wide defaults\n5. **Built-in defaults** - Fallback values\n\n### Database Location\n\nChunkHound automatically determines the database location based on project scope:\n\n**With positional path argument:**\n```bash\nuv run chunkhound mcp /my/project\n# Database automatically created at: /my/project/.chunkhound/db\n```\n\n**Without positional path (current directory):**\n```bash\ncd /my/project\nuv run chunkhound mcp\n# Database created at: /my/project/.chunkhound/db\n```\n\n**Manual overrides** (in priority order):\n- **Command line**: `--database-path /path/to/my-chunks`\n- **Config file**: Add to `.chunkhound.json`: `{\"database\": {\"path\": \"/path/to/.chunkhound.db\"}}`\n- **Environment variable**: `CHUNKHOUND_DATABASE__PATH=\"/path/to/.chunkhound.db\"`\n\n### Configuration File Format\n\nCreate `.chunkhound.json` in your project root for automatic loading:\n```json\n{\n  \"embedding\": {\n    \"provider\": \"openai\",\n    \"api_key\": \"sk-your-openai-key-here\",\n    \"model\": \"text-embedding-3-small\",\n    \"batch_size\": 50,\n    \"timeout\": 30,\n    \"max_retries\": 3,\n    \"max_concurrent_batches\": 3\n  },\n  \"database\": {\n    \"path\": \".chunkhound.db\",\n    \"provider\": \"duckdb\"\n  },\n  \"indexing\": {\n    \"watch\": true,\n    \"debounce_ms\": 500,\n    \"batch_size\": 100,\n    \"db_batch_size\": 500,\n    \"max_concurrent\": 4,\n    \"include\": [\n      \"**/*.py\",\n      \"**/*.ts\",\n      \"**/*.jsx\"\n    ],\n    \"exclude\": [\n      \"**/node_modules/**\",\n      \"**/__pycache__/**\",\n      \"**/dist/**\"\n    ]\n  },\n  \"mcp\": {\n    \"transport\": \"stdio\",\n    \"host\": \"127.0.0.1\",\n    \"port\": 8000\n  },\n  \"debug\": false\n}\n```\n\n### Working with `.chunkhound.json`\n\n**Example 1: Project with OpenAI embeddings**\n```bash\n# Create .chunkhound.json in your project\necho '{\n  \"embedding\": {\n    \"provider\": \"openai\",\n    \"api_key\": \"sk-your-key-here\"\n  }\n}' > .chunkhound.json\n\n# Index - automatically uses .chunkhound.json\nuv run chunkhound index\n```\n\n**Example 2: Project with local Ollama**\n```bash\n# Create .chunkhound.json for Ollama\necho '{\n  \"embedding\": {\n    \"provider\": \"openai-compatible\",\n    \"base_url\": \"http://localhost:11434\",\n    \"model\": \"nomic-embed-text\"\n  }\n}' > .chunkhound.json\n\n# Index - automatically uses .chunkhound.json\nuv run chunkhound index\n```\n\n**Provider-Specific Examples**:\n\nOpenAI-compatible (Ollama, LocalAI):\n```json\n{\n  \"embedding\": {\n    \"provider\": \"openai-compatible\",\n    \"base_url\": \"http://localhost:11434\",\n    \"model\": \"nomic-embed-text\",\n    \"api_key\": \"optional-api-key\"\n  }\n}\n```\n\nText Embeddings Inference (TEI):\n```json\n{\n  \"embedding\": {\n    \"provider\": \"tei\",\n    \"base_url\": \"http://localhost:8080\"\n  }\n}\n```\n\nBGE-IN-ICL:\n```json\n{\n  \"embedding\": {\n    \"provider\": \"bge-in-icl\",\n    \"base_url\": \"http://localhost:8080\",\n    \"language\": \"python\",\n    \"enable_icl\": true\n  }\n}\n```\n\n**Security Note**: \n- API keys in config files are convenient for local development\n- Add `.chunkhound.json` to `.gitignore` to prevent committing API keys:\n\n```gitignore\n# ChunkHound config files\n.chunkhound.json\n*.chunkhound.json\nchunkhound.json\n```\n\n**Configuration Options**:\n\n- **`embedding`**: Embedding provider settings\n  - `provider`: Choose from `openai`, `openai-compatible`, `tei`, `bge-in-icl`\n  - `model`: Model name (uses provider default if not specified)\n  - `api_key`: API key for authentication\n  - `base_url`: Base URL for API (for local/custom providers)\n  - `batch_size`: Number of texts to embed at once (1-1000)\n  - `timeout`: Request timeout in seconds\n  - `max_retries`: Retry attempts for failed requests\n  - `max_concurrent_batches`: Concurrent embedding batches\n\n- **`database`**: Database settings\n  - `path`: Database file location (relative or absolute)\n  - `provider`: Database type (`duckdb` or `lancedb`)\n\n- **`indexing`**: File indexing behavior\n  - `watch`: Enable file watching in standalone mode\n  - `debounce_ms`: Delay before processing file changes\n  - `batch_size`: Files to process per batch\n  - `db_batch_size`: Database records per transaction\n  - `max_concurrent`: Parallel file processing limit\n  - `include`: Glob patterns for files to index\n  - `exclude`: Glob patterns to ignore\n\n- **`debug`**: Enable debug logging\n\n**Security Note**: Never commit API keys to version control. Use environment variables or `.chunkhound.json` (added to .gitignore):\n```bash\n# Option 1: Environment variable\nexport CHUNKHOUND_EMBEDDING__API_KEY=\"sk-your-key-here\"\n\n# Option 2: .chunkhound.json (automatically detected)\necho '{\"embedding\": {\"api_key\": \"sk-your-key-here\"}}' > .chunkhound.json\necho \".chunkhound.json\" >> .gitignore\n```\n\n### Embedding Providers\n\nChunkHound supports multiple embedding providers for semantic search:\n\n**OpenAI (requires API key)**:\n```bash\n# Option 1: Use .chunkhound.json (automatically detected)\necho '{\n  \"embedding\": {\n    \"provider\": \"openai\",\n    \"api_key\": \"sk-your-key-here\",\n    \"model\": \"text-embedding-3-small\"\n  }\n}' > .chunkhound.json\nuv run chunkhound index\n\n# Option 2: Use environment variable\nexport CHUNKHOUND_EMBEDDING__API_KEY=\"sk-your-key-here\"\nuv run chunkhound index --provider openai --model text-embedding-3-small\n```\n\n**Local embedding servers (no API key required)**:\n\n**Ollama**:\n```bash\n# First, start Ollama with an embedding model\nollama pull nomic-embed-text\n\n# Option 1: Use .chunkhound.json (automatically detected)\necho '{\n  \"embedding\": {\n    \"provider\": \"openai-compatible\",\n    \"base_url\": \"http://localhost:11434\",\n    \"model\": \"nomic-embed-text\"\n  }\n}' > .chunkhound.json\nuv run chunkhound index\n\n# Option 2: Use command line\nuv run chunkhound index --provider openai-compatible --base-url http://localhost:11434 --model nomic-embed-text\n```\n\n**LocalAI, LM Studio, or other OpenAI-compatible servers**:\n```bash\n# Create .chunkhound.json for automatic detection\necho '{\n  \"embedding\": {\n    \"provider\": \"openai-compatible\",\n    \"base_url\": \"http://localhost:1234\",\n    \"model\": \"your-embedding-model\"\n  }\n}' > .chunkhound.json\nuv run chunkhound index\n\n# Or use command line\nuv run chunkhound index --provider openai-compatible --base-url http://localhost:1234 --model your-embedding-model\n```\n\n**Text Embeddings Inference (TEI)**:\n```bash\n# Create .chunkhound.json for automatic detection\necho '{\n  \"embedding\": {\n    \"provider\": \"tei\",\n    \"base_url\": \"http://localhost:8080\"\n  }\n}' > .chunkhound.json\nuv run chunkhound index\n\n# Or use command line\nuv run chunkhound index --provider tei --base-url http://localhost:8080\n```\n\n**Regex-only mode (no embeddings)**:\n```bash\n# Skip embedding setup entirely - only regex search will be available\nuv run chunkhound index --no-embeddings\n```\n\n### Environment Variables\n\nEnvironment variables are useful for system-wide defaults, but `.chunkhound.json` in your project directory will take precedence:\n\n```bash\n# For OpenAI semantic search only\nexport CHUNKHOUND_EMBEDDING__API_KEY=\"sk-your-key-here\"\n\n# For local embedding servers (Ollama, LocalAI, etc.)\nexport CHUNKHOUND_EMBEDDING__PROVIDER=\"openai-compatible\"\nexport CHUNKHOUND_EMBEDDING__BASE_URL=\"http://localhost:11434\"  # Ollama default\nexport CHUNKHOUND_EMBEDDING__MODEL=\"nomic-embed-text\"\n\n# Optional: Database location\nexport CHUNKHOUND_DATABASE__PATH=\"/path/to/.chunkhound.db\"\n\n# Note: .chunkhound.json in your project will override these settings\n```\n\n## Security\n\nChunkHound prioritizes data security through a local-first architecture:\n\n- **Local database**: All code chunks stored in local DuckDB file - no data sent to external servers\n- **Local embeddings**: Supports self-hosted embedding servers (Ollama, LocalAI, TEI) for complete data isolation\n- **MCP over stdio**: Uses standard input/output for AI assistant communication - no network exposure\n- **No authentication complexity**: Zero auth required since everything runs locally on your machine\n\nYour code never leaves your environment unless you explicitly configure external embedding providers.\n\n## Requirements\n\n- **Python**: 3.10+\n- **API Key**: Only required for semantic search - **regex search works without any API key**\n  - **OpenAI API key**: For OpenAI semantic search\n  - **No API key needed**: For local embedding servers (Ollama, LocalAI, TEI) or regex-only usage\n\n## How It Works\n\nChunkHound indexes your codebase in three layers:\n1. **Pre-index** - Run `chunkhound index` to sync database with current code (automatically uses `.chunkhound.json` if present)\n2. **Background scan** - MCP server checks for changes every 5 minutes  \n3. **Real-time updates** - File system events trigger immediate updates\n\n**Project Scope Control:**\n- `chunkhound mcp` - Uses current directory as project root\n- `chunkhound mcp /path/to/project` - Uses specified path as project root\n- Database, config, and watch paths all scoped to project root\n\n**Processing pipeline:**\nScan \u2192 Parse \u2192 Index \u2192 Embed \u2192 Search\n\n## Performance\n\nChunkHound uses smart caching and prioritization:\n- **Priority queue** - User queries > file changes > background scans\n- **Change detection** - Only processes modified files\n- **Efficient re-indexing** - Reuses existing embeddings for unchanged code\n\n## Origin Story\n\n**100% of ChunkHound's code was written by an AI agent - zero lines written by hand.**\n\nA human envisioned the project and provided strategic direction, but every single line of code, the project name, documentation, and technical decisions were generated by language models. The human acted as product manager and architect, writing prompts and validating each step, while the AI agent served as compiler - transforming requirements into working code.\n\nThe entire codebase emerged through an iterative human-AI collaboration: design \u2192 code \u2192 test \u2192 review \u2192 commit. Remarkably, the agent performed its own QA and testing by using ChunkHound to search its own code, creating a self-improving feedback loop where the tool helped build itself.\n\n## License\n\nMIT\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Local-first semantic code search with vector and regex capabilities for AI assistants via MCP",
    "version": "2.8.1",
    "project_urls": {
        "Changelog": "https://github.com/chunkhound/chunkhound/releases",
        "Documentation": "https://github.com/chunkhound/chunkhound#readme",
        "Homepage": "https://github.com/chunkhound/chunkhound",
        "Issues": "https://github.com/chunkhound/chunkhound/issues",
        "Repository": "https://github.com/chunkhound/chunkhound"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8e1641cc35f04cfe204d14e168d9aaecd9494fe4cb7b6760b638af6485f7bede",
                "md5": "e8e37d5dc8fb78a3a1ee0753b21dc2b5",
                "sha256": "d036aa8fe5988bad55705c1d9c859b9a061a9e7ee9bb21bda807b50823077f5f"
            },
            "downloads": -1,
            "filename": "chunkhound-2.8.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8e37d5dc8fb78a3a1ee0753b21dc2b5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 309282,
            "upload_time": "2025-07-20T14:47:20",
            "upload_time_iso_8601": "2025-07-20T14:47:20.314692Z",
            "url": "https://files.pythonhosted.org/packages/8e/16/41cc35f04cfe204d14e168d9aaecd9494fe4cb7b6760b638af6485f7bede/chunkhound-2.8.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "41db5d934b30a8505aae9c09fb25ca6b0061fde3955281f7cda27abb6118d5ab",
                "md5": "4a818a7778c288334613d3aeb803be1f",
                "sha256": "563867f6bbb2f8496d53daad4b1b97ec36d24e8761928a887ad63ebbfa37d629"
            },
            "downloads": -1,
            "filename": "chunkhound-2.8.1.tar.gz",
            "has_sig": false,
            "md5_digest": "4a818a7778c288334613d3aeb803be1f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 586228,
            "upload_time": "2025-07-20T14:47:22",
            "upload_time_iso_8601": "2025-07-20T14:47:22.078597Z",
            "url": "https://files.pythonhosted.org/packages/41/db/5d934b30a8505aae9c09fb25ca6b0061fde3955281f7cda27abb6118d5ab/chunkhound-2.8.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-20 14:47:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chunkhound",
    "github_project": "chunkhound",
    "github_not_found": true,
    "lcname": "chunkhound"
}

Ofri Wolfus