mcp-server-excel-sql


Namemcp-server-excel-sql JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryMCP server that exposes Excel files as SQL-queryable tables via DuckDB
upload_time2025-10-18 11:18:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords mcp excel sql duckdb server
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # mcp-server-excel-sql

MCP server exposing Excel files as SQL-queryable DuckDB views with thread-safe concurrent access.

## Features

- SQL queries on Excel via DuckDB views
- RAW (all_varchar) or ASSISTED (sheet_overrides) modes
- Multi-row headers, type hints, unpivot, column renames
- Auto-refresh on file changes (--watch)
- Thread-safe concurrent access (HTTP/SSE transports)
- Isolated query timeouts (per-connection interrupt)
- API key authentication for HTTP transports (optional)

## Installation

```bash
pipx install mcp-server-excel-sql
```

## Usage

### CLI

```bash
# STDIO mode (single-threaded)
mcp-excel --path /data/excel --watch --overrides config.yaml

# HTTP mode (concurrent, ~10 users)
mcp-excel --path /data/excel --transport streamable-http --port 8000

# HTTP mode with authentication
export MCP_EXCEL_API_KEY=your-secret-key
mcp-excel --path /data/excel --transport sse --port 8000 --require-auth
```

### Claude Desktop

Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "excel": {
      "command": "mcp-excel",
      "args": ["--path", "/Users/your-username/data/excel"]
    }
  }
}
```

With overrides:
```json
{
  "mcpServers": {
    "excel": {
      "command": "mcp-excel",
      "args": [
        "--path", "/path/to/excel/files",
        "--overrides", "/path/to/config.yaml"
      ]
    }
  }
}
```

## MCP Tools

**tool_load_dir** - Load Excel directory into views
**tool_query** - Execute SELECT (read-only, timeout/limit enforced)
**tool_list_tables** - List views with metadata
**tool_get_schema** - DESCRIBE table
**tool_refresh** - Rescan filesystem (incremental or full)

## Table Naming

**Format**: `<alias>.<filepath>.<sheet>` (dot-separated, lowercase, sanitized)

**Sanitization**:
- Spaces → `_`
- Special chars → removed
- Allowed: `[a-z0-9_$]`

**Alias**: Auto-generated from directory name

**Examples**:
```
/data/sales/Q1-2024.xlsx → "sales.q12024.summary"
/reports/P&L (Final).xlsx → "reports.plfinal.sheet1"
```

**IMPORTANT**: Dots require quoted identifiers in SQL:
```sql
SELECT * FROM "sales.q12024.summary"  -- correct
SELECT * FROM sales.q12024.summary    -- fails (Catalog Error)
```

## System Views

`<alias>.__files` - File metadata (path, sheet_count, total_rows, mtime)
`<alias>.__tables` - Table metadata (table_name, file, sheet, mode, est_rows)

Query: `SELECT * FROM "sales.__files"`

## Modes

**RAW**: `read_xlsx(..., all_varchar=true, header=false)`
**ASSISTED**: Apply per-sheet overrides

```yaml
sales.xlsx:
  sheet_overrides:
    Summary:
      skip_rows: 3
      skip_footer: 2
      header_rows: 2
      drop_regex: "^Total:"
      column_renames:
        "col_0": "region"
      type_hints:
        amount: "DECIMAL(10,2)"
        date: "DATE"
      unpivot:
        id_vars: ["Region"]
        value_vars: ["Jan", "Feb"]
        var_name: "Month"
        value_name: "Sales"
```

## Concurrency & Thread Safety

### Design

**STDIO mode**: Single shared connection (single-threaded)
**HTTP/SSE mode**: Isolated connections per request (concurrent)

Supports ~10-20 concurrent users with <1ms overhead per request.

### Thread Safety Features

1. **Isolated Connections** - Each HTTP request gets its own DuckDB connection
2. **Protected Catalog** - RLock guards all metadata dictionary access
3. **Per-Connection Timeouts** - Query interrupts don't affect other users
4. **Deadlock-Free** - Sequential lock acquisition (never nested)

### Critical Fixes

**Problem 1: Timeout Interference**
- Before: One user's timeout killed all concurrent queries
- After: Isolated connections ensure timeouts only affect their own query

**Problem 2: Catalog Race**
- Before: Concurrent refresh could corrupt table metadata
- After: RLock protects all catalog reads/writes

**Problem 3: Dictionary Mutation**
- Before: Python dict operations not atomic (GIL can be released)
- After: All dict mutations locked to prevent corruption

### Performance

- STDIO mode: 0% overhead (no changes to single-threaded path)
- HTTP mode: ~0.6ms per request (connection + lock overhead)
- Tested: 20 concurrent workers, 100+ operations

### Testing

**75/75 tests pass** including:
- 6 concurrency tests (parallel queries, timeout isolation, refresh safety)
- 4 stress tests (100+ concurrent operations, memory leak detection)
- 1 authentication test (API key middleware validation)

## Security

- Read-only: BEGIN TRANSACTION READ ONLY (DuckDB-enforced, blocks all write operations)
- Path-confined: Root path validation, no traversal
- Timeout: threading.Timer → conn.interrupt() (isolated per connection)
- Row limit: fetchmany(max_rows + 1)
- Authentication: Optional API key for HTTP/SSE transports

### Authentication

API key authentication via Bearer token (HTTP/SSE transports only, STDIO bypassed).

**Configuration**:
```bash
export MCP_EXCEL_API_KEY=your-secret-key
mcp-excel --path /data/excel --transport sse --require-auth
```

**Client requests**:
```bash
curl -H "Authorization: Bearer your-secret-key" http://localhost:8000/sse
```

**Implementation**:
- Starlette middleware intercepts all HTTP requests
- Returns 401 for missing/invalid Authorization header
- STDIO transport ignores --require-auth (local subprocess already OS-authenticated)
- Single shared API key (environment variable)

## Examples

Finance examples for **Kopitiam Kita Sdn Bhd**, a Malaysian coffeehouse chain:

```bash
# Generate example files
python examples/create_finance_examples.py

# Load and query
mcp-excel --path examples --alias finance --overrides examples/finance_overrides.yaml

# Query examples
SELECT SUM(COALESCE(debit, 0)) as total_debits
FROM "finance.general_ledger.entries";

SELECT region, SUM(revenue) as total_revenue
FROM "finance.revenue_by_segment.revenue"
GROUP BY region ORDER BY total_revenue DESC;
```

**Includes**:
- 10 Excel files with 3,100+ financial records (MYR currency)
- General Ledger, Financial Statements, AR Aging, Revenue Analysis
- Budget Variance, Invoices, Trial Balance, Cash Flow Forecast
- Complete prompt chain sequences in `examples/README.md`

See `examples/README.md` for detailed usage and SQL query patterns.

## Development

```bash
pip install -e ".[dev]"
pytest --cov=mcp_excel tests/
python -m build
```

**Tests**: 75 passing (12 unit + 47 integration + 10 regression + 6 concurrency + 4 stress)
**Coverage**: Comprehensive test coverage including race condition scenarios

## Architecture Notes

### Transport Modes

**STDIO** (default):
- Single shared in-memory DuckDB connection
- Single-threaded (no concurrency concerns)
- Ideal for: Claude Desktop, local CLI usage

**HTTP/SSE** (--transport streamable-http):
- Persistent DuckDB file (temp, auto-cleanup)
- Isolated connection per request
- RLock-protected shared state (catalog, load_configs)
- Ideal for: Multiple concurrent users, web APIs

### State Management

**Global State** (thread-safe via locks):
- `catalog` - Table metadata dict (protected by `_catalog_lock`)
- `load_configs` - Directory configurations (protected by `_load_configs_lock`)
- `registry` - Name collision tracking (internal lock)

**Per-Request State** (isolated):
- DuckDB connection (HTTP mode only)
- Query timeout timer
- Transaction state

### Scale Limits

- Current design: Optimized for ~10 concurrent users
- Tested with: 20 concurrent workers
- Upgrade path: If >20 users needed, consider connection pooling

## License

MIT

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mcp-server-excel-sql",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "mcp, excel, sql, duckdb, server",
    "author": null,
    "author_email": "Ivan <ivan@eriad.com>",
    "download_url": "https://files.pythonhosted.org/packages/f5/6f/15f078d28ad35538ed28186cacffe4e2d0091280b44c51661f1493ff426d/mcp_server_excel_sql-0.3.0.tar.gz",
    "platform": null,
    "description": "# mcp-server-excel-sql\n\nMCP server exposing Excel files as SQL-queryable DuckDB views with thread-safe concurrent access.\n\n## Features\n\n- SQL queries on Excel via DuckDB views\n- RAW (all_varchar) or ASSISTED (sheet_overrides) modes\n- Multi-row headers, type hints, unpivot, column renames\n- Auto-refresh on file changes (--watch)\n- Thread-safe concurrent access (HTTP/SSE transports)\n- Isolated query timeouts (per-connection interrupt)\n- API key authentication for HTTP transports (optional)\n\n## Installation\n\n```bash\npipx install mcp-server-excel-sql\n```\n\n## Usage\n\n### CLI\n\n```bash\n# STDIO mode (single-threaded)\nmcp-excel --path /data/excel --watch --overrides config.yaml\n\n# HTTP mode (concurrent, ~10 users)\nmcp-excel --path /data/excel --transport streamable-http --port 8000\n\n# HTTP mode with authentication\nexport MCP_EXCEL_API_KEY=your-secret-key\nmcp-excel --path /data/excel --transport sse --port 8000 --require-auth\n```\n\n### Claude Desktop\n\nAdd to `~/Library/Application Support/Claude/claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"excel\": {\n      \"command\": \"mcp-excel\",\n      \"args\": [\"--path\", \"/Users/your-username/data/excel\"]\n    }\n  }\n}\n```\n\nWith overrides:\n```json\n{\n  \"mcpServers\": {\n    \"excel\": {\n      \"command\": \"mcp-excel\",\n      \"args\": [\n        \"--path\", \"/path/to/excel/files\",\n        \"--overrides\", \"/path/to/config.yaml\"\n      ]\n    }\n  }\n}\n```\n\n## MCP Tools\n\n**tool_load_dir** - Load Excel directory into views\n**tool_query** - Execute SELECT (read-only, timeout/limit enforced)\n**tool_list_tables** - List views with metadata\n**tool_get_schema** - DESCRIBE table\n**tool_refresh** - Rescan filesystem (incremental or full)\n\n## Table Naming\n\n**Format**: `<alias>.<filepath>.<sheet>` (dot-separated, lowercase, sanitized)\n\n**Sanitization**:\n- Spaces \u2192 `_`\n- Special chars \u2192 removed\n- Allowed: `[a-z0-9_$]`\n\n**Alias**: Auto-generated from directory name\n\n**Examples**:\n```\n/data/sales/Q1-2024.xlsx \u2192 \"sales.q12024.summary\"\n/reports/P&L (Final).xlsx \u2192 \"reports.plfinal.sheet1\"\n```\n\n**IMPORTANT**: Dots require quoted identifiers in SQL:\n```sql\nSELECT * FROM \"sales.q12024.summary\"  -- correct\nSELECT * FROM sales.q12024.summary    -- fails (Catalog Error)\n```\n\n## System Views\n\n`<alias>.__files` - File metadata (path, sheet_count, total_rows, mtime)\n`<alias>.__tables` - Table metadata (table_name, file, sheet, mode, est_rows)\n\nQuery: `SELECT * FROM \"sales.__files\"`\n\n## Modes\n\n**RAW**: `read_xlsx(..., all_varchar=true, header=false)`\n**ASSISTED**: Apply per-sheet overrides\n\n```yaml\nsales.xlsx:\n  sheet_overrides:\n    Summary:\n      skip_rows: 3\n      skip_footer: 2\n      header_rows: 2\n      drop_regex: \"^Total:\"\n      column_renames:\n        \"col_0\": \"region\"\n      type_hints:\n        amount: \"DECIMAL(10,2)\"\n        date: \"DATE\"\n      unpivot:\n        id_vars: [\"Region\"]\n        value_vars: [\"Jan\", \"Feb\"]\n        var_name: \"Month\"\n        value_name: \"Sales\"\n```\n\n## Concurrency & Thread Safety\n\n### Design\n\n**STDIO mode**: Single shared connection (single-threaded)\n**HTTP/SSE mode**: Isolated connections per request (concurrent)\n\nSupports ~10-20 concurrent users with <1ms overhead per request.\n\n### Thread Safety Features\n\n1. **Isolated Connections** - Each HTTP request gets its own DuckDB connection\n2. **Protected Catalog** - RLock guards all metadata dictionary access\n3. **Per-Connection Timeouts** - Query interrupts don't affect other users\n4. **Deadlock-Free** - Sequential lock acquisition (never nested)\n\n### Critical Fixes\n\n**Problem 1: Timeout Interference**\n- Before: One user's timeout killed all concurrent queries\n- After: Isolated connections ensure timeouts only affect their own query\n\n**Problem 2: Catalog Race**\n- Before: Concurrent refresh could corrupt table metadata\n- After: RLock protects all catalog reads/writes\n\n**Problem 3: Dictionary Mutation**\n- Before: Python dict operations not atomic (GIL can be released)\n- After: All dict mutations locked to prevent corruption\n\n### Performance\n\n- STDIO mode: 0% overhead (no changes to single-threaded path)\n- HTTP mode: ~0.6ms per request (connection + lock overhead)\n- Tested: 20 concurrent workers, 100+ operations\n\n### Testing\n\n**75/75 tests pass** including:\n- 6 concurrency tests (parallel queries, timeout isolation, refresh safety)\n- 4 stress tests (100+ concurrent operations, memory leak detection)\n- 1 authentication test (API key middleware validation)\n\n## Security\n\n- Read-only: BEGIN TRANSACTION READ ONLY (DuckDB-enforced, blocks all write operations)\n- Path-confined: Root path validation, no traversal\n- Timeout: threading.Timer \u2192 conn.interrupt() (isolated per connection)\n- Row limit: fetchmany(max_rows + 1)\n- Authentication: Optional API key for HTTP/SSE transports\n\n### Authentication\n\nAPI key authentication via Bearer token (HTTP/SSE transports only, STDIO bypassed).\n\n**Configuration**:\n```bash\nexport MCP_EXCEL_API_KEY=your-secret-key\nmcp-excel --path /data/excel --transport sse --require-auth\n```\n\n**Client requests**:\n```bash\ncurl -H \"Authorization: Bearer your-secret-key\" http://localhost:8000/sse\n```\n\n**Implementation**:\n- Starlette middleware intercepts all HTTP requests\n- Returns 401 for missing/invalid Authorization header\n- STDIO transport ignores --require-auth (local subprocess already OS-authenticated)\n- Single shared API key (environment variable)\n\n## Examples\n\nFinance examples for **Kopitiam Kita Sdn Bhd**, a Malaysian coffeehouse chain:\n\n```bash\n# Generate example files\npython examples/create_finance_examples.py\n\n# Load and query\nmcp-excel --path examples --alias finance --overrides examples/finance_overrides.yaml\n\n# Query examples\nSELECT SUM(COALESCE(debit, 0)) as total_debits\nFROM \"finance.general_ledger.entries\";\n\nSELECT region, SUM(revenue) as total_revenue\nFROM \"finance.revenue_by_segment.revenue\"\nGROUP BY region ORDER BY total_revenue DESC;\n```\n\n**Includes**:\n- 10 Excel files with 3,100+ financial records (MYR currency)\n- General Ledger, Financial Statements, AR Aging, Revenue Analysis\n- Budget Variance, Invoices, Trial Balance, Cash Flow Forecast\n- Complete prompt chain sequences in `examples/README.md`\n\nSee `examples/README.md` for detailed usage and SQL query patterns.\n\n## Development\n\n```bash\npip install -e \".[dev]\"\npytest --cov=mcp_excel tests/\npython -m build\n```\n\n**Tests**: 75 passing (12 unit + 47 integration + 10 regression + 6 concurrency + 4 stress)\n**Coverage**: Comprehensive test coverage including race condition scenarios\n\n## Architecture Notes\n\n### Transport Modes\n\n**STDIO** (default):\n- Single shared in-memory DuckDB connection\n- Single-threaded (no concurrency concerns)\n- Ideal for: Claude Desktop, local CLI usage\n\n**HTTP/SSE** (--transport streamable-http):\n- Persistent DuckDB file (temp, auto-cleanup)\n- Isolated connection per request\n- RLock-protected shared state (catalog, load_configs)\n- Ideal for: Multiple concurrent users, web APIs\n\n### State Management\n\n**Global State** (thread-safe via locks):\n- `catalog` - Table metadata dict (protected by `_catalog_lock`)\n- `load_configs` - Directory configurations (protected by `_load_configs_lock`)\n- `registry` - Name collision tracking (internal lock)\n\n**Per-Request State** (isolated):\n- DuckDB connection (HTTP mode only)\n- Query timeout timer\n- Transaction state\n\n### Scale Limits\n\n- Current design: Optimized for ~10 concurrent users\n- Tested with: 20 concurrent workers\n- Upgrade path: If >20 users needed, consider connection pooling\n\n## License\n\nMIT\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "MCP server that exposes Excel files as SQL-queryable tables via DuckDB",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/ivan-loh/mcp-excel",
        "Issues": "https://github.com/ivan-loh/mcp-excel/issues",
        "Repository": "https://github.com/ivan-loh/mcp-excel"
    },
    "split_keywords": [
        "mcp",
        " excel",
        " sql",
        " duckdb",
        " server"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "edc28acd70e4e27de97b04c96382e311d7d5255b5c48c0718ad1f1a61f119d63",
                "md5": "83157f2e99779e23008bd8150767cffd",
                "sha256": "c7a8676f7022a21f75a4e3acc6e646cfdf669bc804a32c478fc991d88fa7c612"
            },
            "downloads": -1,
            "filename": "mcp_server_excel_sql-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "83157f2e99779e23008bd8150767cffd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 17770,
            "upload_time": "2025-10-18T11:18:56",
            "upload_time_iso_8601": "2025-10-18T11:18:56.529452Z",
            "url": "https://files.pythonhosted.org/packages/ed/c2/8acd70e4e27de97b04c96382e311d7d5255b5c48c0718ad1f1a61f119d63/mcp_server_excel_sql-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f56f15f078d28ad35538ed28186cacffe4e2d0091280b44c51661f1493ff426d",
                "md5": "a2247db7af2d70e305482c02c5790319",
                "sha256": "0ca8141f764bcc7373df43c63eeba33bf5529b2a9866a064f5d2d32fce627fa0"
            },
            "downloads": -1,
            "filename": "mcp_server_excel_sql-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a2247db7af2d70e305482c02c5790319",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 248632,
            "upload_time": "2025-10-18T11:18:58",
            "upload_time_iso_8601": "2025-10-18T11:18:58.396718Z",
            "url": "https://files.pythonhosted.org/packages/f5/6f/15f078d28ad35538ed28186cacffe4e2d0091280b44c51661f1493ff426d/mcp_server_excel_sql-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-18 11:18:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ivan-loh",
    "github_project": "mcp-excel",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mcp-server-excel-sql"
}
        
Elapsed time: 2.32415s