Name | mcp-server-excel-sql JSON |
Version |
0.3.0
JSON |
| download |
home_page | None |
Summary | MCP server that exposes Excel files as SQL-queryable tables via DuckDB |
upload_time | 2025-10-18 11:18:58 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.11 |
license | None |
keywords |
mcp
excel
sql
duckdb
server
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# mcp-server-excel-sql
MCP server exposing Excel files as SQL-queryable DuckDB views with thread-safe concurrent access.
## Features
- SQL queries on Excel via DuckDB views
- RAW (all_varchar) or ASSISTED (sheet_overrides) modes
- Multi-row headers, type hints, unpivot, column renames
- Auto-refresh on file changes (--watch)
- Thread-safe concurrent access (HTTP/SSE transports)
- Isolated query timeouts (per-connection interrupt)
- API key authentication for HTTP transports (optional)
## Installation
```bash
pipx install mcp-server-excel-sql
```
## Usage
### CLI
```bash
# STDIO mode (single-threaded)
mcp-excel --path /data/excel --watch --overrides config.yaml
# HTTP mode (concurrent, ~10 users)
mcp-excel --path /data/excel --transport streamable-http --port 8000
# HTTP mode with authentication
export MCP_EXCEL_API_KEY=your-secret-key
mcp-excel --path /data/excel --transport sse --port 8000 --require-auth
```
### Claude Desktop
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"excel": {
"command": "mcp-excel",
"args": ["--path", "/Users/your-username/data/excel"]
}
}
}
```
With overrides:
```json
{
"mcpServers": {
"excel": {
"command": "mcp-excel",
"args": [
"--path", "/path/to/excel/files",
"--overrides", "/path/to/config.yaml"
]
}
}
}
```
## MCP Tools
**tool_load_dir** - Load Excel directory into views
**tool_query** - Execute SELECT (read-only, timeout/limit enforced)
**tool_list_tables** - List views with metadata
**tool_get_schema** - DESCRIBE table
**tool_refresh** - Rescan filesystem (incremental or full)
## Table Naming
**Format**: `<alias>.<filepath>.<sheet>` (dot-separated, lowercase, sanitized)
**Sanitization**:
- Spaces → `_`
- Special chars → removed
- Allowed: `[a-z0-9_$]`
**Alias**: Auto-generated from directory name
**Examples**:
```
/data/sales/Q1-2024.xlsx → "sales.q12024.summary"
/reports/P&L (Final).xlsx → "reports.plfinal.sheet1"
```
**IMPORTANT**: Dots require quoted identifiers in SQL:
```sql
SELECT * FROM "sales.q12024.summary" -- correct
SELECT * FROM sales.q12024.summary -- fails (Catalog Error)
```
## System Views
`<alias>.__files` - File metadata (path, sheet_count, total_rows, mtime)
`<alias>.__tables` - Table metadata (table_name, file, sheet, mode, est_rows)
Query: `SELECT * FROM "sales.__files"`
## Modes
**RAW**: `read_xlsx(..., all_varchar=true, header=false)`
**ASSISTED**: Apply per-sheet overrides
```yaml
sales.xlsx:
sheet_overrides:
Summary:
skip_rows: 3
skip_footer: 2
header_rows: 2
drop_regex: "^Total:"
column_renames:
"col_0": "region"
type_hints:
amount: "DECIMAL(10,2)"
date: "DATE"
unpivot:
id_vars: ["Region"]
value_vars: ["Jan", "Feb"]
var_name: "Month"
value_name: "Sales"
```
## Concurrency & Thread Safety
### Design
**STDIO mode**: Single shared connection (single-threaded)
**HTTP/SSE mode**: Isolated connections per request (concurrent)
Supports ~10-20 concurrent users with <1ms overhead per request.
### Thread Safety Features
1. **Isolated Connections** - Each HTTP request gets its own DuckDB connection
2. **Protected Catalog** - RLock guards all metadata dictionary access
3. **Per-Connection Timeouts** - Query interrupts don't affect other users
4. **Deadlock-Free** - Sequential lock acquisition (never nested)
### Critical Fixes
**Problem 1: Timeout Interference**
- Before: One user's timeout killed all concurrent queries
- After: Isolated connections ensure timeouts only affect their own query
**Problem 2: Catalog Race**
- Before: Concurrent refresh could corrupt table metadata
- After: RLock protects all catalog reads/writes
**Problem 3: Dictionary Mutation**
- Before: Python dict operations not atomic (GIL can be released)
- After: All dict mutations locked to prevent corruption
### Performance
- STDIO mode: 0% overhead (no changes to single-threaded path)
- HTTP mode: ~0.6ms per request (connection + lock overhead)
- Tested: 20 concurrent workers, 100+ operations
### Testing
**75/75 tests pass** including:
- 6 concurrency tests (parallel queries, timeout isolation, refresh safety)
- 4 stress tests (100+ concurrent operations, memory leak detection)
- 1 authentication test (API key middleware validation)
## Security
- Read-only: BEGIN TRANSACTION READ ONLY (DuckDB-enforced, blocks all write operations)
- Path-confined: Root path validation, no traversal
- Timeout: threading.Timer → conn.interrupt() (isolated per connection)
- Row limit: fetchmany(max_rows + 1)
- Authentication: Optional API key for HTTP/SSE transports
### Authentication
API key authentication via Bearer token (HTTP/SSE transports only, STDIO bypassed).
**Configuration**:
```bash
export MCP_EXCEL_API_KEY=your-secret-key
mcp-excel --path /data/excel --transport sse --require-auth
```
**Client requests**:
```bash
curl -H "Authorization: Bearer your-secret-key" http://localhost:8000/sse
```
**Implementation**:
- Starlette middleware intercepts all HTTP requests
- Returns 401 for missing/invalid Authorization header
- STDIO transport ignores --require-auth (local subprocess already OS-authenticated)
- Single shared API key (environment variable)
## Examples
Finance examples for **Kopitiam Kita Sdn Bhd**, a Malaysian coffeehouse chain:
```bash
# Generate example files
python examples/create_finance_examples.py
# Load and query
mcp-excel --path examples --alias finance --overrides examples/finance_overrides.yaml
# Query examples
SELECT SUM(COALESCE(debit, 0)) as total_debits
FROM "finance.general_ledger.entries";
SELECT region, SUM(revenue) as total_revenue
FROM "finance.revenue_by_segment.revenue"
GROUP BY region ORDER BY total_revenue DESC;
```
**Includes**:
- 10 Excel files with 3,100+ financial records (MYR currency)
- General Ledger, Financial Statements, AR Aging, Revenue Analysis
- Budget Variance, Invoices, Trial Balance, Cash Flow Forecast
- Complete prompt chain sequences in `examples/README.md`
See `examples/README.md` for detailed usage and SQL query patterns.
## Development
```bash
pip install -e ".[dev]"
pytest --cov=mcp_excel tests/
python -m build
```
**Tests**: 75 passing (12 unit + 47 integration + 10 regression + 6 concurrency + 4 stress)
**Coverage**: Comprehensive test coverage including race condition scenarios
## Architecture Notes
### Transport Modes
**STDIO** (default):
- Single shared in-memory DuckDB connection
- Single-threaded (no concurrency concerns)
- Ideal for: Claude Desktop, local CLI usage
**HTTP/SSE** (--transport streamable-http):
- Persistent DuckDB file (temp, auto-cleanup)
- Isolated connection per request
- RLock-protected shared state (catalog, load_configs)
- Ideal for: Multiple concurrent users, web APIs
### State Management
**Global State** (thread-safe via locks):
- `catalog` - Table metadata dict (protected by `_catalog_lock`)
- `load_configs` - Directory configurations (protected by `_load_configs_lock`)
- `registry` - Name collision tracking (internal lock)
**Per-Request State** (isolated):
- DuckDB connection (HTTP mode only)
- Query timeout timer
- Transaction state
### Scale Limits
- Current design: Optimized for ~10 concurrent users
- Tested with: 20 concurrent workers
- Upgrade path: If >20 users needed, consider connection pooling
## License
MIT
Raw data
{
"_id": null,
"home_page": null,
"name": "mcp-server-excel-sql",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "mcp, excel, sql, duckdb, server",
"author": null,
"author_email": "Ivan <ivan@eriad.com>",
"download_url": "https://files.pythonhosted.org/packages/f5/6f/15f078d28ad35538ed28186cacffe4e2d0091280b44c51661f1493ff426d/mcp_server_excel_sql-0.3.0.tar.gz",
"platform": null,
"description": "# mcp-server-excel-sql\n\nMCP server exposing Excel files as SQL-queryable DuckDB views with thread-safe concurrent access.\n\n## Features\n\n- SQL queries on Excel via DuckDB views\n- RAW (all_varchar) or ASSISTED (sheet_overrides) modes\n- Multi-row headers, type hints, unpivot, column renames\n- Auto-refresh on file changes (--watch)\n- Thread-safe concurrent access (HTTP/SSE transports)\n- Isolated query timeouts (per-connection interrupt)\n- API key authentication for HTTP transports (optional)\n\n## Installation\n\n```bash\npipx install mcp-server-excel-sql\n```\n\n## Usage\n\n### CLI\n\n```bash\n# STDIO mode (single-threaded)\nmcp-excel --path /data/excel --watch --overrides config.yaml\n\n# HTTP mode (concurrent, ~10 users)\nmcp-excel --path /data/excel --transport streamable-http --port 8000\n\n# HTTP mode with authentication\nexport MCP_EXCEL_API_KEY=your-secret-key\nmcp-excel --path /data/excel --transport sse --port 8000 --require-auth\n```\n\n### Claude Desktop\n\nAdd to `~/Library/Application Support/Claude/claude_desktop_config.json`:\n\n```json\n{\n \"mcpServers\": {\n \"excel\": {\n \"command\": \"mcp-excel\",\n \"args\": [\"--path\", \"/Users/your-username/data/excel\"]\n }\n }\n}\n```\n\nWith overrides:\n```json\n{\n \"mcpServers\": {\n \"excel\": {\n \"command\": \"mcp-excel\",\n \"args\": [\n \"--path\", \"/path/to/excel/files\",\n \"--overrides\", \"/path/to/config.yaml\"\n ]\n }\n }\n}\n```\n\n## MCP Tools\n\n**tool_load_dir** - Load Excel directory into views\n**tool_query** - Execute SELECT (read-only, timeout/limit enforced)\n**tool_list_tables** - List views with metadata\n**tool_get_schema** - DESCRIBE table\n**tool_refresh** - Rescan filesystem (incremental or full)\n\n## Table Naming\n\n**Format**: `<alias>.<filepath>.<sheet>` (dot-separated, lowercase, sanitized)\n\n**Sanitization**:\n- Spaces \u2192 `_`\n- Special chars \u2192 removed\n- Allowed: `[a-z0-9_$]`\n\n**Alias**: Auto-generated from directory name\n\n**Examples**:\n```\n/data/sales/Q1-2024.xlsx \u2192 \"sales.q12024.summary\"\n/reports/P&L (Final).xlsx \u2192 \"reports.plfinal.sheet1\"\n```\n\n**IMPORTANT**: Dots require quoted identifiers in SQL:\n```sql\nSELECT * FROM \"sales.q12024.summary\" -- correct\nSELECT * FROM sales.q12024.summary -- fails (Catalog Error)\n```\n\n## System Views\n\n`<alias>.__files` - File metadata (path, sheet_count, total_rows, mtime)\n`<alias>.__tables` - Table metadata (table_name, file, sheet, mode, est_rows)\n\nQuery: `SELECT * FROM \"sales.__files\"`\n\n## Modes\n\n**RAW**: `read_xlsx(..., all_varchar=true, header=false)`\n**ASSISTED**: Apply per-sheet overrides\n\n```yaml\nsales.xlsx:\n sheet_overrides:\n Summary:\n skip_rows: 3\n skip_footer: 2\n header_rows: 2\n drop_regex: \"^Total:\"\n column_renames:\n \"col_0\": \"region\"\n type_hints:\n amount: \"DECIMAL(10,2)\"\n date: \"DATE\"\n unpivot:\n id_vars: [\"Region\"]\n value_vars: [\"Jan\", \"Feb\"]\n var_name: \"Month\"\n value_name: \"Sales\"\n```\n\n## Concurrency & Thread Safety\n\n### Design\n\n**STDIO mode**: Single shared connection (single-threaded)\n**HTTP/SSE mode**: Isolated connections per request (concurrent)\n\nSupports ~10-20 concurrent users with <1ms overhead per request.\n\n### Thread Safety Features\n\n1. **Isolated Connections** - Each HTTP request gets its own DuckDB connection\n2. **Protected Catalog** - RLock guards all metadata dictionary access\n3. **Per-Connection Timeouts** - Query interrupts don't affect other users\n4. **Deadlock-Free** - Sequential lock acquisition (never nested)\n\n### Critical Fixes\n\n**Problem 1: Timeout Interference**\n- Before: One user's timeout killed all concurrent queries\n- After: Isolated connections ensure timeouts only affect their own query\n\n**Problem 2: Catalog Race**\n- Before: Concurrent refresh could corrupt table metadata\n- After: RLock protects all catalog reads/writes\n\n**Problem 3: Dictionary Mutation**\n- Before: Python dict operations not atomic (GIL can be released)\n- After: All dict mutations locked to prevent corruption\n\n### Performance\n\n- STDIO mode: 0% overhead (no changes to single-threaded path)\n- HTTP mode: ~0.6ms per request (connection + lock overhead)\n- Tested: 20 concurrent workers, 100+ operations\n\n### Testing\n\n**75/75 tests pass** including:\n- 6 concurrency tests (parallel queries, timeout isolation, refresh safety)\n- 4 stress tests (100+ concurrent operations, memory leak detection)\n- 1 authentication test (API key middleware validation)\n\n## Security\n\n- Read-only: BEGIN TRANSACTION READ ONLY (DuckDB-enforced, blocks all write operations)\n- Path-confined: Root path validation, no traversal\n- Timeout: threading.Timer \u2192 conn.interrupt() (isolated per connection)\n- Row limit: fetchmany(max_rows + 1)\n- Authentication: Optional API key for HTTP/SSE transports\n\n### Authentication\n\nAPI key authentication via Bearer token (HTTP/SSE transports only, STDIO bypassed).\n\n**Configuration**:\n```bash\nexport MCP_EXCEL_API_KEY=your-secret-key\nmcp-excel --path /data/excel --transport sse --require-auth\n```\n\n**Client requests**:\n```bash\ncurl -H \"Authorization: Bearer your-secret-key\" http://localhost:8000/sse\n```\n\n**Implementation**:\n- Starlette middleware intercepts all HTTP requests\n- Returns 401 for missing/invalid Authorization header\n- STDIO transport ignores --require-auth (local subprocess already OS-authenticated)\n- Single shared API key (environment variable)\n\n## Examples\n\nFinance examples for **Kopitiam Kita Sdn Bhd**, a Malaysian coffeehouse chain:\n\n```bash\n# Generate example files\npython examples/create_finance_examples.py\n\n# Load and query\nmcp-excel --path examples --alias finance --overrides examples/finance_overrides.yaml\n\n# Query examples\nSELECT SUM(COALESCE(debit, 0)) as total_debits\nFROM \"finance.general_ledger.entries\";\n\nSELECT region, SUM(revenue) as total_revenue\nFROM \"finance.revenue_by_segment.revenue\"\nGROUP BY region ORDER BY total_revenue DESC;\n```\n\n**Includes**:\n- 10 Excel files with 3,100+ financial records (MYR currency)\n- General Ledger, Financial Statements, AR Aging, Revenue Analysis\n- Budget Variance, Invoices, Trial Balance, Cash Flow Forecast\n- Complete prompt chain sequences in `examples/README.md`\n\nSee `examples/README.md` for detailed usage and SQL query patterns.\n\n## Development\n\n```bash\npip install -e \".[dev]\"\npytest --cov=mcp_excel tests/\npython -m build\n```\n\n**Tests**: 75 passing (12 unit + 47 integration + 10 regression + 6 concurrency + 4 stress)\n**Coverage**: Comprehensive test coverage including race condition scenarios\n\n## Architecture Notes\n\n### Transport Modes\n\n**STDIO** (default):\n- Single shared in-memory DuckDB connection\n- Single-threaded (no concurrency concerns)\n- Ideal for: Claude Desktop, local CLI usage\n\n**HTTP/SSE** (--transport streamable-http):\n- Persistent DuckDB file (temp, auto-cleanup)\n- Isolated connection per request\n- RLock-protected shared state (catalog, load_configs)\n- Ideal for: Multiple concurrent users, web APIs\n\n### State Management\n\n**Global State** (thread-safe via locks):\n- `catalog` - Table metadata dict (protected by `_catalog_lock`)\n- `load_configs` - Directory configurations (protected by `_load_configs_lock`)\n- `registry` - Name collision tracking (internal lock)\n\n**Per-Request State** (isolated):\n- DuckDB connection (HTTP mode only)\n- Query timeout timer\n- Transaction state\n\n### Scale Limits\n\n- Current design: Optimized for ~10 concurrent users\n- Tested with: 20 concurrent workers\n- Upgrade path: If >20 users needed, consider connection pooling\n\n## License\n\nMIT\n",
"bugtrack_url": null,
"license": null,
"summary": "MCP server that exposes Excel files as SQL-queryable tables via DuckDB",
"version": "0.3.0",
"project_urls": {
"Homepage": "https://github.com/ivan-loh/mcp-excel",
"Issues": "https://github.com/ivan-loh/mcp-excel/issues",
"Repository": "https://github.com/ivan-loh/mcp-excel"
},
"split_keywords": [
"mcp",
" excel",
" sql",
" duckdb",
" server"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "edc28acd70e4e27de97b04c96382e311d7d5255b5c48c0718ad1f1a61f119d63",
"md5": "83157f2e99779e23008bd8150767cffd",
"sha256": "c7a8676f7022a21f75a4e3acc6e646cfdf669bc804a32c478fc991d88fa7c612"
},
"downloads": -1,
"filename": "mcp_server_excel_sql-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "83157f2e99779e23008bd8150767cffd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 17770,
"upload_time": "2025-10-18T11:18:56",
"upload_time_iso_8601": "2025-10-18T11:18:56.529452Z",
"url": "https://files.pythonhosted.org/packages/ed/c2/8acd70e4e27de97b04c96382e311d7d5255b5c48c0718ad1f1a61f119d63/mcp_server_excel_sql-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f56f15f078d28ad35538ed28186cacffe4e2d0091280b44c51661f1493ff426d",
"md5": "a2247db7af2d70e305482c02c5790319",
"sha256": "0ca8141f764bcc7373df43c63eeba33bf5529b2a9866a064f5d2d32fce627fa0"
},
"downloads": -1,
"filename": "mcp_server_excel_sql-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "a2247db7af2d70e305482c02c5790319",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 248632,
"upload_time": "2025-10-18T11:18:58",
"upload_time_iso_8601": "2025-10-18T11:18:58.396718Z",
"url": "https://files.pythonhosted.org/packages/f5/6f/15f078d28ad35538ed28186cacffe4e2d0091280b44c51661f1493ff426d/mcp_server_excel_sql-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-18 11:18:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ivan-loh",
"github_project": "mcp-excel",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mcp-server-excel-sql"
}