localdata-mcp

Name	localdata-mcp JSON
Version	1.3.0 JSON
	download
home_page	None
Summary	A dynamic MCP server for local databases and text files.
upload_time	2025-08-30 16:00:06
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	None
keywords	mcp model-context-protocol database ai assistant
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![MseeP.ai Security Assessment Badge](https://mseep.net/pr/chrisgve-localdata-mcp-badge.png)](https://mseep.ai/app/chrisgve-localdata-mcp)

# LocalData MCP Server

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![PyPI version](https://img.shields.io/pypi/v/localdata-mcp.svg)](https://pypi.org/project/localdata-mcp/)
[![FastMCP](https://img.shields.io/badge/FastMCP-Compatible-green.svg)](https://github.com/jlowin/fastmcp)
[![Verified on MseeP](https://mseep.ai/badge.svg)](https://mseep.ai/app/cd737717-02f3-4388-bab7-5ec7cbe40713)

**A comprehensive MCP server for databases, spreadsheets, and structured data files with security features, performance optimization, and extensive format support.**

## ✨ Features

### 🗄️ **Multi-Database Support**

- **SQL Databases**: PostgreSQL, MySQL, SQLite, DuckDB
- **Modern Databases**: MongoDB, Redis, Elasticsearch, InfluxDB, Neo4j, CouchDB  
- **Spreadsheets**: Excel (.xlsx/.xls), LibreOffice Calc (.ods), Apple Numbers (.numbers)
- **Structured Files**: CSV, TSV, JSON, YAML, TOML, XML, INI
- **Analytical Formats**: Parquet, Feather, Arrow, HDF5

### 🔒 **Advanced Security**

- **Path Security**: Restricts file access to current working directory only
- **SQL Injection Prevention**: Parameterized queries and safe table identifiers
- **Connection Limits**: Maximum 10 concurrent database connections
- **Input Validation**: Comprehensive validation and sanitization

### 📊 **Large Dataset Handling**

- **Query Buffering**: Automatic buffering for results with 100+ rows
- **Large File Support**: 100MB+ files automatically use temporary SQLite storage
- **Chunk Retrieval**: Paginated access to large result sets
- **Auto-Cleanup**: 10-minute expiry with file modification detection

### 🛠️ **Developer Experience**

- **Comprehensive Tools**: 12 database operation tools
- **Error Handling**: Detailed, actionable error messages
- **Thread Safety**: Concurrent operation support
- **Backward Compatible**: All existing APIs preserved

## 🚀 Quick Start

### Installation

```bash
# Using pip
pip install localdata-mcp

# Using uv (recommended)
uv tool install localdata-mcp

# Development installation
git clone https://github.com/ChrisGVE/localdata-mcp.git
cd localdata-mcp
pip install -e .
```

### Configuration

Add to your MCP client configuration:

```json
{
  "mcpServers": {
    "localdata": {
      "command": "localdata-mcp",
      "env": {}
    }
  }
}
```

### Usage Examples

#### Connect to Databases

```python
# PostgreSQL
connect_database("analytics", "postgresql", "postgresql://user:pass@localhost/db")

# SQLite
connect_database("local", "sqlite", "./data.sqlite")

# CSV Files
connect_database("csvdata", "csv", "./data.csv")

# JSON Files
connect_database("config", "json", "./config.json")

# Excel Spreadsheets (all sheets)
connect_database("sales", "xlsx", "./sales_data.xlsx")

# Excel with specific sheet
connect_database("q1data", "xlsx", "./quarterly.xlsx?sheet=Q1_Sales")

# LibreOffice Calc
connect_database("budget", "ods", "./budget_2024.ods")

# Tab-separated values
connect_database("exports", "tsv", "./export_data.tsv")

# XML structured data
connect_database("config_xml", "xml", "./config.xml")

# INI configuration files
connect_database("settings", "ini", "./app.ini")

# Analytical formats
connect_database("analytics", "parquet", "./data.parquet")
connect_database("features", "feather", "./features.feather")
connect_database("vectors", "arrow", "./vectors.arrow")
```

#### Query Data

```python
# Execute queries with automatic result formatting
execute_query("analytics", "SELECT * FROM users LIMIT 50")

# Large result sets use buffering automatically
execute_query_json("analytics", "SELECT * FROM large_table")
```

#### Handle Large Results

```python
# Get chunked results for large datasets
get_query_chunk("analytics_1640995200_a1b2", 101, "100")

# Check buffer status
get_buffered_query_info("analytics_1640995200_a1b2")

# Manual cleanup
clear_query_buffer("analytics_1640995200_a1b2")
```

## 🔧 Available Tools

| Tool                      | Description                | Use Case      |
| ------------------------- | -------------------------- | ------------- |
| `connect_database`        | Connect to databases/files | Initial setup |
| `disconnect_database`     | Close connections          | Cleanup       |
| `list_databases`          | Show active connections    | Status check  |
| `execute_query`           | Run SQL (markdown output)  | Small results |
| `execute_query_json`      | Run SQL (JSON output)      | Large results |
| `describe_database`       | Show schema/structure      | Exploration   |
| `describe_table`          | Show table details         | Analysis      |
| `get_table_sample`        | Preview table data         | Quick look    |
| `get_table_sample_json`   | Preview (JSON format)      | Development   |
| `find_table`              | Locate tables by name      | Navigation    |
| `read_text_file`          | Read structured files      | File access   |
| `get_query_chunk`         | Paginated result access    | Large data    |
| `get_buffered_query_info` | Buffer status info         | Monitoring    |
| `clear_query_buffer`      | Manual buffer cleanup      | Management    |

## 📋 Supported Data Sources

### SQL Databases

- **PostgreSQL**: Full support with connection pooling
- **MySQL**: Complete MySQL/MariaDB compatibility  
- **SQLite**: Local file and in-memory databases
- **DuckDB**: High-performance analytical SQL database

### Modern Databases

- **MongoDB**: Document store with collection queries and aggregation
- **Redis**: High-performance key-value store
- **Elasticsearch**: Full-text search and analytics engine
- **InfluxDB**: Time-series database for metrics and IoT data
- **Neo4j**: Graph database for relationship queries
- **CouchDB**: Document-oriented database with HTTP API

### Structured Files

#### Spreadsheet Formats
- **Excel (.xlsx, .xls)**: Full multi-sheet support with automatic table creation
- **LibreOffice Calc (.ods)**: Complete ODS support with sheet handling
- **Apple Numbers (.numbers)**: Native support for Numbers documents
- **Multi-sheet handling**: Each sheet becomes a separate queryable table

#### Text-Based Formats
- **CSV**: Large file automatic SQLite conversion
- **TSV**: Tab-separated values with same features as CSV
- **JSON**: Nested structure flattening
- **YAML**: Configuration file support
- **TOML**: Settings and config files
- **XML**: Structured XML document parsing
- **INI**: Configuration file format support

#### Analytical Formats
- **Parquet**: High-performance columnar data format
- **Feather**: Fast binary format for data interchange
- **Arrow**: In-memory columnar format support
- **HDF5**: Hierarchical data format for scientific computing

## 🛡️ Security Features

### Path Security

```python
# ✅ Allowed - current directory and subdirectories
"./data/users.csv"
"data/config.json"
"subdir/file.yaml"

# ❌ Blocked - parent directory access
"../etc/passwd"
"../../sensitive.db"
"/etc/hosts"
```

### SQL Injection Prevention

```python
# ✅ Safe - parameterized queries
describe_table("mydb", "users")  # Validates table name

# ❌ Blocked - malicious input
describe_table("mydb", "users; DROP TABLE users; --")
```

### Resource Limits

- **Connection Limit**: Maximum 10 concurrent connections
- **File Size Threshold**: 100MB triggers temporary storage
- **Query Buffering**: Automatic for 100+ row results
- **Auto-Cleanup**: Buffers expire after 10 minutes

## 📊 Performance & Scalability

### Large File Handling

- Files over 100MB automatically use temporary SQLite storage
- Memory-efficient streaming for large datasets
- Automatic cleanup of temporary files

### Query Optimization

- Results with 100+ rows automatically use buffering system
- Chunk-based retrieval for large datasets
- File modification detection for cache invalidation

### Concurrency

- Thread-safe connection management
- Concurrent query execution support
- Resource pooling and limits

## 🧪 Testing & Quality

**✅ Comprehensive Test Coverage**

- 68% test coverage with 500+ test cases
- Import error handling and graceful degradation
- Security vulnerability testing  
- Performance benchmarking with large datasets
- Modern database connection testing

**🔒 Security Validated**

- Path traversal prevention
- SQL injection protection
- Resource exhaustion testing
- Malicious input handling

**⚡ Performance Tested**

- Large file processing
- Concurrent connection handling
- Memory usage optimization
- Query response times

## 🔄 API Compatibility

All existing MCP tool signatures remain **100% backward compatible**. New functionality is additive only:

- ✅ All original tools work unchanged
- ✅ Enhanced responses with additional metadata
- ✅ New buffering tools for large datasets
- ✅ Improved error messages and validation

## 📖 Examples

### Basic Database Operations

```python
# Connect to SQLite
connect_database("sales", "sqlite", "./sales.db")

# Explore structure
describe_database("sales")
describe_table("sales", "orders")

# Query data
execute_query("sales", "SELECT product, SUM(amount) FROM orders GROUP BY product")
```

### Large Dataset Processing

```python
# Connect to large CSV
connect_database("bigdata", "csv", "./million_records.csv")

# Query returns buffer info for large results
result = execute_query_json("bigdata", "SELECT * FROM data WHERE category = 'A'")

# Access results in chunks
chunk = get_query_chunk("bigdata_1640995200_a1b2", 1, "1000")
```

### Multi-Database Analysis

```python
# Connect multiple sources
connect_database("postgres", "postgresql", "postgresql://localhost/prod")
connect_database("config", "yaml", "./config.yaml")
connect_database("logs", "json", "./logs.json")

# Query across sources (in application logic)
user_data = execute_query("postgres", "SELECT * FROM users")
config = read_text_file("./config.yaml", "yaml")
```

### Multi-Sheet Spreadsheet Handling

LocalData MCP Server provides comprehensive support for multi-sheet spreadsheets (Excel and LibreOffice Calc):

#### Automatic Multi-Sheet Processing

```python
# Connect to Excel file - all sheets become separate tables
connect_database("workbook", "xlsx", "./financial_data.xlsx")

# Query specific sheet (table names are sanitized sheet names)
execute_query("workbook", "SELECT * FROM Q1_Sales")
execute_query("workbook", "SELECT * FROM Q2_Budget")
execute_query("workbook", "SELECT * FROM Annual_Summary")
```

#### Single Sheet Selection

```python
# Connect to specific sheet only using ?sheet=SheetName syntax
connect_database("q1only", "xlsx", "./financial_data.xlsx?sheet=Q1 Sales")

# The data is available as the default table
execute_query("q1only", "SELECT * FROM data")
```

#### Sheet Name Sanitization

Sheet names are automatically sanitized for SQL compatibility:

| Original Sheet Name | SQL Table Name |
| ------------------- | -------------- |
| "Q1 Sales"          | Q1_Sales       |
| "2024-Budget"       | _2024_Budget   |
| "Summary & Notes"   | Summary__Notes |

#### Discovering Available Sheets

```python
# Connect to multi-sheet workbook
connect_database("workbook", "xlsx", "./data.xlsx")

# List all available tables (sheets)
describe_database("workbook")

# Get sample data from specific sheet
get_table_sample("workbook", "Sheet1")
```

## 🚧 Roadmap

### Completed (v1.1.0)
- [x] **Spreadsheet Formats**: Excel (.xlsx/.xls), LibreOffice Calc (.ods) with full multi-sheet support
- [x] **Enhanced File Formats**: XML, INI, TSV support
- [x] **Analytical Formats**: Parquet, Feather, Arrow support

### Planned Features
- [ ] **Caching Layer**: Configurable query result caching
- [ ] **Connection Pooling**: Advanced connection management
- [ ] **Streaming APIs**: Real-time data processing
- [ ] **Monitoring Tools**: Connection and performance metrics
- [ ] **Export Capabilities**: Query results to various formats

## 🛠️ Troubleshooting

### Spreadsheet Format Issues

#### Large Excel Files
```python
# For files over 100MB, temporary SQLite storage is used automatically
connect_database("largefile", "xlsx", "./large_workbook.xlsx")

# Monitor processing with describe_database
describe_database("largefile")  # Shows processing status
```

#### Sheet Name Conflicts
```python
# If sheet names conflict after sanitization, use specific sheet selection
connect_database("specific", "xlsx", "./workbook.xlsx?sheet=Sheet1")

# Check sanitized names
describe_database("workbook")  # Lists all table names
```

#### Format Detection
```python
# Ensure correct file extension for proper format detection
connect_database("data", "xlsx", "./file.xlsx")  # ✅ Correct
connect_database("data", "xlsx", "./file.xls")   # ⚠️ May cause issues

# Use explicit format specification
connect_database("data", "xls", "./old_format.xls")  # ✅ Better
```

#### Multi-Sheet Selection Issues
```python
# Sheet names with special characters need URL encoding
connect_database("data", "xlsx", "./file.xlsx?sheet=Q1%20Sales")  # For "Q1 Sales"

# Or use the sanitized table name after connecting all sheets
connect_database("workbook", "xlsx", "./file.xlsx")
execute_query("workbook", "SELECT * FROM Q1_Sales")  # Use sanitized name
```

#### Performance Optimization
```python
# For better performance with large spreadsheets:
# 1. Use specific sheet selection when possible
connect_database("q1", "xlsx", "./large.xlsx?sheet=Q1_Data")

# 2. Use LIMIT clauses for large datasets
execute_query("data", "SELECT * FROM large_sheet LIMIT 1000")

# 3. Consider converting to Parquet for repeated analysis
# (Manual conversion outside of LocalData MCP recommended for very large files)
```

### General File Issues

#### Path Security Errors
```python
# ✅ Allowed paths (current directory and subdirectories)
connect_database("data", "csv", "./data/file.csv")
connect_database("data", "csv", "subfolder/file.csv")

# ❌ Blocked paths (parent directories)
connect_database("data", "csv", "../data/file.csv")  # Security error
```

#### Connection Limits
```python
# Maximum 10 concurrent connections
# Use disconnect_database() to free up connections when done
disconnect_database("old_connection")
```

## 🤝 Contributing

Contributions welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details.

### Development Setup

```bash
git clone https://github.com/ChrisGVE/localdata-mcp.git
cd localdata-mcp
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev]"
pytest
```

## 📄 License

MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- **GitHub**: [localdata-mcp](https://github.com/ChrisGVE/localdata-mcp)
- **PyPI**: [localdata-mcp](https://pypi.org/project/localdata-mcp/)
- **MCP Protocol**: [Model Context Protocol](https://modelcontextprotocol.io/)
- **FastMCP**: [FastMCP Framework](https://github.com/jlowin/fastmcp)

## 📊 Stats

![GitHub stars](https://img.shields.io/github/stars/ChrisGVE/localdata-mcp?style=social)
![GitHub forks](https://img.shields.io/github/forks/ChrisGVE/localdata-mcp?style=social)
![PyPI downloads](https://img.shields.io/pypi/dm/localdata-mcp)

## 📚 Additional Resources

- **[FAQ](FAQ.md)**: Common questions and troubleshooting
- **[Troubleshooting Guide](TROUBLESHOOTING.md)**: Comprehensive problem resolution
- **[Advanced Examples](ADVANCED_EXAMPLES.md)**: Production-ready usage patterns
- **[Blog Post](BLOG_POST.md)**: Technical deep dive and use cases

## 🤔 Need Help?

- **Issues**: [GitHub Issues](https://github.com/ChrisGVE/localdata-mcp/issues)
- **Discussions**: [GitHub Discussions](https://github.com/ChrisGVE/localdata-mcp/discussions)
- **Email**: Available in GitHub profile
- **Community**: Join MCP community forums

## 🏷️ Tags

`mcp` `model-context-protocol` `database` `postgresql` `mysql` `sqlite` `mongodb` `spreadsheet` `excel` `xlsx` `ods` `csv` `tsv` `json` `yaml` `toml` `xml` `ini` `parquet` `feather` `arrow` `ai` `machine-learning` `data-integration` `python` `security` `performance`

---

**Made with ❤️ for the MCP Community**

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "localdata-mcp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "mcp, model-context-protocol, database, ai, assistant",
    "author": null,
    "author_email": "\"Christian C. Berclaz\" <christian.berclaz@mac.com>",
    "download_url": "https://files.pythonhosted.org/packages/01/c4/d1c3037b304c61d63e50ace9788aad49006df9805f5b7e6432282cd26038/localdata_mcp-1.3.0.tar.gz",
    "platform": null,
    "description": "[![MseeP.ai Security Assessment Badge](https://mseep.net/pr/chrisgve-localdata-mcp-badge.png)](https://mseep.ai/app/chrisgve-localdata-mcp)\n\n# LocalData MCP Server\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![PyPI version](https://img.shields.io/pypi/v/localdata-mcp.svg)](https://pypi.org/project/localdata-mcp/)\n[![FastMCP](https://img.shields.io/badge/FastMCP-Compatible-green.svg)](https://github.com/jlowin/fastmcp)\n[![Verified on MseeP](https://mseep.ai/badge.svg)](https://mseep.ai/app/cd737717-02f3-4388-bab7-5ec7cbe40713)\n\n**A comprehensive MCP server for databases, spreadsheets, and structured data files with security features, performance optimization, and extensive format support.**\n\n## \u2728 Features\n\n### \ud83d\uddc4\ufe0f **Multi-Database Support**\n\n- **SQL Databases**: PostgreSQL, MySQL, SQLite, DuckDB\n- **Modern Databases**: MongoDB, Redis, Elasticsearch, InfluxDB, Neo4j, CouchDB  \n- **Spreadsheets**: Excel (.xlsx/.xls), LibreOffice Calc (.ods), Apple Numbers (.numbers)\n- **Structured Files**: CSV, TSV, JSON, YAML, TOML, XML, INI\n- **Analytical Formats**: Parquet, Feather, Arrow, HDF5\n\n### \ud83d\udd12 **Advanced Security**\n\n- **Path Security**: Restricts file access to current working directory only\n- **SQL Injection Prevention**: Parameterized queries and safe table identifiers\n- **Connection Limits**: Maximum 10 concurrent database connections\n- **Input Validation**: Comprehensive validation and sanitization\n\n### \ud83d\udcca **Large Dataset Handling**\n\n- **Query Buffering**: Automatic buffering for results with 100+ rows\n- **Large File Support**: 100MB+ files automatically use temporary SQLite storage\n- **Chunk Retrieval**: Paginated access to large result sets\n- **Auto-Cleanup**: 10-minute expiry with file modification detection\n\n### \ud83d\udee0\ufe0f **Developer Experience**\n\n- **Comprehensive Tools**: 12 database operation tools\n- **Error Handling**: Detailed, actionable error messages\n- **Thread Safety**: Concurrent operation support\n- **Backward Compatible**: All existing APIs preserved\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\n# Using pip\npip install localdata-mcp\n\n# Using uv (recommended)\nuv tool install localdata-mcp\n\n# Development installation\ngit clone https://github.com/ChrisGVE/localdata-mcp.git\ncd localdata-mcp\npip install -e .\n```\n\n### Configuration\n\nAdd to your MCP client configuration:\n\n```json\n{\n  \"mcpServers\": {\n    \"localdata\": {\n      \"command\": \"localdata-mcp\",\n      \"env\": {}\n    }\n  }\n}\n```\n\n### Usage Examples\n\n#### Connect to Databases\n\n```python\n# PostgreSQL\nconnect_database(\"analytics\", \"postgresql\", \"postgresql://user:pass@localhost/db\")\n\n# SQLite\nconnect_database(\"local\", \"sqlite\", \"./data.sqlite\")\n\n# CSV Files\nconnect_database(\"csvdata\", \"csv\", \"./data.csv\")\n\n# JSON Files\nconnect_database(\"config\", \"json\", \"./config.json\")\n\n# Excel Spreadsheets (all sheets)\nconnect_database(\"sales\", \"xlsx\", \"./sales_data.xlsx\")\n\n# Excel with specific sheet\nconnect_database(\"q1data\", \"xlsx\", \"./quarterly.xlsx?sheet=Q1_Sales\")\n\n# LibreOffice Calc\nconnect_database(\"budget\", \"ods\", \"./budget_2024.ods\")\n\n# Tab-separated values\nconnect_database(\"exports\", \"tsv\", \"./export_data.tsv\")\n\n# XML structured data\nconnect_database(\"config_xml\", \"xml\", \"./config.xml\")\n\n# INI configuration files\nconnect_database(\"settings\", \"ini\", \"./app.ini\")\n\n# Analytical formats\nconnect_database(\"analytics\", \"parquet\", \"./data.parquet\")\nconnect_database(\"features\", \"feather\", \"./features.feather\")\nconnect_database(\"vectors\", \"arrow\", \"./vectors.arrow\")\n```\n\n#### Query Data\n\n```python\n# Execute queries with automatic result formatting\nexecute_query(\"analytics\", \"SELECT * FROM users LIMIT 50\")\n\n# Large result sets use buffering automatically\nexecute_query_json(\"analytics\", \"SELECT * FROM large_table\")\n```\n\n#### Handle Large Results\n\n```python\n# Get chunked results for large datasets\nget_query_chunk(\"analytics_1640995200_a1b2\", 101, \"100\")\n\n# Check buffer status\nget_buffered_query_info(\"analytics_1640995200_a1b2\")\n\n# Manual cleanup\nclear_query_buffer(\"analytics_1640995200_a1b2\")\n```\n\n## \ud83d\udd27 Available Tools\n\n| Tool                      | Description                | Use Case      |\n| ------------------------- | -------------------------- | ------------- |\n| `connect_database`        | Connect to databases/files | Initial setup |\n| `disconnect_database`     | Close connections          | Cleanup       |\n| `list_databases`          | Show active connections    | Status check  |\n| `execute_query`           | Run SQL (markdown output)  | Small results |\n| `execute_query_json`      | Run SQL (JSON output)      | Large results |\n| `describe_database`       | Show schema/structure      | Exploration   |\n| `describe_table`          | Show table details         | Analysis      |\n| `get_table_sample`        | Preview table data         | Quick look    |\n| `get_table_sample_json`   | Preview (JSON format)      | Development   |\n| `find_table`              | Locate tables by name      | Navigation    |\n| `read_text_file`          | Read structured files      | File access   |\n| `get_query_chunk`         | Paginated result access    | Large data    |\n| `get_buffered_query_info` | Buffer status info         | Monitoring    |\n| `clear_query_buffer`      | Manual buffer cleanup      | Management    |\n\n## \ud83d\udccb Supported Data Sources\n\n### SQL Databases\n\n- **PostgreSQL**: Full support with connection pooling\n- **MySQL**: Complete MySQL/MariaDB compatibility  \n- **SQLite**: Local file and in-memory databases\n- **DuckDB**: High-performance analytical SQL database\n\n### Modern Databases\n\n- **MongoDB**: Document store with collection queries and aggregation\n- **Redis**: High-performance key-value store\n- **Elasticsearch**: Full-text search and analytics engine\n- **InfluxDB**: Time-series database for metrics and IoT data\n- **Neo4j**: Graph database for relationship queries\n- **CouchDB**: Document-oriented database with HTTP API\n\n### Structured Files\n\n#### Spreadsheet Formats\n- **Excel (.xlsx, .xls)**: Full multi-sheet support with automatic table creation\n- **LibreOffice Calc (.ods)**: Complete ODS support with sheet handling\n- **Apple Numbers (.numbers)**: Native support for Numbers documents\n- **Multi-sheet handling**: Each sheet becomes a separate queryable table\n\n#### Text-Based Formats\n- **CSV**: Large file automatic SQLite conversion\n- **TSV**: Tab-separated values with same features as CSV\n- **JSON**: Nested structure flattening\n- **YAML**: Configuration file support\n- **TOML**: Settings and config files\n- **XML**: Structured XML document parsing\n- **INI**: Configuration file format support\n\n#### Analytical Formats\n- **Parquet**: High-performance columnar data format\n- **Feather**: Fast binary format for data interchange\n- **Arrow**: In-memory columnar format support\n- **HDF5**: Hierarchical data format for scientific computing\n\n## \ud83d\udee1\ufe0f Security Features\n\n### Path Security\n\n```python\n# \u2705 Allowed - current directory and subdirectories\n\"./data/users.csv\"\n\"data/config.json\"\n\"subdir/file.yaml\"\n\n# \u274c Blocked - parent directory access\n\"../etc/passwd\"\n\"../../sensitive.db\"\n\"/etc/hosts\"\n```\n\n### SQL Injection Prevention\n\n```python\n# \u2705 Safe - parameterized queries\ndescribe_table(\"mydb\", \"users\")  # Validates table name\n\n# \u274c Blocked - malicious input\ndescribe_table(\"mydb\", \"users; DROP TABLE users; --\")\n```\n\n### Resource Limits\n\n- **Connection Limit**: Maximum 10 concurrent connections\n- **File Size Threshold**: 100MB triggers temporary storage\n- **Query Buffering**: Automatic for 100+ row results\n- **Auto-Cleanup**: Buffers expire after 10 minutes\n\n## \ud83d\udcca Performance & Scalability\n\n### Large File Handling\n\n- Files over 100MB automatically use temporary SQLite storage\n- Memory-efficient streaming for large datasets\n- Automatic cleanup of temporary files\n\n### Query Optimization\n\n- Results with 100+ rows automatically use buffering system\n- Chunk-based retrieval for large datasets\n- File modification detection for cache invalidation\n\n### Concurrency\n\n- Thread-safe connection management\n- Concurrent query execution support\n- Resource pooling and limits\n\n## \ud83e\uddea Testing & Quality\n\n**\u2705 Comprehensive Test Coverage**\n\n- 68% test coverage with 500+ test cases\n- Import error handling and graceful degradation\n- Security vulnerability testing  \n- Performance benchmarking with large datasets\n- Modern database connection testing\n\n**\ud83d\udd12 Security Validated**\n\n- Path traversal prevention\n- SQL injection protection\n- Resource exhaustion testing\n- Malicious input handling\n\n**\u26a1 Performance Tested**\n\n- Large file processing\n- Concurrent connection handling\n- Memory usage optimization\n- Query response times\n\n## \ud83d\udd04 API Compatibility\n\nAll existing MCP tool signatures remain **100% backward compatible**. New functionality is additive only:\n\n- \u2705 All original tools work unchanged\n- \u2705 Enhanced responses with additional metadata\n- \u2705 New buffering tools for large datasets\n- \u2705 Improved error messages and validation\n\n## \ud83d\udcd6 Examples\n\n### Basic Database Operations\n\n```python\n# Connect to SQLite\nconnect_database(\"sales\", \"sqlite\", \"./sales.db\")\n\n# Explore structure\ndescribe_database(\"sales\")\ndescribe_table(\"sales\", \"orders\")\n\n# Query data\nexecute_query(\"sales\", \"SELECT product, SUM(amount) FROM orders GROUP BY product\")\n```\n\n### Large Dataset Processing\n\n```python\n# Connect to large CSV\nconnect_database(\"bigdata\", \"csv\", \"./million_records.csv\")\n\n# Query returns buffer info for large results\nresult = execute_query_json(\"bigdata\", \"SELECT * FROM data WHERE category = 'A'\")\n\n# Access results in chunks\nchunk = get_query_chunk(\"bigdata_1640995200_a1b2\", 1, \"1000\")\n```\n\n### Multi-Database Analysis\n\n```python\n# Connect multiple sources\nconnect_database(\"postgres\", \"postgresql\", \"postgresql://localhost/prod\")\nconnect_database(\"config\", \"yaml\", \"./config.yaml\")\nconnect_database(\"logs\", \"json\", \"./logs.json\")\n\n# Query across sources (in application logic)\nuser_data = execute_query(\"postgres\", \"SELECT * FROM users\")\nconfig = read_text_file(\"./config.yaml\", \"yaml\")\n```\n\n### Multi-Sheet Spreadsheet Handling\n\nLocalData MCP Server provides comprehensive support for multi-sheet spreadsheets (Excel and LibreOffice Calc):\n\n#### Automatic Multi-Sheet Processing\n\n```python\n# Connect to Excel file - all sheets become separate tables\nconnect_database(\"workbook\", \"xlsx\", \"./financial_data.xlsx\")\n\n# Query specific sheet (table names are sanitized sheet names)\nexecute_query(\"workbook\", \"SELECT * FROM Q1_Sales\")\nexecute_query(\"workbook\", \"SELECT * FROM Q2_Budget\")\nexecute_query(\"workbook\", \"SELECT * FROM Annual_Summary\")\n```\n\n#### Single Sheet Selection\n\n```python\n# Connect to specific sheet only using ?sheet=SheetName syntax\nconnect_database(\"q1only\", \"xlsx\", \"./financial_data.xlsx?sheet=Q1 Sales\")\n\n# The data is available as the default table\nexecute_query(\"q1only\", \"SELECT * FROM data\")\n```\n\n#### Sheet Name Sanitization\n\nSheet names are automatically sanitized for SQL compatibility:\n\n| Original Sheet Name | SQL Table Name |\n| ------------------- | -------------- |\n| \"Q1 Sales\"          | Q1_Sales       |\n| \"2024-Budget\"       | _2024_Budget   |\n| \"Summary & Notes\"   | Summary__Notes |\n\n#### Discovering Available Sheets\n\n```python\n# Connect to multi-sheet workbook\nconnect_database(\"workbook\", \"xlsx\", \"./data.xlsx\")\n\n# List all available tables (sheets)\ndescribe_database(\"workbook\")\n\n# Get sample data from specific sheet\nget_table_sample(\"workbook\", \"Sheet1\")\n```\n\n## \ud83d\udea7 Roadmap\n\n### Completed (v1.1.0)\n- [x] **Spreadsheet Formats**: Excel (.xlsx/.xls), LibreOffice Calc (.ods) with full multi-sheet support\n- [x] **Enhanced File Formats**: XML, INI, TSV support\n- [x] **Analytical Formats**: Parquet, Feather, Arrow support\n\n### Planned Features\n- [ ] **Caching Layer**: Configurable query result caching\n- [ ] **Connection Pooling**: Advanced connection management\n- [ ] **Streaming APIs**: Real-time data processing\n- [ ] **Monitoring Tools**: Connection and performance metrics\n- [ ] **Export Capabilities**: Query results to various formats\n\n## \ud83d\udee0\ufe0f Troubleshooting\n\n### Spreadsheet Format Issues\n\n#### Large Excel Files\n```python\n# For files over 100MB, temporary SQLite storage is used automatically\nconnect_database(\"largefile\", \"xlsx\", \"./large_workbook.xlsx\")\n\n# Monitor processing with describe_database\ndescribe_database(\"largefile\")  # Shows processing status\n```\n\n#### Sheet Name Conflicts\n```python\n# If sheet names conflict after sanitization, use specific sheet selection\nconnect_database(\"specific\", \"xlsx\", \"./workbook.xlsx?sheet=Sheet1\")\n\n# Check sanitized names\ndescribe_database(\"workbook\")  # Lists all table names\n```\n\n#### Format Detection\n```python\n# Ensure correct file extension for proper format detection\nconnect_database(\"data\", \"xlsx\", \"./file.xlsx\")  # \u2705 Correct\nconnect_database(\"data\", \"xlsx\", \"./file.xls\")   # \u26a0\ufe0f May cause issues\n\n# Use explicit format specification\nconnect_database(\"data\", \"xls\", \"./old_format.xls\")  # \u2705 Better\n```\n\n#### Multi-Sheet Selection Issues\n```python\n# Sheet names with special characters need URL encoding\nconnect_database(\"data\", \"xlsx\", \"./file.xlsx?sheet=Q1%20Sales\")  # For \"Q1 Sales\"\n\n# Or use the sanitized table name after connecting all sheets\nconnect_database(\"workbook\", \"xlsx\", \"./file.xlsx\")\nexecute_query(\"workbook\", \"SELECT * FROM Q1_Sales\")  # Use sanitized name\n```\n\n#### Performance Optimization\n```python\n# For better performance with large spreadsheets:\n# 1. Use specific sheet selection when possible\nconnect_database(\"q1\", \"xlsx\", \"./large.xlsx?sheet=Q1_Data\")\n\n# 2. Use LIMIT clauses for large datasets\nexecute_query(\"data\", \"SELECT * FROM large_sheet LIMIT 1000\")\n\n# 3. Consider converting to Parquet for repeated analysis\n# (Manual conversion outside of LocalData MCP recommended for very large files)\n```\n\n### General File Issues\n\n#### Path Security Errors\n```python\n# \u2705 Allowed paths (current directory and subdirectories)\nconnect_database(\"data\", \"csv\", \"./data/file.csv\")\nconnect_database(\"data\", \"csv\", \"subfolder/file.csv\")\n\n# \u274c Blocked paths (parent directories)\nconnect_database(\"data\", \"csv\", \"../data/file.csv\")  # Security error\n```\n\n#### Connection Limits\n```python\n# Maximum 10 concurrent connections\n# Use disconnect_database() to free up connections when done\ndisconnect_database(\"old_connection\")\n```\n\n## \ud83e\udd1d Contributing\n\nContributions welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\ngit clone https://github.com/ChrisGVE/localdata-mcp.git\ncd localdata-mcp\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\npip install -e \".[dev]\"\npytest\n```\n\n## \ud83d\udcc4 License\n\nMIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **GitHub**: [localdata-mcp](https://github.com/ChrisGVE/localdata-mcp)\n- **PyPI**: [localdata-mcp](https://pypi.org/project/localdata-mcp/)\n- **MCP Protocol**: [Model Context Protocol](https://modelcontextprotocol.io/)\n- **FastMCP**: [FastMCP Framework](https://github.com/jlowin/fastmcp)\n\n## \ud83d\udcca Stats\n\n![GitHub stars](https://img.shields.io/github/stars/ChrisGVE/localdata-mcp?style=social)\n![GitHub forks](https://img.shields.io/github/forks/ChrisGVE/localdata-mcp?style=social)\n![PyPI downloads](https://img.shields.io/pypi/dm/localdata-mcp)\n\n## \ud83d\udcda Additional Resources\n\n- **[FAQ](FAQ.md)**: Common questions and troubleshooting\n- **[Troubleshooting Guide](TROUBLESHOOTING.md)**: Comprehensive problem resolution\n- **[Advanced Examples](ADVANCED_EXAMPLES.md)**: Production-ready usage patterns\n- **[Blog Post](BLOG_POST.md)**: Technical deep dive and use cases\n\n## \ud83e\udd14 Need Help?\n\n- **Issues**: [GitHub Issues](https://github.com/ChrisGVE/localdata-mcp/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/ChrisGVE/localdata-mcp/discussions)\n- **Email**: Available in GitHub profile\n- **Community**: Join MCP community forums\n\n## \ud83c\udff7\ufe0f Tags\n\n`mcp` `model-context-protocol` `database` `postgresql` `mysql` `sqlite` `mongodb` `spreadsheet` `excel` `xlsx` `ods` `csv` `tsv` `json` `yaml` `toml` `xml` `ini` `parquet` `feather` `arrow` `ai` `machine-learning` `data-integration` `python` `security` `performance`\n\n---\n\n**Made with \u2764\ufe0f for the MCP Community**\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A dynamic MCP server for local databases and text files.",
    "version": "1.3.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/chrisgve/localdata-mcp/issues",
        "Documentation": "https://github.com/chrisgve/localdata-mcp#readme",
        "Homepage": "https://github.com/chrisgve/localdata-mcp",
        "Repository": "https://github.com/chrisgve/localdata-mcp"
    },
    "split_keywords": [
        "mcp",
        " model-context-protocol",
        " database",
        " ai",
        " assistant"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "edb51c6ea3fbdabdf3f93762df285c92a51218f8ecb70848caa8aaf57efacf8c",
                "md5": "1866d8291bcc2b3a8f82d3462fac6676",
                "sha256": "3d31551573ab5630f92dff52570ddfa2e45a0be9928ea7360f0ffd99e3293a9c"
            },
            "downloads": -1,
            "filename": "localdata_mcp-1.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1866d8291bcc2b3a8f82d3462fac6676",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 23658,
            "upload_time": "2025-08-30T16:00:05",
            "upload_time_iso_8601": "2025-08-30T16:00:05.411332Z",
            "url": "https://files.pythonhosted.org/packages/ed/b5/1c6ea3fbdabdf3f93762df285c92a51218f8ecb70848caa8aaf57efacf8c/localdata_mcp-1.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "01c4d1c3037b304c61d63e50ace9788aad49006df9805f5b7e6432282cd26038",
                "md5": "ba8d15492888ae87688597175b28e712",
                "sha256": "b731f7063e48fe28ff822c8a7155d172fbe9ea8e6aabbabac4bffdb9f804ceaa"
            },
            "downloads": -1,
            "filename": "localdata_mcp-1.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ba8d15492888ae87688597175b28e712",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 30203,
            "upload_time": "2025-08-30T16:00:06",
            "upload_time_iso_8601": "2025-08-30T16:00:06.826495Z",
            "url": "https://files.pythonhosted.org/packages/01/c4/d1c3037b304c61d63e50ace9788aad49006df9805f5b7e6432282cd26038/localdata_mcp-1.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-30 16:00:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chrisgve",
    "github_project": "localdata-mcp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "localdata-mcp"
}

None