# MCP CSV Database Server
A Model Context Protocol (MCP) server that provides comprehensive tools for loading CSV files into a temporary SQLite database and performing advanced data analysis with AI assistance.
## Features
- **Smart CSV Loading**: Automatically detect CSV separators and load multiple files from a folder
- **Advanced SQL Queries**: Execute any SQL query with automatic result formatting and pagination
- **Schema Inspection**: View database schema, table structures, and relationships
- **Data Quality Analysis**: Comprehensive missing data analysis, duplicate detection, and data profiling
- **Statistical Analysis**: Column statistics, data summaries, and distribution analysis
- **Export Capabilities**: Export query results or tables back to CSV with custom formatting
- **Performance Tools**: Create indexes, analyze query execution plans, and optimize performance
- **AI-Ready**: Designed for seamless integration with AI assistants for data analysis workflows
## Installation
### From PyPI
```bash
pip install mcp-csv-database
```
### From source
```bash
git clone https://github.com/Lasitha-Jayawardana/mcp-csv-database.git
cd mcp-csv-database
pip install -e .
```
## Usage
### Command Line
Start the server with stdio transport:
```bash
mcp-csv-database
```
**Recommended**: Auto-load CSV files from a folder using positional argument:
```bash
mcp-csv-database /path/to/csv/files
```
Alternative syntax with explicit flag:
```bash
mcp-csv-database --csv-folder /path/to/csv/files
```
With custom table prefix:
```bash
mcp-csv-database /path/to/csv/files --table-prefix sales_
```
For remote access with HTTP transport:
```bash
mcp-csv-database /path/to/csv/files --transport sse --port 8080
```
### Configuration
Add to your MCP client configuration:
```json
{
"mcpServers": {
"csv-database": {
"command": "mcp-csv-database",
"args": ["/path/to/your/csv/files"]
}
}
}
```
Alternative configuration with explicit options:
```json
{
"mcpServers": {
"csv-database": {
"command": "mcp-csv-database",
"args": ["--csv-folder", "/path/to/csv/files", "--table-prefix", "analytics_"]
}
}
}
```
## Available Tools
### Data Loading & Management
- `load_csv_folder(folder_path, table_prefix="")` - Load all CSV files from a folder with smart separator detection
- `list_loaded_tables()` - List currently loaded tables with source file information
- `clear_database()` - Clear all loaded data and temporary files
- `backup_database(backup_path)` - Create complete database backups
### Data Querying & Schema
- `execute_sql_query(query, limit=100)` - Execute any SQL query with automatic result formatting
- `get_database_schema()` - View complete database schema with column types and sample data
- `get_table_info(table_name)` - Get detailed information about specific tables
- `get_query_plan(query)` - Analyze query execution plans for performance optimization
### Data Quality & Analysis
- `get_data_summary(table_name)` - Comprehensive data overview with insights and data types
- `get_column_stats(table_name, column_name)` - Detailed statistical analysis for specific columns
- `analyze_missing_data(table_name)` - Complete missing data analysis across all columns
- `find_duplicates(table_name, columns="all")` - Advanced duplicate detection with configurable column sets
### Performance & Export
- `create_index(table_name, column_name, index_name="")` - Create indexes for query optimization
- `export_table_to_csv(table_name, output_path, include_header=True)` - Export tables with custom formatting
## Examples
### Basic Usage
```python
# Load CSV files
result = load_csv_folder("/path/to/csv/files")
# View what's loaded
schema = get_database_schema()
# Query the data
result = execute_sql_query("SELECT * FROM my_table LIMIT 10")
# Export results
export_table_to_csv("my_table", "/path/to/output.csv")
```
### Advanced Data Analysis
```python
# Get comprehensive data overview
summary = get_data_summary("sales_data")
# Detailed statistical analysis for specific columns
price_stats = get_column_stats("sales_data", "price")
quantity_stats = get_column_stats("sales_data", "quantity")
# Data quality assessment
missing_analysis = analyze_missing_data("sales_data")
duplicates = find_duplicates("sales_data", "customer_id,product")
# Complex analytical queries
result = execute_sql_query("""
SELECT
category,
COUNT(*) as count,
AVG(price) as avg_price,
SUM(quantity) as total_quantity,
MIN(price) as min_price,
MAX(price) as max_price,
STDDEV(price) as price_stddev
FROM sales_data
GROUP BY category
ORDER BY total_quantity DESC
""")
# Performance optimization
create_index("sales_data", "category")
query_plan = get_query_plan("SELECT * FROM sales_data WHERE category = 'Electronics'")
```
### Data Quality Workflow
```python
# Step 1: Load and inspect data
load_csv_folder("/path/to/data")
schema = get_database_schema()
# Step 2: Data quality assessment
missing_data = analyze_missing_data("customers")
duplicates = find_duplicates("customers", "email")
summary = get_data_summary("customers")
# Step 3: Statistical analysis
age_stats = get_column_stats("customers", "age")
income_stats = get_column_stats("customers", "income")
# Step 4: Clean and analyze
clean_data = execute_sql_query("""
SELECT customer_id, name, email, city, age, income
FROM customers
WHERE email IS NOT NULL
AND age BETWEEN 18 AND 100
AND income > 0
""")
```
## Transport Options
The server supports multiple transport methods:
- `stdio` (default): Standard input/output
- `sse`: Server-sent events
- `streamable-http`: HTTP streaming
```bash
# SSE transport
mcp-csv-database --transport sse --port 8080
# HTTP transport
mcp-csv-database --transport streamable-http --port 8080
```
## Requirements
- Python 3.10+ (required for MCP framework compatibility)
- pandas >= 1.3.0
- sqlite3 (built-in)
- mcp >= 1.0.0
## CLI Reference
```bash
mcp-csv-database [folder_path] [OPTIONS]
# Positional Arguments:
# folder_path Path to folder containing CSV files (recommended)
# Options:
# --csv-folder PATH Alternative way to specify CSV folder path
# --table-prefix PREFIX Optional prefix for table names (e.g., 'sales_')
# --transport TYPE Transport type: stdio (default), sse, streamable-http
# --port PORT Port for HTTP transport (default: 3000)
# -h, --help Show help message and exit
# Examples:
mcp-csv-database /data/sales # Load CSV files from /data/sales
mcp-csv-database --csv-folder /data --table-prefix t_ # Load with table prefix
mcp-csv-database /data --transport sse --port 8080 # HTTP transport on port 8080
```
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Changelog
### v0.1.3 (Latest)
- Enhanced CLI interface with positional argument support for CSV folder paths
- Improved command-line help with comprehensive examples and tool descriptions
- Fixed mypy type checking and added pandas-stubs for better development experience
- Resolved GitHub Actions CI/CD pipeline configuration issues
- Updated Python requirement to 3.10+ for MCP framework compatibility
### v0.1.2
- Added comprehensive data analysis tools: `get_data_summary()`, `get_column_stats()`, `analyze_missing_data()`, `find_duplicates()`
- Enhanced statistical analysis capabilities with numeric data detection
- Improved data quality assessment and missing data visualization
- Added advanced duplicate detection with configurable column sets
- Enhanced table information display with better formatting
### v0.1.1
- Improved CSV separator auto-detection (semicolon, comma, tab)
- Enhanced error handling and user feedback
- Better table naming with special character handling
- Added comprehensive test coverage
- Improved documentation and examples
### v0.1.0
- Initial release
- Basic CSV loading and SQL querying
- Schema inspection tools
- Data export capabilities
- Multiple transport support
Raw data
{
"_id": null,
"home_page": null,
"name": "mcp-csv-database",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "Lasitha <lasitha.work@gmail.com>",
"keywords": "mcp, csv, database, sql, data-analysis",
"author": null,
"author_email": "Lasitha <lasitha.work@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/93/db/0cfebdd6b211a3b2dc7ed22adc6afb4c7b4eb072c5bfd8cda22fe7b855ea/mcp_csv_database-0.1.4.tar.gz",
"platform": null,
"description": "# MCP CSV Database Server\n\nA Model Context Protocol (MCP) server that provides comprehensive tools for loading CSV files into a temporary SQLite database and performing advanced data analysis with AI assistance.\n\n## Features\n\n- **Smart CSV Loading**: Automatically detect CSV separators and load multiple files from a folder\n- **Advanced SQL Queries**: Execute any SQL query with automatic result formatting and pagination\n- **Schema Inspection**: View database schema, table structures, and relationships\n- **Data Quality Analysis**: Comprehensive missing data analysis, duplicate detection, and data profiling\n- **Statistical Analysis**: Column statistics, data summaries, and distribution analysis\n- **Export Capabilities**: Export query results or tables back to CSV with custom formatting\n- **Performance Tools**: Create indexes, analyze query execution plans, and optimize performance\n- **AI-Ready**: Designed for seamless integration with AI assistants for data analysis workflows\n\n## Installation\n\n### From PyPI\n\n```bash\npip install mcp-csv-database\n```\n\n### From source\n\n```bash\ngit clone https://github.com/Lasitha-Jayawardana/mcp-csv-database.git\ncd mcp-csv-database\npip install -e .\n```\n\n## Usage\n\n### Command Line\n\nStart the server with stdio transport:\n\n```bash\nmcp-csv-database\n```\n\n**Recommended**: Auto-load CSV files from a folder using positional argument:\n\n```bash\nmcp-csv-database /path/to/csv/files\n```\n\nAlternative syntax with explicit flag:\n\n```bash\nmcp-csv-database --csv-folder /path/to/csv/files\n```\n\nWith custom table prefix:\n\n```bash\nmcp-csv-database /path/to/csv/files --table-prefix sales_\n```\n\nFor remote access with HTTP transport:\n\n```bash\nmcp-csv-database /path/to/csv/files --transport sse --port 8080\n```\n\n### Configuration\n\nAdd to your MCP client configuration:\n\n```json\n{\n \"mcpServers\": {\n \"csv-database\": {\n \"command\": \"mcp-csv-database\",\n \"args\": [\"/path/to/your/csv/files\"]\n }\n }\n}\n```\n\nAlternative configuration with explicit options:\n\n```json\n{\n \"mcpServers\": {\n \"csv-database\": {\n \"command\": \"mcp-csv-database\",\n \"args\": [\"--csv-folder\", \"/path/to/csv/files\", \"--table-prefix\", \"analytics_\"]\n }\n }\n}\n```\n\n## Available Tools\n\n### Data Loading & Management\n- `load_csv_folder(folder_path, table_prefix=\"\")` - Load all CSV files from a folder with smart separator detection\n- `list_loaded_tables()` - List currently loaded tables with source file information\n- `clear_database()` - Clear all loaded data and temporary files\n- `backup_database(backup_path)` - Create complete database backups\n\n### Data Querying & Schema\n- `execute_sql_query(query, limit=100)` - Execute any SQL query with automatic result formatting\n- `get_database_schema()` - View complete database schema with column types and sample data\n- `get_table_info(table_name)` - Get detailed information about specific tables\n- `get_query_plan(query)` - Analyze query execution plans for performance optimization\n\n### Data Quality & Analysis\n- `get_data_summary(table_name)` - Comprehensive data overview with insights and data types\n- `get_column_stats(table_name, column_name)` - Detailed statistical analysis for specific columns\n- `analyze_missing_data(table_name)` - Complete missing data analysis across all columns\n- `find_duplicates(table_name, columns=\"all\")` - Advanced duplicate detection with configurable column sets\n\n### Performance & Export\n- `create_index(table_name, column_name, index_name=\"\")` - Create indexes for query optimization\n- `export_table_to_csv(table_name, output_path, include_header=True)` - Export tables with custom formatting\n\n## Examples\n\n### Basic Usage\n\n```python\n# Load CSV files\nresult = load_csv_folder(\"/path/to/csv/files\")\n\n# View what's loaded\nschema = get_database_schema()\n\n# Query the data\nresult = execute_sql_query(\"SELECT * FROM my_table LIMIT 10\")\n\n# Export results\nexport_table_to_csv(\"my_table\", \"/path/to/output.csv\")\n```\n\n### Advanced Data Analysis\n\n```python\n# Get comprehensive data overview\nsummary = get_data_summary(\"sales_data\")\n\n# Detailed statistical analysis for specific columns\nprice_stats = get_column_stats(\"sales_data\", \"price\")\nquantity_stats = get_column_stats(\"sales_data\", \"quantity\")\n\n# Data quality assessment\nmissing_analysis = analyze_missing_data(\"sales_data\")\nduplicates = find_duplicates(\"sales_data\", \"customer_id,product\")\n\n# Complex analytical queries\nresult = execute_sql_query(\"\"\"\n SELECT \n category,\n COUNT(*) as count,\n AVG(price) as avg_price,\n SUM(quantity) as total_quantity,\n MIN(price) as min_price,\n MAX(price) as max_price,\n STDDEV(price) as price_stddev\n FROM sales_data \n GROUP BY category\n ORDER BY total_quantity DESC\n\"\"\")\n\n# Performance optimization\ncreate_index(\"sales_data\", \"category\")\nquery_plan = get_query_plan(\"SELECT * FROM sales_data WHERE category = 'Electronics'\")\n```\n\n### Data Quality Workflow\n\n```python\n# Step 1: Load and inspect data\nload_csv_folder(\"/path/to/data\")\nschema = get_database_schema()\n\n# Step 2: Data quality assessment\nmissing_data = analyze_missing_data(\"customers\")\nduplicates = find_duplicates(\"customers\", \"email\")\nsummary = get_data_summary(\"customers\")\n\n# Step 3: Statistical analysis\nage_stats = get_column_stats(\"customers\", \"age\") \nincome_stats = get_column_stats(\"customers\", \"income\")\n\n# Step 4: Clean and analyze\nclean_data = execute_sql_query(\"\"\"\n SELECT customer_id, name, email, city, age, income\n FROM customers \n WHERE email IS NOT NULL \n AND age BETWEEN 18 AND 100\n AND income > 0\n\"\"\")\n```\n\n## Transport Options\n\nThe server supports multiple transport methods:\n\n- `stdio` (default): Standard input/output\n- `sse`: Server-sent events\n- `streamable-http`: HTTP streaming\n\n```bash\n# SSE transport\nmcp-csv-database --transport sse --port 8080\n\n# HTTP transport \nmcp-csv-database --transport streamable-http --port 8080\n```\n\n## Requirements\n\n- Python 3.10+ (required for MCP framework compatibility)\n- pandas >= 1.3.0\n- sqlite3 (built-in)\n- mcp >= 1.0.0\n\n## CLI Reference\n\n```bash\nmcp-csv-database [folder_path] [OPTIONS]\n\n# Positional Arguments:\n# folder_path Path to folder containing CSV files (recommended)\n\n# Options:\n# --csv-folder PATH Alternative way to specify CSV folder path\n# --table-prefix PREFIX Optional prefix for table names (e.g., 'sales_')\n# --transport TYPE Transport type: stdio (default), sse, streamable-http\n# --port PORT Port for HTTP transport (default: 3000)\n# -h, --help Show help message and exit\n\n# Examples:\nmcp-csv-database /data/sales # Load CSV files from /data/sales\nmcp-csv-database --csv-folder /data --table-prefix t_ # Load with table prefix\nmcp-csv-database /data --transport sse --port 8080 # HTTP transport on port 8080\n```\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Changelog\n\n### v0.1.3 (Latest)\n- Enhanced CLI interface with positional argument support for CSV folder paths\n- Improved command-line help with comprehensive examples and tool descriptions\n- Fixed mypy type checking and added pandas-stubs for better development experience\n- Resolved GitHub Actions CI/CD pipeline configuration issues\n- Updated Python requirement to 3.10+ for MCP framework compatibility\n\n### v0.1.2\n- Added comprehensive data analysis tools: `get_data_summary()`, `get_column_stats()`, `analyze_missing_data()`, `find_duplicates()`\n- Enhanced statistical analysis capabilities with numeric data detection\n- Improved data quality assessment and missing data visualization\n- Added advanced duplicate detection with configurable column sets\n- Enhanced table information display with better formatting\n\n### v0.1.1\n- Improved CSV separator auto-detection (semicolon, comma, tab)\n- Enhanced error handling and user feedback\n- Better table naming with special character handling\n- Added comprehensive test coverage\n- Improved documentation and examples\n\n### v0.1.0\n- Initial release\n- Basic CSV loading and SQL querying\n- Schema inspection tools\n- Data export capabilities\n- Multiple transport support\n",
"bugtrack_url": null,
"license": null,
"summary": "MCP server for CSV file management and SQL querying",
"version": "0.1.4",
"project_urls": {
"Changelog": "https://github.com/Lasitha-Jayawardana/mcp-csv-database/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/Lasitha-Jayawardana/mcp-csv-database#readme",
"Homepage": "https://github.com/Lasitha-Jayawardana/mcp-csv-database",
"Issues": "https://github.com/Lasitha-Jayawardana/mcp-csv-database/issues",
"Repository": "https://github.com/Lasitha-Jayawardana/mcp-csv-database.git"
},
"split_keywords": [
"mcp",
" csv",
" database",
" sql",
" data-analysis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d5a380fd0d4174a7386b93f36698fbdf5b6589eacc20eafdeea62380bb391972",
"md5": "30dbcf5116b8294ef9a1a4a7d3b0b0c9",
"sha256": "cd6f6fdf894ab1db537a59a72071d0126292ed8093fe617ae96deb529ccd4d86"
},
"downloads": -1,
"filename": "mcp_csv_database-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "30dbcf5116b8294ef9a1a4a7d3b0b0c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 14631,
"upload_time": "2025-07-12T16:27:00",
"upload_time_iso_8601": "2025-07-12T16:27:00.958633Z",
"url": "https://files.pythonhosted.org/packages/d5/a3/80fd0d4174a7386b93f36698fbdf5b6589eacc20eafdeea62380bb391972/mcp_csv_database-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "93db0cfebdd6b211a3b2dc7ed22adc6afb4c7b4eb072c5bfd8cda22fe7b855ea",
"md5": "191587659979a1903ad164458b0f07ab",
"sha256": "db032976efe2e1e5c9242a625bdb8c83bc62e43e0557b970710291c9836eea20"
},
"downloads": -1,
"filename": "mcp_csv_database-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "191587659979a1903ad164458b0f07ab",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 30045,
"upload_time": "2025-07-12T16:27:01",
"upload_time_iso_8601": "2025-07-12T16:27:01.904347Z",
"url": "https://files.pythonhosted.org/packages/93/db/0cfebdd6b211a3b2dc7ed22adc6afb4c7b4eb072c5bfd8cda22fe7b855ea/mcp_csv_database-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-12 16:27:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Lasitha-Jayawardana",
"github_project": "mcp-csv-database",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mcp-csv-database"
}