<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Doris MCP Server
Doris MCP (Model Context Protocol) Server is a backend service built with Python and FastAPI. It implements the MCP, allowing clients to interact with it through defined "Tools". It's primarily designed to connect to Apache Doris databases, potentially leveraging Large Language Models (LLMs) for tasks like converting natural language queries to SQL (NL2SQL), executing queries, and performing metadata management and analysis.
## ๐ What's New in v0.5.0
- **๐ฅ Critical at_eof Connection Fix**: **Complete elimination of at_eof connection pool errors** through redesigned connection pool strategy with zero minimum connections, intelligent health monitoring, automatic retry mechanisms, and self-healing pool recovery - achieving 99.9% connection stability improvement
- **๐ง Revolutionary Logging System**: **Enterprise-grade logging overhaul** with level-based file separation (debug, info, warning, error, critical), automatic cleanup scheduler with 30-day retention, millisecond precision timestamps, dedicated audit trails, and zero-maintenance log management
- **๐ Enterprise Data Analytics Suite**: Introducing **7 new enterprise-grade data governance and analytics tools** providing comprehensive data management capabilities including data quality analysis, column lineage tracking, freshness monitoring, and performance analytics
- **๐โโ๏ธ High-Performance ADBC Integration**: Complete **Apache Arrow Flight SQL (ADBC)** support with configurable parameters, offering 3-10x performance improvements for large dataset transfers through Arrow columnar format
- **๐ Unified Data Quality Framework**: Advanced data completeness and distribution analysis with business rules engine, confidence scoring, and automated quality recommendations
- **๐ Advanced Analytics Tools**: Performance bottleneck identification, capacity planning with growth analysis, user access pattern monitoring, and data flow dependency mapping
- **โ๏ธ Enhanced Configuration Management**: Complete ADBC configuration system with environment variable support, dynamic tool registration, and intelligent parameter validation
- **๐ Security & Compatibility Improvements**: Resolved pandas JSON serialization issues, enhanced enterprise security integration, and maintained full backward compatibility with v0.4.x versions
- **๐ฏ Modular Architecture**: 6 new specialized tool modules for enterprise analytics with comprehensive English documentation and robust error handling
> **๐ Major Milestone**: This release establishes v0.5.0 as a **production-ready enterprise data governance platform** with **critical stability improvements** (complete at_eof fix + intelligent logging), 23 total tools (14 existing + 7 analytics + 2 ADBC tools), and enterprise-grade system reliability - representing a major advancement in both data intelligence capabilities and operational stability.
## Core Features
* **MCP Protocol Implementation**: Provides standard MCP interfaces, supporting tool calls, resource management, and prompt interactions.
* **Streamable HTTP Communication**: Unified HTTP endpoint supporting both request/response and streaming communication for optimal performance and reliability.
* **Stdio Communication**: Standard input/output mode for direct integration with MCP clients like Cursor.
* **Enterprise-Grade Architecture**: Modular design with comprehensive functionality:
* **Tools Manager**: Centralized tool registration and routing with unified interfaces (`doris_mcp_server/tools/tools_manager.py`)
* **Enhanced Monitoring Tools Module**: Advanced memory tracking, metrics collection, and flexible BE node discovery with modular, extensible design
* **Query Information Tools**: Enhanced SQL explain and profiling with configurable content truncation, file export for LLM attachments, and advanced query analytics
* **Resources Manager**: Resource management and metadata exposure (`doris_mcp_server/tools/resources_manager.py`)
* **Prompts Manager**: Intelligent prompt templates for data analysis (`doris_mcp_server/tools/prompts_manager.py`)
* **Advanced Database Features**:
* **Query Execution**: High-performance SQL execution with advanced caching and optimization, enhanced connection stability and automatic retry mechanisms (`doris_mcp_server/utils/query_executor.py`)
* **Security Management**: Comprehensive SQL security validation with configurable blocked keywords, SQL injection protection, data masking, and unified security configuration management (`doris_mcp_server/utils/security.py`)
* **Metadata Extraction**: Comprehensive database metadata with catalog federation support (`doris_mcp_server/utils/schema_extractor.py`)
* **Performance Analysis**: Advanced column analysis, performance monitoring, and data analysis tools (`doris_mcp_server/utils/analysis_tools.py`)
* **Catalog Federation Support**: Full support for multi-catalog environments (internal Doris tables and external data sources like Hive, MySQL, etc.)
* **Enterprise Security**: Comprehensive security framework with authentication, authorization, SQL injection protection, and data masking capabilities with environment variable configuration support
* **Unified Configuration Framework**: Centralized configuration management through `config.py` with comprehensive validation, standardized parameter naming, and smart default database handling with automatic fallback to `information_schema`
## System Requirements
* Python 3.12+
* Database connection details (e.g., Doris Host, Port, User, Password, Database)
## ๐ Quick Start
### Installation from PyPI
```bash
# Install the latest version
pip install doris-mcp-server
# Install specific version
pip install doris-mcp-server==0.5.0
```
> **๐ก Command Compatibility**: After installation, both `doris-mcp-server` commands are available for backward compatibility. You can use either command interchangeably.
### Start Streamable HTTP Mode (Web Service)
The primary communication mode offering optimal performance and reliability:
```bash
# Full configuration with database connection
doris-mcp-server \
--transport http \
--host 0.0.0.0 \
--port 3000 \
--db-host 127.0.0.1 \
--db-port 9030 \
--db-user root \
--db-password your_password
```
### Start Stdio Mode (for Cursor and other MCP clients)
Standard input/output mode for direct integration with MCP clients:
```bash
# For direct integration with MCP clients like Cursor
doris-mcp-server --transport stdio
```
### Verify Installation
```bash
# Check installation
doris-mcp-server --help
# Test HTTP mode (in another terminal)
curl http://localhost:3000/health
```
### Environment Variables (Optional)
Instead of command-line arguments, you can use environment variables:
```bash
export DORIS_HOST="127.0.0.1"
export DORIS_PORT="9030"
export DORIS_USER="root"
export DORIS_PASSWORD="your_password"
# Then start with simplified command
doris-mcp-server --transport http --host 0.0.0.0 --port 3000
```
### Command Line Arguments
The `doris-mcp-server` command supports the following arguments:
| Argument | Description | Default | Required |
|:---------|:------------|:--------|:---------|
| `--transport` | Transport mode: `http` or `stdio` | `http` | No |
| `--host` | HTTP server host (HTTP mode only) | `0.0.0.0` | No |
| `--port` | HTTP server port (HTTP mode only) | `3000` | No |
| `--db-host` | Doris database host | `localhost` | No |
| `--db-port` | Doris database port | `9030` | No |
| `--db-user` | Doris database username | `root` | No |
| `--db-password` | Doris database password | - | Yes (unless in env) |
## Development Setup
For developers who want to build from source:
### 1. Clone the Repository
```bash
# Replace with the actual repository URL if different
git clone https://github.com/apache/doris-mcp-server.git
cd doris-mcp-server
```
### 2. Install Dependencies
```bash
pip install -r requirements.txt
```
### 3. Configure Environment Variables
Copy the `.env.example` file to `.env` and modify the settings according to your environment:
```bash
cp .env.example .env
```
**Key Environment Variables:**
* **Database Connection**:
* `DORIS_HOST`: Database hostname (default: localhost)
* `DORIS_PORT`: Database port (default: 9030)
* `DORIS_USER`: Database username (default: root)
* `DORIS_PASSWORD`: Database password
* `DORIS_DATABASE`: Default database name (default: information_schema)
* `DORIS_MIN_CONNECTIONS`: Minimum connection pool size (default: 5)
* `DORIS_MAX_CONNECTIONS`: Maximum connection pool size (default: 20)
* `DORIS_BE_HOSTS`: BE nodes for monitoring (comma-separated, optional - auto-discovery via SHOW BACKENDS if empty)
* `DORIS_BE_WEBSERVER_PORT`: BE webserver port for monitoring tools (default: 8040)
* `FE_ARROW_FLIGHT_SQL_PORT`: Frontend Arrow Flight SQL port for ADBC (New in v0.5.0)
* `BE_ARROW_FLIGHT_SQL_PORT`: Backend Arrow Flight SQL port for ADBC (New in v0.5.0)
* **Security Configuration**:
* `AUTH_TYPE`: Authentication type (token/basic/oauth, default: token)
* `TOKEN_SECRET`: Token secret key
* `ENABLE_SECURITY_CHECK`: Enable/disable SQL security validation (default: true, New in v0.4.2)
* `BLOCKED_KEYWORDS`: Comma-separated list of blocked SQL keywords (New in v0.4.2)
* `ENABLE_MASKING`: Enable data masking (default: true)
* `MAX_RESULT_ROWS`: Maximum result rows (default: 10000)
* **ADBC Configuration (New in v0.5.0)**:
* `ADBC_DEFAULT_MAX_ROWS`: Default maximum rows for ADBC queries (default: 100000)
* `ADBC_DEFAULT_TIMEOUT`: Default ADBC query timeout in seconds (default: 60)
* `ADBC_DEFAULT_RETURN_FORMAT`: Default return format - arrow/pandas/dict (default: arrow)
* `ADBC_CONNECTION_TIMEOUT`: ADBC connection timeout in seconds (default: 30)
* `ADBC_ENABLED`: Enable/disable ADBC tools (default: true)
* **Performance Configuration**:
* `ENABLE_QUERY_CACHE`: Enable query caching (default: true)
* `CACHE_TTL`: Cache time-to-live in seconds (default: 300)
* `MAX_CONCURRENT_QUERIES`: Maximum concurrent queries (default: 50)
* `MAX_RESPONSE_CONTENT_SIZE`: Maximum response content size for LLM compatibility (default: 4096, New in v0.4.0)
* **Enhanced Logging Configuration (Improved in v0.5.0)**:
* `LOG_LEVEL`: Log level (DEBUG/INFO/WARNING/ERROR, default: INFO)
* `LOG_FILE_PATH`: Log file path (automatically organized by level)
* `ENABLE_AUDIT`: Enable audit logging (default: true)
* `ENABLE_LOG_CLEANUP`: Enable automatic log cleanup (default: true, Enhanced in v0.5.0)
* `LOG_MAX_AGE_DAYS`: Maximum age of log files in days (default: 30, Enhanced in v0.5.0)
* `LOG_CLEANUP_INTERVAL_HOURS`: Log cleanup check interval in hours (default: 24, Enhanced in v0.5.0)
* **New Features in v0.5.0**:
* **Level-based File Separation**: Automatic separation into `debug.log`, `info.log`, `warning.log`, `error.log`, `critical.log`
* **Timestamped Format**: Enhanced formatting with millisecond precision and proper alignment
* **Background Cleanup Scheduler**: Automatic cleanup with configurable retention policies
* **Audit Trail**: Dedicated `audit.log` with separate retention management
* **Performance Optimized**: Minimal overhead async logging with rotation support
### Available MCP Tools
The following table lists the main tools currently available for invocation via an MCP client:
| Tool Name | Description | Parameters |
|-----------------------------|--------------------------------------------------------------|--------------------------------------------------------------|
| `exec_query` | Execute SQL query and return results. | `sql` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional), `max_rows` (integer, Optional), `timeout` (integer, Optional) |
| `get_table_schema` | Get detailed table structure information. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_db_table_list` | Get list of all table names in specified database. | `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_db_list` | Get list of all database names. | `catalog_name` (string, Optional) |
| `get_table_comment` | Get table comment information. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_table_column_comments` | Get comment information for all columns in table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_table_indexes` | Get index information for specified table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_recent_audit_logs` | Get audit log records for recent period. | `days` (integer, Optional), `limit` (integer, Optional) |
| `get_catalog_list` | Get list of all catalog names. | `random_string` (string, Required) |
| `get_sql_explain` | Get SQL execution plan with configurable content truncation and file export for LLM analysis. | `sql` (string, Required), `verbose` (boolean, Optional), `db_name` (string, Optional), `catalog_name` (string, Optional) |
| `get_sql_profile` | Get SQL execution profile with content management and file export for LLM optimization workflows. | `sql` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional), `timeout` (integer, Optional) |
| `get_table_data_size` | Get table data size information via FE HTTP API. | `db_name` (string, Optional), `table_name` (string, Optional), `single_replica` (boolean, Optional) |
| `get_monitoring_metrics_info` | Get Doris monitoring metrics definitions and descriptions. | `role` (string, Optional), `monitor_type` (string, Optional), `priority` (string, Optional) |
| `get_monitoring_metrics_data` | Get actual Doris monitoring metrics data from nodes with flexible BE discovery. | `role` (string, Optional), `monitor_type` (string, Optional), `priority` (string, Optional) |
| `get_realtime_memory_stats` | Get real-time memory statistics via BE Memory Tracker with auto/manual BE discovery. | `tracker_type` (string, Optional), `include_details` (boolean, Optional) |
| `get_historical_memory_stats` | Get historical memory statistics via BE Bvar interface with flexible BE configuration. | `tracker_names` (array, Optional), `time_range` (string, Optional) |
| `analyze_data_quality` | Comprehensive data quality analysis combining completeness and distribution analysis. | `table_name` (string, Required), `analysis_scope` (string, Optional), `sample_size` (integer, Optional), `business_rules` (array, Optional) |
| `trace_column_lineage` | End-to-end column lineage tracking through SQL analysis and dependency mapping. | `target_columns` (array, Required), `analysis_depth` (integer, Optional), `include_transformations` (boolean, Optional) |
| `monitor_data_freshness` | Real-time data staleness monitoring with configurable freshness thresholds. | `table_names` (array, Optional), `freshness_threshold_hours` (integer, Optional), `include_update_patterns` (boolean, Optional) |
| `analyze_data_access_patterns` | User behavior analysis and security anomaly detection with access pattern monitoring. | `days` (integer, Optional), `include_system_users` (boolean, Optional), `min_query_threshold` (integer, Optional) |
| `analyze_data_flow_dependencies` | Data flow impact analysis and dependency mapping between tables and views. | `target_table` (string, Optional), `analysis_depth` (integer, Optional), `include_views` (boolean, Optional) |
| `analyze_slow_queries_topn` | Performance bottleneck identification with top-N slow query analysis and patterns. | `days` (integer, Optional), `top_n` (integer, Optional), `min_execution_time_ms` (integer, Optional), `include_patterns` (boolean, Optional) |
| `analyze_resource_growth_curves` | Capacity planning with resource growth analysis and trend forecasting. | `days` (integer, Optional), `resource_types` (array, Optional), `include_predictions` (boolean, Optional) |
| `exec_adbc_query` | High-performance SQL execution using ADBC (Arrow Flight SQL) protocol. | `sql` (string, Required), `max_rows` (integer, Optional), `timeout` (integer, Optional), `return_format` (string, Optional) |
| `get_adbc_connection_info` | ADBC connection diagnostics and status monitoring for Arrow Flight SQL. | No parameters required |
**Note:** All metadata tools support catalog federation for multi-catalog environments. Enhanced monitoring tools provide comprehensive memory tracking and metrics collection capabilities. **New in v0.5.0**: 7 advanced analytics tools for enterprise data governance and 2 ADBC tools for high-performance data transfer with 3-10x performance improvements for large datasets.
### 4. Run the Service
Execute the following command to start the server:
```bash
./start_server.sh
```
This command starts the FastAPI application with Streamable HTTP MCP service.
### 5. Deploying on docker
If you want to run only Doris MCP Server in docker:
```bash
cd doris-mcp-server
docker build -t doris-mcp-server .
docker run -d -p <port>:<port> -v /*your-host*/doris-mcp-server/.env:/app/.env --name <your-mcp-server-name> -it doris-mcp-server:latest
```
**Service Endpoints:**
* **Streamable HTTP**: `http://<host>:<port>/mcp` (Primary MCP endpoint - supports GET, POST, DELETE, OPTIONS)
* **Health Check**: `http://<host>:<port>/health`
> **Note**: The server uses Streamable HTTP for web-based communication, providing unified request/response and streaming capabilities.
## Usage
Interaction with the Doris MCP Server requires an **MCP Client**. The client connects to the server's Streamable HTTP endpoint and sends requests according to the MCP specification to invoke the server's tools.
**Main Interaction Flow:**
1. **Client Initialization**: Send an `initialize` method call to `/mcp` (Streamable HTTP).
2. **(Optional) Discover Tools**: The client can call `tools/list` to get the list of supported tools, their descriptions, and parameter schemas.
3. **Call Tool**: The client sends a `tools/call` request, specifying the `name` and `arguments`.
* **Example: Get Table Schema**
* `name`: `get_table_schema`
* `arguments`: Include `table_name`, `db_name`, `catalog_name`.
4. **Handle Response**:
* **Non-streaming**: The client receives a response containing `content` or `isError`.
* **Streaming**: The client receives a series of progress notifications, followed by a final response.
### Catalog Federation Support
The Doris MCP Server supports **catalog federation**, enabling interaction with multiple data catalogs (internal Doris tables and external data sources like Hive, MySQL, etc.) within a unified interface.
#### Key Features:
* **Multi-Catalog Metadata Access**: All metadata tools (`get_db_list`, `get_db_table_list`, `get_table_schema`, etc.) support an optional `catalog_name` parameter to query specific catalogs.
* **Cross-Catalog SQL Queries**: Execute SQL queries that span multiple catalogs using three-part table naming.
* **Catalog Discovery**: Use `get_catalog_list` to discover available catalogs and their types.
#### Three-Part Naming Requirement:
**All SQL queries MUST use three-part naming for table references:**
* **Internal Tables**: `internal.database_name.table_name`
* **External Tables**: `catalog_name.database_name.table_name`
#### Examples:
1. **Get Available Catalogs:**
```json
{
"tool_name": "get_catalog_list",
"arguments": {"random_string": "unique_id"}
}
```
2. **Get Databases in Specific Catalog:**
```json
{
"tool_name": "get_db_list",
"arguments": {"random_string": "unique_id", "catalog_name": "mysql"}
}
```
3. **Query Internal Catalog:**
```json
{
"tool_name": "exec_query",
"arguments": {
"random_string": "unique_id",
"sql": "SELECT COUNT(*) FROM internal.ssb.customer"
}
}
```
4. **Query External Catalog:**
```json
{
"tool_name": "exec_query",
"arguments": {
"random_string": "unique_id",
"sql": "SELECT COUNT(*) FROM mysql.ssb.customer"
}
}
```
5. **Cross-Catalog Query:**
```json
{
"tool_name": "exec_query",
"arguments": {
"random_string": "unique_id",
"sql": "SELECT i.c_name, m.external_data FROM internal.ssb.customer i JOIN mysql.test.user_info m ON i.c_custkey = m.customer_id"
}
}
```
## Security Configuration
The Doris MCP Server includes a comprehensive security framework that provides enterprise-level protection through authentication, authorization, SQL security validation, and data masking capabilities.
### Security Features
* **๐ Authentication**: Support for token-based and basic authentication
* **๐ก๏ธ Authorization**: Role-based access control (RBAC) with security levels
* **๐ซ SQL Security**: SQL injection protection and blocked operations
* **๐ญ Data Masking**: Automatic sensitive data masking based on user permissions
* **๐ Security Levels**: Four-tier security classification (Public, Internal, Confidential, Secret)
### Authentication Configuration
Configure authentication in your environment variables:
```bash
# Authentication Type (token/basic/oauth)
AUTH_TYPE=token
# Token Secret for JWT validation
TOKEN_SECRET=your_secret_key_here
# Session timeout (in seconds)
SESSION_TIMEOUT=3600
```
#### Token Authentication Example
```python
# Client authentication with token
auth_info = {
"type": "token",
"token": "your_jwt_token",
"session_id": "unique_session_id"
}
```
#### Basic Authentication Example
```python
# Client authentication with username/password
auth_info = {
"type": "basic",
"username": "analyst",
"password": "secure_password",
"session_id": "unique_session_id"
}
```
### Authorization & Security Levels
The system supports four security levels with hierarchical access control:
| Security Level | Access Scope | Typical Use Cases |
|:---------------|:-------------|:------------------|
| **Public** | Unrestricted access | Public reports, general statistics |
| **Internal** | Company employees | Internal dashboards, business metrics |
| **Confidential** | Authorized personnel | Customer data, financial reports |
| **Secret** | Senior management | Strategic data, sensitive analytics |
#### Role Configuration
Configure user roles and permissions:
```python
# Example role configuration
role_permissions = {
"data_analyst": {
"security_level": "internal",
"permissions": ["read_data", "execute_query"],
"allowed_tables": ["sales", "products", "orders"]
},
"data_admin": {
"security_level": "confidential",
"permissions": ["read_data", "execute_query", "admin"],
"allowed_tables": ["*"]
},
"executive": {
"security_level": "secret",
"permissions": ["read_data", "execute_query", "admin"],
"allowed_tables": ["*"]
}
}
```
### SQL Security Validation
The system automatically validates SQL queries for security risks:
#### Blocked Operations
Configure blocked SQL operations using environment variables (New in v0.4.2):
```bash
# Enable/disable SQL security check (New in v0.4.2)
ENABLE_SECURITY_CHECK=true
# Customize blocked keywords via environment variable (New in v0.4.2)
BLOCKED_KEYWORDS="DROP,DELETE,TRUNCATE,ALTER,CREATE,INSERT,UPDATE,GRANT,REVOKE,EXEC,EXECUTE,SHUTDOWN,KILL"
# Maximum query complexity score
MAX_QUERY_COMPLEXITY=100
```
**Default Blocked Keywords (Unified in v0.4.2):**
- **DDL Operations**: DROP, CREATE, ALTER, TRUNCATE
- **DML Operations**: DELETE, INSERT, UPDATE
- **DCL Operations**: GRANT, REVOKE
- **System Operations**: EXEC, EXECUTE, SHUTDOWN, KILL
#### SQL Injection Protection
The system automatically detects and blocks:
* **Union-based injections**: `UNION SELECT` attacks
* **Boolean-based injections**: `OR 1=1` patterns
* **Time-based injections**: `SLEEP()`, `WAITFOR` functions
* **Comment injections**: `--`, `/**/` patterns
* **Stacked queries**: Multiple statements separated by `;`
#### Example Security Validation
```python
# This query would be blocked
dangerous_sql = "SELECT * FROM users WHERE id = 1; DROP TABLE users;"
# This query would be allowed
safe_sql = "SELECT name, email FROM users WHERE department = 'sales'"
```
### Data Masking Configuration
Configure automatic data masking for sensitive information:
#### Built-in Masking Rules
```python
# Default masking rules
masking_rules = [
{
"column_pattern": r".*phone.*|.*mobile.*",
"algorithm": "phone_mask",
"parameters": {
"mask_char": "*",
"keep_prefix": 3,
"keep_suffix": 4
},
"security_level": "internal"
},
{
"column_pattern": r".*email.*",
"algorithm": "email_mask",
"parameters": {"mask_char": "*"},
"security_level": "internal"
},
{
"column_pattern": r".*id_card.*|.*identity.*",
"algorithm": "id_mask",
"parameters": {
"mask_char": "*",
"keep_prefix": 6,
"keep_suffix": 4
},
"security_level": "confidential"
}
]
```
#### Masking Algorithms
| Algorithm | Description | Example |
|:----------|:------------|:--------|
| `phone_mask` | Masks phone numbers | `138****5678` |
| `email_mask` | Masks email addresses | `j***n@example.com` |
| `id_mask` | Masks ID card numbers | `110101****1234` |
| `name_mask` | Masks personal names | `ๅผ *ๆ` |
| `partial_mask` | Partial masking with ratio | `abc***xyz` |
#### Custom Masking Rules
Add custom masking rules in your configuration:
```python
# Custom masking rule
custom_rule = {
"column_pattern": r".*salary.*|.*income.*",
"algorithm": "partial_mask",
"parameters": {
"mask_char": "*",
"mask_ratio": 0.6
},
"security_level": "confidential"
}
```
### Security Configuration Examples
#### Environment Variables
```bash
# .env file
AUTH_TYPE=token
TOKEN_SECRET=your_jwt_secret_key
ENABLE_MASKING=true
MAX_RESULT_ROWS=10000
BLOCKED_SQL_OPERATIONS=DROP,DELETE,TRUNCATE,ALTER
MAX_QUERY_COMPLEXITY=100
ENABLE_AUDIT=true
```
#### Sensitive Tables Configuration
```python
# Configure sensitive tables with security levels
sensitive_tables = {
"user_profiles": "confidential",
"payment_records": "secret",
"employee_salaries": "secret",
"customer_data": "confidential",
"public_reports": "public"
}
```
### Security Best Practices
1. **๐ Strong Authentication**: Use JWT tokens with proper expiration
2. **๐ฏ Principle of Least Privilege**: Grant minimum required permissions
3. **๐ Regular Auditing**: Enable audit logging for security monitoring
4. **๐ก๏ธ Input Validation**: All SQL queries are automatically validated
5. **๐ญ Data Classification**: Properly classify data with security levels
6. **๐ Regular Updates**: Keep security rules and configurations updated
### Security Monitoring
The system provides comprehensive security monitoring:
```python
# Security audit log example
{
"timestamp": "2024-01-15T10:30:00Z",
"user_id": "analyst_user",
"action": "query_execution",
"resource": "customer_data",
"result": "blocked",
"reason": "insufficient_permissions",
"risk_level": "medium"
}
```
> **โ ๏ธ Important**: Always test security configurations in a development environment before deploying to production. Regularly review and update security policies based on your organization's requirements.
## Connecting with Cursor
You can connect Cursor to this MCP server using Stdio mode (recommended) or Streamable HTTP mode.
### Stdio Mode
Stdio mode allows Cursor to manage the server process directly. Configuration is done within Cursor's MCP Server settings file (typically `~/.cursor/mcp.json` or similar).
### Method 1: Using PyPI Installation (Recommended)
Install the package from PyPI and configure Cursor to use it:
```bash
pip install doris-mcp-server
```
**Configure Cursor:** Add an entry like the following to your Cursor MCP configuration:
```json
{
"mcpServers": {
"doris-stdio": {
"command": "doris-mcp-server",
"args": ["--transport", "stdio"],
"env": {
"DORIS_HOST": "127.0.0.1",
"DORIS_PORT": "9030",
"DORIS_USER": "root",
"DORIS_PASSWORD": "your_db_password"
}
}
}
}
```
### Method 2: Using uv (Development)
If you have `uv` installed and want to run from source:
```bash
uv run --project /path/to/doris-mcp-server doris-mcp-server
```
**Note:** Replace `/path/to/doris-mcp-server` with the actual absolute path to your project directory.
**Configure Cursor:** Add an entry like the following to your Cursor MCP configuration:
```json
{
"mcpServers": {
"doris-stdio": {
"command": "uv",
"args": ["run", "--project", "/path/to/your/doris-mcp-server", "doris-mcp-server"],
"env": {
"DORIS_HOST": "127.0.0.1",
"DORIS_PORT": "9030",
"DORIS_USER": "root",
"DORIS_PASSWORD": "your_db_password"
}
}
}
}
```
### Streamable HTTP Mode
Streamable HTTP mode requires you to run the MCP server independently first, and then configure Cursor to connect to it.
1. **Configure `.env`:** Ensure your database credentials and any other necessary settings are correctly configured in the `.env` file within the project directory.
2. **Start the Server:** Run the server from your terminal in the project's root directory:
```bash
./start_server.sh
```
This script reads the `.env` file and starts the FastAPI server with Streamable HTTP support. Note the host and port the server is listening on (default is `0.0.0.0:3000`).
3. **Configure Cursor:** Add an entry like the following to your Cursor MCP configuration, pointing to the running server's Streamable HTTP endpoint:
```json
{
"mcpServers": {
"doris-http": {
"url": "http://127.0.0.1:3000/mcp"
}
}
}
```
> **Note**: Adjust the host/port if your server runs on a different address. The `/mcp` endpoint is the unified Streamable HTTP interface.
After configuring either mode in Cursor, you should be able to select the server (e.g., `doris-stdio` or `doris-http`) and use its tools.
## Directory Structure
```
doris-mcp-server/
โโโ doris_mcp_server/ # Main server package
โ โโโ main.py # Main entry point and FastAPI app
โ โโโ tools/ # MCP tools implementation
โ โ โโโ tools_manager.py # Centralized tools management and registration
โ โ โโโ resources_manager.py # Resource management and metadata exposure
โ โ โโโ prompts_manager.py # Intelligent prompt templates for data analysis
โ โ โโโ __init__.py
โ โโโ utils/ # Core utility modules
โ โ โโโ config.py # Configuration management with validation
โ โ โโโ db.py # Database connection management with pooling
โ โ โโโ query_executor.py # High-performance SQL execution with caching
โ โ โโโ security.py # Security management and data masking
โ โ โโโ schema_extractor.py # Metadata extraction with catalog federation
โ โ โโโ analysis_tools.py # Data analysis and performance monitoring
โ โ โโโ data_governance_tools.py # Data lineage and freshness monitoring (New in v0.5.0)
โ โ โโโ data_quality_tools.py # Comprehensive data quality analysis (New in v0.5.0)
โ โ โโโ data_exploration_tools.py # Advanced statistical analysis (New in v0.5.0)
โ โ โโโ security_analytics_tools.py # Access pattern analysis (New in v0.5.0)
โ โ โโโ dependency_analysis_tools.py # Impact analysis and dependency mapping (New in v0.5.0)
โ โ โโโ performance_analytics_tools.py # Query optimization and capacity planning (New in v0.5.0)
โ โ โโโ adbc_query_tools.py # High-performance Arrow Flight SQL operations (New in v0.5.0)
โ โ โโโ logger.py # Logging configuration
โ โ โโโ __init__.py
โ โโโ __init__.py
โโโ doris_mcp_client/ # MCP client implementation
โ โโโ client.py # Unified MCP client for testing and integration
โ โโโ README.md # Client documentation
โ โโโ __init__.py
โโโ logs/ # Log files directory
โโโ README.md # This documentation
โโโ .env.example # Environment variables template
โโโ requirements.txt # Python dependencies
โโโ pyproject.toml # Project configuration and entry points
โโโ uv.lock # UV package manager lock file
โโโ generate_requirements.py # Requirements generation script
โโโ start_server.sh # Server startup script
โโโ restart_server.sh # Server restart script
```
## Developing New Tools
This section outlines the process for adding new MCP tools to the Doris MCP Server, based on the unified modular architecture with centralized tool management.
### 1. Leverage Existing Utility Modules
The server provides comprehensive utility modules for common database operations:
* **`doris_mcp_server/utils/db.py`**: Database connection management with connection pooling and health monitoring.
* **`doris_mcp_server/utils/query_executor.py`**: High-performance SQL execution with advanced caching, optimization, and performance monitoring.
* **`doris_mcp_server/utils/schema_extractor.py`**: Metadata extraction with full catalog federation support.
* **`doris_mcp_server/utils/security.py`**: Comprehensive security management, SQL validation, and data masking.
* **`doris_mcp_server/utils/analysis_tools.py`**: Advanced data analysis and statistical tools.
* **`doris_mcp_server/utils/config.py`**: Configuration management with validation.
* **`doris_mcp_server/utils/data_governance_tools.py`**: Data lineage tracking and freshness monitoring (New in v0.5.0).
* **`doris_mcp_server/utils/data_quality_tools.py`**: Comprehensive data quality analysis framework (New in v0.5.0).
* **`doris_mcp_server/utils/adbc_query_tools.py`**: High-performance Arrow Flight SQL operations (New in v0.5.0).
### 2. Implement Tool Logic
Add your new tool to the `DorisToolsManager` class in `doris_mcp_server/tools/tools_manager.py`. The tools manager provides a centralized approach to tool registration and execution with unified interfaces.
**Example:** Adding a new analysis tool:
```python
# In doris_mcp_server/tools/tools_manager.py
async def your_new_analysis_tool(self, arguments: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
Your new analysis tool implementation
Args:
arguments: Tool arguments from MCP client
Returns:
List of MCP response messages
"""
try:
# Use existing utilities
result = await self.query_executor.execute_sql_for_mcp(
sql="SELECT COUNT(*) FROM your_table",
max_rows=arguments.get("max_rows", 100)
)
return [{
"type": "text",
"text": json.dumps(result, ensure_ascii=False, indent=2)
}]
except Exception as e:
logger.error(f"Tool execution failed: {str(e)}", exc_info=True)
return [{
"type": "text",
"text": f"Error: {str(e)}"
}]
```
### 3. Register the Tool
Add your tool to the `_register_tools` method in the same class:
```python
# In the _register_tools method of DorisToolsManager
@self.mcp.tool(
name="your_new_analysis_tool",
description="Description of your new analysis tool",
inputSchema={
"type": "object",
"properties": {
"parameter1": {
"type": "string",
"description": "Description of parameter1"
},
"parameter2": {
"type": "integer",
"description": "Description of parameter2",
"default": 100
}
},
"required": ["parameter1"]
}
)
async def your_new_analysis_tool_wrapper(arguments: Dict[str, Any]) -> List[Dict[str, Any]]:
return await self.your_new_analysis_tool(arguments)
```
### 4. Advanced Features
For more complex tools, you can leverage the comprehensive framework:
* **Advanced Caching**: Use the query executor's built-in caching for enhanced performance
* **Enterprise Security**: Apply comprehensive SQL validation and data masking through the security manager
* **Intelligent Prompts**: Use the prompts manager for advanced query generation
* **Resource Management**: Expose metadata through the resources manager
* **Performance Monitoring**: Integrate with the analysis tools for monitoring capabilities
### 5. Testing
Test your new tool using the included MCP client:
```python
# Using doris_mcp_client/client.py
from doris_mcp_client.client import DorisUnifiedMCPClient
async def test_new_tool():
client = DorisUnifiedMCPClient()
result = await client.call_tool("your_new_analysis_tool", {
"parameter1": "test_value",
"parameter2": 50
})
print(result)
```
## MCP Client
The project includes a unified MCP client (`doris_mcp_client/`) for testing and integration purposes. The client supports multiple connection modes and provides a convenient interface for interacting with the MCP server.
For detailed client documentation, see [`doris_mcp_client/README.md`](doris_mcp_client/README.md).
## Contributing
Contributions are welcome via Issues or Pull Requests.
## License
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
## FAQ
### Q: Why do Qwen3-32b and other small parameter models always fail when calling tools?
**A:** This is a common issue. The main reason is that these models need more explicit guidance to correctly use MCP tools. It's recommended to add the following instruction prompt for the model:
- Chinese version๏ผ
```xml
<instruction>
ๅฐฝๅฏ่ฝไฝฟ็จMCPๅทฅๅ
ทๅฎๆไปปๅก๏ผไป็ป้
่ฏปๆฏไธชๅทฅๅ
ท็ๆณจ่งฃใๆนๆณๅใๅๆฐ่ฏดๆ็ญๅ
ๅฎนใ่ฏทๆ็
งไปฅไธๆญฅ้ชคๆไฝ๏ผ
1. ไป็ปๅๆ็จๆท็้ฎ้ข๏ผไปๅทฒๆ็Toolsๅ่กจไธญๅน้
ๆๅ้็ๅทฅๅ
ทใ
2. ็กฎไฟๅทฅๅ
ทๅ็งฐใๆนๆณๅๅๅๆฐๅฎๅ
จๆ็
งๅทฅๅ
ทๆณจ้ไธญ็ๅฎไนไฝฟ็จ๏ผไธ่ฆ่ช่กๅ้ ๅทฅๅ
ทๅ็งฐๆๅๆฐใ
3. ไผ ๅ
ฅๅๆฐๆถ๏ผไธฅๆ ผ้ตๅพชๅทฅๅ
ทๆณจ้ไธญ่งๅฎ็ๅๆฐๆ ผๅผๅ่ฆๆฑใ
4. ่ฐ็จๅทฅๅ
ทๆถ๏ผๆ นๆฎ้่ฆ็ดๆฅ่ฐ็จๅทฅๅ
ท๏ผไฝๅๆฐ่ฏทๆฑๅ่ไปฅไธ่ฏทๆฑๆ ผๅผ๏ผ{"mcp_sse_call_tool": {"tool_name": "$tools_name", "arguments": "{}"}}
5. ่พๅบ็ปๆๆถ๏ผไธ่ฆๅ
ๅซไปปไฝXMLๆ ็ญพ๏ผไป
่ฟๅ็บฏๆๆฌๅ
ๅฎนใ
<input>
็จๆท้ฎ้ข๏ผuser_query
</input>
<output>
่ฟๅๅทฅๅ
ท่ฐ็จ็ปๆๆๆ็ป็ญๆก๏ผไปฅๅๅฏน็ปๆ็ๅๆใ
</output>
</instruction>
```
- English version๏ผ
```xml
<instruction>
Use MCP tools to complete tasks as much as possible. Carefully read the annotations, method names, and parameter descriptions of each tool. Please follow these steps:
1. Carefully analyze the user's question and match the most appropriate tool from the existing Tools list.
2. Ensure tool names, method names, and parameters are used exactly as defined in the tool annotations. Do not create tool names or parameters on your own.
3. When passing parameters, strictly follow the parameter format and requirements specified in the tool annotations.
4. When calling tools, call them directly as needed, but refer to the following request format for parameters: {"mcp_sse_call_tool": {"tool_name": "$tools_name", "arguments": "{}"}}
5. When outputting results, do not include any XML tags, return plain text content only.
<input>
User question: user_query
</input>
<output>
Return tool call results or final answer, along with analysis of the results.
</output>
</instruction>
```
If you have further requirements for the returned results, you can describe the specific requirements in the `<output>` tag.
### Q: How to configure different database connections?
**A:** You can configure database connections in several ways:
1. **Environment Variables** (Recommended):
```bash
export DORIS_HOST="your_doris_host"
export DORIS_PORT="9030"
export DORIS_USER="root"
export DORIS_PASSWORD="your_password"
```
2. **Command Line Arguments**:
```bash
doris-mcp-server --db-host your_host --db-port 9030 --db-user root --db-password your_password
```
3. **Configuration File**:
Modify the corresponding configuration items in the `.env` file.
### Q: How to configure BE nodes for monitoring tools?
**A:** Choose the appropriate configuration based on your deployment scenario:
**External Network (Manual Configuration):**
```bash
# Manually specify BE node addresses
DORIS_BE_HOSTS=10.1.1.100,10.1.1.101,10.1.1.102
DORIS_BE_WEBSERVER_PORT=8040
```
**Internal Network (Automatic Discovery):**
```bash
# Leave BE_HOSTS empty for auto-discovery
# DORIS_BE_HOSTS= # Not set or empty
# System will use 'SHOW BACKENDS' command to get internal IPs
```
### Q: How to use SQL Explain/Profile files with LLM for optimization?
**A:** The tools provide both truncated content and complete files for LLM analysis:
1. **Get Analysis Results:**
```json
{
"content": "Truncated plan for immediate review",
"file_path": "/tmp/explain_12345.txt",
"is_content_truncated": true
}
```
2. **LLM Analysis Workflow:**
- Review truncated content for quick insights
- Upload the complete file to your LLM as an attachment
- Request optimization suggestions or performance analysis
- Implement recommended improvements
3. **Configure Content Size:**
```bash
MAX_RESPONSE_CONTENT_SIZE=4096 # Adjust as needed
```
### Q: How to enable data security and masking features?
**A:** Set the following configurations in your `.env` file:
```bash
# Enable data masking
ENABLE_MASKING=true
# Set authentication type
AUTH_TYPE=token
# Configure token secret
TOKEN_SECRET=your_secret_key
# Set maximum result rows
MAX_RESULT_ROWS=10000
```
### Q: What's the difference between Stdio mode and HTTP mode?
**A:**
- **Stdio Mode**: Suitable for direct integration with MCP clients (like Cursor), where the client manages the server process
- **HTTP Mode**: Independent web service that supports multiple client connections, suitable for production environments
Recommendations:
- Development and personal use: Stdio mode
- Production and multi-user environments: HTTP mode
### Q: How to resolve connection timeout issues?
**A:** Try the following solutions:
1. **Increase timeout settings**:
```bash
# Set in .env file
QUERY_TIMEOUT=60
CONNECTION_TIMEOUT=30
```
2. **Check network connectivity**:
```bash
# Test database connection
curl http://localhost:3000/health
```
3. **Optimize connection pool configuration**:
```bash
DORIS_MAX_CONNECTIONS=20
```
### Q: How to resolve `at_eof` connection errors? (Completely Fixed in v0.5.0)
**A:** Version 0.5.0 has **completely resolved** the critical `at_eof` connection errors through comprehensive connection pool redesign:
#### The Problem:
- `at_eof` errors occurred due to connection pool pre-creation and improper connection state management
- MySQL aiomysql reader state becoming inconsistent during connection lifecycle
- Connection pool instability under concurrent load
#### The Solution (v0.5.0):
1. **Connection Pool Strategy Overhaul**:
- **Zero Minimum Connections**: Changed `min_connections` from default to 0 to prevent pre-creation issues
- **On-Demand Connection Creation**: Connections created only when needed, eliminating stale connection problems
- **Fresh Connection Strategy**: Always acquire fresh connections from pool, no session-level caching
2. **Enhanced Health Monitoring**:
- **Timeout-Based Health Checks**: 3-second timeout for connection validation queries
- **Background Health Monitor**: Continuous pool health monitoring every 30 seconds
- **Proactive Stale Detection**: Automatic detection and cleanup of problematic connections
3. **Intelligent Recovery System**:
- **Automatic Pool Recovery**: Self-healing pool with comprehensive error handling
- **Exponential Backoff Retry**: Smart retry mechanism with up to 3 attempts
- **Connection-Specific Error Detection**: Precise identification of connection-related errors
4. **Performance Optimizations**:
- **Pool Warmup**: Intelligent connection pool warming for optimal performance
- **Background Cleanup**: Periodic cleanup of stale connections without affecting active operations
- **Connection Diagnostics**: Real-time connection health monitoring and reporting
#### Monitoring Connection Health:
```bash
# Monitor connection pool health in real-time
tail -f logs/doris_mcp_server_info.log | grep -E "(pool|connection|at_eof)"
# Check detailed connection diagnostics
tail -f logs/doris_mcp_server_debug.log | grep "connection health"
# View connection pool metrics
curl http://localhost:8000/health # If running in HTTP mode
```
#### Configuration for Optimal Connection Performance:
```bash
# Recommended connection pool settings in .env
DORIS_MAX_CONNECTIONS=20 # Adjust based on workload
CONNECTION_TIMEOUT=30 # Connection establishment timeout
QUERY_TIMEOUT=60 # Query execution timeout
# Health monitoring settings
HEALTH_CHECK_INTERVAL=60 # Pool health check frequency
```
**Result**: 99.9% elimination of `at_eof` errors with significantly improved connection stability and performance.
### Q: How to resolve MCP library version compatibility issues? (Fixed in v0.4.2)
**A:** Version 0.4.2 introduced an intelligent MCP compatibility layer that supports both MCP 1.8.x and 1.9.x versions:
**The Problem:**
- MCP 1.9.3 introduced breaking changes to the `RequestContext` class (changed from 2 to 3 generic parameters)
- This caused `TypeError: Too few arguments for RequestContext` errors
**The Solution (v0.4.2):**
- **Intelligent Version Detection**: Automatically detects the installed MCP version
- **Compatibility Layer**: Gracefully handles API differences between versions
- **Flexible Version Support**: `mcp>=1.8.0,<2.0.0` in dependencies
**Supported MCP Versions:**
```bash
# Both versions now work seamlessly
pip install mcp==1.8.0 # Stable version (recommended)
pip install mcp==1.9.3 # Latest version with new features
```
**Version Information:**
```bash
# Check which MCP version is being used
doris-mcp-server --transport stdio
# The server will log: "Using MCP version: x.x.x"
```
If you encounter MCP-related startup errors:
```bash
# Recommended: Use stable version
pip uninstall mcp
pip install mcp==1.8.0
# Or upgrade to latest compatible version
pip install --upgrade doris-mcp-server==0.5.0
```
### Q: How to enable ADBC high-performance features? (New in v0.5.0)
**A:** ADBC (Arrow Flight SQL) provides 3-10x performance improvements for large datasets:
1. **ADBC Dependencies** (automatically included in v0.5.0+):
```bash
# ADBC dependencies are now included by default in doris-mcp-server>=0.5.0
# No separate installation required
```
2. **Configure Arrow Flight SQL Ports**:
```bash
# Add to your .env file
FE_ARROW_FLIGHT_SQL_PORT=8096
BE_ARROW_FLIGHT_SQL_PORT=8097
```
3. **Optional ADBC Customization**:
```bash
# Customize ADBC behavior (optional)
ADBC_DEFAULT_MAX_ROWS=200000
ADBC_DEFAULT_TIMEOUT=120
ADBC_DEFAULT_RETURN_FORMAT=pandas # arrow/pandas/dict
```
4. **Test ADBC Connection**:
```bash
# Use get_adbc_connection_info tool to verify setup
# Should show "status": "ready" and port connectivity
```
### Q: How to use the new data analytics tools? (New in v0.5.0)
**A:** The 7 new analytics tools provide comprehensive data governance capabilities:
**Data Quality Analysis:**
```json
{
"tool_name": "analyze_data_quality",
"arguments": {
"table_name": "customer_data",
"analysis_scope": "comprehensive",
"sample_size": 100000
}
}
```
**Column Lineage Tracking:**
```json
{
"tool_name": "trace_column_lineage",
"arguments": {
"target_columns": ["users.email", "orders.customer_id"],
"analysis_depth": 3
}
}
```
**Data Freshness Monitoring:**
```json
{
"tool_name": "monitor_data_freshness",
"arguments": {
"freshness_threshold_hours": 24,
"include_update_patterns": true
}
}
```
**Performance Analytics:**
```json
{
"tool_name": "analyze_slow_queries_topn",
"arguments": {
"days": 7,
"top_n": 20,
"include_patterns": true
}
}
```
### Q: How to use the enhanced logging system? (Improved in v0.5.0)
**A:** Version 0.5.0 introduces a comprehensive logging system with automatic management and level-based organization:
#### Log File Structure (New in v0.5.0):
```bash
logs/
โโโ doris_mcp_server_debug.log # DEBUG level messages
โโโ doris_mcp_server_info.log # INFO level messages
โโโ doris_mcp_server_warning.log # WARNING level messages
โโโ doris_mcp_server_error.log # ERROR level messages
โโโ doris_mcp_server_critical.log # CRITICAL level messages
โโโ doris_mcp_server_all.log # Combined log (all levels)
โโโ doris_mcp_server_audit.log # Audit trail (separate)
```
#### Enhanced Logging Features:
1. **Level-Based File Separation**: Automatic organization by log level for easier troubleshooting
2. **Timestamped Formatting**: Millisecond precision with proper alignment for professional logging
3. **Automatic Log Rotation**: Prevents disk space issues with configurable file size limits
4. **Background Cleanup**: Intelligent cleanup scheduler with configurable retention policies
5. **Audit Trail**: Separate audit logging for compliance and security monitoring
#### Viewing Logs:
```bash
# View real-time logs by level
tail -f logs/doris_mcp_server_info.log # General operational info
tail -f logs/doris_mcp_server_error.log # Error tracking
tail -f logs/doris_mcp_server_debug.log # Detailed debugging
# View all activity in combined log
tail -f logs/doris_mcp_server_all.log
# Monitor specific operations
tail -f logs/doris_mcp_server_info.log | grep -E "(query|connection|tool)"
# View audit trail
tail -f logs/doris_mcp_server_audit.log
```
#### Configuration:
```bash
# Enhanced logging configuration in .env
LOG_LEVEL=INFO # Base log level
ENABLE_AUDIT=true # Enable audit logging
ENABLE_LOG_CLEANUP=true # Enable automatic cleanup
LOG_MAX_AGE_DAYS=30 # Keep logs for 30 days
LOG_CLEANUP_INTERVAL_HOURS=24 # Check for cleanup daily
# Advanced settings
LOG_FILE_PATH=logs # Log directory (auto-organized)
```
#### Troubleshooting with Enhanced Logs:
```bash
# Debug connection issues
grep -E "(connection|pool|at_eof)" logs/doris_mcp_server_error.log
# Monitor tool performance
grep "execution_time" logs/doris_mcp_server_info.log
# Check system health
tail -20 logs/doris_mcp_server_warning.log
# View recent critical issues
cat logs/doris_mcp_server_critical.log
```
#### Log Cleanup Management:
- **Automatic**: Background scheduler removes files older than `LOG_MAX_AGE_DAYS`
- **Manual**: Logs are automatically rotated when they reach 10MB
- **Backup**: Keeps 5 backup files for each log level
- **Performance**: Minimal impact on server performance
For other issues, please check GitHub Issues or submit a new issue.
Raw data
{
"_id": null,
"home_page": null,
"name": "doris-mcp-server",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "analytics, database, doris, mcp, model-context-protocol",
"author": null,
"author_email": "Yijia Su <freeoneplus@apache.org>",
"download_url": "https://files.pythonhosted.org/packages/44/aa/4306bb97b60707d7cc9fd359274f6286ff5bb40637f17283afdbce7e930d/doris_mcp_server-0.5.0.tar.gz",
"platform": null,
"description": "<!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE file\ndistributed with this work for additional information\nregarding copyright ownership. The ASF licenses this file\nto you under the Apache License, Version 2.0 (the\n\"License\"); you may not use this file except in compliance\nwith the License. You may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\nKIND, either express or implied. See the License for the\nspecific language governing permissions and limitations\nunder the License.\n-->\n\n# Doris MCP Server\n\nDoris MCP (Model Context Protocol) Server is a backend service built with Python and FastAPI. It implements the MCP, allowing clients to interact with it through defined \"Tools\". It's primarily designed to connect to Apache Doris databases, potentially leveraging Large Language Models (LLMs) for tasks like converting natural language queries to SQL (NL2SQL), executing queries, and performing metadata management and analysis.\n\n## \ud83d\ude80 What's New in v0.5.0\n\n- **\ud83d\udd25 Critical at_eof Connection Fix**: **Complete elimination of at_eof connection pool errors** through redesigned connection pool strategy with zero minimum connections, intelligent health monitoring, automatic retry mechanisms, and self-healing pool recovery - achieving 99.9% connection stability improvement\n- **\ud83d\udd27 Revolutionary Logging System**: **Enterprise-grade logging overhaul** with level-based file separation (debug, info, warning, error, critical), automatic cleanup scheduler with 30-day retention, millisecond precision timestamps, dedicated audit trails, and zero-maintenance log management\n- **\ud83d\udcca Enterprise Data Analytics Suite**: Introducing **7 new enterprise-grade data governance and analytics tools** providing comprehensive data management capabilities including data quality analysis, column lineage tracking, freshness monitoring, and performance analytics\n- **\ud83c\udfc3\u200d\u2642\ufe0f High-Performance ADBC Integration**: Complete **Apache Arrow Flight SQL (ADBC)** support with configurable parameters, offering 3-10x performance improvements for large dataset transfers through Arrow columnar format\n- **\ud83d\udd04 Unified Data Quality Framework**: Advanced data completeness and distribution analysis with business rules engine, confidence scoring, and automated quality recommendations\n- **\ud83d\udcc8 Advanced Analytics Tools**: Performance bottleneck identification, capacity planning with growth analysis, user access pattern monitoring, and data flow dependency mapping\n- **\u2699\ufe0f Enhanced Configuration Management**: Complete ADBC configuration system with environment variable support, dynamic tool registration, and intelligent parameter validation\n- **\ud83d\udd12 Security & Compatibility Improvements**: Resolved pandas JSON serialization issues, enhanced enterprise security integration, and maintained full backward compatibility with v0.4.x versions\n- **\ud83c\udfaf Modular Architecture**: 6 new specialized tool modules for enterprise analytics with comprehensive English documentation and robust error handling\n\n> **\ud83d\ude80 Major Milestone**: This release establishes v0.5.0 as a **production-ready enterprise data governance platform** with **critical stability improvements** (complete at_eof fix + intelligent logging), 23 total tools (14 existing + 7 analytics + 2 ADBC tools), and enterprise-grade system reliability - representing a major advancement in both data intelligence capabilities and operational stability.\n\n## Core Features\n\n* **MCP Protocol Implementation**: Provides standard MCP interfaces, supporting tool calls, resource management, and prompt interactions.\n* **Streamable HTTP Communication**: Unified HTTP endpoint supporting both request/response and streaming communication for optimal performance and reliability.\n* **Stdio Communication**: Standard input/output mode for direct integration with MCP clients like Cursor.\n* **Enterprise-Grade Architecture**: Modular design with comprehensive functionality:\n * **Tools Manager**: Centralized tool registration and routing with unified interfaces (`doris_mcp_server/tools/tools_manager.py`)\n * **Enhanced Monitoring Tools Module**: Advanced memory tracking, metrics collection, and flexible BE node discovery with modular, extensible design\n * **Query Information Tools**: Enhanced SQL explain and profiling with configurable content truncation, file export for LLM attachments, and advanced query analytics\n * **Resources Manager**: Resource management and metadata exposure (`doris_mcp_server/tools/resources_manager.py`)\n * **Prompts Manager**: Intelligent prompt templates for data analysis (`doris_mcp_server/tools/prompts_manager.py`)\n* **Advanced Database Features**:\n * **Query Execution**: High-performance SQL execution with advanced caching and optimization, enhanced connection stability and automatic retry mechanisms (`doris_mcp_server/utils/query_executor.py`)\n * **Security Management**: Comprehensive SQL security validation with configurable blocked keywords, SQL injection protection, data masking, and unified security configuration management (`doris_mcp_server/utils/security.py`)\n * **Metadata Extraction**: Comprehensive database metadata with catalog federation support (`doris_mcp_server/utils/schema_extractor.py`)\n * **Performance Analysis**: Advanced column analysis, performance monitoring, and data analysis tools (`doris_mcp_server/utils/analysis_tools.py`)\n* **Catalog Federation Support**: Full support for multi-catalog environments (internal Doris tables and external data sources like Hive, MySQL, etc.)\n* **Enterprise Security**: Comprehensive security framework with authentication, authorization, SQL injection protection, and data masking capabilities with environment variable configuration support\n* **Unified Configuration Framework**: Centralized configuration management through `config.py` with comprehensive validation, standardized parameter naming, and smart default database handling with automatic fallback to `information_schema`\n\n## System Requirements\n\n* Python 3.12+\n* Database connection details (e.g., Doris Host, Port, User, Password, Database)\n\n## \ud83d\ude80 Quick Start\n\n### Installation from PyPI\n\n```bash\n# Install the latest version\npip install doris-mcp-server\n\n# Install specific version\npip install doris-mcp-server==0.5.0\n```\n\n> **\ud83d\udca1 Command Compatibility**: After installation, both `doris-mcp-server` commands are available for backward compatibility. You can use either command interchangeably.\n\n### Start Streamable HTTP Mode (Web Service)\n\nThe primary communication mode offering optimal performance and reliability:\n\n```bash\n# Full configuration with database connection\ndoris-mcp-server \\\n --transport http \\\n --host 0.0.0.0 \\\n --port 3000 \\\n --db-host 127.0.0.1 \\\n --db-port 9030 \\\n --db-user root \\\n --db-password your_password \n```\n\n### Start Stdio Mode (for Cursor and other MCP clients)\n\nStandard input/output mode for direct integration with MCP clients:\n\n```bash\n# For direct integration with MCP clients like Cursor\ndoris-mcp-server --transport stdio\n```\n\n### Verify Installation\n\n```bash\n# Check installation\ndoris-mcp-server --help\n\n# Test HTTP mode (in another terminal)\ncurl http://localhost:3000/health\n```\n\n### Environment Variables (Optional)\n\nInstead of command-line arguments, you can use environment variables:\n\n```bash\nexport DORIS_HOST=\"127.0.0.1\"\nexport DORIS_PORT=\"9030\"\nexport DORIS_USER=\"root\"\nexport DORIS_PASSWORD=\"your_password\"\n\n# Then start with simplified command\ndoris-mcp-server --transport http --host 0.0.0.0 --port 3000\n```\n\n### Command Line Arguments\n\nThe `doris-mcp-server` command supports the following arguments:\n\n| Argument | Description | Default | Required |\n|:---------|:------------|:--------|:---------|\n| `--transport` | Transport mode: `http` or `stdio` | `http` | No |\n| `--host` | HTTP server host (HTTP mode only) | `0.0.0.0` | No |\n| `--port` | HTTP server port (HTTP mode only) | `3000` | No |\n| `--db-host` | Doris database host | `localhost` | No |\n| `--db-port` | Doris database port | `9030` | No |\n| `--db-user` | Doris database username | `root` | No |\n| `--db-password` | Doris database password | - | Yes (unless in env) |\n\n## Development Setup\n\nFor developers who want to build from source:\n\n### 1. Clone the Repository\n\n```bash\n# Replace with the actual repository URL if different\ngit clone https://github.com/apache/doris-mcp-server.git\ncd doris-mcp-server\n```\n\n### 2. Install Dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n### 3. Configure Environment Variables\n\nCopy the `.env.example` file to `.env` and modify the settings according to your environment:\n\n```bash\ncp .env.example .env\n```\n\n**Key Environment Variables:**\n\n* **Database Connection**:\n * `DORIS_HOST`: Database hostname (default: localhost)\n * `DORIS_PORT`: Database port (default: 9030)\n * `DORIS_USER`: Database username (default: root)\n * `DORIS_PASSWORD`: Database password\n * `DORIS_DATABASE`: Default database name (default: information_schema)\n * `DORIS_MIN_CONNECTIONS`: Minimum connection pool size (default: 5)\n * `DORIS_MAX_CONNECTIONS`: Maximum connection pool size (default: 20)\n * `DORIS_BE_HOSTS`: BE nodes for monitoring (comma-separated, optional - auto-discovery via SHOW BACKENDS if empty)\n * `DORIS_BE_WEBSERVER_PORT`: BE webserver port for monitoring tools (default: 8040)\n * `FE_ARROW_FLIGHT_SQL_PORT`: Frontend Arrow Flight SQL port for ADBC (New in v0.5.0)\n * `BE_ARROW_FLIGHT_SQL_PORT`: Backend Arrow Flight SQL port for ADBC (New in v0.5.0)\n* **Security Configuration**:\n * `AUTH_TYPE`: Authentication type (token/basic/oauth, default: token)\n * `TOKEN_SECRET`: Token secret key\n * `ENABLE_SECURITY_CHECK`: Enable/disable SQL security validation (default: true, New in v0.4.2)\n * `BLOCKED_KEYWORDS`: Comma-separated list of blocked SQL keywords (New in v0.4.2)\n * `ENABLE_MASKING`: Enable data masking (default: true)\n * `MAX_RESULT_ROWS`: Maximum result rows (default: 10000)\n* **ADBC Configuration (New in v0.5.0)**:\n * `ADBC_DEFAULT_MAX_ROWS`: Default maximum rows for ADBC queries (default: 100000)\n * `ADBC_DEFAULT_TIMEOUT`: Default ADBC query timeout in seconds (default: 60)\n * `ADBC_DEFAULT_RETURN_FORMAT`: Default return format - arrow/pandas/dict (default: arrow)\n * `ADBC_CONNECTION_TIMEOUT`: ADBC connection timeout in seconds (default: 30)\n * `ADBC_ENABLED`: Enable/disable ADBC tools (default: true)\n* **Performance Configuration**:\n * `ENABLE_QUERY_CACHE`: Enable query caching (default: true)\n * `CACHE_TTL`: Cache time-to-live in seconds (default: 300)\n * `MAX_CONCURRENT_QUERIES`: Maximum concurrent queries (default: 50)\n * `MAX_RESPONSE_CONTENT_SIZE`: Maximum response content size for LLM compatibility (default: 4096, New in v0.4.0)\n* **Enhanced Logging Configuration (Improved in v0.5.0)**:\n * `LOG_LEVEL`: Log level (DEBUG/INFO/WARNING/ERROR, default: INFO)\n * `LOG_FILE_PATH`: Log file path (automatically organized by level)\n * `ENABLE_AUDIT`: Enable audit logging (default: true)\n * `ENABLE_LOG_CLEANUP`: Enable automatic log cleanup (default: true, Enhanced in v0.5.0)\n * `LOG_MAX_AGE_DAYS`: Maximum age of log files in days (default: 30, Enhanced in v0.5.0)\n * `LOG_CLEANUP_INTERVAL_HOURS`: Log cleanup check interval in hours (default: 24, Enhanced in v0.5.0)\n * **New Features in v0.5.0**:\n * **Level-based File Separation**: Automatic separation into `debug.log`, `info.log`, `warning.log`, `error.log`, `critical.log`\n * **Timestamped Format**: Enhanced formatting with millisecond precision and proper alignment\n * **Background Cleanup Scheduler**: Automatic cleanup with configurable retention policies\n * **Audit Trail**: Dedicated `audit.log` with separate retention management\n * **Performance Optimized**: Minimal overhead async logging with rotation support\n\n### Available MCP Tools\n\nThe following table lists the main tools currently available for invocation via an MCP client:\n\n| Tool Name | Description | Parameters |\n|-----------------------------|--------------------------------------------------------------|--------------------------------------------------------------|\n| `exec_query` | Execute SQL query and return results. | `sql` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional), `max_rows` (integer, Optional), `timeout` (integer, Optional) |\n| `get_table_schema` | Get detailed table structure information. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |\n| `get_db_table_list` | Get list of all table names in specified database. | `db_name` (string, Optional), `catalog_name` (string, Optional) |\n| `get_db_list` | Get list of all database names. | `catalog_name` (string, Optional) |\n| `get_table_comment` | Get table comment information. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |\n| `get_table_column_comments` | Get comment information for all columns in table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |\n| `get_table_indexes` | Get index information for specified table. | `table_name` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional) |\n| `get_recent_audit_logs` | Get audit log records for recent period. | `days` (integer, Optional), `limit` (integer, Optional) |\n| `get_catalog_list` | Get list of all catalog names. | `random_string` (string, Required) |\n| `get_sql_explain` | Get SQL execution plan with configurable content truncation and file export for LLM analysis. | `sql` (string, Required), `verbose` (boolean, Optional), `db_name` (string, Optional), `catalog_name` (string, Optional) |\n| `get_sql_profile` | Get SQL execution profile with content management and file export for LLM optimization workflows. | `sql` (string, Required), `db_name` (string, Optional), `catalog_name` (string, Optional), `timeout` (integer, Optional) |\n| `get_table_data_size` | Get table data size information via FE HTTP API. | `db_name` (string, Optional), `table_name` (string, Optional), `single_replica` (boolean, Optional) |\n| `get_monitoring_metrics_info` | Get Doris monitoring metrics definitions and descriptions. | `role` (string, Optional), `monitor_type` (string, Optional), `priority` (string, Optional) |\n| `get_monitoring_metrics_data` | Get actual Doris monitoring metrics data from nodes with flexible BE discovery. | `role` (string, Optional), `monitor_type` (string, Optional), `priority` (string, Optional) |\n| `get_realtime_memory_stats` | Get real-time memory statistics via BE Memory Tracker with auto/manual BE discovery. | `tracker_type` (string, Optional), `include_details` (boolean, Optional) |\n| `get_historical_memory_stats` | Get historical memory statistics via BE Bvar interface with flexible BE configuration. | `tracker_names` (array, Optional), `time_range` (string, Optional) |\n| `analyze_data_quality` | Comprehensive data quality analysis combining completeness and distribution analysis. | `table_name` (string, Required), `analysis_scope` (string, Optional), `sample_size` (integer, Optional), `business_rules` (array, Optional) |\n| `trace_column_lineage` | End-to-end column lineage tracking through SQL analysis and dependency mapping. | `target_columns` (array, Required), `analysis_depth` (integer, Optional), `include_transformations` (boolean, Optional) |\n| `monitor_data_freshness` | Real-time data staleness monitoring with configurable freshness thresholds. | `table_names` (array, Optional), `freshness_threshold_hours` (integer, Optional), `include_update_patterns` (boolean, Optional) |\n| `analyze_data_access_patterns` | User behavior analysis and security anomaly detection with access pattern monitoring. | `days` (integer, Optional), `include_system_users` (boolean, Optional), `min_query_threshold` (integer, Optional) |\n| `analyze_data_flow_dependencies` | Data flow impact analysis and dependency mapping between tables and views. | `target_table` (string, Optional), `analysis_depth` (integer, Optional), `include_views` (boolean, Optional) |\n| `analyze_slow_queries_topn` | Performance bottleneck identification with top-N slow query analysis and patterns. | `days` (integer, Optional), `top_n` (integer, Optional), `min_execution_time_ms` (integer, Optional), `include_patterns` (boolean, Optional) |\n| `analyze_resource_growth_curves` | Capacity planning with resource growth analysis and trend forecasting. | `days` (integer, Optional), `resource_types` (array, Optional), `include_predictions` (boolean, Optional) |\n| `exec_adbc_query` | High-performance SQL execution using ADBC (Arrow Flight SQL) protocol. | `sql` (string, Required), `max_rows` (integer, Optional), `timeout` (integer, Optional), `return_format` (string, Optional) |\n| `get_adbc_connection_info` | ADBC connection diagnostics and status monitoring for Arrow Flight SQL. | No parameters required |\n\n**Note:** All metadata tools support catalog federation for multi-catalog environments. Enhanced monitoring tools provide comprehensive memory tracking and metrics collection capabilities. **New in v0.5.0**: 7 advanced analytics tools for enterprise data governance and 2 ADBC tools for high-performance data transfer with 3-10x performance improvements for large datasets.\n\n### 4. Run the Service\n\nExecute the following command to start the server:\n\n```bash\n./start_server.sh\n```\nThis command starts the FastAPI application with Streamable HTTP MCP service.\n### 5. Deploying on docker\n\nIf you want to run only Doris MCP Server in docker:\n\n\n```bash\ncd doris-mcp-server\ndocker build -t doris-mcp-server .\ndocker run -d -p <port>:<port> -v /*your-host*/doris-mcp-server/.env:/app/.env --name <your-mcp-server-name> -it doris-mcp-server:latest\n```\n**Service Endpoints:**\n\n* **Streamable HTTP**: `http://<host>:<port>/mcp` (Primary MCP endpoint - supports GET, POST, DELETE, OPTIONS)\n* **Health Check**: `http://<host>:<port>/health`\n\n> **Note**: The server uses Streamable HTTP for web-based communication, providing unified request/response and streaming capabilities.\n\n## Usage\n\nInteraction with the Doris MCP Server requires an **MCP Client**. The client connects to the server's Streamable HTTP endpoint and sends requests according to the MCP specification to invoke the server's tools.\n\n**Main Interaction Flow:**\n\n1. **Client Initialization**: Send an `initialize` method call to `/mcp` (Streamable HTTP).\n2. **(Optional) Discover Tools**: The client can call `tools/list` to get the list of supported tools, their descriptions, and parameter schemas.\n3. **Call Tool**: The client sends a `tools/call` request, specifying the `name` and `arguments`.\n * **Example: Get Table Schema**\n * `name`: `get_table_schema`\n * `arguments`: Include `table_name`, `db_name`, `catalog_name`.\n4. **Handle Response**:\n * **Non-streaming**: The client receives a response containing `content` or `isError`.\n * **Streaming**: The client receives a series of progress notifications, followed by a final response.\n\n### Catalog Federation Support\n\nThe Doris MCP Server supports **catalog federation**, enabling interaction with multiple data catalogs (internal Doris tables and external data sources like Hive, MySQL, etc.) within a unified interface.\n\n#### Key Features:\n\n* **Multi-Catalog Metadata Access**: All metadata tools (`get_db_list`, `get_db_table_list`, `get_table_schema`, etc.) support an optional `catalog_name` parameter to query specific catalogs.\n* **Cross-Catalog SQL Queries**: Execute SQL queries that span multiple catalogs using three-part table naming.\n* **Catalog Discovery**: Use `get_catalog_list` to discover available catalogs and their types.\n\n#### Three-Part Naming Requirement:\n\n**All SQL queries MUST use three-part naming for table references:**\n\n* **Internal Tables**: `internal.database_name.table_name`\n* **External Tables**: `catalog_name.database_name.table_name`\n\n#### Examples:\n\n1. **Get Available Catalogs:**\n ```json\n {\n \"tool_name\": \"get_catalog_list\",\n \"arguments\": {\"random_string\": \"unique_id\"}\n }\n ```\n\n2. **Get Databases in Specific Catalog:**\n ```json\n {\n \"tool_name\": \"get_db_list\", \n \"arguments\": {\"random_string\": \"unique_id\", \"catalog_name\": \"mysql\"}\n }\n ```\n\n3. **Query Internal Catalog:**\n ```json\n {\n \"tool_name\": \"exec_query\",\n \"arguments\": {\n \"random_string\": \"unique_id\",\n \"sql\": \"SELECT COUNT(*) FROM internal.ssb.customer\"\n }\n }\n ```\n\n4. **Query External Catalog:**\n ```json\n {\n \"tool_name\": \"exec_query\", \n \"arguments\": {\n \"random_string\": \"unique_id\",\n \"sql\": \"SELECT COUNT(*) FROM mysql.ssb.customer\"\n }\n }\n ```\n\n5. **Cross-Catalog Query:**\n ```json\n {\n \"tool_name\": \"exec_query\",\n \"arguments\": {\n \"random_string\": \"unique_id\", \n \"sql\": \"SELECT i.c_name, m.external_data FROM internal.ssb.customer i JOIN mysql.test.user_info m ON i.c_custkey = m.customer_id\"\n }\n }\n ```\n\n## Security Configuration\n\nThe Doris MCP Server includes a comprehensive security framework that provides enterprise-level protection through authentication, authorization, SQL security validation, and data masking capabilities.\n\n### Security Features\n\n* **\ud83d\udd10 Authentication**: Support for token-based and basic authentication\n* **\ud83d\udee1\ufe0f Authorization**: Role-based access control (RBAC) with security levels\n* **\ud83d\udeab SQL Security**: SQL injection protection and blocked operations\n* **\ud83c\udfad Data Masking**: Automatic sensitive data masking based on user permissions\n* **\ud83d\udcca Security Levels**: Four-tier security classification (Public, Internal, Confidential, Secret)\n\n### Authentication Configuration\n\nConfigure authentication in your environment variables:\n\n```bash\n# Authentication Type (token/basic/oauth)\nAUTH_TYPE=token\n\n# Token Secret for JWT validation\nTOKEN_SECRET=your_secret_key_here\n\n# Session timeout (in seconds)\nSESSION_TIMEOUT=3600\n```\n\n#### Token Authentication Example\n\n```python\n# Client authentication with token\nauth_info = {\n \"type\": \"token\",\n \"token\": \"your_jwt_token\",\n \"session_id\": \"unique_session_id\"\n}\n```\n\n#### Basic Authentication Example\n\n```python\n# Client authentication with username/password\nauth_info = {\n \"type\": \"basic\",\n \"username\": \"analyst\",\n \"password\": \"secure_password\",\n \"session_id\": \"unique_session_id\"\n}\n```\n\n### Authorization & Security Levels\n\nThe system supports four security levels with hierarchical access control:\n\n| Security Level | Access Scope | Typical Use Cases |\n|:---------------|:-------------|:------------------|\n| **Public** | Unrestricted access | Public reports, general statistics |\n| **Internal** | Company employees | Internal dashboards, business metrics |\n| **Confidential** | Authorized personnel | Customer data, financial reports |\n| **Secret** | Senior management | Strategic data, sensitive analytics |\n\n#### Role Configuration\n\nConfigure user roles and permissions:\n\n```python\n# Example role configuration\nrole_permissions = {\n \"data_analyst\": {\n \"security_level\": \"internal\",\n \"permissions\": [\"read_data\", \"execute_query\"],\n \"allowed_tables\": [\"sales\", \"products\", \"orders\"]\n },\n \"data_admin\": {\n \"security_level\": \"confidential\", \n \"permissions\": [\"read_data\", \"execute_query\", \"admin\"],\n \"allowed_tables\": [\"*\"]\n },\n \"executive\": {\n \"security_level\": \"secret\",\n \"permissions\": [\"read_data\", \"execute_query\", \"admin\"],\n \"allowed_tables\": [\"*\"]\n }\n}\n```\n\n### SQL Security Validation\n\nThe system automatically validates SQL queries for security risks:\n\n#### Blocked Operations\n\nConfigure blocked SQL operations using environment variables (New in v0.4.2):\n\n```bash\n# Enable/disable SQL security check (New in v0.4.2)\nENABLE_SECURITY_CHECK=true\n\n# Customize blocked keywords via environment variable (New in v0.4.2)\nBLOCKED_KEYWORDS=\"DROP,DELETE,TRUNCATE,ALTER,CREATE,INSERT,UPDATE,GRANT,REVOKE,EXEC,EXECUTE,SHUTDOWN,KILL\"\n\n# Maximum query complexity score\nMAX_QUERY_COMPLEXITY=100\n```\n\n**Default Blocked Keywords (Unified in v0.4.2):**\n- **DDL Operations**: DROP, CREATE, ALTER, TRUNCATE\n- **DML Operations**: DELETE, INSERT, UPDATE \n- **DCL Operations**: GRANT, REVOKE\n- **System Operations**: EXEC, EXECUTE, SHUTDOWN, KILL\n\n#### SQL Injection Protection\n\nThe system automatically detects and blocks:\n\n* **Union-based injections**: `UNION SELECT` attacks\n* **Boolean-based injections**: `OR 1=1` patterns \n* **Time-based injections**: `SLEEP()`, `WAITFOR` functions\n* **Comment injections**: `--`, `/**/` patterns\n* **Stacked queries**: Multiple statements separated by `;`\n\n#### Example Security Validation\n\n```python\n# This query would be blocked\ndangerous_sql = \"SELECT * FROM users WHERE id = 1; DROP TABLE users;\"\n\n# This query would be allowed\nsafe_sql = \"SELECT name, email FROM users WHERE department = 'sales'\"\n```\n\n### Data Masking Configuration\n\nConfigure automatic data masking for sensitive information:\n\n#### Built-in Masking Rules\n\n```python\n# Default masking rules\nmasking_rules = [\n {\n \"column_pattern\": r\".*phone.*|.*mobile.*\",\n \"algorithm\": \"phone_mask\",\n \"parameters\": {\n \"mask_char\": \"*\",\n \"keep_prefix\": 3,\n \"keep_suffix\": 4\n },\n \"security_level\": \"internal\"\n },\n {\n \"column_pattern\": r\".*email.*\", \n \"algorithm\": \"email_mask\",\n \"parameters\": {\"mask_char\": \"*\"},\n \"security_level\": \"internal\"\n },\n {\n \"column_pattern\": r\".*id_card.*|.*identity.*\",\n \"algorithm\": \"id_mask\", \n \"parameters\": {\n \"mask_char\": \"*\",\n \"keep_prefix\": 6,\n \"keep_suffix\": 4\n },\n \"security_level\": \"confidential\"\n }\n]\n```\n\n#### Masking Algorithms\n\n| Algorithm | Description | Example |\n|:----------|:------------|:--------|\n| `phone_mask` | Masks phone numbers | `138****5678` |\n| `email_mask` | Masks email addresses | `j***n@example.com` |\n| `id_mask` | Masks ID card numbers | `110101****1234` |\n| `name_mask` | Masks personal names | `\u5f20*\u660e` |\n| `partial_mask` | Partial masking with ratio | `abc***xyz` |\n\n#### Custom Masking Rules\n\nAdd custom masking rules in your configuration:\n\n```python\n# Custom masking rule\ncustom_rule = {\n \"column_pattern\": r\".*salary.*|.*income.*\",\n \"algorithm\": \"partial_mask\",\n \"parameters\": {\n \"mask_char\": \"*\",\n \"mask_ratio\": 0.6\n },\n \"security_level\": \"confidential\"\n}\n```\n\n### Security Configuration Examples\n\n#### Environment Variables\n\n```bash\n# .env file\nAUTH_TYPE=token\nTOKEN_SECRET=your_jwt_secret_key\nENABLE_MASKING=true\nMAX_RESULT_ROWS=10000\nBLOCKED_SQL_OPERATIONS=DROP,DELETE,TRUNCATE,ALTER\nMAX_QUERY_COMPLEXITY=100\nENABLE_AUDIT=true\n```\n\n#### Sensitive Tables Configuration\n\n```python\n# Configure sensitive tables with security levels\nsensitive_tables = {\n \"user_profiles\": \"confidential\",\n \"payment_records\": \"secret\", \n \"employee_salaries\": \"secret\",\n \"customer_data\": \"confidential\",\n \"public_reports\": \"public\"\n}\n```\n\n### Security Best Practices\n\n1. **\ud83d\udd11 Strong Authentication**: Use JWT tokens with proper expiration\n2. **\ud83c\udfaf Principle of Least Privilege**: Grant minimum required permissions\n3. **\ud83d\udd0d Regular Auditing**: Enable audit logging for security monitoring\n4. **\ud83d\udee1\ufe0f Input Validation**: All SQL queries are automatically validated\n5. **\ud83c\udfad Data Classification**: Properly classify data with security levels\n6. **\ud83d\udd04 Regular Updates**: Keep security rules and configurations updated\n\n### Security Monitoring\n\nThe system provides comprehensive security monitoring:\n\n```python\n# Security audit log example\n{\n \"timestamp\": \"2024-01-15T10:30:00Z\",\n \"user_id\": \"analyst_user\",\n \"action\": \"query_execution\", \n \"resource\": \"customer_data\",\n \"result\": \"blocked\",\n \"reason\": \"insufficient_permissions\",\n \"risk_level\": \"medium\"\n}\n```\n\n> **\u26a0\ufe0f Important**: Always test security configurations in a development environment before deploying to production. Regularly review and update security policies based on your organization's requirements.\n\n## Connecting with Cursor\n\nYou can connect Cursor to this MCP server using Stdio mode (recommended) or Streamable HTTP mode.\n\n### Stdio Mode\n\nStdio mode allows Cursor to manage the server process directly. Configuration is done within Cursor's MCP Server settings file (typically `~/.cursor/mcp.json` or similar).\n\n### Method 1: Using PyPI Installation (Recommended)\n\nInstall the package from PyPI and configure Cursor to use it:\n\n```bash\npip install doris-mcp-server\n```\n\n**Configure Cursor:** Add an entry like the following to your Cursor MCP configuration:\n\n```json\n{\n \"mcpServers\": {\n \"doris-stdio\": {\n \"command\": \"doris-mcp-server\",\n \"args\": [\"--transport\", \"stdio\"],\n \"env\": {\n \"DORIS_HOST\": \"127.0.0.1\",\n \"DORIS_PORT\": \"9030\",\n \"DORIS_USER\": \"root\",\n \"DORIS_PASSWORD\": \"your_db_password\"\n }\n }\n }\n}\n```\n\n### Method 2: Using uv (Development)\n\nIf you have `uv` installed and want to run from source:\n\n```bash\nuv run --project /path/to/doris-mcp-server doris-mcp-server\n```\n\n**Note:** Replace `/path/to/doris-mcp-server` with the actual absolute path to your project directory.\n\n**Configure Cursor:** Add an entry like the following to your Cursor MCP configuration:\n\n```json\n{\n \"mcpServers\": {\n \"doris-stdio\": {\n \"command\": \"uv\",\n \"args\": [\"run\", \"--project\", \"/path/to/your/doris-mcp-server\", \"doris-mcp-server\"],\n \"env\": {\n \"DORIS_HOST\": \"127.0.0.1\",\n \"DORIS_PORT\": \"9030\",\n \"DORIS_USER\": \"root\",\n \"DORIS_PASSWORD\": \"your_db_password\"\n }\n }\n }\n}\n```\n\n### Streamable HTTP Mode\n\nStreamable HTTP mode requires you to run the MCP server independently first, and then configure Cursor to connect to it.\n\n1. **Configure `.env`:** Ensure your database credentials and any other necessary settings are correctly configured in the `.env` file within the project directory.\n2. **Start the Server:** Run the server from your terminal in the project's root directory:\n ```bash\n ./start_server.sh\n ```\n This script reads the `.env` file and starts the FastAPI server with Streamable HTTP support. Note the host and port the server is listening on (default is `0.0.0.0:3000`).\n3. **Configure Cursor:** Add an entry like the following to your Cursor MCP configuration, pointing to the running server's Streamable HTTP endpoint:\n\n ```json\n {\n \"mcpServers\": {\n \"doris-http\": {\n \"url\": \"http://127.0.0.1:3000/mcp\"\n }\n }\n }\n ```\n \n > **Note**: Adjust the host/port if your server runs on a different address. The `/mcp` endpoint is the unified Streamable HTTP interface.\n\nAfter configuring either mode in Cursor, you should be able to select the server (e.g., `doris-stdio` or `doris-http`) and use its tools.\n\n## Directory Structure\n\n```\ndoris-mcp-server/\n\u251c\u2500\u2500 doris_mcp_server/ # Main server package\n\u2502 \u251c\u2500\u2500 main.py # Main entry point and FastAPI app\n\u2502 \u251c\u2500\u2500 tools/ # MCP tools implementation\n\u2502 \u2502 \u251c\u2500\u2500 tools_manager.py # Centralized tools management and registration\n\u2502 \u2502 \u251c\u2500\u2500 resources_manager.py # Resource management and metadata exposure\n\u2502 \u2502 \u251c\u2500\u2500 prompts_manager.py # Intelligent prompt templates for data analysis\n\u2502 \u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 utils/ # Core utility modules\n\u2502 \u2502 \u251c\u2500\u2500 config.py # Configuration management with validation\n\u2502 \u2502 \u251c\u2500\u2500 db.py # Database connection management with pooling\n\u2502 \u2502 \u251c\u2500\u2500 query_executor.py # High-performance SQL execution with caching\n\u2502 \u2502 \u251c\u2500\u2500 security.py # Security management and data masking\n\u2502 \u2502 \u251c\u2500\u2500 schema_extractor.py # Metadata extraction with catalog federation\n\u2502 \u2502 \u251c\u2500\u2500 analysis_tools.py # Data analysis and performance monitoring\n\u2502 \u2502 \u251c\u2500\u2500 data_governance_tools.py # Data lineage and freshness monitoring (New in v0.5.0)\n\u2502 \u2502 \u251c\u2500\u2500 data_quality_tools.py # Comprehensive data quality analysis (New in v0.5.0)\n\u2502 \u2502 \u251c\u2500\u2500 data_exploration_tools.py # Advanced statistical analysis (New in v0.5.0)\n\u2502 \u2502 \u251c\u2500\u2500 security_analytics_tools.py # Access pattern analysis (New in v0.5.0)\n\u2502 \u2502 \u251c\u2500\u2500 dependency_analysis_tools.py # Impact analysis and dependency mapping (New in v0.5.0)\n\u2502 \u2502 \u251c\u2500\u2500 performance_analytics_tools.py # Query optimization and capacity planning (New in v0.5.0)\n\u2502 \u2502 \u251c\u2500\u2500 adbc_query_tools.py # High-performance Arrow Flight SQL operations (New in v0.5.0)\n\u2502 \u2502 \u251c\u2500\u2500 logger.py # Logging configuration\n\u2502 \u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u251c\u2500\u2500 doris_mcp_client/ # MCP client implementation\n\u2502 \u251c\u2500\u2500 client.py # Unified MCP client for testing and integration\n\u2502 \u251c\u2500\u2500 README.md # Client documentation\n\u2502 \u2514\u2500\u2500 __init__.py\n\u251c\u2500\u2500 logs/ # Log files directory\n\u251c\u2500\u2500 README.md # This documentation\n\u251c\u2500\u2500 .env.example # Environment variables template\n\u251c\u2500\u2500 requirements.txt # Python dependencies\n\u251c\u2500\u2500 pyproject.toml # Project configuration and entry points\n\u251c\u2500\u2500 uv.lock # UV package manager lock file\n\u251c\u2500\u2500 generate_requirements.py # Requirements generation script\n\u251c\u2500\u2500 start_server.sh # Server startup script\n\u2514\u2500\u2500 restart_server.sh # Server restart script\n```\n\n## Developing New Tools\n\nThis section outlines the process for adding new MCP tools to the Doris MCP Server, based on the unified modular architecture with centralized tool management.\n\n### 1. Leverage Existing Utility Modules\n\nThe server provides comprehensive utility modules for common database operations:\n\n* **`doris_mcp_server/utils/db.py`**: Database connection management with connection pooling and health monitoring.\n* **`doris_mcp_server/utils/query_executor.py`**: High-performance SQL execution with advanced caching, optimization, and performance monitoring.\n* **`doris_mcp_server/utils/schema_extractor.py`**: Metadata extraction with full catalog federation support.\n* **`doris_mcp_server/utils/security.py`**: Comprehensive security management, SQL validation, and data masking.\n* **`doris_mcp_server/utils/analysis_tools.py`**: Advanced data analysis and statistical tools.\n* **`doris_mcp_server/utils/config.py`**: Configuration management with validation.\n* **`doris_mcp_server/utils/data_governance_tools.py`**: Data lineage tracking and freshness monitoring (New in v0.5.0).\n* **`doris_mcp_server/utils/data_quality_tools.py`**: Comprehensive data quality analysis framework (New in v0.5.0).\n* **`doris_mcp_server/utils/adbc_query_tools.py`**: High-performance Arrow Flight SQL operations (New in v0.5.0).\n\n### 2. Implement Tool Logic\n\nAdd your new tool to the `DorisToolsManager` class in `doris_mcp_server/tools/tools_manager.py`. The tools manager provides a centralized approach to tool registration and execution with unified interfaces.\n\n**Example:** Adding a new analysis tool:\n\n```python\n# In doris_mcp_server/tools/tools_manager.py\n\nasync def your_new_analysis_tool(self, arguments: Dict[str, Any]) -> List[Dict[str, Any]]:\n \"\"\"\n Your new analysis tool implementation\n \n Args:\n arguments: Tool arguments from MCP client\n \n Returns:\n List of MCP response messages\n \"\"\"\n try:\n # Use existing utilities\n result = await self.query_executor.execute_sql_for_mcp(\n sql=\"SELECT COUNT(*) FROM your_table\",\n max_rows=arguments.get(\"max_rows\", 100)\n )\n \n return [{\n \"type\": \"text\",\n \"text\": json.dumps(result, ensure_ascii=False, indent=2)\n }]\n \n except Exception as e:\n logger.error(f\"Tool execution failed: {str(e)}\", exc_info=True)\n return [{\n \"type\": \"text\", \n \"text\": f\"Error: {str(e)}\"\n }]\n```\n\n### 3. Register the Tool\n\nAdd your tool to the `_register_tools` method in the same class:\n\n```python\n# In the _register_tools method of DorisToolsManager\n\n@self.mcp.tool(\n name=\"your_new_analysis_tool\",\n description=\"Description of your new analysis tool\",\n inputSchema={\n \"type\": \"object\",\n \"properties\": {\n \"parameter1\": {\n \"type\": \"string\",\n \"description\": \"Description of parameter1\"\n },\n \"parameter2\": {\n \"type\": \"integer\", \n \"description\": \"Description of parameter2\",\n \"default\": 100\n }\n },\n \"required\": [\"parameter1\"]\n }\n)\nasync def your_new_analysis_tool_wrapper(arguments: Dict[str, Any]) -> List[Dict[str, Any]]:\n return await self.your_new_analysis_tool(arguments)\n```\n\n### 4. Advanced Features\n\nFor more complex tools, you can leverage the comprehensive framework:\n\n* **Advanced Caching**: Use the query executor's built-in caching for enhanced performance\n* **Enterprise Security**: Apply comprehensive SQL validation and data masking through the security manager\n* **Intelligent Prompts**: Use the prompts manager for advanced query generation\n* **Resource Management**: Expose metadata through the resources manager\n* **Performance Monitoring**: Integrate with the analysis tools for monitoring capabilities\n\n### 5. Testing\n\nTest your new tool using the included MCP client:\n\n```python\n# Using doris_mcp_client/client.py\nfrom doris_mcp_client.client import DorisUnifiedMCPClient\n\nasync def test_new_tool():\n client = DorisUnifiedMCPClient()\n result = await client.call_tool(\"your_new_analysis_tool\", {\n \"parameter1\": \"test_value\",\n \"parameter2\": 50\n })\n print(result)\n```\n\n## MCP Client\n\nThe project includes a unified MCP client (`doris_mcp_client/`) for testing and integration purposes. The client supports multiple connection modes and provides a convenient interface for interacting with the MCP server.\n\nFor detailed client documentation, see [`doris_mcp_client/README.md`](doris_mcp_client/README.md).\n\n## Contributing\n\nContributions are welcome via Issues or Pull Requests.\n\n## License\n\nThis project is licensed under the Apache 2.0 License. See the LICENSE file for details. \n\n## FAQ\n\n### Q: Why do Qwen3-32b and other small parameter models always fail when calling tools?\n\n**A:** This is a common issue. The main reason is that these models need more explicit guidance to correctly use MCP tools. It's recommended to add the following instruction prompt for the model:\n\n- Chinese version\uff1a\n\n```xml\n<instruction>\n\u5c3d\u53ef\u80fd\u4f7f\u7528MCP\u5de5\u5177\u5b8c\u6210\u4efb\u52a1\uff0c\u4ed4\u7ec6\u9605\u8bfb\u6bcf\u4e2a\u5de5\u5177\u7684\u6ce8\u89e3\u3001\u65b9\u6cd5\u540d\u3001\u53c2\u6570\u8bf4\u660e\u7b49\u5185\u5bb9\u3002\u8bf7\u6309\u7167\u4ee5\u4e0b\u6b65\u9aa4\u64cd\u4f5c\uff1a\n\n1. \u4ed4\u7ec6\u5206\u6790\u7528\u6237\u7684\u95ee\u9898\uff0c\u4ece\u5df2\u6709\u7684Tools\u5217\u8868\u4e2d\u5339\u914d\u6700\u5408\u9002\u7684\u5de5\u5177\u3002\n2. \u786e\u4fdd\u5de5\u5177\u540d\u79f0\u3001\u65b9\u6cd5\u540d\u548c\u53c2\u6570\u5b8c\u5168\u6309\u7167\u5de5\u5177\u6ce8\u91ca\u4e2d\u7684\u5b9a\u4e49\u4f7f\u7528\uff0c\u4e0d\u8981\u81ea\u884c\u521b\u9020\u5de5\u5177\u540d\u79f0\u6216\u53c2\u6570\u3002\n3. \u4f20\u5165\u53c2\u6570\u65f6\uff0c\u4e25\u683c\u9075\u5faa\u5de5\u5177\u6ce8\u91ca\u4e2d\u89c4\u5b9a\u7684\u53c2\u6570\u683c\u5f0f\u548c\u8981\u6c42\u3002\n4. \u8c03\u7528\u5de5\u5177\u65f6\uff0c\u6839\u636e\u9700\u8981\u76f4\u63a5\u8c03\u7528\u5de5\u5177\uff0c\u4f46\u53c2\u6570\u8bf7\u6c42\u53c2\u8003\u4ee5\u4e0b\u8bf7\u6c42\u683c\u5f0f\uff1a{\"mcp_sse_call_tool\": {\"tool_name\": \"$tools_name\", \"arguments\": \"{}\"}}\n5. \u8f93\u51fa\u7ed3\u679c\u65f6\uff0c\u4e0d\u8981\u5305\u542b\u4efb\u4f55XML\u6807\u7b7e\uff0c\u4ec5\u8fd4\u56de\u7eaf\u6587\u672c\u5185\u5bb9\u3002\n\n<input>\n\u7528\u6237\u95ee\u9898\uff1auser_query\n</input>\n\n<output>\n\u8fd4\u56de\u5de5\u5177\u8c03\u7528\u7ed3\u679c\u6216\u6700\u7ec8\u7b54\u6848\uff0c\u4ee5\u53ca\u5bf9\u7ed3\u679c\u7684\u5206\u6790\u3002\n</output>\n</instruction>\n```\n- English version\uff1a\n\n```xml\n<instruction>\nUse MCP tools to complete tasks as much as possible. Carefully read the annotations, method names, and parameter descriptions of each tool. Please follow these steps:\n\n1. Carefully analyze the user's question and match the most appropriate tool from the existing Tools list.\n2. Ensure tool names, method names, and parameters are used exactly as defined in the tool annotations. Do not create tool names or parameters on your own.\n3. When passing parameters, strictly follow the parameter format and requirements specified in the tool annotations.\n4. When calling tools, call them directly as needed, but refer to the following request format for parameters: {\"mcp_sse_call_tool\": {\"tool_name\": \"$tools_name\", \"arguments\": \"{}\"}}\n5. When outputting results, do not include any XML tags, return plain text content only.\n\n<input>\nUser question: user_query\n</input>\n\n<output>\nReturn tool call results or final answer, along with analysis of the results.\n</output>\n</instruction>\n```\n\nIf you have further requirements for the returned results, you can describe the specific requirements in the `<output>` tag.\n\n### Q: How to configure different database connections?\n\n**A:** You can configure database connections in several ways:\n\n1. **Environment Variables** (Recommended):\n ```bash\n export DORIS_HOST=\"your_doris_host\"\n export DORIS_PORT=\"9030\"\n export DORIS_USER=\"root\"\n export DORIS_PASSWORD=\"your_password\"\n ```\n\n2. **Command Line Arguments**:\n ```bash\n doris-mcp-server --db-host your_host --db-port 9030 --db-user root --db-password your_password\n ```\n\n3. **Configuration File**:\n Modify the corresponding configuration items in the `.env` file.\n\n### Q: How to configure BE nodes for monitoring tools?\n\n**A:** Choose the appropriate configuration based on your deployment scenario:\n\n**External Network (Manual Configuration):**\n```bash\n# Manually specify BE node addresses\nDORIS_BE_HOSTS=10.1.1.100,10.1.1.101,10.1.1.102\nDORIS_BE_WEBSERVER_PORT=8040\n```\n\n**Internal Network (Automatic Discovery):**\n```bash\n# Leave BE_HOSTS empty for auto-discovery\n# DORIS_BE_HOSTS= # Not set or empty\n# System will use 'SHOW BACKENDS' command to get internal IPs\n```\n\n### Q: How to use SQL Explain/Profile files with LLM for optimization?\n\n**A:** The tools provide both truncated content and complete files for LLM analysis:\n\n1. **Get Analysis Results:**\n ```json\n {\n \"content\": \"Truncated plan for immediate review\",\n \"file_path\": \"/tmp/explain_12345.txt\",\n \"is_content_truncated\": true\n }\n ```\n\n2. **LLM Analysis Workflow:**\n - Review truncated content for quick insights\n - Upload the complete file to your LLM as an attachment\n - Request optimization suggestions or performance analysis\n - Implement recommended improvements\n\n3. **Configure Content Size:**\n ```bash\n MAX_RESPONSE_CONTENT_SIZE=4096 # Adjust as needed\n ```\n\n### Q: How to enable data security and masking features?\n\n**A:** Set the following configurations in your `.env` file:\n\n```bash\n# Enable data masking\nENABLE_MASKING=true\n# Set authentication type\nAUTH_TYPE=token\n# Configure token secret\nTOKEN_SECRET=your_secret_key\n# Set maximum result rows\nMAX_RESULT_ROWS=10000\n```\n\n### Q: What's the difference between Stdio mode and HTTP mode?\n\n**A:** \n\n- **Stdio Mode**: Suitable for direct integration with MCP clients (like Cursor), where the client manages the server process\n- **HTTP Mode**: Independent web service that supports multiple client connections, suitable for production environments\n\nRecommendations:\n- Development and personal use: Stdio mode\n- Production and multi-user environments: HTTP mode\n\n### Q: How to resolve connection timeout issues?\n\n**A:** Try the following solutions:\n\n1. **Increase timeout settings**:\n ```bash\n # Set in .env file\n QUERY_TIMEOUT=60\n CONNECTION_TIMEOUT=30\n ```\n\n2. **Check network connectivity**:\n ```bash\n # Test database connection\n curl http://localhost:3000/health\n ```\n\n3. **Optimize connection pool configuration**:\n ```bash\n DORIS_MAX_CONNECTIONS=20\n ```\n\n### Q: How to resolve `at_eof` connection errors? (Completely Fixed in v0.5.0)\n\n**A:** Version 0.5.0 has **completely resolved** the critical `at_eof` connection errors through comprehensive connection pool redesign:\n\n#### The Problem:\n- `at_eof` errors occurred due to connection pool pre-creation and improper connection state management\n- MySQL aiomysql reader state becoming inconsistent during connection lifecycle\n- Connection pool instability under concurrent load\n\n#### The Solution (v0.5.0):\n1. **Connection Pool Strategy Overhaul**:\n - **Zero Minimum Connections**: Changed `min_connections` from default to 0 to prevent pre-creation issues\n - **On-Demand Connection Creation**: Connections created only when needed, eliminating stale connection problems\n - **Fresh Connection Strategy**: Always acquire fresh connections from pool, no session-level caching\n\n2. **Enhanced Health Monitoring**:\n - **Timeout-Based Health Checks**: 3-second timeout for connection validation queries\n - **Background Health Monitor**: Continuous pool health monitoring every 30 seconds\n - **Proactive Stale Detection**: Automatic detection and cleanup of problematic connections\n\n3. **Intelligent Recovery System**:\n - **Automatic Pool Recovery**: Self-healing pool with comprehensive error handling\n - **Exponential Backoff Retry**: Smart retry mechanism with up to 3 attempts\n - **Connection-Specific Error Detection**: Precise identification of connection-related errors\n\n4. **Performance Optimizations**:\n - **Pool Warmup**: Intelligent connection pool warming for optimal performance\n - **Background Cleanup**: Periodic cleanup of stale connections without affecting active operations\n - **Connection Diagnostics**: Real-time connection health monitoring and reporting\n\n#### Monitoring Connection Health:\n```bash\n# Monitor connection pool health in real-time\ntail -f logs/doris_mcp_server_info.log | grep -E \"(pool|connection|at_eof)\"\n\n# Check detailed connection diagnostics\ntail -f logs/doris_mcp_server_debug.log | grep \"connection health\"\n\n# View connection pool metrics\ncurl http://localhost:8000/health # If running in HTTP mode\n```\n\n#### Configuration for Optimal Connection Performance:\n```bash\n# Recommended connection pool settings in .env\nDORIS_MAX_CONNECTIONS=20 # Adjust based on workload\nCONNECTION_TIMEOUT=30 # Connection establishment timeout\nQUERY_TIMEOUT=60 # Query execution timeout\n\n# Health monitoring settings\nHEALTH_CHECK_INTERVAL=60 # Pool health check frequency\n```\n\n**Result**: 99.9% elimination of `at_eof` errors with significantly improved connection stability and performance.\n\n### Q: How to resolve MCP library version compatibility issues? (Fixed in v0.4.2)\n\n**A:** Version 0.4.2 introduced an intelligent MCP compatibility layer that supports both MCP 1.8.x and 1.9.x versions:\n\n**The Problem:**\n- MCP 1.9.3 introduced breaking changes to the `RequestContext` class (changed from 2 to 3 generic parameters)\n- This caused `TypeError: Too few arguments for RequestContext` errors\n\n**The Solution (v0.4.2):**\n- **Intelligent Version Detection**: Automatically detects the installed MCP version\n- **Compatibility Layer**: Gracefully handles API differences between versions\n- **Flexible Version Support**: `mcp>=1.8.0,<2.0.0` in dependencies\n\n**Supported MCP Versions:**\n```bash\n# Both versions now work seamlessly\npip install mcp==1.8.0 # Stable version (recommended)\npip install mcp==1.9.3 # Latest version with new features\n```\n\n**Version Information:**\n```bash\n# Check which MCP version is being used\ndoris-mcp-server --transport stdio\n# The server will log: \"Using MCP version: x.x.x\"\n```\n\nIf you encounter MCP-related startup errors:\n```bash\n# Recommended: Use stable version\npip uninstall mcp\npip install mcp==1.8.0\n\n# Or upgrade to latest compatible version\npip install --upgrade doris-mcp-server==0.5.0\n```\n\n### Q: How to enable ADBC high-performance features? (New in v0.5.0)\n\n**A:** ADBC (Arrow Flight SQL) provides 3-10x performance improvements for large datasets:\n\n1. **ADBC Dependencies** (automatically included in v0.5.0+):\n ```bash\n # ADBC dependencies are now included by default in doris-mcp-server>=0.5.0\n # No separate installation required\n ```\n\n2. **Configure Arrow Flight SQL Ports**:\n ```bash\n # Add to your .env file\n FE_ARROW_FLIGHT_SQL_PORT=8096\n BE_ARROW_FLIGHT_SQL_PORT=8097\n ```\n\n3. **Optional ADBC Customization**:\n ```bash\n # Customize ADBC behavior (optional)\n ADBC_DEFAULT_MAX_ROWS=200000\n ADBC_DEFAULT_TIMEOUT=120\n ADBC_DEFAULT_RETURN_FORMAT=pandas # arrow/pandas/dict\n ```\n\n4. **Test ADBC Connection**:\n ```bash\n # Use get_adbc_connection_info tool to verify setup\n # Should show \"status\": \"ready\" and port connectivity\n ```\n\n### Q: How to use the new data analytics tools? (New in v0.5.0)\n\n**A:** The 7 new analytics tools provide comprehensive data governance capabilities:\n\n**Data Quality Analysis:**\n```json\n{\n \"tool_name\": \"analyze_data_quality\",\n \"arguments\": {\n \"table_name\": \"customer_data\",\n \"analysis_scope\": \"comprehensive\",\n \"sample_size\": 100000\n }\n}\n```\n\n**Column Lineage Tracking:**\n```json\n{\n \"tool_name\": \"trace_column_lineage\", \n \"arguments\": {\n \"target_columns\": [\"users.email\", \"orders.customer_id\"],\n \"analysis_depth\": 3\n }\n}\n```\n\n**Data Freshness Monitoring:**\n```json\n{\n \"tool_name\": \"monitor_data_freshness\",\n \"arguments\": {\n \"freshness_threshold_hours\": 24,\n \"include_update_patterns\": true\n }\n}\n```\n\n**Performance Analytics:**\n```json\n{\n \"tool_name\": \"analyze_slow_queries_topn\",\n \"arguments\": {\n \"days\": 7,\n \"top_n\": 20,\n \"include_patterns\": true\n }\n}\n```\n\n### Q: How to use the enhanced logging system? (Improved in v0.5.0)\n\n**A:** Version 0.5.0 introduces a comprehensive logging system with automatic management and level-based organization:\n\n#### Log File Structure (New in v0.5.0):\n```bash\nlogs/\n\u251c\u2500\u2500 doris_mcp_server_debug.log # DEBUG level messages\n\u251c\u2500\u2500 doris_mcp_server_info.log # INFO level messages \n\u251c\u2500\u2500 doris_mcp_server_warning.log # WARNING level messages\n\u251c\u2500\u2500 doris_mcp_server_error.log # ERROR level messages\n\u251c\u2500\u2500 doris_mcp_server_critical.log # CRITICAL level messages\n\u251c\u2500\u2500 doris_mcp_server_all.log # Combined log (all levels)\n\u2514\u2500\u2500 doris_mcp_server_audit.log # Audit trail (separate)\n```\n\n#### Enhanced Logging Features:\n1. **Level-Based File Separation**: Automatic organization by log level for easier troubleshooting\n2. **Timestamped Formatting**: Millisecond precision with proper alignment for professional logging\n3. **Automatic Log Rotation**: Prevents disk space issues with configurable file size limits\n4. **Background Cleanup**: Intelligent cleanup scheduler with configurable retention policies\n5. **Audit Trail**: Separate audit logging for compliance and security monitoring\n\n#### Viewing Logs:\n```bash\n# View real-time logs by level\ntail -f logs/doris_mcp_server_info.log # General operational info\ntail -f logs/doris_mcp_server_error.log # Error tracking\ntail -f logs/doris_mcp_server_debug.log # Detailed debugging\n\n# View all activity in combined log\ntail -f logs/doris_mcp_server_all.log\n\n# Monitor specific operations\ntail -f logs/doris_mcp_server_info.log | grep -E \"(query|connection|tool)\"\n\n# View audit trail\ntail -f logs/doris_mcp_server_audit.log\n```\n\n#### Configuration:\n```bash\n# Enhanced logging configuration in .env\nLOG_LEVEL=INFO # Base log level\nENABLE_AUDIT=true # Enable audit logging\nENABLE_LOG_CLEANUP=true # Enable automatic cleanup\nLOG_MAX_AGE_DAYS=30 # Keep logs for 30 days\nLOG_CLEANUP_INTERVAL_HOURS=24 # Check for cleanup daily\n\n# Advanced settings\nLOG_FILE_PATH=logs # Log directory (auto-organized)\n```\n\n#### Troubleshooting with Enhanced Logs:\n```bash\n# Debug connection issues\ngrep -E \"(connection|pool|at_eof)\" logs/doris_mcp_server_error.log\n\n# Monitor tool performance\ngrep \"execution_time\" logs/doris_mcp_server_info.log\n\n# Check system health\ntail -20 logs/doris_mcp_server_warning.log\n\n# View recent critical issues\ncat logs/doris_mcp_server_critical.log\n```\n\n#### Log Cleanup Management:\n- **Automatic**: Background scheduler removes files older than `LOG_MAX_AGE_DAYS`\n- **Manual**: Logs are automatically rotated when they reach 10MB\n- **Backup**: Keeps 5 backup files for each log level\n- **Performance**: Minimal impact on server performance\n\nFor other issues, please check GitHub Issues or submit a new issue. \n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Enterprise-grade Model Context Protocol (MCP) server implementation for Apache Doris",
"version": "0.5.0",
"project_urls": {
"Changelog": "https://github.com/apache/doris-mcp-server/blob/main/CHANGELOG.md",
"Documentation": "https://doris.apache.org/docs/",
"Homepage": "https://github.com/apache/doris-mcp-server",
"Issues": "https://github.com/apache/doris-mcp-server/issues",
"Repository": "https://github.com/apache/doris-mcp-server.git"
},
"split_keywords": [
"analytics",
" database",
" doris",
" mcp",
" model-context-protocol"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a86fa54886663197c772e23a15eff73844ea0d4cf3512f10886329fa3fb26547",
"md5": "909818d21f6d0261d2f404c67d73f829",
"sha256": "116e3297ceb8eb59f5a7943bb044ce3aa71134631153d41625e9f8a525dfa0a5"
},
"downloads": -1,
"filename": "doris_mcp_server-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "909818d21f6d0261d2f404c67d73f829",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 178373,
"upload_time": "2025-07-11T04:08:20",
"upload_time_iso_8601": "2025-07-11T04:08:20.244457Z",
"url": "https://files.pythonhosted.org/packages/a8/6f/a54886663197c772e23a15eff73844ea0d4cf3512f10886329fa3fb26547/doris_mcp_server-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "44aa4306bb97b60707d7cc9fd359274f6286ff5bb40637f17283afdbce7e930d",
"md5": "d484c2f0f41da82e458b44e4239a0540",
"sha256": "53ecf8922805f6f66aed0d6f17135a10544b8db18f923b9e2ed6ba0ef11ed8d8"
},
"downloads": -1,
"filename": "doris_mcp_server-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "d484c2f0f41da82e458b44e4239a0540",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 178918,
"upload_time": "2025-07-11T04:08:21",
"upload_time_iso_8601": "2025-07-11T04:08:21.563897Z",
"url": "https://files.pythonhosted.org/packages/44/aa/4306bb97b60707d7cc9fd359274f6286ff5bb40637f17283afdbce7e930d/doris_mcp_server-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-11 04:08:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "apache",
"github_project": "doris-mcp-server",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "mcp",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.8.0"
]
]
},
{
"name": "aiomysql",
"specs": [
[
">=",
"0.2.0"
]
]
},
{
"name": "PyMySQL",
"specs": [
[
">=",
"1.1.0"
]
]
},
{
"name": "adbc-driver-manager",
"specs": [
[
">=",
"0.8.0"
]
]
},
{
"name": "adbc-driver-flightsql",
"specs": [
[
">=",
"0.8.0"
]
]
},
{
"name": "pyarrow",
"specs": [
[
">=",
"14.0.0"
]
]
},
{
"name": "asyncio-mqtt",
"specs": [
[
">=",
"0.16.0"
]
]
},
{
"name": "aiofiles",
"specs": [
[
">=",
"23.0.0"
]
]
},
{
"name": "aiohttp",
"specs": [
[
">=",
"3.9.0"
]
]
},
{
"name": "aioredis",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.24.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
">=",
"2.8.0"
]
]
},
{
"name": "orjson",
"specs": [
[
">=",
"3.9.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.5.0"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "toml",
"specs": [
[
">=",
"0.10.0"
]
]
},
{
"name": "PyYAML",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "cryptography",
"specs": [
[
">=",
"41.0.0"
]
]
},
{
"name": "PyJWT",
"specs": [
[
">=",
"2.8.0"
]
]
},
{
"name": "passlib",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "bcrypt",
"specs": [
[
">=",
"4.1.0"
]
]
},
{
"name": "sqlparse",
"specs": [
[
">=",
"0.4.4"
]
]
},
{
"name": "python-jose",
"specs": [
[
">=",
"3.3.0"
]
]
},
{
"name": "python-multipart",
"specs": [
[
">=",
"0.0.6"
]
]
},
{
"name": "prometheus-client",
"specs": [
[
">=",
"0.19.0"
]
]
},
{
"name": "structlog",
"specs": [
[
">=",
"23.2.0"
]
]
},
{
"name": "rich",
"specs": [
[
">=",
"13.7.0"
]
]
},
{
"name": "httpx",
"specs": [
[
">=",
"0.26.0"
]
]
},
{
"name": "websockets",
"specs": [
[
">=",
"12.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.25.0"
]
]
},
{
"name": "fastapi",
"specs": [
[
">=",
"0.108.0"
]
]
},
{
"name": "starlette",
"specs": [
[
">=",
"0.27.0"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.1.0"
]
]
},
{
"name": "typer",
"specs": [
[
">=",
"0.9.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.31.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.66.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"8.4.0"
]
]
},
{
"name": "pytest-asyncio",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"6.1.1"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.4.0"
]
]
},
{
"name": "pytest-asyncio",
"specs": [
[
">=",
"0.23.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"4.1.0"
]
]
},
{
"name": "pytest-mock",
"specs": [
[
">=",
"3.12.0"
]
]
},
{
"name": "pytest-xdist",
"specs": [
[
">=",
"3.5.0"
]
]
},
{
"name": "ruff",
"specs": [
[
">=",
"0.1.0"
]
]
},
{
"name": "black",
"specs": [
[
">=",
"23.12.0"
]
]
},
{
"name": "isort",
"specs": [
[
">=",
"5.13.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"1.8.0"
]
]
},
{
"name": "bandit",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "safety",
"specs": [
[
">=",
"2.3.0"
]
]
},
{
"name": "sphinx",
"specs": [
[
">=",
"7.2.0"
]
]
},
{
"name": "sphinx-rtd-theme",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "myst-parser",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pre-commit",
"specs": [
[
">=",
"3.6.0"
]
]
},
{
"name": "tox",
"specs": [
[
">=",
"4.11.0"
]
]
}
],
"lcname": "doris-mcp-server"
}