mcp-apache-spark-history-server

Name	mcp-apache-spark-history-server JSON
Version	0.0.1rc1 JSON
	download
home_page	None
Summary	Model Context Protocol (MCP) server for Apache Spark History Server with job comparison and analytics
upload_time	2025-07-25 23:05:11
maintainer	None
docs_url	None
author	None
requires_python	>=3.12
license	Apache-2.0
keywords	analytics history-server mcp model-context-protocol performance spark
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # MCP Server for Apache Spark History Server

[![CI](https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server/actions)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![MCP](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

> **🤖 Connect AI agents to Apache Spark History Server for intelligent job analysis and performance monitoring**

Transform your Spark infrastructure monitoring with AI! This Model Context Protocol (MCP) server enables AI agents to analyze job performance, identify bottlenecks, and provide intelligent insights from your Spark History Server data.

## 🎯 What is This?

**Spark History Server MCP** bridges AI agents with your existing Apache Spark infrastructure, enabling:

- 🔍 **Query job details** through natural language
- 📊 **Analyze performance metrics** across applications
- 🔄 **Compare multiple jobs** to identify regressions
- 🚨 **Investigate failures** with detailed error analysis
- 📈 **Generate insights** from historical execution data

📺 **See it in action:**

[![Watch the demo video](https://img.shields.io/badge/YouTube-Watch%20Demo-red?style=for-the-badge&logo=youtube)](https://www.youtube.com/watch?v=e3P_2_RiUHw)


## 🏗️ Architecture

```mermaid
graph TB
    A[🤖 AI Agent/LLM] --> F[📡 MCP Client]
    B[🦙 LlamaIndex Agent] --> F
    C[🌐 LangGraph] --> F
    D[�️ Claudep Desktop] --> F
    E[🛠️ Amazon Q CLI] --> F

    F --> G[⚡ Spark History MCP Server]

    G --> H[🔥 Prod Spark History Server]
    G --> I[🔥 Staging Spark History Server]
    G --> J[🔥 Dev Spark History Server]

    H --> K[📄 Prod Event Logs]
    I --> L[📄 Staging Event Logs]
    J --> M[📄 Dev Event Logs]
```

**🔗 Components:**
- **🔥 Spark History Server**: Your existing infrastructure serving Spark event data
- **⚡ MCP Server**: This project - provides MCP tools for querying Spark data
- **🤖 AI Agents**: LangChain, custom agents, or any MCP-compatible client

## ⚡ Quick Start

### 📋 Prerequisites
- 🔥 Existing Spark History Server (running and accessible)
- 🐍 Python 3.12+
- ⚡ [uv](https://docs.astral.sh/uv/getting-started/installation/) package manager

### 🚀 Setup & Testing
```bash
git clone https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server

# Install Task (if not already installed)
brew install go-task  # macOS, see https://taskfile.dev/installation/ for others

# Setup and start testing
task install                    # Install dependencies
task start-spark-bg            # Start Spark History Server with sample data (default Spark 3.5.5)
# Or specify a different Spark version:
# task start-spark-bg spark_version=3.5.2
task start-mcp-bg             # Start MCP Server

# Optional: Opens MCP Inspector on http://localhost:6274 for interactive testing
# Requires Node.js: 22.7.5+ (Check https://github.com/modelcontextprotocol/inspector for latest requirements)
task start-inspector-bg       # Start MCP Inspector

# When done, run `task stop-all`
```

### 📊 Sample Data
The repository includes real Spark event logs for testing:
- `spark-bcec39f6201b42b9925124595baad260` - ✅ Successful ETL job
- `spark-110be3a8424d4a2789cb88134418217b` - 🔄 Data processing job
- `spark-cc4d115f011443d787f03a71a476a745` - 📈 Multi-stage analytics job

See **[TESTING.md](TESTING.md)** for using them.

### ⚙️ Server Configuration
Edit `config.yaml` for your Spark History Server:
```yaml
servers:
  local:
    default: true
    url: "http://your-spark-history-server:18080"
    auth:  # optional
      username: "user"
      password: "pass"
mcp:
  transports:
    - streamable-http # streamable-http or stdio.
  port: "18888"
  debug: true
```

## 📸 Screenshots

### 🔍 Get Spark Application
![Get Application](screenshots/get-application.png)

### ⚡ Job Performance Comparison
![Job Comparison](screenshots/job-compare.png)


## 🛠️ Available Tools

> **Note**: These tools are subject to change as we scale and improve the performance of the MCP server.

The MCP server provides **17 specialized tools** organized by analysis patterns. LLMs can intelligently select and combine these tools based on user queries:

### 📊 Application Information
*Basic application metadata and overview*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `get_application` | 📊 Get detailed information about a specific Spark application including status, resource usage, duration, and attempt details |

### 🔗 Job Analysis
*Job-level performance analysis and identification*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `list_jobs` | 🔗 Get a list of all jobs for a Spark application with optional status filtering |
| `list_slowest_jobs` | ⏱️ Get the N slowest jobs for a Spark application (excludes running jobs by default) |

### ⚡ Stage Analysis
*Stage-level performance deep dive and task metrics*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `list_stages` | ⚡ Get a list of all stages for a Spark application with optional status filtering and summaries |
| `list_slowest_stages` | 🐌 Get the N slowest stages for a Spark application (excludes running stages by default) |
| `get_stage` | 🎯 Get information about a specific stage with optional attempt ID and summary metrics |
| `get_stage_task_summary` | 📊 Get statistical distributions of task metrics for a specific stage (execution times, memory usage, I/O metrics) |

### 🖥️ Executor & Resource Analysis
*Resource utilization, executor performance, and allocation tracking*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `list_executors` | 🖥️ Get executor information with optional inactive executor inclusion |
| `get_executor` | 🔍 Get information about a specific executor including resource allocation, task statistics, and performance metrics |
| `get_executor_summary` | 📈 Aggregates metrics across all executors (memory usage, disk usage, task counts, performance metrics) |
| `get_resource_usage_timeline` | 📅 Get chronological view of resource allocation and usage patterns including executor additions/removals |

### ⚙️ Configuration & Environment
*Spark configuration, environment variables, and runtime settings*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `get_environment` | ⚙️ Get comprehensive Spark runtime configuration including JVM info, Spark properties, system properties, and classpath |

### 🔎 SQL & Query Analysis
*SQL performance analysis and execution plan comparison*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `list_slowest_sql_queries` | 🐌 Get the top N slowest SQL queries for an application with detailed execution metrics |
| `compare_sql_execution_plans` | 🔍 Compare SQL execution plans between two Spark jobs, analyzing logical/physical plans and execution metrics |

### 🚨 Performance & Bottleneck Analysis
*Intelligent bottleneck identification and performance recommendations*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `get_job_bottlenecks` | 🚨 Identify performance bottlenecks by analyzing stages, tasks, and executors with actionable recommendations |

### 🔄 Comparative Analysis
*Cross-application comparison for regression detection and optimization*
| 🔧 Tool | 📝 Description |
|---------|----------------|
| `compare_job_environments` | ⚙️ Compare Spark environment configurations between two jobs to identify differences in properties and settings |
| `compare_job_performance` | 📈 Compare performance metrics between two Spark jobs including execution times, resource usage, and task distribution |

### 🤖 How LLMs Use These Tools

**Query Pattern Examples:**
- *"Why is my job slow?"* → `get_job_bottlenecks` + `list_slowest_stages` + `get_executor_summary`
- *"Compare today vs yesterday"* → `compare_job_performance` + `compare_job_environments`
- *"What's wrong with stage 5?"* → `get_stage` + `get_stage_task_summary`
- *"Show me resource usage over time"* → `get_resource_usage_timeline` + `get_executor_summary`
- *"Find my slowest SQL queries"* → `list_slowest_sql_queries` + `compare_sql_execution_plans`

## 📔 AWS Integration Guides

If you are an existing AWS user looking to analyze your Spark Applications, we provide detailed setup guides for:

- **[AWS Glue Users](examples/aws/glue/README.md)** - Connect to Glue Spark History Server
- **[Amazon EMR Users](examples/aws/emr/README.md)** - Use EMR Persistent UI for Spark analysis

These guides provide step-by-step instructions for setting up the Spark History Server MCP with your AWS services.

## 🚀 Kubernetes Deployment

Deploy using Kubernetes with Helm:

> ⚠️ **Work in Progress**: We are still testing and will soon publish the container image and Helm registry to GitHub for easy deployment.

```bash
# 📦 Deploy with Helm
helm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/

# 🎯 Production configuration
helm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/ \
  --set replicaCount=3 \
  --set autoscaling.enabled=true \
  --set monitoring.enabled=true
```

📚 See [`deploy/kubernetes/helm/`](deploy/kubernetes/helm/) for complete deployment manifests and configuration options.

## 🌐 Multi-Spark History Server Setup
Setup multiple Spark history servers in the config.yaml and choose which server you want the LLM to interact with for each query.

```yaml
servers:
  production:
    default: true
    url: "http://prod-spark-history:18080"
    auth:
      username: "user"
      password: "pass"
  staging:
    url: "http://staging-spark-history:18080"
```

💁 User Query: "Can you get application <app_id> using production server?"

🤖 AI Tool Request:
```json
{
  "app_id": "<app_id>",
  "server": "production"
}
```
🤖 AI Tool Response:
```json
{
  "id": "<app_id>>",
  "name": "app_name",
  "coresGranted": null,
  "maxCores": null,
  "coresPerExecutor": null,
  "memoryPerExecutorMB": null,
  "attempts": [
    {
      "attemptId": null,
      "startTime": "2023-09-06T04:44:37.006000Z",
      "endTime": "2023-09-06T04:45:40.431000Z",
      "lastUpdated": "2023-09-06T04:45:42Z",
      "duration": 63425,
      "sparkUser": "spark",
      "appSparkVersion": "3.3.0",
      "completed": true
    }
  ]
}
```

### 🔐 Environment Variables
```bash
SHS_SPARK_USERNAME=your_username
SHS_SPARK_PASSWORD=your_password
SHS_SPARK_TOKEN=your_jwt_token
SHS_MCP_PORT=18888
SHS_MCP_DEBUG=false
SHS_MCP_ADDRESS=0.0.0.0
```

## 🤖 AI Agent Integration

### Quick Start Options

| Integration | Transport | Best For |
|-------------|-----------|----------|
| **[Local Testing](TESTING.md)** | HTTP | Development, testing tools |
| **[Claude Desktop](examples/integrations/claude-desktop/)** | STDIO | Interactive analysis |
| **[Amazon Q CLI](examples/integrations/amazon-q-cli/)** | STDIO | Command-line automation |
| **[Kiro](examples/integrations/kiro/)** | HTTP | IDE integration, code-centric analysis |
| **[LangGraph](examples/integrations/langgraph/)** | HTTP | Multi-agent workflows |
| **[Strands Agents](examples/integrations/strands-agents/)** | HTTP | Multi-agent workflows |

## 🎯 Example Use Cases

### 🔍 Performance Investigation
```
🤖 AI Query: "Why is my ETL job running slower than usual?"

📊 MCP Actions:
✅ Analyze application metrics
✅ Compare with historical performance
✅ Identify bottleneck stages
✅ Generate optimization recommendations
```

### 🚨 Failure Analysis
```
🤖 AI Query: "What caused job 42 to fail?"

🔍 MCP Actions:
✅ Examine failed tasks and error messages
✅ Review executor logs and resource usage
✅ Identify root cause and suggest fixes
```

### 📈 Comparative Analysis
```
🤖 AI Query: "Compare today's batch job with yesterday's run"

📊 MCP Actions:
✅ Compare execution times and resource usage
✅ Identify performance deltas
✅ Highlight configuration differences
```

## 🤝 Contributing

Check [CONTRIBUTING.md](CONTRIBUTING.md) for full guidelines on contributions

## 📄 License

Apache License 2.0 - see [LICENSE](LICENSE) file for details.


## 📝 Trademark Notice

*This project is built for use with Apache Spark™ History Server. Not affiliated with or endorsed by the Apache Software Foundation.*

---

<div align="center">

**🔥 Connect your Spark infrastructure to AI agents**

[🚀 Get Started](#-quick-start) | [🛠️ View Tools](#%EF%B8%8F-available-tools) | [🧪 Test Now](TESTING.md) | [🤝 Contribute](#-contributing)

*Built by the community, for the community* 💙

</div>

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mcp-apache-spark-history-server",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "analytics, history-server, mcp, model-context-protocol, performance, spark",
    "author": null,
    "author_email": "Manabu McCloskey <Manabu.McCloskey@gmail.com>, Vara Bonthu <vara.bonthu@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/dd/7e/88ee4a0a271b28a6898003ac2bf1f23312d0dd5e50bc689eba06945492b8/mcp_apache_spark_history_server-0.0.1rc1.tar.gz",
    "platform": null,
    "description": "# MCP Server for Apache Spark History Server\n\n[![CI](https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server/actions)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)\n[![MCP](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n> **\ud83e\udd16 Connect AI agents to Apache Spark History Server for intelligent job analysis and performance monitoring**\n\nTransform your Spark infrastructure monitoring with AI! This Model Context Protocol (MCP) server enables AI agents to analyze job performance, identify bottlenecks, and provide intelligent insights from your Spark History Server data.\n\n## \ud83c\udfaf What is This?\n\n**Spark History Server MCP** bridges AI agents with your existing Apache Spark infrastructure, enabling:\n\n- \ud83d\udd0d **Query job details** through natural language\n- \ud83d\udcca **Analyze performance metrics** across applications\n- \ud83d\udd04 **Compare multiple jobs** to identify regressions\n- \ud83d\udea8 **Investigate failures** with detailed error analysis\n- \ud83d\udcc8 **Generate insights** from historical execution data\n\n\ud83d\udcfa **See it in action:**\n\n[![Watch the demo video](https://img.shields.io/badge/YouTube-Watch%20Demo-red?style=for-the-badge&logo=youtube)](https://www.youtube.com/watch?v=e3P_2_RiUHw)\n\n\n## \ud83c\udfd7\ufe0f Architecture\n\n```mermaid\ngraph TB\n    A[\ud83e\udd16 AI Agent/LLM] --> F[\ud83d\udce1 MCP Client]\n    B[\ud83e\udd99 LlamaIndex Agent] --> F\n    C[\ud83c\udf10 LangGraph] --> F\n    D[\ufffd\ufe0f Claudep Desktop] --> F\n    E[\ud83d\udee0\ufe0f Amazon Q CLI] --> F\n\n    F --> G[\u26a1 Spark History MCP Server]\n\n    G --> H[\ud83d\udd25 Prod Spark History Server]\n    G --> I[\ud83d\udd25 Staging Spark History Server]\n    G --> J[\ud83d\udd25 Dev Spark History Server]\n\n    H --> K[\ud83d\udcc4 Prod Event Logs]\n    I --> L[\ud83d\udcc4 Staging Event Logs]\n    J --> M[\ud83d\udcc4 Dev Event Logs]\n```\n\n**\ud83d\udd17 Components:**\n- **\ud83d\udd25 Spark History Server**: Your existing infrastructure serving Spark event data\n- **\u26a1 MCP Server**: This project - provides MCP tools for querying Spark data\n- **\ud83e\udd16 AI Agents**: LangChain, custom agents, or any MCP-compatible client\n\n## \u26a1 Quick Start\n\n### \ud83d\udccb Prerequisites\n- \ud83d\udd25 Existing Spark History Server (running and accessible)\n- \ud83d\udc0d Python 3.12+\n- \u26a1 [uv](https://docs.astral.sh/uv/getting-started/installation/) package manager\n\n### \ud83d\ude80 Setup & Testing\n```bash\ngit clone https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server.git\ncd mcp-apache-spark-history-server\n\n# Install Task (if not already installed)\nbrew install go-task  # macOS, see https://taskfile.dev/installation/ for others\n\n# Setup and start testing\ntask install                    # Install dependencies\ntask start-spark-bg            # Start Spark History Server with sample data (default Spark 3.5.5)\n# Or specify a different Spark version:\n# task start-spark-bg spark_version=3.5.2\ntask start-mcp-bg             # Start MCP Server\n\n# Optional: Opens MCP Inspector on http://localhost:6274 for interactive testing\n# Requires Node.js: 22.7.5+ (Check https://github.com/modelcontextprotocol/inspector for latest requirements)\ntask start-inspector-bg       # Start MCP Inspector\n\n# When done, run `task stop-all`\n```\n\n### \ud83d\udcca Sample Data\nThe repository includes real Spark event logs for testing:\n- `spark-bcec39f6201b42b9925124595baad260` - \u2705 Successful ETL job\n- `spark-110be3a8424d4a2789cb88134418217b` - \ud83d\udd04 Data processing job\n- `spark-cc4d115f011443d787f03a71a476a745` - \ud83d\udcc8 Multi-stage analytics job\n\nSee **[TESTING.md](TESTING.md)** for using them.\n\n### \u2699\ufe0f Server Configuration\nEdit `config.yaml` for your Spark History Server:\n```yaml\nservers:\n  local:\n    default: true\n    url: \"http://your-spark-history-server:18080\"\n    auth:  # optional\n      username: \"user\"\n      password: \"pass\"\nmcp:\n  transports:\n    - streamable-http # streamable-http or stdio.\n  port: \"18888\"\n  debug: true\n```\n\n## \ud83d\udcf8 Screenshots\n\n### \ud83d\udd0d Get Spark Application\n![Get Application](screenshots/get-application.png)\n\n### \u26a1 Job Performance Comparison\n![Job Comparison](screenshots/job-compare.png)\n\n\n## \ud83d\udee0\ufe0f Available Tools\n\n> **Note**: These tools are subject to change as we scale and improve the performance of the MCP server.\n\nThe MCP server provides **17 specialized tools** organized by analysis patterns. LLMs can intelligently select and combine these tools based on user queries:\n\n### \ud83d\udcca Application Information\n*Basic application metadata and overview*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `get_application` | \ud83d\udcca Get detailed information about a specific Spark application including status, resource usage, duration, and attempt details |\n\n### \ud83d\udd17 Job Analysis\n*Job-level performance analysis and identification*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `list_jobs` | \ud83d\udd17 Get a list of all jobs for a Spark application with optional status filtering |\n| `list_slowest_jobs` | \u23f1\ufe0f Get the N slowest jobs for a Spark application (excludes running jobs by default) |\n\n### \u26a1 Stage Analysis\n*Stage-level performance deep dive and task metrics*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `list_stages` | \u26a1 Get a list of all stages for a Spark application with optional status filtering and summaries |\n| `list_slowest_stages` | \ud83d\udc0c Get the N slowest stages for a Spark application (excludes running stages by default) |\n| `get_stage` | \ud83c\udfaf Get information about a specific stage with optional attempt ID and summary metrics |\n| `get_stage_task_summary` | \ud83d\udcca Get statistical distributions of task metrics for a specific stage (execution times, memory usage, I/O metrics) |\n\n### \ud83d\udda5\ufe0f Executor & Resource Analysis\n*Resource utilization, executor performance, and allocation tracking*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `list_executors` | \ud83d\udda5\ufe0f Get executor information with optional inactive executor inclusion |\n| `get_executor` | \ud83d\udd0d Get information about a specific executor including resource allocation, task statistics, and performance metrics |\n| `get_executor_summary` | \ud83d\udcc8 Aggregates metrics across all executors (memory usage, disk usage, task counts, performance metrics) |\n| `get_resource_usage_timeline` | \ud83d\udcc5 Get chronological view of resource allocation and usage patterns including executor additions/removals |\n\n### \u2699\ufe0f Configuration & Environment\n*Spark configuration, environment variables, and runtime settings*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `get_environment` | \u2699\ufe0f Get comprehensive Spark runtime configuration including JVM info, Spark properties, system properties, and classpath |\n\n### \ud83d\udd0e SQL & Query Analysis\n*SQL performance analysis and execution plan comparison*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `list_slowest_sql_queries` | \ud83d\udc0c Get the top N slowest SQL queries for an application with detailed execution metrics |\n| `compare_sql_execution_plans` | \ud83d\udd0d Compare SQL execution plans between two Spark jobs, analyzing logical/physical plans and execution metrics |\n\n### \ud83d\udea8 Performance & Bottleneck Analysis\n*Intelligent bottleneck identification and performance recommendations*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `get_job_bottlenecks` | \ud83d\udea8 Identify performance bottlenecks by analyzing stages, tasks, and executors with actionable recommendations |\n\n### \ud83d\udd04 Comparative Analysis\n*Cross-application comparison for regression detection and optimization*\n| \ud83d\udd27 Tool | \ud83d\udcdd Description |\n|---------|----------------|\n| `compare_job_environments` | \u2699\ufe0f Compare Spark environment configurations between two jobs to identify differences in properties and settings |\n| `compare_job_performance` | \ud83d\udcc8 Compare performance metrics between two Spark jobs including execution times, resource usage, and task distribution |\n\n### \ud83e\udd16 How LLMs Use These Tools\n\n**Query Pattern Examples:**\n- *\"Why is my job slow?\"* \u2192 `get_job_bottlenecks` + `list_slowest_stages` + `get_executor_summary`\n- *\"Compare today vs yesterday\"* \u2192 `compare_job_performance` + `compare_job_environments`\n- *\"What's wrong with stage 5?\"* \u2192 `get_stage` + `get_stage_task_summary`\n- *\"Show me resource usage over time\"* \u2192 `get_resource_usage_timeline` + `get_executor_summary`\n- *\"Find my slowest SQL queries\"* \u2192 `list_slowest_sql_queries` + `compare_sql_execution_plans`\n\n## \ud83d\udcd4 AWS Integration Guides\n\nIf you are an existing AWS user looking to analyze your Spark Applications, we provide detailed setup guides for:\n\n- **[AWS Glue Users](examples/aws/glue/README.md)** - Connect to Glue Spark History Server\n- **[Amazon EMR Users](examples/aws/emr/README.md)** - Use EMR Persistent UI for Spark analysis\n\nThese guides provide step-by-step instructions for setting up the Spark History Server MCP with your AWS services.\n\n## \ud83d\ude80 Kubernetes Deployment\n\nDeploy using Kubernetes with Helm:\n\n> \u26a0\ufe0f **Work in Progress**: We are still testing and will soon publish the container image and Helm registry to GitHub for easy deployment.\n\n```bash\n# \ud83d\udce6 Deploy with Helm\nhelm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/\n\n# \ud83c\udfaf Production configuration\nhelm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/ \\\n  --set replicaCount=3 \\\n  --set autoscaling.enabled=true \\\n  --set monitoring.enabled=true\n```\n\n\ud83d\udcda See [`deploy/kubernetes/helm/`](deploy/kubernetes/helm/) for complete deployment manifests and configuration options.\n\n## \ud83c\udf10 Multi-Spark History Server Setup\nSetup multiple Spark history servers in the config.yaml and choose which server you want the LLM to interact with for each query.\n\n```yaml\nservers:\n  production:\n    default: true\n    url: \"http://prod-spark-history:18080\"\n    auth:\n      username: \"user\"\n      password: \"pass\"\n  staging:\n    url: \"http://staging-spark-history:18080\"\n```\n\n\ud83d\udc81 User Query: \"Can you get application <app_id> using production server?\"\n\n\ud83e\udd16 AI Tool Request:\n```json\n{\n  \"app_id\": \"<app_id>\",\n  \"server\": \"production\"\n}\n```\n\ud83e\udd16 AI Tool Response:\n```json\n{\n  \"id\": \"<app_id>>\",\n  \"name\": \"app_name\",\n  \"coresGranted\": null,\n  \"maxCores\": null,\n  \"coresPerExecutor\": null,\n  \"memoryPerExecutorMB\": null,\n  \"attempts\": [\n    {\n      \"attemptId\": null,\n      \"startTime\": \"2023-09-06T04:44:37.006000Z\",\n      \"endTime\": \"2023-09-06T04:45:40.431000Z\",\n      \"lastUpdated\": \"2023-09-06T04:45:42Z\",\n      \"duration\": 63425,\n      \"sparkUser\": \"spark\",\n      \"appSparkVersion\": \"3.3.0\",\n      \"completed\": true\n    }\n  ]\n}\n```\n\n### \ud83d\udd10 Environment Variables\n```bash\nSHS_SPARK_USERNAME=your_username\nSHS_SPARK_PASSWORD=your_password\nSHS_SPARK_TOKEN=your_jwt_token\nSHS_MCP_PORT=18888\nSHS_MCP_DEBUG=false\nSHS_MCP_ADDRESS=0.0.0.0\n```\n\n## \ud83e\udd16 AI Agent Integration\n\n### Quick Start Options\n\n| Integration | Transport | Best For |\n|-------------|-----------|----------|\n| **[Local Testing](TESTING.md)** | HTTP | Development, testing tools |\n| **[Claude Desktop](examples/integrations/claude-desktop/)** | STDIO | Interactive analysis |\n| **[Amazon Q CLI](examples/integrations/amazon-q-cli/)** | STDIO | Command-line automation |\n| **[Kiro](examples/integrations/kiro/)** | HTTP | IDE integration, code-centric analysis |\n| **[LangGraph](examples/integrations/langgraph/)** | HTTP | Multi-agent workflows |\n| **[Strands Agents](examples/integrations/strands-agents/)** | HTTP | Multi-agent workflows |\n\n## \ud83c\udfaf Example Use Cases\n\n### \ud83d\udd0d Performance Investigation\n```\n\ud83e\udd16 AI Query: \"Why is my ETL job running slower than usual?\"\n\n\ud83d\udcca MCP Actions:\n\u2705 Analyze application metrics\n\u2705 Compare with historical performance\n\u2705 Identify bottleneck stages\n\u2705 Generate optimization recommendations\n```\n\n### \ud83d\udea8 Failure Analysis\n```\n\ud83e\udd16 AI Query: \"What caused job 42 to fail?\"\n\n\ud83d\udd0d MCP Actions:\n\u2705 Examine failed tasks and error messages\n\u2705 Review executor logs and resource usage\n\u2705 Identify root cause and suggest fixes\n```\n\n### \ud83d\udcc8 Comparative Analysis\n```\n\ud83e\udd16 AI Query: \"Compare today's batch job with yesterday's run\"\n\n\ud83d\udcca MCP Actions:\n\u2705 Compare execution times and resource usage\n\u2705 Identify performance deltas\n\u2705 Highlight configuration differences\n```\n\n## \ud83e\udd1d Contributing\n\nCheck [CONTRIBUTING.md](CONTRIBUTING.md) for full guidelines on contributions\n\n## \ud83d\udcc4 License\n\nApache License 2.0 - see [LICENSE](LICENSE) file for details.\n\n\n## \ud83d\udcdd Trademark Notice\n\n*This project is built for use with Apache Spark\u2122 History Server. Not affiliated with or endorsed by the Apache Software Foundation.*\n\n---\n\n<div align=\"center\">\n\n**\ud83d\udd25 Connect your Spark infrastructure to AI agents**\n\n[\ud83d\ude80 Get Started](#-quick-start) | [\ud83d\udee0\ufe0f View Tools](#%EF%B8%8F-available-tools) | [\ud83e\uddea Test Now](TESTING.md) | [\ud83e\udd1d Contribute](#-contributing)\n\n*Built by the community, for the community* \ud83d\udc99\n\n</div>\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Model Context Protocol (MCP) server for Apache Spark History Server with job comparison and analytics",
    "version": "0.0.1rc1",
    "project_urls": {
        "Homepage": "https://github.com/DeepDiagnostix-AI/spark-history-server-mcp",
        "Issues": "https://github.com/DeepDiagnostix-AI/spark-history-server-mcp/issues",
        "Repository": "https://github.com/DeepDiagnostix-AI/spark-history-server-mcp"
    },
    "split_keywords": [
        "analytics",
        " history-server",
        " mcp",
        " model-context-protocol",
        " performance",
        " spark"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d37972c3b8d9e35994d1c4f63e55e7a080ef6b19cbc0c8b21cc6e84e8b468405",
                "md5": "3cfdd1f0d8c1492fdb109be3e9dac082",
                "sha256": "f0ffcc15cd385a6e27555bc45dc4b24c38f25ec69b0d50f34bdbf45927ad53a9"
            },
            "downloads": -1,
            "filename": "mcp_apache_spark_history_server-0.0.1rc1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3cfdd1f0d8c1492fdb109be3e9dac082",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 35516,
            "upload_time": "2025-07-25T23:05:09",
            "upload_time_iso_8601": "2025-07-25T23:05:09.930799Z",
            "url": "https://files.pythonhosted.org/packages/d3/79/72c3b8d9e35994d1c4f63e55e7a080ef6b19cbc0c8b21cc6e84e8b468405/mcp_apache_spark_history_server-0.0.1rc1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dd7e88ee4a0a271b28a6898003ac2bf1f23312d0dd5e50bc689eba06945492b8",
                "md5": "f6e10789314d27961b830ff9fd6ef09a",
                "sha256": "d4e3bf3d821236fd20bc376cd04f4bfcfcc9937027f03cdc8ed25a23fe3cb634"
            },
            "downloads": -1,
            "filename": "mcp_apache_spark_history_server-0.0.1rc1.tar.gz",
            "has_sig": false,
            "md5_digest": "f6e10789314d27961b830ff9fd6ef09a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 32171,
            "upload_time": "2025-07-25T23:05:11",
            "upload_time_iso_8601": "2025-07-25T23:05:11.064782Z",
            "url": "https://files.pythonhosted.org/packages/dd/7e/88ee4a0a271b28a6898003ac2bf1f23312d0dd5e50bc689eba06945492b8/mcp_apache_spark_history_server-0.0.1rc1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-25 23:05:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DeepDiagnostix-AI",
    "github_project": "spark-history-server-mcp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mcp-apache-spark-history-server"
}

None