mcp-ds-toolkit-server

Name	mcp-ds-toolkit-server JSON
Version	0.1.3 JSON
	download
home_page	None
Summary	Pure standalone MCP Data Science Server - Complete DS workflows with local SQLite tracking, no external services required
upload_time	2025-09-14 14:09:42
maintainer	None
docs_url	None
author	None
requires_python	>=3.12
license	MIT
keywords	data-science local-tracking machine-learning mcp scikit-learn sqlite standalone
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # 🚀 MCP DS Toolkit Server

A **standalone** Model Context Protocol (MCP) server that brings complete DS capabilities to AI assistants like Claude Desktop and Cursor. Build, train, and track ML models through natural language - no external services required!

[![MCP Protocol](https://img.shields.io/badge/MCP-Protocol-green?style=for-the-badge)](https://modelcontextprotocol.io)
[![Claude Desktop](https://img.shields.io/badge/Claude-Desktop-blue?style=for-the-badge)](#claude-desktop)
[![Cursor IDE](https://img.shields.io/badge/Cursor-IDE-purple?style=for-the-badge)](#cursor-ide)

## 🎯 **What is MCP DS Toolkit Server?**

MCP DS Toolkit Server enables AI assistants to perform complete data science workflows through natural language. Simply talk to Claude or Cursor about what you want to do, and it handles the entire DS pipeline - from data loading to model training and evaluation.

### **Key Capabilities**

✅ **31 DS Tools** - Complete DS workflow from data loading to model evaluation
✅ **Natural Language Interface** - Just describe what you want in plain English
✅ **Zero Configuration** - Works immediately after installation
✅ **Local SQLite Tracking** - No external databases or cloud services needed
✅ **Cross-Platform** - Works on macOS, Linux, and Windows
✅ **AI Assistant Integration** - Seamless with Claude Desktop and Cursor IDE

## 🌟 Why MCP DS Toolkit?

### 🎯 **Transform Your AI Assistant into a Data Scientist**
- **Natural Language DS**: Just describe what you want - "Load the iris dataset and train a random forest classifier"
- **Complete Automation**: Your AI assistant handles data preprocessing, model training, and evaluation
- **Intelligent Recommendations**: Get suggestions for algorithms, hyperparameters, and preprocessing steps
- **Comprehensive Metrics**: Detailed performance metrics, learning curves, and model comparisons

### 🔬 **Enterprise-Ready Features**
- **Production-Quality Code**: Generated code follows best practices and is deployment-ready
- **Comprehensive Tracking**: Every experiment, model, and metric is automatically tracked
- **Reproducible Workflows**: All operations are logged and can be reproduced
- **Local-First Architecture**: Your data never leaves your machine

### 📊 **Complete Tool Suite**
- **Data Management**: Loading, validation, profiling, cleaning, preprocessing
- **Model Training**: 20+ algorithms from scikit-learn with automatic hyperparameter tuning
- **Experiment Tracking**: SQLite-based tracking with full experiment lineage
- **Performance Analysis**: Learning curves, feature importance, and model comparisons

> **Note**: Cloud storage capabilities (AWS S3, Google Cloud, Azure) are available as optional dependencies but not yet fully implemented. Current version focuses on local storage and processing.

## ⚡ Quick Start

Choose your preferred AI assistant:

### 🤖 Claude Desktop

#### 1. Install the Server (30 seconds)

```bash
# Using uvx (recommended)
uvx mcp-ds-toolkit-server

# Or using pip
pip install mcp-ds-toolkit-server
```

#### 2. Configure Claude Desktop

Add to your Claude Desktop configuration file:

**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
**Linux**: `~/.config/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "ds-toolkit": {
      "command": "uvx",
      "args": ["mcp-ds-toolkit-server"]
    }
  }
}
```

> **Note**: If you get `uvx ENOENT` errors, use the full path to `uvx` instead. Find it with `which uvx` and replace `"command": "uvx"` with `"command": "/full/path/to/uvx"`. See [troubleshooting section](#uvx-command-not-found-enoent-error) for details.

#### 3. Restart Claude Desktop and Test

```
You: Load the iris dataset and train a random forest classifier
Claude: I'll help you load the iris dataset and train a random forest classifier...
```

### 📝 Cursor IDE

#### 1. Install the Server

```bash
# Using uvx (recommended)
uvx mcp-ds-toolkit-server

# Or using pip
pip install mcp-ds-toolkit-server
```

#### 2. Configure Cursor

Create or edit the MCP configuration file:

**Project-specific**: `.cursor/mcp.json` (in your project root)
**Global**: `~/.cursor/mcp.json` (in your home directory)

```json
{
  "mcpServers": {
    "ds-toolkit": {
      "command": "uvx",
      "args": ["mcp-ds-toolkit-server"]
    }
  }
}
```

> **Note**: If you get `uvx ENOENT` errors, use the full path to `uvx` instead. Find it with `which uvx` and replace `"command": "uvx"` with `"command": "/full/path/to/uvx"`. See [troubleshooting section](#uvx-command-not-found-enoent-error) for details.

#### 3. Restart Cursor and Test

Open Cursor's AI chat and try:
```
You: Profile my CSV dataset and show me the correlations
Cursor: I'll analyze your CSV dataset and generate a comprehensive profile...
```

### 🐳 Alternative Installation Methods

#### Development Installation (for contributors)
```bash
git clone https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server
cd mcp-ds-toolkit-server
uv sync

# Configure with local path
{
  "mcpServers": {
    "ds-toolkit": {
      "command": "uv",
      "args": ["--directory", "/path/to/mcp-ds-toolkit-server", "run", "mcp-ds-toolkit-server"]
    }
  }
}
```

## 🛠️ **Complete Tool Reference**

### 📊 **Data Management Tools (15 tools)**

| Tool | Description | Example Usage |
|------|-------------|---------------|
| `load_dataset` | Load data from CSV, JSON, Excel, sklearn datasets | "Load the iris dataset" |
| `validate_dataset` | Check data quality and integrity | "Validate my dataset for missing values" |
| `profile_dataset` | Generate comprehensive statistics | "Profile the dataset and show correlations" |
| `preprocess_dataset` | Apply scaling, encoding, feature selection | "Preprocess data with standard scaling" |
| `clean_dataset` | Handle missing values and outliers | "Clean the dataset and remove outliers" |
| `split_dataset` | Create train/test/validation splits | "Split data 80/20 for training" |
| `list_datasets` | Show all loaded datasets | "What datasets are available?" |
| `get_dataset_info` | Get detailed dataset information | "Show info about the sales dataset" |
| `compare_datasets` | Compare multiple datasets | "Compare train and test distributions" |
| `batch_process_datasets` | Process multiple datasets at once | "Apply same preprocessing to all datasets" |
| `sample_dataset` | Create dataset samples | "Sample 1000 rows from the dataset" |
| `export_dataset` | Export to various formats | "Export cleaned data to CSV" |
| `generate_learning_curve` | Analyze model learning behavior | "Generate learning curves for the model" |
| `remove_dataset` | Remove dataset from memory | "Remove the temporary dataset" |
| `clear_all_data` | Clear all loaded data | "Clear all datasets from memory" |

### 🤖 **Model Training Tools (6 tools)**

| Tool | Description | Example Usage |
|------|-------------|---------------|
| `train_model` | Train ML models with 20+ algorithms | "Train a random forest classifier" |
| `evaluate_model` | Evaluate model performance | "Evaluate the model on test data" |
| `compare_models` | Compare multiple models | "Compare RF, SVM, and XGBoost" |
| `tune_hyperparameters` | Optimize model parameters | "Tune hyperparameters using grid search" |
| `get_model_info` | Get model details and parameters | "Show model configuration" |
| `list_algorithms` | List available algorithms | "What algorithms can I use?" |

### 📈 **Experiment Tracking Tools (10 tools)**

| Tool | Description | Example Usage |
|------|-------------|---------------|
| `create_experiment` | Create new experiment | "Create experiment 'customer_churn_v1'" |
| `start_run` | Start tracking run | "Start a new training run" |
| `log_params` | Log hyperparameters | "Log the model parameters" |
| `log_metrics` | Log performance metrics | "Log accuracy and F1 score" |
| `log_artifact` | Save artifacts (plots, models) | "Save the confusion matrix plot" |
| `end_run` | Complete current run | "End the current run" |
| `list_experiments` | Show all experiments | "List all my experiments" |
| `get_experiment` | Get experiment details | "Show details of the latest experiment" |
| `list_runs` | List experiment runs | "Show all runs for this experiment" |
| `compare_runs` | Compare run metrics | "Compare the last 3 runs" |


## 💬 Example Prompts

### 🎯 **Quick Start Examples**

```yaml
Basic Operations:
  "Load the iris dataset from sklearn"
  "Show me what datasets are currently loaded"
  "Profile my dataset and show key statistics"
  "Train a random forest classifier on the iris data"
  "Evaluate my model and show the confusion matrix"

Data Processing:
  "Load data.csv and check for missing values"
  "Clean the dataset by removing outliers using IQR method"
  "Preprocess the data with standard scaling and one-hot encoding"
  "Split my data into 70% train, 15% validation, 15% test"
  "Handle class imbalance in my dataset"

Model Training:
  "Train multiple models and compare their performance"
  "Perform 5-fold cross-validation on my model"
  "Tune hyperparameters for the random forest using grid search"
  "Show me the feature importance for the trained model"
  "Generate learning curves to check for overfitting"

Experiment Tracking:
  "Create an experiment called 'customer_churn_prediction'"
  "Start tracking this training run"
  "Log the model parameters and performance metrics"
  "Compare the last 5 training runs"
  "Show me all experiments from the past week"
```

### 🔬 **Complete Workflow Example**

Here's a real conversation showing an end-to-end ML pipeline:

```
You: Load the wine dataset from sklearn and give me a comprehensive profile

Claude: I'll load the wine dataset and provide you with a comprehensive profile...
[Loads dataset, generates statistics, correlations, and data quality report]

You: Clean the data and split it for training with 80/20 ratio

Claude: I'll clean the dataset and create train/test splits...
[Handles missing values, removes outliers, splits data]

You: Train and compare Random Forest, SVM, and Gradient Boosting classifiers

Claude: I'll train these three models and compare their performance...
[Trains models, evaluates each, provides comparison table]

You: The Random Forest looks best. Can you tune its hyperparameters?

Claude: I'll optimize the Random Forest hyperparameters using grid search...
[Performs grid search, shows best parameters and improved metrics]

You: Great! Create an experiment called "wine_classification" and log everything

Claude: I'll create the experiment and log all our work...
[Creates experiment, logs parameters, metrics, and artifacts]

You: Show me the learning curves and feature importance

Claude: I'll analyze the model's learning behavior and feature importance...
[Provides learning curve analysis and feature importance rankings]
```

## 🚀 **Supported ML Algorithms**

### Classification Algorithms
- **Tree-Based**: RandomForest, GradientBoosting, ExtraTrees, DecisionTree
- **Linear Models**: LogisticRegression, RidgeClassifier, SGDClassifier
- **Support Vector**: SVC (linear, rbf, poly kernels)
- **Neighbors**: KNeighborsClassifier
- **Naive Bayes**: GaussianNB, MultinomialNB, BernoulliNB

### Regression Algorithms
- **Tree-Based**: RandomForestRegressor, GradientBoostingRegressor, ExtraTreesRegressor
- **Linear Models**: LinearRegression, Ridge, Lasso, ElasticNet
- **Support Vector**: SVR (linear, rbf, poly kernels)
- **Neighbors**: KNeighborsRegressor

## 🏗️ **Architecture**

### How It Works

```mermaid
graph LR
    A[AI Assistant<br/>Claude/Cursor] -->|Natural Language| B[MCP Protocol]
    B --> C[MCP DS Toolkit Server]
    C --> D[Data Tools]
    C --> E[Training Tools]
    C --> F[Tracking Tools]
    D --> G[Local Storage<br/>~/.mcp-ds-toolkit]
    E --> G
    F --> G
```

### Storage Structure

```
~/.mcp-ds-toolkit/
├── experiments.db          # SQLite experiment tracking
├── artifacts/              # Plots, reports, outputs
│   └── {experiment_id}/
│       └── {run_id}/
├── models/                 # Saved ML models
├── datasets/               # Cached datasets
└── cache/                  # Temporary files
```

### Technology Stack

- **Core**: Python 3.12+, MCP Protocol, SQLite
- **ML Framework**: scikit-learn, pandas, numpy
- **Data Processing**: pandas, numpy, scipy
- **No External Dependencies**: Everything runs locally

## 🔧 Troubleshooting

### Common Issues and Solutions

#### Server Not Starting
```bash
# Check Python version (requires 3.12+)
python --version

# Reinstall with verbose output
pip install --verbose mcp-ds-toolkit-server

# Check if the command is available
which mcp-ds-toolkit-server
```

#### uvx Command Not Found (ENOENT Error)
If you see errors like `spawn uvx ENOENT` in Claude Desktop logs, this means `uvx` is not in the system PATH that Claude Desktop can access.

**Solution**: Use the full path to `uvx` in your configuration:

1. **Find your uvx path**:
   ```bash
   which uvx
   # Example output: /Users/username/.pyenv/shims/uvx
   ```

2. **Update your configuration with the full path**:
   ```json
   {
     "mcpServers": {
       "ds-toolkit": {
         "command": "/Users/username/.pyenv/shims/uvx",
         "args": ["mcp-ds-toolkit-server"]
       }
     }
   }
   ```

**Why this happens**: Claude Desktop runs with a limited PATH environment that may not include directories where `uvx` is installed (like `~/.pyenv/shims` for pyenv users, `~/.local/bin`, or other Python tool directories).

#### Claude/Cursor Not Finding Tools
1. **Check configuration file location**:
   - Claude: `~/Library/Application Support/Claude/claude_desktop_config.json`
   - Cursor: `.cursor/mcp.json` or `~/.cursor/mcp.json`

2. **Verify JSON syntax**:
   ```json
   {
     "mcpServers": {
       "ds-toolkit": {
         "command": "uvx",
         "args": ["mcp-ds-toolkit-server"]
       }
     }
   }
   ```

3. **Restart the application** after configuration changes

#### Permission Errors
```bash
# Fix permissions for local storage
chmod -R 755 ~/.mcp-ds-toolkit

# If using pip install
pip install --user mcp-ds-toolkit-server
```

#### Memory Issues with Large Datasets
- Use `sample_dataset` to work with smaller subsets
- Clear unused datasets with `remove_dataset` or `clear_all_data`
- Increase Python memory limit if needed

### Getting Help

- **Documentation**: See our [detailed guides](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server/wiki)
- **Issues**: Report bugs on [GitHub Issues](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server/issues)
- **Discussions**: Join our [community forum](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server/discussions)

## 🤝 Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Development Setup
```bash
git clone https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server
cd mcp-ds-toolkit-server
uv sync
uv run pytest
```

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

Built on these excellent projects:
- [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic
- [scikit-learn](https://scikit-learn.org/) for ML algorithms
- [SQLite](https://sqlite.org/) for local tracking

---

<div align="center">

**Transform your AI assistant into a complete Data Science toolkit!**

[![Star on GitHub](https://img.shields.io/github/stars/Yasserelhaddar/MCP-DS-Toolkit-Server?style=for-the-badge)](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server)
[![Install Now](https://img.shields.io/badge/Install-Now-success?style=for-the-badge)](#-quick-start)
[![View Examples](https://img.shields.io/badge/View-Examples-blue?style=for-the-badge)](#-example-prompts)

</div>

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mcp-ds-toolkit-server",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "data-science, local-tracking, machine-learning, mcp, scikit-learn, sqlite, standalone",
    "author": null,
    "author_email": "Yasser El Haddar <yasserelhaddar@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/19/aa/e7cdd864e1f7720de70beec8f4b40a330c8b51d4fc6daf14c1a0a19a0c3c/mcp_ds_toolkit_server-0.1.3.tar.gz",
    "platform": null,
    "description": "# \ud83d\ude80 MCP DS Toolkit Server\n\nA **standalone** Model Context Protocol (MCP) server that brings complete DS capabilities to AI assistants like Claude Desktop and Cursor. Build, train, and track ML models through natural language - no external services required!\n\n[![MCP Protocol](https://img.shields.io/badge/MCP-Protocol-green?style=for-the-badge)](https://modelcontextprotocol.io)\n[![Claude Desktop](https://img.shields.io/badge/Claude-Desktop-blue?style=for-the-badge)](#claude-desktop)\n[![Cursor IDE](https://img.shields.io/badge/Cursor-IDE-purple?style=for-the-badge)](#cursor-ide)\n\n## \ud83c\udfaf **What is MCP DS Toolkit Server?**\n\nMCP DS Toolkit Server enables AI assistants to perform complete data science workflows through natural language. Simply talk to Claude or Cursor about what you want to do, and it handles the entire DS pipeline - from data loading to model training and evaluation.\n\n### **Key Capabilities**\n\n\u2705 **31 DS Tools** - Complete DS workflow from data loading to model evaluation\n\u2705 **Natural Language Interface** - Just describe what you want in plain English\n\u2705 **Zero Configuration** - Works immediately after installation\n\u2705 **Local SQLite Tracking** - No external databases or cloud services needed\n\u2705 **Cross-Platform** - Works on macOS, Linux, and Windows\n\u2705 **AI Assistant Integration** - Seamless with Claude Desktop and Cursor IDE\n\n## \ud83c\udf1f Why MCP DS Toolkit?\n\n### \ud83c\udfaf **Transform Your AI Assistant into a Data Scientist**\n- **Natural Language DS**: Just describe what you want - \"Load the iris dataset and train a random forest classifier\"\n- **Complete Automation**: Your AI assistant handles data preprocessing, model training, and evaluation\n- **Intelligent Recommendations**: Get suggestions for algorithms, hyperparameters, and preprocessing steps\n- **Comprehensive Metrics**: Detailed performance metrics, learning curves, and model comparisons\n\n### \ud83d\udd2c **Enterprise-Ready Features**\n- **Production-Quality Code**: Generated code follows best practices and is deployment-ready\n- **Comprehensive Tracking**: Every experiment, model, and metric is automatically tracked\n- **Reproducible Workflows**: All operations are logged and can be reproduced\n- **Local-First Architecture**: Your data never leaves your machine\n\n### \ud83d\udcca **Complete Tool Suite**\n- **Data Management**: Loading, validation, profiling, cleaning, preprocessing\n- **Model Training**: 20+ algorithms from scikit-learn with automatic hyperparameter tuning\n- **Experiment Tracking**: SQLite-based tracking with full experiment lineage\n- **Performance Analysis**: Learning curves, feature importance, and model comparisons\n\n> **Note**: Cloud storage capabilities (AWS S3, Google Cloud, Azure) are available as optional dependencies but not yet fully implemented. Current version focuses on local storage and processing.\n\n## \u26a1 Quick Start\n\nChoose your preferred AI assistant:\n\n### \ud83e\udd16 Claude Desktop\n\n#### 1. Install the Server (30 seconds)\n\n```bash\n# Using uvx (recommended)\nuvx mcp-ds-toolkit-server\n\n# Or using pip\npip install mcp-ds-toolkit-server\n```\n\n#### 2. Configure Claude Desktop\n\nAdd to your Claude Desktop configuration file:\n\n**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`\n**Windows**: `%APPDATA%\\Claude\\claude_desktop_config.json`\n**Linux**: `~/.config/Claude/claude_desktop_config.json`\n\n```json\n{\n  \"mcpServers\": {\n    \"ds-toolkit\": {\n      \"command\": \"uvx\",\n      \"args\": [\"mcp-ds-toolkit-server\"]\n    }\n  }\n}\n```\n\n> **Note**: If you get `uvx ENOENT` errors, use the full path to `uvx` instead. Find it with `which uvx` and replace `\"command\": \"uvx\"` with `\"command\": \"/full/path/to/uvx\"`. See [troubleshooting section](#uvx-command-not-found-enoent-error) for details.\n\n#### 3. Restart Claude Desktop and Test\n\n```\nYou: Load the iris dataset and train a random forest classifier\nClaude: I'll help you load the iris dataset and train a random forest classifier...\n```\n\n### \ud83d\udcdd Cursor IDE\n\n#### 1. Install the Server\n\n```bash\n# Using uvx (recommended)\nuvx mcp-ds-toolkit-server\n\n# Or using pip\npip install mcp-ds-toolkit-server\n```\n\n#### 2. Configure Cursor\n\nCreate or edit the MCP configuration file:\n\n**Project-specific**: `.cursor/mcp.json` (in your project root)\n**Global**: `~/.cursor/mcp.json` (in your home directory)\n\n```json\n{\n  \"mcpServers\": {\n    \"ds-toolkit\": {\n      \"command\": \"uvx\",\n      \"args\": [\"mcp-ds-toolkit-server\"]\n    }\n  }\n}\n```\n\n> **Note**: If you get `uvx ENOENT` errors, use the full path to `uvx` instead. Find it with `which uvx` and replace `\"command\": \"uvx\"` with `\"command\": \"/full/path/to/uvx\"`. See [troubleshooting section](#uvx-command-not-found-enoent-error) for details.\n\n#### 3. Restart Cursor and Test\n\nOpen Cursor's AI chat and try:\n```\nYou: Profile my CSV dataset and show me the correlations\nCursor: I'll analyze your CSV dataset and generate a comprehensive profile...\n```\n\n### \ud83d\udc33 Alternative Installation Methods\n\n#### Development Installation (for contributors)\n```bash\ngit clone https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server\ncd mcp-ds-toolkit-server\nuv sync\n\n# Configure with local path\n{\n  \"mcpServers\": {\n    \"ds-toolkit\": {\n      \"command\": \"uv\",\n      \"args\": [\"--directory\", \"/path/to/mcp-ds-toolkit-server\", \"run\", \"mcp-ds-toolkit-server\"]\n    }\n  }\n}\n```\n\n## \ud83d\udee0\ufe0f **Complete Tool Reference**\n\n### \ud83d\udcca **Data Management Tools (15 tools)**\n\n| Tool | Description | Example Usage |\n|------|-------------|---------------|\n| `load_dataset` | Load data from CSV, JSON, Excel, sklearn datasets | \"Load the iris dataset\" |\n| `validate_dataset` | Check data quality and integrity | \"Validate my dataset for missing values\" |\n| `profile_dataset` | Generate comprehensive statistics | \"Profile the dataset and show correlations\" |\n| `preprocess_dataset` | Apply scaling, encoding, feature selection | \"Preprocess data with standard scaling\" |\n| `clean_dataset` | Handle missing values and outliers | \"Clean the dataset and remove outliers\" |\n| `split_dataset` | Create train/test/validation splits | \"Split data 80/20 for training\" |\n| `list_datasets` | Show all loaded datasets | \"What datasets are available?\" |\n| `get_dataset_info` | Get detailed dataset information | \"Show info about the sales dataset\" |\n| `compare_datasets` | Compare multiple datasets | \"Compare train and test distributions\" |\n| `batch_process_datasets` | Process multiple datasets at once | \"Apply same preprocessing to all datasets\" |\n| `sample_dataset` | Create dataset samples | \"Sample 1000 rows from the dataset\" |\n| `export_dataset` | Export to various formats | \"Export cleaned data to CSV\" |\n| `generate_learning_curve` | Analyze model learning behavior | \"Generate learning curves for the model\" |\n| `remove_dataset` | Remove dataset from memory | \"Remove the temporary dataset\" |\n| `clear_all_data` | Clear all loaded data | \"Clear all datasets from memory\" |\n\n### \ud83e\udd16 **Model Training Tools (6 tools)**\n\n| Tool | Description | Example Usage |\n|------|-------------|---------------|\n| `train_model` | Train ML models with 20+ algorithms | \"Train a random forest classifier\" |\n| `evaluate_model` | Evaluate model performance | \"Evaluate the model on test data\" |\n| `compare_models` | Compare multiple models | \"Compare RF, SVM, and XGBoost\" |\n| `tune_hyperparameters` | Optimize model parameters | \"Tune hyperparameters using grid search\" |\n| `get_model_info` | Get model details and parameters | \"Show model configuration\" |\n| `list_algorithms` | List available algorithms | \"What algorithms can I use?\" |\n\n### \ud83d\udcc8 **Experiment Tracking Tools (10 tools)**\n\n| Tool | Description | Example Usage |\n|------|-------------|---------------|\n| `create_experiment` | Create new experiment | \"Create experiment 'customer_churn_v1'\" |\n| `start_run` | Start tracking run | \"Start a new training run\" |\n| `log_params` | Log hyperparameters | \"Log the model parameters\" |\n| `log_metrics` | Log performance metrics | \"Log accuracy and F1 score\" |\n| `log_artifact` | Save artifacts (plots, models) | \"Save the confusion matrix plot\" |\n| `end_run` | Complete current run | \"End the current run\" |\n| `list_experiments` | Show all experiments | \"List all my experiments\" |\n| `get_experiment` | Get experiment details | \"Show details of the latest experiment\" |\n| `list_runs` | List experiment runs | \"Show all runs for this experiment\" |\n| `compare_runs` | Compare run metrics | \"Compare the last 3 runs\" |\n\n\n## \ud83d\udcac Example Prompts\n\n### \ud83c\udfaf **Quick Start Examples**\n\n```yaml\nBasic Operations:\n  \"Load the iris dataset from sklearn\"\n  \"Show me what datasets are currently loaded\"\n  \"Profile my dataset and show key statistics\"\n  \"Train a random forest classifier on the iris data\"\n  \"Evaluate my model and show the confusion matrix\"\n\nData Processing:\n  \"Load data.csv and check for missing values\"\n  \"Clean the dataset by removing outliers using IQR method\"\n  \"Preprocess the data with standard scaling and one-hot encoding\"\n  \"Split my data into 70% train, 15% validation, 15% test\"\n  \"Handle class imbalance in my dataset\"\n\nModel Training:\n  \"Train multiple models and compare their performance\"\n  \"Perform 5-fold cross-validation on my model\"\n  \"Tune hyperparameters for the random forest using grid search\"\n  \"Show me the feature importance for the trained model\"\n  \"Generate learning curves to check for overfitting\"\n\nExperiment Tracking:\n  \"Create an experiment called 'customer_churn_prediction'\"\n  \"Start tracking this training run\"\n  \"Log the model parameters and performance metrics\"\n  \"Compare the last 5 training runs\"\n  \"Show me all experiments from the past week\"\n```\n\n### \ud83d\udd2c **Complete Workflow Example**\n\nHere's a real conversation showing an end-to-end ML pipeline:\n\n```\nYou: Load the wine dataset from sklearn and give me a comprehensive profile\n\nClaude: I'll load the wine dataset and provide you with a comprehensive profile...\n[Loads dataset, generates statistics, correlations, and data quality report]\n\nYou: Clean the data and split it for training with 80/20 ratio\n\nClaude: I'll clean the dataset and create train/test splits...\n[Handles missing values, removes outliers, splits data]\n\nYou: Train and compare Random Forest, SVM, and Gradient Boosting classifiers\n\nClaude: I'll train these three models and compare their performance...\n[Trains models, evaluates each, provides comparison table]\n\nYou: The Random Forest looks best. Can you tune its hyperparameters?\n\nClaude: I'll optimize the Random Forest hyperparameters using grid search...\n[Performs grid search, shows best parameters and improved metrics]\n\nYou: Great! Create an experiment called \"wine_classification\" and log everything\n\nClaude: I'll create the experiment and log all our work...\n[Creates experiment, logs parameters, metrics, and artifacts]\n\nYou: Show me the learning curves and feature importance\n\nClaude: I'll analyze the model's learning behavior and feature importance...\n[Provides learning curve analysis and feature importance rankings]\n```\n\n## \ud83d\ude80 **Supported ML Algorithms**\n\n### Classification Algorithms\n- **Tree-Based**: RandomForest, GradientBoosting, ExtraTrees, DecisionTree\n- **Linear Models**: LogisticRegression, RidgeClassifier, SGDClassifier\n- **Support Vector**: SVC (linear, rbf, poly kernels)\n- **Neighbors**: KNeighborsClassifier\n- **Naive Bayes**: GaussianNB, MultinomialNB, BernoulliNB\n\n### Regression Algorithms\n- **Tree-Based**: RandomForestRegressor, GradientBoostingRegressor, ExtraTreesRegressor\n- **Linear Models**: LinearRegression, Ridge, Lasso, ElasticNet\n- **Support Vector**: SVR (linear, rbf, poly kernels)\n- **Neighbors**: KNeighborsRegressor\n\n## \ud83c\udfd7\ufe0f **Architecture**\n\n### How It Works\n\n```mermaid\ngraph LR\n    A[AI Assistant<br/>Claude/Cursor] -->|Natural Language| B[MCP Protocol]\n    B --> C[MCP DS Toolkit Server]\n    C --> D[Data Tools]\n    C --> E[Training Tools]\n    C --> F[Tracking Tools]\n    D --> G[Local Storage<br/>~/.mcp-ds-toolkit]\n    E --> G\n    F --> G\n```\n\n### Storage Structure\n\n```\n~/.mcp-ds-toolkit/\n\u251c\u2500\u2500 experiments.db          # SQLite experiment tracking\n\u251c\u2500\u2500 artifacts/              # Plots, reports, outputs\n\u2502   \u2514\u2500\u2500 {experiment_id}/\n\u2502       \u2514\u2500\u2500 {run_id}/\n\u251c\u2500\u2500 models/                 # Saved ML models\n\u251c\u2500\u2500 datasets/               # Cached datasets\n\u2514\u2500\u2500 cache/                  # Temporary files\n```\n\n### Technology Stack\n\n- **Core**: Python 3.12+, MCP Protocol, SQLite\n- **ML Framework**: scikit-learn, pandas, numpy\n- **Data Processing**: pandas, numpy, scipy\n- **No External Dependencies**: Everything runs locally\n\n## \ud83d\udd27 Troubleshooting\n\n### Common Issues and Solutions\n\n#### Server Not Starting\n```bash\n# Check Python version (requires 3.12+)\npython --version\n\n# Reinstall with verbose output\npip install --verbose mcp-ds-toolkit-server\n\n# Check if the command is available\nwhich mcp-ds-toolkit-server\n```\n\n#### uvx Command Not Found (ENOENT Error)\nIf you see errors like `spawn uvx ENOENT` in Claude Desktop logs, this means `uvx` is not in the system PATH that Claude Desktop can access.\n\n**Solution**: Use the full path to `uvx` in your configuration:\n\n1. **Find your uvx path**:\n   ```bash\n   which uvx\n   # Example output: /Users/username/.pyenv/shims/uvx\n   ```\n\n2. **Update your configuration with the full path**:\n   ```json\n   {\n     \"mcpServers\": {\n       \"ds-toolkit\": {\n         \"command\": \"/Users/username/.pyenv/shims/uvx\",\n         \"args\": [\"mcp-ds-toolkit-server\"]\n       }\n     }\n   }\n   ```\n\n**Why this happens**: Claude Desktop runs with a limited PATH environment that may not include directories where `uvx` is installed (like `~/.pyenv/shims` for pyenv users, `~/.local/bin`, or other Python tool directories).\n\n#### Claude/Cursor Not Finding Tools\n1. **Check configuration file location**:\n   - Claude: `~/Library/Application Support/Claude/claude_desktop_config.json`\n   - Cursor: `.cursor/mcp.json` or `~/.cursor/mcp.json`\n\n2. **Verify JSON syntax**:\n   ```json\n   {\n     \"mcpServers\": {\n       \"ds-toolkit\": {\n         \"command\": \"uvx\",\n         \"args\": [\"mcp-ds-toolkit-server\"]\n       }\n     }\n   }\n   ```\n\n3. **Restart the application** after configuration changes\n\n#### Permission Errors\n```bash\n# Fix permissions for local storage\nchmod -R 755 ~/.mcp-ds-toolkit\n\n# If using pip install\npip install --user mcp-ds-toolkit-server\n```\n\n#### Memory Issues with Large Datasets\n- Use `sample_dataset` to work with smaller subsets\n- Clear unused datasets with `remove_dataset` or `clear_all_data`\n- Increase Python memory limit if needed\n\n### Getting Help\n\n- **Documentation**: See our [detailed guides](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server/wiki)\n- **Issues**: Report bugs on [GitHub Issues](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server/issues)\n- **Discussions**: Join our [community forum](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server/discussions)\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### Development Setup\n```bash\ngit clone https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server\ncd mcp-ds-toolkit-server\nuv sync\nuv run pytest\n```\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\nBuilt on these excellent projects:\n- [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic\n- [scikit-learn](https://scikit-learn.org/) for ML algorithms\n- [SQLite](https://sqlite.org/) for local tracking\n\n---\n\n<div align=\"center\">\n\n**Transform your AI assistant into a complete Data Science toolkit!**\n\n[![Star on GitHub](https://img.shields.io/github/stars/Yasserelhaddar/MCP-DS-Toolkit-Server?style=for-the-badge)](https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server)\n[![Install Now](https://img.shields.io/badge/Install-Now-success?style=for-the-badge)](#-quick-start)\n[![View Examples](https://img.shields.io/badge/View-Examples-blue?style=for-the-badge)](#-example-prompts)\n\n</div>",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pure standalone MCP Data Science Server - Complete DS workflows with local SQLite tracking, no external services required",
    "version": "0.1.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server/issues",
        "Homepage": "https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server",
        "Repository": "https://github.com/Yasserelhaddar/MCP-DS-Toolkit-Server"
    },
    "split_keywords": [
        "data-science",
        " local-tracking",
        " machine-learning",
        " mcp",
        " scikit-learn",
        " sqlite",
        " standalone"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0d544099366ee1e7fed180506c0d349b47e9aed6941f63ae777218cdb343d5bc",
                "md5": "f9f50b5e98002ef06d3a120e740985d2",
                "sha256": "fa43e3bd842868c7122ec865546ae358bc0e468632a217a988e69a0a1714620a"
            },
            "downloads": -1,
            "filename": "mcp_ds_toolkit_server-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f9f50b5e98002ef06d3a120e740985d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 158398,
            "upload_time": "2025-09-14T14:09:41",
            "upload_time_iso_8601": "2025-09-14T14:09:41.337591Z",
            "url": "https://files.pythonhosted.org/packages/0d/54/4099366ee1e7fed180506c0d349b47e9aed6941f63ae777218cdb343d5bc/mcp_ds_toolkit_server-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "19aae7cdd864e1f7720de70beec8f4b40a330c8b51d4fc6daf14c1a0a19a0c3c",
                "md5": "e2d86ef92d78ba4560ede5e587bbb0d6",
                "sha256": "5ba3fab1dde7558a3bc1308da6e2c357b4c95badd8acd126e0349b100ed18d32"
            },
            "downloads": -1,
            "filename": "mcp_ds_toolkit_server-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e2d86ef92d78ba4560ede5e587bbb0d6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 222475,
            "upload_time": "2025-09-14T14:09:42",
            "upload_time_iso_8601": "2025-09-14T14:09:42.897354Z",
            "url": "https://files.pythonhosted.org/packages/19/aa/e7cdd864e1f7720de70beec8f4b40a330c8b51d4fc6daf14c1a0a19a0c3c/mcp_ds_toolkit_server-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-14 14:09:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Yasserelhaddar",
    "github_project": "MCP-DS-Toolkit-Server",
    "github_not_found": true,
    "lcname": "mcp-ds-toolkit-server"
}

None