source-coop-mcp


Namesource-coop-mcp JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryMCP server for Source Cooperative auto-discovery and data exploration
upload_time2025-10-22 23:19:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT
keywords geospatial mcp model-context-protocol object-storage s3 source-cooperative
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Source Cooperative MCP Server

[![Tests](https://github.com/yharby/source-coop-mcp/actions/workflows/test-and-report.yml/badge.svg)](https://github.com/yharby/source-coop-mcp/actions)
[![PyPI version](https://badge.fury.io/py/source-coop-mcp.svg)](https://pypi.org/project/source-coop-mcp/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Discover and access 800TB+ of geospatial data through AI agents.**

An MCP (Model Context Protocol) server for [Source Cooperative](https://source.coop) - a collaborative repository with datasets from Maxar, Harvard, ESA, USGS, and 90+ organizations.

---

## 🏗️ Architecture Overview

```mermaid
graph TB
    subgraph "AI Clients"
        A1[Claude Desktop]
        A2[Claude Code]
        A3[Cursor]
        A4[Cline]
        A5[Zed]
        A6[Continue.dev]
    end

    subgraph "MCP Server"
        MCP[Source Cooperative MCP<br/>FastMCP + obstore]
    end

    subgraph "6 Available Tools"
        T1[list_accounts<br/>94+ orgs]
        T2[list_products<br/>hybrid S3+API]
        T3[get_product_details<br/>+ README]
        T4[list_product_files<br/>tree mode]
        T5[get_file_metadata<br/>no download]
        T6[search<br/>hybrid fuzzy]
    end

    subgraph "Data Sources"
        S1[HTTP API<br/>source.coop/api]
        S2[S3 Direct<br/>opendata.source.coop]
    end

    A1 -->|JSON-RPC| MCP
    A2 -->|JSON-RPC| MCP
    A3 -->|JSON-RPC| MCP
    A4 -->|JSON-RPC| MCP
    A5 -->|JSON-RPC| MCP
    A6 -->|JSON-RPC| MCP

    MCP --> T1
    MCP --> T2
    MCP --> T3
    MCP --> T4
    MCP --> T5
    MCP --> T6

    T1 --> S2
    T2 --> S1
    T2 --> S2
    T3 --> S1
    T3 --> S2
    T4 --> S2
    T5 --> S2
    T6 --> S1

    style MCP fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
    style S1 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style S2 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
```

**Key Features:**
- ✅ **Token Optimized** - 72% reduction for large datasets
- ✅ **Smart Partitions** - Auto-detects Hive-style patterns
- ✅ **Fuzzy Search** - Handles typos and partial matches
- ✅ **No Auth** - All 800TB+ is public

---

## 🚀 Quick Start

### Install

```bash
uvx source-coop-mcp
```

### Configure Your AI Client

#### **Claude Desktop / Claude Code / Cursor / Cline**

Add to config file:
- **Claude Desktop**: `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS)
- **Claude Code**: VS Code `settings.json`
- **Cursor**: Cursor settings
- **Cline**: Cline MCP settings

```json
{
  "mcpServers": {
    "source-coop": {
      "command": "uvx",
      "args": ["source-coop-mcp"]
    }
  }
}
```

#### **Zed**

Add to Zed settings:

```json
{
  "context_servers": {
    "source-coop": {
      "command": "uvx",
      "args": ["source-coop-mcp"]
    }
  }
}
```

#### **Continue.dev**

Add to Continue config (`~/.continue/config.json`):

```json
{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "transport": {
          "type": "stdio",
          "command": "uvx",
          "args": ["source-coop-mcp"]
        }
      }
    ]
  }
}
```

**Restart your AI client and start exploring!**

---

## 🛠️ Available Tools

| Tool | Purpose | Performance |
|------|---------|-------------|
| `list_accounts()` | Find all 94+ organizations | ~850ms |
| `list_products()` | **Hybrid:** S3 mode (default) for ALL datasets + file counts | ~240ms |
| `list_products(include_unpublished=False)` | API mode for published datasets with rich metadata | ~500ms |
| `get_product_details()` | Get metadata + README automatically | ~650ms |
| `list_product_files()` | List files with S3/HTTP paths | ~240ms |
| `list_product_files(show_tree=True)` | Tree view (72% token savings) | ~980ms |
| `get_file_metadata()` | Get file info without downloading | ~230ms |
| `search(query)` | **Hybrid:** Search accounts + products (published + unpublished), top 5 results | ~5-10s |

---

## 💡 What You Can Do

### Discover Data

```
"List all organizations in Source Cooperative"
→ Returns 94+ organizations: maxar, planet, harvard, etc.

"Find all datasets for harvard-lil"
→ Discovers published + unpublished products

"Search for climate datasets"
→ Smart fuzzy search handles typos and partial matches
```

### Access Files

```
"List files in harvard-lil/gov-data"
→ Returns S3 paths and HTTP URLs ready for analysis

"Show me the file tree with partition detection"
→ Smart visualization: year={2020,2021,...+5 more}/ [partitioned]

"Get file metadata without downloading"
→ Size, last modified, ETag
```

### Smart Search

```
"Search for climte" (typo)
→ Finds "climate" datasets (fuzzy matching)

"Search for geo" (partial)
→ Finds "geospatial", "geocoding", etc.
```

---

## ⚡ Features

| Feature | Description |
|---------|-------------|
| **Complete Discovery** | Finds unpublished products the official API doesn't show |
| **No Authentication** | All 800TB+ data is public |
| **Fast Performance** | Rust-backed S3 client (9x faster than boto3) |
| **Token Optimized** | Tree mode: 72% token reduction for large datasets |
| **Smart Partitions** | Auto-detects patterns: `year={2020,2021,...}` |
| **Fuzzy Search** | Handles typos and partial matches |
| **README Integration** | Documentation automatically included |
| **800TB+ Data** | 94+ organizations, geospatial datasets |

---

## 📋 Example Workflow

```
1. "List all organizations"
   → Get 94+ account names

2. "Show me all datasets from maxar"
   → Discover published + unpublished products

3. "Search for climate data"
   → Smart fuzzy search finds relevant datasets

4. "Get details for harvard-lil/gov-data"
   → Full metadata + README content

5. "List files in this dataset with tree view"
   → Token-optimized tree with partition detection
```

---

## 🎯 Why This Server?

### Problem
Source Cooperative has 800TB+ of valuable data, but:
- Official API only shows **published** products
- No auto-discovery of organizations
- Requires knowing what you're looking for

### Solution
This MCP server provides:
- ✅ Complete auto-discovery (published + unpublished)
- ✅ Smart search with fuzzy matching
- ✅ Direct S3 access for all files
- ✅ Token-optimized outputs (72% reduction)
- ✅ Smart partition detection (10-88% additional savings)
- ✅ README documentation included automatically
- ✅ No authentication required

---

## 📊 Performance

All operations complete in **under 1 second**:

```
list_accounts():                          ~850ms  (94+ organizations)
list_products():                          ~240ms  (S3 mode - ALL datasets + file counts)
list_products(include_unpublished=False): ~500ms  (API mode - published with metadata)
list_product_files():                     ~240ms  (simple list)
list_product_files(tree=True):            ~980ms  (72% token savings)
get_file_metadata():                      ~230ms  (HEAD only)
search(query):                            ~5-10s  (hybrid search - 1 recursive S3 scan, top 5 enriched)
```

### Token Optimization Impact

| Dataset Size | Without Tree | With Tree | Saved |
|--------------|--------------|-----------|-------|
| 10 files | 1,500 tokens | 415 tokens | 72.3% |
| 100 files | 15,000 tokens | 4,150 tokens | 72.3% |
| 1,000 files | 150,000 tokens | 41,500 tokens | 72.3% |

With partition detection (1,000 partitions): **88% total savings!**

---

## 🔧 Requirements

- **Python**: 3.11 or higher
- **Package Manager**: `uv` (installed automatically by `uvx`)
- **Operating Systems**: macOS, Linux, Windows

---

## 🤝 Development

See [DEVELOPMENT.md](DEVELOPMENT.md) for:
- Architecture details
- Testing instructions
- Contributing guidelines
- Performance benchmarks
- Token optimization details

---

## 📝 Support

- **Issues**: [GitHub Issues](https://github.com/yharby/source-coop-mcp/issues)

---

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "source-coop-mcp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "geospatial, mcp, model-context-protocol, object-storage, s3, source-cooperative",
    "author": null,
    "author_email": "yharby <me@youssefharby.com>",
    "download_url": "https://files.pythonhosted.org/packages/1c/34/6703e9f6233247ba28b6d16837f69518ab3183016189772dcc66e83f484f/source_coop_mcp-0.2.1.tar.gz",
    "platform": null,
    "description": "# Source Cooperative MCP Server\n\n[![Tests](https://github.com/yharby/source-coop-mcp/actions/workflows/test-and-report.yml/badge.svg)](https://github.com/yharby/source-coop-mcp/actions)\n[![PyPI version](https://badge.fury.io/py/source-coop-mcp.svg)](https://pypi.org/project/source-coop-mcp/)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**Discover and access 800TB+ of geospatial data through AI agents.**\n\nAn MCP (Model Context Protocol) server for [Source Cooperative](https://source.coop) - a collaborative repository with datasets from Maxar, Harvard, ESA, USGS, and 90+ organizations.\n\n---\n\n## \ud83c\udfd7\ufe0f Architecture Overview\n\n```mermaid\ngraph TB\n    subgraph \"AI Clients\"\n        A1[Claude Desktop]\n        A2[Claude Code]\n        A3[Cursor]\n        A4[Cline]\n        A5[Zed]\n        A6[Continue.dev]\n    end\n\n    subgraph \"MCP Server\"\n        MCP[Source Cooperative MCP<br/>FastMCP + obstore]\n    end\n\n    subgraph \"6 Available Tools\"\n        T1[list_accounts<br/>94+ orgs]\n        T2[list_products<br/>hybrid S3+API]\n        T3[get_product_details<br/>+ README]\n        T4[list_product_files<br/>tree mode]\n        T5[get_file_metadata<br/>no download]\n        T6[search<br/>hybrid fuzzy]\n    end\n\n    subgraph \"Data Sources\"\n        S1[HTTP API<br/>source.coop/api]\n        S2[S3 Direct<br/>opendata.source.coop]\n    end\n\n    A1 -->|JSON-RPC| MCP\n    A2 -->|JSON-RPC| MCP\n    A3 -->|JSON-RPC| MCP\n    A4 -->|JSON-RPC| MCP\n    A5 -->|JSON-RPC| MCP\n    A6 -->|JSON-RPC| MCP\n\n    MCP --> T1\n    MCP --> T2\n    MCP --> T3\n    MCP --> T4\n    MCP --> T5\n    MCP --> T6\n\n    T1 --> S2\n    T2 --> S1\n    T2 --> S2\n    T3 --> S1\n    T3 --> S2\n    T4 --> S2\n    T5 --> S2\n    T6 --> S1\n\n    style MCP fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff\n    style S1 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff\n    style S2 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff\n```\n\n**Key Features:**\n- \u2705 **Token Optimized** - 72% reduction for large datasets\n- \u2705 **Smart Partitions** - Auto-detects Hive-style patterns\n- \u2705 **Fuzzy Search** - Handles typos and partial matches\n- \u2705 **No Auth** - All 800TB+ is public\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### Install\n\n```bash\nuvx source-coop-mcp\n```\n\n### Configure Your AI Client\n\n#### **Claude Desktop / Claude Code / Cursor / Cline**\n\nAdd to config file:\n- **Claude Desktop**: `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS)\n- **Claude Code**: VS Code `settings.json`\n- **Cursor**: Cursor settings\n- **Cline**: Cline MCP settings\n\n```json\n{\n  \"mcpServers\": {\n    \"source-coop\": {\n      \"command\": \"uvx\",\n      \"args\": [\"source-coop-mcp\"]\n    }\n  }\n}\n```\n\n#### **Zed**\n\nAdd to Zed settings:\n\n```json\n{\n  \"context_servers\": {\n    \"source-coop\": {\n      \"command\": \"uvx\",\n      \"args\": [\"source-coop-mcp\"]\n    }\n  }\n}\n```\n\n#### **Continue.dev**\n\nAdd to Continue config (`~/.continue/config.json`):\n\n```json\n{\n  \"experimental\": {\n    \"modelContextProtocolServers\": [\n      {\n        \"transport\": {\n          \"type\": \"stdio\",\n          \"command\": \"uvx\",\n          \"args\": [\"source-coop-mcp\"]\n        }\n      }\n    ]\n  }\n}\n```\n\n**Restart your AI client and start exploring!**\n\n---\n\n## \ud83d\udee0\ufe0f Available Tools\n\n| Tool | Purpose | Performance |\n|------|---------|-------------|\n| `list_accounts()` | Find all 94+ organizations | ~850ms |\n| `list_products()` | **Hybrid:** S3 mode (default) for ALL datasets + file counts | ~240ms |\n| `list_products(include_unpublished=False)` | API mode for published datasets with rich metadata | ~500ms |\n| `get_product_details()` | Get metadata + README automatically | ~650ms |\n| `list_product_files()` | List files with S3/HTTP paths | ~240ms |\n| `list_product_files(show_tree=True)` | Tree view (72% token savings) | ~980ms |\n| `get_file_metadata()` | Get file info without downloading | ~230ms |\n| `search(query)` | **Hybrid:** Search accounts + products (published + unpublished), top 5 results | ~5-10s |\n\n---\n\n## \ud83d\udca1 What You Can Do\n\n### Discover Data\n\n```\n\"List all organizations in Source Cooperative\"\n\u2192 Returns 94+ organizations: maxar, planet, harvard, etc.\n\n\"Find all datasets for harvard-lil\"\n\u2192 Discovers published + unpublished products\n\n\"Search for climate datasets\"\n\u2192 Smart fuzzy search handles typos and partial matches\n```\n\n### Access Files\n\n```\n\"List files in harvard-lil/gov-data\"\n\u2192 Returns S3 paths and HTTP URLs ready for analysis\n\n\"Show me the file tree with partition detection\"\n\u2192 Smart visualization: year={2020,2021,...+5 more}/ [partitioned]\n\n\"Get file metadata without downloading\"\n\u2192 Size, last modified, ETag\n```\n\n### Smart Search\n\n```\n\"Search for climte\" (typo)\n\u2192 Finds \"climate\" datasets (fuzzy matching)\n\n\"Search for geo\" (partial)\n\u2192 Finds \"geospatial\", \"geocoding\", etc.\n```\n\n---\n\n## \u26a1 Features\n\n| Feature | Description |\n|---------|-------------|\n| **Complete Discovery** | Finds unpublished products the official API doesn't show |\n| **No Authentication** | All 800TB+ data is public |\n| **Fast Performance** | Rust-backed S3 client (9x faster than boto3) |\n| **Token Optimized** | Tree mode: 72% token reduction for large datasets |\n| **Smart Partitions** | Auto-detects patterns: `year={2020,2021,...}` |\n| **Fuzzy Search** | Handles typos and partial matches |\n| **README Integration** | Documentation automatically included |\n| **800TB+ Data** | 94+ organizations, geospatial datasets |\n\n---\n\n## \ud83d\udccb Example Workflow\n\n```\n1. \"List all organizations\"\n   \u2192 Get 94+ account names\n\n2. \"Show me all datasets from maxar\"\n   \u2192 Discover published + unpublished products\n\n3. \"Search for climate data\"\n   \u2192 Smart fuzzy search finds relevant datasets\n\n4. \"Get details for harvard-lil/gov-data\"\n   \u2192 Full metadata + README content\n\n5. \"List files in this dataset with tree view\"\n   \u2192 Token-optimized tree with partition detection\n```\n\n---\n\n## \ud83c\udfaf Why This Server?\n\n### Problem\nSource Cooperative has 800TB+ of valuable data, but:\n- Official API only shows **published** products\n- No auto-discovery of organizations\n- Requires knowing what you're looking for\n\n### Solution\nThis MCP server provides:\n- \u2705 Complete auto-discovery (published + unpublished)\n- \u2705 Smart search with fuzzy matching\n- \u2705 Direct S3 access for all files\n- \u2705 Token-optimized outputs (72% reduction)\n- \u2705 Smart partition detection (10-88% additional savings)\n- \u2705 README documentation included automatically\n- \u2705 No authentication required\n\n---\n\n## \ud83d\udcca Performance\n\nAll operations complete in **under 1 second**:\n\n```\nlist_accounts():                          ~850ms  (94+ organizations)\nlist_products():                          ~240ms  (S3 mode - ALL datasets + file counts)\nlist_products(include_unpublished=False): ~500ms  (API mode - published with metadata)\nlist_product_files():                     ~240ms  (simple list)\nlist_product_files(tree=True):            ~980ms  (72% token savings)\nget_file_metadata():                      ~230ms  (HEAD only)\nsearch(query):                            ~5-10s  (hybrid search - 1 recursive S3 scan, top 5 enriched)\n```\n\n### Token Optimization Impact\n\n| Dataset Size | Without Tree | With Tree | Saved |\n|--------------|--------------|-----------|-------|\n| 10 files | 1,500 tokens | 415 tokens | 72.3% |\n| 100 files | 15,000 tokens | 4,150 tokens | 72.3% |\n| 1,000 files | 150,000 tokens | 41,500 tokens | 72.3% |\n\nWith partition detection (1,000 partitions): **88% total savings!**\n\n---\n\n## \ud83d\udd27 Requirements\n\n- **Python**: 3.11 or higher\n- **Package Manager**: `uv` (installed automatically by `uvx`)\n- **Operating Systems**: macOS, Linux, Windows\n\n---\n\n## \ud83e\udd1d Development\n\nSee [DEVELOPMENT.md](DEVELOPMENT.md) for:\n- Architecture details\n- Testing instructions\n- Contributing guidelines\n- Performance benchmarks\n- Token optimization details\n\n---\n\n## \ud83d\udcdd Support\n\n- **Issues**: [GitHub Issues](https://github.com/yharby/source-coop-mcp/issues)\n\n---\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MCP server for Source Cooperative auto-discovery and data exploration",
    "version": "0.2.1",
    "project_urls": {
        "Changelog": "https://github.com/yharby/source-coop-mcp/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/yharby/source-coop-mcp#readme",
        "Homepage": "https://github.com/yharby/source-coop-mcp",
        "Issues": "https://github.com/yharby/source-coop-mcp/issues",
        "Repository": "https://github.com/yharby/source-coop-mcp"
    },
    "split_keywords": [
        "geospatial",
        " mcp",
        " model-context-protocol",
        " object-storage",
        " s3",
        " source-cooperative"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f81932b954c1ce4ff44dbe40a2839ea0bd5bce458bc8460fdc0631778169f0d2",
                "md5": "f9434cdf197c138a0bcf369cf77edbd3",
                "sha256": "98bb1ba8e0e31abc6e42dc2ab4e3601bca57353f6960215a41b3a4b1efb81164"
            },
            "downloads": -1,
            "filename": "source_coop_mcp-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f9434cdf197c138a0bcf369cf77edbd3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 17917,
            "upload_time": "2025-10-22T23:19:56",
            "upload_time_iso_8601": "2025-10-22T23:19:56.723542Z",
            "url": "https://files.pythonhosted.org/packages/f8/19/32b954c1ce4ff44dbe40a2839ea0bd5bce458bc8460fdc0631778169f0d2/source_coop_mcp-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1c346703e9f6233247ba28b6d16837f69518ab3183016189772dcc66e83f484f",
                "md5": "8b806b931d89d2545e70163102eac005",
                "sha256": "6d10c38e7034c4578dd5d027a8c520042f0ead4b0fd01626662abb93450856f9"
            },
            "downloads": -1,
            "filename": "source_coop_mcp-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8b806b931d89d2545e70163102eac005",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 128411,
            "upload_time": "2025-10-22T23:19:58",
            "upload_time_iso_8601": "2025-10-22T23:19:58.248086Z",
            "url": "https://files.pythonhosted.org/packages/1c/34/6703e9f6233247ba28b6d16837f69518ab3183016189772dcc66e83f484f/source_coop_mcp-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-22 23:19:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yharby",
    "github_project": "source-coop-mcp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "source-coop-mcp"
}
        
Elapsed time: 1.62790s