mcp-ingest


Namemcp-ingest JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummarySDK + CLI to describe, validate, and register MCP servers into MatrixHub
upload_time2025-08-10 15:36:18
maintainerNone
docs_urlNone
authorMatrixHub Contributors
requires_python<3.13,>=3.11
licenseApache-2.0
keywords agents ingest matrixhub mcp model-context-protocol
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🦾 MCP Ingest

*Discover, describe, and register AI agents with ease.*

<p align="center">
  <a href="https://pypi.org/project/mcp-ingest/"><img src="https://img.shields.io/pypi/v/mcp-ingest?color=blue" alt="PyPI"></a>
  <a href="https://pypi.org/project/mcp-ingest/"><img src="https://img.shields.io/pypi/pyversions/mcp-ingest.svg?logo=python" alt="Python Versions"></a>
  <a href="https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml">
    <img src="https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml/badge.svg?branch=master" alt="CI">
  </a>
  <a href="https://agent-matrix.github.io/matrix-hub/"><img src="https://img.shields.io/static/v1?label=docs&message=mkdocs&color=blue&logo=mkdocs" alt="Docs"></a>
  <a href="https://github.com/agent-matrix/mcp_ingest/blob/master/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue" alt="License: Apache-2.0"></a>
  <a href="https://github.com/ruslanmv/agent-generator"><img src="https://img.shields.io/badge/Powered%20by-agent--generator-brightgreen" alt="Powered by agent-generator"></a>
</p>


---

**`mcp-ingest`** is a tiny SDK + CLI that turns *any* MCP server/agent/tool into a **MatrixHub‑ready** artifact. It lets you:

* **Discover** servers from a folder, Git repo, or ZIP — even whole registries (harvester).
* **Describe** them offline → emit `manifest.json` + `index.json` (SSE normalized).
* **Validate** in a sandbox or container (handshake, `ListTools`, one `CallTool`).
* **Publish** to S3/GitHub Pages and **Register** to MatrixHub (`/catalog/install`).

Built for **Python 3.11**, packaged for **PyPI**, with strict lint/type/test gates.

> You can catalog **millions** of MCP candidates offline and install on demand per tenant/workspace — the fastest path to building **the Matrix** of interoperable agents and tools.

---


## Install

```bash
pip install mcp-ingest
```

For contributors:

```bash
python3.11 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev,harvester]"
```

---

## Quickstart

### SDK (authors)

```python
from mcp_ingest import describe, autoinstall

# Generate manifest.json and index.json without running your server
paths = describe(
    name="watsonx-mcp",
    url="http://127.0.0.1:6288/sse",
    tools=["chat"],
    resources=[{"uri":"file://server.py","name":"source"}],
    description="Watsonx SSE server",
    version="0.1.0",
)
print(paths)  # {"manifest_path": "./manifest.json", "index_path": "./index.json"}

# Optionally register into MatrixHub (idempotent)
autoinstall(matrixhub_url="http://127.0.0.1:7300")
```

### CLI (operators)

```bash
# Detect → describe (offline)
mcp-ingest pack ./examples/watsonx --out dist/

# Register later (POST /catalog/install)
mcp-ingest register \
  --matrixhub http://127.0.0.1:7300 \
  --manifest dist/manifest.json
```

### Harvest a whole repo

Harvest a multi-server repository (e.g., the official MCP servers collection):

```bash
mcp-ingest harvest-repo \
  https://github.com/modelcontextprotocol/servers/archive/refs/heads/main.zip \
  --out dist/servers
```

Outputs one `manifest.json` per detected server and a **repo-level `index.json`** that lists them all. Optionally `--publish s3://…` and/or `--register`.

---

## MatrixHub integration

* Preferred path: **`POST /catalog/install`** with the **inline manifest** (what `autoinstall()` and `mcp-ingest register` do).
* **Idempotent** by design: HTTP **409** is treated as success; safe to re-run.
* **SSE normalization**: we auto-fix URLs to end in `/sse` unless the manifest explicitly requests `/messages` or a different transport.

**Deferred install**: You can *describe* millions of candidates offline, then install only when a tenant wants them.

---

## Transports

MCP Ingest supports three server transports when building manifests:

* **SSE** (default): URL is normalized to `/sse` if needed.
* **STDIO**: provide an `exec` block with `cmd` (e.g. Node servers via `npx`).
* **WS**: WebSocket endpoints are preserved as provided.

Example STDIO snippet:

```json
{
  "mcp_registration": {
    "server": {
      "name": "filesystem",
      "transport": "STDIO",
      "exec": { "cmd": ["npx","-y","@modelcontextprotocol/server-filesystem"] }
    }
  }
}
```

---

## Project layout

```
mcp_ingest/
  __init__.py              # exports: describe(), autoinstall()
  sdk.py                   # orchestrates describe/register
  cli.py                   # detect/describe/register/pack/harvest-repo
  emit/                    # manifest/index + optional MatrixHub adapters
  register/                # MatrixHub /catalog/install + gateway fallback
  utils/                   # sse/io/idempotency/jsonschema/ast_parse/fetch/git/temp
  detect/                  # fastmcp, langchain, llamaindex, autogen, crewai, semantic_kernel, raw
  validate/                # mcp_probe + sandbox (proc/container)
services/harvester/
  app.py + routers + workers + discovery + store + clients
examples/watsonx/
  server.py, manifest.json, index.json
```

MkDocs documentation lives under `docs/` (Material theme). CI builds lint/type/tests and wheels.

---

## Development

Use the Makefile helpers:

```bash
make help
make setup      # create .venv (Python 3.11)
make install    # install package + dev extras
make format     # black
make lint       # ruff
make typecheck  # mypy
make test       # pytest
make ci         # full gate (ruff+black+mypy+pytest)
make build      # sdist/wheel → dist/
```

Local harvester API:

```bash
uvicorn services.harvester.app:app --reload
# POST /jobs {"mode":"harvest_repo","source":"<git|zip|dir>","options":{}}
```

---

## CI & Quality

* **Ruff** (lint), **Black** (format), **Mypy** (types), **Pytest** (coverage)
* GitHub Actions workflow in `.github/workflows/ci.yml`
* Package is built and uploaded as CI artifact; PyPI publishing via Twine is supported.

---

## Security & Safety

* Idempotent HTTP and retries with exponential backoff (409 → success).
* Sandboxes (process & container) with timeouts and memory caps.
* No secrets stored at rest; inject via environment only.
* Logs are structured; per-job trace IDs in the harvester path.

---

## Roadmap

* More detectors (framework coverage), stronger schema inference.
* Richer validation (multiple tool calls, golden tests), SBOM/provenance by default.
* Global public index shards (CDN) fed by the harvester.

---

## License

`mcp-ingest` is licensed under the Apache License 2.0. See the `LICENSE` file for more details.


---

### Acknowledgements

This project is part of the **Agent‑Matrix** ecosystem and is inspired by the Model Context Protocol community work.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mcp-ingest",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.11",
    "maintainer_email": null,
    "keywords": "agents, ingest, matrixhub, mcp, model-context-protocol",
    "author": "MatrixHub Contributors",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/93/6b/1862c27422c69455a07e1009f91096f31e7d09cd77b5f6be3a89496fdfcc/mcp_ingest-0.1.0.tar.gz",
    "platform": null,
    "description": "# \ud83e\uddbe MCP Ingest\n\n*Discover, describe, and register AI agents with ease.*\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/mcp-ingest/\"><img src=\"https://img.shields.io/pypi/v/mcp-ingest?color=blue\" alt=\"PyPI\"></a>\n  <a href=\"https://pypi.org/project/mcp-ingest/\"><img src=\"https://img.shields.io/pypi/pyversions/mcp-ingest.svg?logo=python\" alt=\"Python Versions\"></a>\n  <a href=\"https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml\">\n    <img src=\"https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml/badge.svg?branch=master\" alt=\"CI\">\n  </a>\n  <a href=\"https://agent-matrix.github.io/matrix-hub/\"><img src=\"https://img.shields.io/static/v1?label=docs&message=mkdocs&color=blue&logo=mkdocs\" alt=\"Docs\"></a>\n  <a href=\"https://github.com/agent-matrix/mcp_ingest/blob/master/LICENSE\"><img src=\"https://img.shields.io/badge/License-Apache%202.0-blue\" alt=\"License: Apache-2.0\"></a>\n  <a href=\"https://github.com/ruslanmv/agent-generator\"><img src=\"https://img.shields.io/badge/Powered%20by-agent--generator-brightgreen\" alt=\"Powered by agent-generator\"></a>\n</p>\n\n\n---\n\n**`mcp-ingest`** is a tiny SDK + CLI that turns *any* MCP server/agent/tool into a **MatrixHub\u2011ready** artifact. It lets you:\n\n* **Discover** servers from a folder, Git repo, or ZIP \u2014 even whole registries (harvester).\n* **Describe** them offline \u2192 emit `manifest.json` + `index.json` (SSE normalized).\n* **Validate** in a sandbox or container (handshake, `ListTools`, one `CallTool`).\n* **Publish** to S3/GitHub Pages and **Register** to MatrixHub (`/catalog/install`).\n\nBuilt for **Python 3.11**, packaged for **PyPI**, with strict lint/type/test gates.\n\n> You can catalog **millions** of MCP candidates offline and install on demand per tenant/workspace \u2014 the fastest path to building **the Matrix** of interoperable agents and tools.\n\n---\n\n\n## Install\n\n```bash\npip install mcp-ingest\n```\n\nFor contributors:\n\n```bash\npython3.11 -m venv .venv\nsource .venv/bin/activate\npip install -U pip\npip install -e \".[dev,harvester]\"\n```\n\n---\n\n## Quickstart\n\n### SDK (authors)\n\n```python\nfrom mcp_ingest import describe, autoinstall\n\n# Generate manifest.json and index.json without running your server\npaths = describe(\n    name=\"watsonx-mcp\",\n    url=\"http://127.0.0.1:6288/sse\",\n    tools=[\"chat\"],\n    resources=[{\"uri\":\"file://server.py\",\"name\":\"source\"}],\n    description=\"Watsonx SSE server\",\n    version=\"0.1.0\",\n)\nprint(paths)  # {\"manifest_path\": \"./manifest.json\", \"index_path\": \"./index.json\"}\n\n# Optionally register into MatrixHub (idempotent)\nautoinstall(matrixhub_url=\"http://127.0.0.1:7300\")\n```\n\n### CLI (operators)\n\n```bash\n# Detect \u2192 describe (offline)\nmcp-ingest pack ./examples/watsonx --out dist/\n\n# Register later (POST /catalog/install)\nmcp-ingest register \\\n  --matrixhub http://127.0.0.1:7300 \\\n  --manifest dist/manifest.json\n```\n\n### Harvest a whole repo\n\nHarvest a multi-server repository (e.g., the official MCP servers collection):\n\n```bash\nmcp-ingest harvest-repo \\\n  https://github.com/modelcontextprotocol/servers/archive/refs/heads/main.zip \\\n  --out dist/servers\n```\n\nOutputs one `manifest.json` per detected server and a **repo-level `index.json`** that lists them all. Optionally `--publish s3://\u2026` and/or `--register`.\n\n---\n\n## MatrixHub integration\n\n* Preferred path: **`POST /catalog/install`** with the **inline manifest** (what `autoinstall()` and `mcp-ingest register` do).\n* **Idempotent** by design: HTTP **409** is treated as success; safe to re-run.\n* **SSE normalization**: we auto-fix URLs to end in `/sse` unless the manifest explicitly requests `/messages` or a different transport.\n\n**Deferred install**: You can *describe* millions of candidates offline, then install only when a tenant wants them.\n\n---\n\n## Transports\n\nMCP Ingest supports three server transports when building manifests:\n\n* **SSE** (default): URL is normalized to `/sse` if needed.\n* **STDIO**: provide an `exec` block with `cmd` (e.g. Node servers via `npx`).\n* **WS**: WebSocket endpoints are preserved as provided.\n\nExample STDIO snippet:\n\n```json\n{\n  \"mcp_registration\": {\n    \"server\": {\n      \"name\": \"filesystem\",\n      \"transport\": \"STDIO\",\n      \"exec\": { \"cmd\": [\"npx\",\"-y\",\"@modelcontextprotocol/server-filesystem\"] }\n    }\n  }\n}\n```\n\n---\n\n## Project layout\n\n```\nmcp_ingest/\n  __init__.py              # exports: describe(), autoinstall()\n  sdk.py                   # orchestrates describe/register\n  cli.py                   # detect/describe/register/pack/harvest-repo\n  emit/                    # manifest/index + optional MatrixHub adapters\n  register/                # MatrixHub /catalog/install + gateway fallback\n  utils/                   # sse/io/idempotency/jsonschema/ast_parse/fetch/git/temp\n  detect/                  # fastmcp, langchain, llamaindex, autogen, crewai, semantic_kernel, raw\n  validate/                # mcp_probe + sandbox (proc/container)\nservices/harvester/\n  app.py + routers + workers + discovery + store + clients\nexamples/watsonx/\n  server.py, manifest.json, index.json\n```\n\nMkDocs documentation lives under `docs/` (Material theme). CI builds lint/type/tests and wheels.\n\n---\n\n## Development\n\nUse the Makefile helpers:\n\n```bash\nmake help\nmake setup      # create .venv (Python 3.11)\nmake install    # install package + dev extras\nmake format     # black\nmake lint       # ruff\nmake typecheck  # mypy\nmake test       # pytest\nmake ci         # full gate (ruff+black+mypy+pytest)\nmake build      # sdist/wheel \u2192 dist/\n```\n\nLocal harvester API:\n\n```bash\nuvicorn services.harvester.app:app --reload\n# POST /jobs {\"mode\":\"harvest_repo\",\"source\":\"<git|zip|dir>\",\"options\":{}}\n```\n\n---\n\n## CI & Quality\n\n* **Ruff** (lint), **Black** (format), **Mypy** (types), **Pytest** (coverage)\n* GitHub Actions workflow in `.github/workflows/ci.yml`\n* Package is built and uploaded as CI artifact; PyPI publishing via Twine is supported.\n\n---\n\n## Security & Safety\n\n* Idempotent HTTP and retries with exponential backoff (409 \u2192 success).\n* Sandboxes (process & container) with timeouts and memory caps.\n* No secrets stored at rest; inject via environment only.\n* Logs are structured; per-job trace IDs in the harvester path.\n\n---\n\n## Roadmap\n\n* More detectors (framework coverage), stronger schema inference.\n* Richer validation (multiple tool calls, golden tests), SBOM/provenance by default.\n* Global public index shards (CDN) fed by the harvester.\n\n---\n\n## License\n\n`mcp-ingest` is licensed under the Apache License 2.0. See the `LICENSE` file for more details.\n\n\n---\n\n### Acknowledgements\n\nThis project is part of the **Agent\u2011Matrix** ecosystem and is inspired by the Model Context Protocol community work.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "SDK + CLI to describe, validate, and register MCP servers into MatrixHub",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "agents",
        " ingest",
        " matrixhub",
        " mcp",
        " model-context-protocol"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "811b347c5f7a79d7db5b82e4f455b02673f0dcbc14645dcd46bc4c87c68734ad",
                "md5": "c5616b762551bfe7c2e5ceef79c36244",
                "sha256": "aecb32d1160bea262020fce014c5d3124884e76e2cb6a985ccdb9cbabd7d64ed"
            },
            "downloads": -1,
            "filename": "mcp_ingest-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5616b762551bfe7c2e5ceef79c36244",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.11",
            "size": 68371,
            "upload_time": "2025-08-10T15:36:16",
            "upload_time_iso_8601": "2025-08-10T15:36:16.610164Z",
            "url": "https://files.pythonhosted.org/packages/81/1b/347c5f7a79d7db5b82e4f455b02673f0dcbc14645dcd46bc4c87c68734ad/mcp_ingest-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "936b1862c27422c69455a07e1009f91096f31e7d09cd77b5f6be3a89496fdfcc",
                "md5": "829cba8a86c5ac792269e62ff53c3c6b",
                "sha256": "658ef648cf0c50482e684392917486f4aec1ae8b4230fe968b83fbd9c779e1b8"
            },
            "downloads": -1,
            "filename": "mcp_ingest-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "829cba8a86c5ac792269e62ff53c3c6b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.11",
            "size": 72871,
            "upload_time": "2025-08-10T15:36:18",
            "upload_time_iso_8601": "2025-08-10T15:36:18.123907Z",
            "url": "https://files.pythonhosted.org/packages/93/6b/1862c27422c69455a07e1009f91096f31e7d09cd77b5f6be3a89496fdfcc/mcp_ingest-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-10 15:36:18",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "mcp-ingest"
}
        
Elapsed time: 1.26334s