# 🦾 MCP Ingest
*Discover, describe, and register AI agents with ease.*
<p align="center">
<a href="https://pypi.org/project/mcp-ingest/"><img src="https://img.shields.io/pypi/v/mcp-ingest?color=blue" alt="PyPI"></a>
<a href="https://pypi.org/project/mcp-ingest/"><img src="https://img.shields.io/pypi/pyversions/mcp-ingest.svg?logo=python" alt="Python Versions"></a>
<a href="https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml">
<img src="https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml/badge.svg?branch=master" alt="CI">
</a>
<a href="https://agent-matrix.github.io/matrix-hub/"><img src="https://img.shields.io/static/v1?label=docs&message=mkdocs&color=blue&logo=mkdocs" alt="Docs"></a>
<a href="https://github.com/agent-matrix/mcp_ingest/blob/master/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue" alt="License: Apache-2.0"></a>
<a href="https://github.com/ruslanmv/agent-generator"><img src="https://img.shields.io/badge/Powered%20by-agent--generator-brightgreen" alt="Powered by agent-generator"></a>
</p>
---
**`mcp-ingest`** is a tiny SDK + CLI that turns *any* MCP server/agent/tool into a **MatrixHub‑ready** artifact. It lets you:
* **Discover** servers from a folder, Git repo, or ZIP — even whole registries (harvester).
* **Describe** them offline → emit `manifest.json` + `index.json` (SSE normalized).
* **Validate** in a sandbox or container (handshake, `ListTools`, one `CallTool`).
* **Publish** to S3/GitHub Pages and **Register** to MatrixHub (`/catalog/install`).
Built for **Python 3.11**, packaged for **PyPI**, with strict lint/type/test gates.
> You can catalog **millions** of MCP candidates offline and install on demand per tenant/workspace — the fastest path to building **the Matrix** of interoperable agents and tools.
---
## Install
```bash
pip install mcp-ingest
```
For contributors:
```bash
python3.11 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev,harvester]"
```
---
## Quickstart
### SDK (authors)
```python
from mcp_ingest import describe, autoinstall
# Generate manifest.json and index.json without running your server
paths = describe(
name="watsonx-mcp",
url="http://127.0.0.1:6288/sse",
tools=["chat"],
resources=[{"uri":"file://server.py","name":"source"}],
description="Watsonx SSE server",
version="0.1.0",
)
print(paths) # {"manifest_path": "./manifest.json", "index_path": "./index.json"}
# Optionally register into MatrixHub (idempotent)
autoinstall(matrixhub_url="http://127.0.0.1:7300")
```
### CLI (operators)
```bash
# Detect → describe (offline)
mcp-ingest pack ./examples/watsonx --out dist/
# Register later (POST /catalog/install)
mcp-ingest register \
--matrixhub http://127.0.0.1:7300 \
--manifest dist/manifest.json
```
### Harvest a whole repo
Harvest a multi-server repository (e.g., the official MCP servers collection):
```bash
mcp-ingest harvest-repo \
https://github.com/modelcontextprotocol/servers/archive/refs/heads/main.zip \
--out dist/servers
```
Outputs one `manifest.json` per detected server and a **repo-level `index.json`** that lists them all. Optionally `--publish s3://…` and/or `--register`.
---
## MatrixHub integration
* Preferred path: **`POST /catalog/install`** with the **inline manifest** (what `autoinstall()` and `mcp-ingest register` do).
* **Idempotent** by design: HTTP **409** is treated as success; safe to re-run.
* **SSE normalization**: we auto-fix URLs to end in `/sse` unless the manifest explicitly requests `/messages` or a different transport.
**Deferred install**: You can *describe* millions of candidates offline, then install only when a tenant wants them.
---
## Transports
MCP Ingest supports three server transports when building manifests:
* **SSE** (default): URL is normalized to `/sse` if needed.
* **STDIO**: provide an `exec` block with `cmd` (e.g. Node servers via `npx`).
* **WS**: WebSocket endpoints are preserved as provided.
Example STDIO snippet:
```json
{
"mcp_registration": {
"server": {
"name": "filesystem",
"transport": "STDIO",
"exec": { "cmd": ["npx","-y","@modelcontextprotocol/server-filesystem"] }
}
}
}
```
---
## Project layout
```
mcp_ingest/
__init__.py # exports: describe(), autoinstall()
sdk.py # orchestrates describe/register
cli.py # detect/describe/register/pack/harvest-repo
emit/ # manifest/index + optional MatrixHub adapters
register/ # MatrixHub /catalog/install + gateway fallback
utils/ # sse/io/idempotency/jsonschema/ast_parse/fetch/git/temp
detect/ # fastmcp, langchain, llamaindex, autogen, crewai, semantic_kernel, raw
validate/ # mcp_probe + sandbox (proc/container)
services/harvester/
app.py + routers + workers + discovery + store + clients
examples/watsonx/
server.py, manifest.json, index.json
```
MkDocs documentation lives under `docs/` (Material theme). CI builds lint/type/tests and wheels.
---
## Development
Use the Makefile helpers:
```bash
make help
make setup # create .venv (Python 3.11)
make install # install package + dev extras
make format # black
make lint # ruff
make typecheck # mypy
make test # pytest
make ci # full gate (ruff+black+mypy+pytest)
make build # sdist/wheel → dist/
```
Local harvester API:
```bash
uvicorn services.harvester.app:app --reload
# POST /jobs {"mode":"harvest_repo","source":"<git|zip|dir>","options":{}}
```
---
## CI & Quality
* **Ruff** (lint), **Black** (format), **Mypy** (types), **Pytest** (coverage)
* GitHub Actions workflow in `.github/workflows/ci.yml`
* Package is built and uploaded as CI artifact; PyPI publishing via Twine is supported.
---
## Security & Safety
* Idempotent HTTP and retries with exponential backoff (409 → success).
* Sandboxes (process & container) with timeouts and memory caps.
* No secrets stored at rest; inject via environment only.
* Logs are structured; per-job trace IDs in the harvester path.
---
## Roadmap
* More detectors (framework coverage), stronger schema inference.
* Richer validation (multiple tool calls, golden tests), SBOM/provenance by default.
* Global public index shards (CDN) fed by the harvester.
---
## License
`mcp-ingest` is licensed under the Apache License 2.0. See the `LICENSE` file for more details.
---
### Acknowledgements
This project is part of the **Agent‑Matrix** ecosystem and is inspired by the Model Context Protocol community work.
Raw data
{
"_id": null,
"home_page": null,
"name": "mcp-ingest",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.11",
"maintainer_email": null,
"keywords": "agents, ingest, matrixhub, mcp, model-context-protocol",
"author": "MatrixHub Contributors",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/93/6b/1862c27422c69455a07e1009f91096f31e7d09cd77b5f6be3a89496fdfcc/mcp_ingest-0.1.0.tar.gz",
"platform": null,
"description": "# \ud83e\uddbe MCP Ingest\n\n*Discover, describe, and register AI agents with ease.*\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/mcp-ingest/\"><img src=\"https://img.shields.io/pypi/v/mcp-ingest?color=blue\" alt=\"PyPI\"></a>\n <a href=\"https://pypi.org/project/mcp-ingest/\"><img src=\"https://img.shields.io/pypi/pyversions/mcp-ingest.svg?logo=python\" alt=\"Python Versions\"></a>\n <a href=\"https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml\">\n <img src=\"https://github.com/agent-matrix/mcp_ingest/actions/workflows/ci.yml/badge.svg?branch=master\" alt=\"CI\">\n </a>\n <a href=\"https://agent-matrix.github.io/matrix-hub/\"><img src=\"https://img.shields.io/static/v1?label=docs&message=mkdocs&color=blue&logo=mkdocs\" alt=\"Docs\"></a>\n <a href=\"https://github.com/agent-matrix/mcp_ingest/blob/master/LICENSE\"><img src=\"https://img.shields.io/badge/License-Apache%202.0-blue\" alt=\"License: Apache-2.0\"></a>\n <a href=\"https://github.com/ruslanmv/agent-generator\"><img src=\"https://img.shields.io/badge/Powered%20by-agent--generator-brightgreen\" alt=\"Powered by agent-generator\"></a>\n</p>\n\n\n---\n\n**`mcp-ingest`** is a tiny SDK + CLI that turns *any* MCP server/agent/tool into a **MatrixHub\u2011ready** artifact. It lets you:\n\n* **Discover** servers from a folder, Git repo, or ZIP \u2014 even whole registries (harvester).\n* **Describe** them offline \u2192 emit `manifest.json` + `index.json` (SSE normalized).\n* **Validate** in a sandbox or container (handshake, `ListTools`, one `CallTool`).\n* **Publish** to S3/GitHub Pages and **Register** to MatrixHub (`/catalog/install`).\n\nBuilt for **Python 3.11**, packaged for **PyPI**, with strict lint/type/test gates.\n\n> You can catalog **millions** of MCP candidates offline and install on demand per tenant/workspace \u2014 the fastest path to building **the Matrix** of interoperable agents and tools.\n\n---\n\n\n## Install\n\n```bash\npip install mcp-ingest\n```\n\nFor contributors:\n\n```bash\npython3.11 -m venv .venv\nsource .venv/bin/activate\npip install -U pip\npip install -e \".[dev,harvester]\"\n```\n\n---\n\n## Quickstart\n\n### SDK (authors)\n\n```python\nfrom mcp_ingest import describe, autoinstall\n\n# Generate manifest.json and index.json without running your server\npaths = describe(\n name=\"watsonx-mcp\",\n url=\"http://127.0.0.1:6288/sse\",\n tools=[\"chat\"],\n resources=[{\"uri\":\"file://server.py\",\"name\":\"source\"}],\n description=\"Watsonx SSE server\",\n version=\"0.1.0\",\n)\nprint(paths) # {\"manifest_path\": \"./manifest.json\", \"index_path\": \"./index.json\"}\n\n# Optionally register into MatrixHub (idempotent)\nautoinstall(matrixhub_url=\"http://127.0.0.1:7300\")\n```\n\n### CLI (operators)\n\n```bash\n# Detect \u2192 describe (offline)\nmcp-ingest pack ./examples/watsonx --out dist/\n\n# Register later (POST /catalog/install)\nmcp-ingest register \\\n --matrixhub http://127.0.0.1:7300 \\\n --manifest dist/manifest.json\n```\n\n### Harvest a whole repo\n\nHarvest a multi-server repository (e.g., the official MCP servers collection):\n\n```bash\nmcp-ingest harvest-repo \\\n https://github.com/modelcontextprotocol/servers/archive/refs/heads/main.zip \\\n --out dist/servers\n```\n\nOutputs one `manifest.json` per detected server and a **repo-level `index.json`** that lists them all. Optionally `--publish s3://\u2026` and/or `--register`.\n\n---\n\n## MatrixHub integration\n\n* Preferred path: **`POST /catalog/install`** with the **inline manifest** (what `autoinstall()` and `mcp-ingest register` do).\n* **Idempotent** by design: HTTP **409** is treated as success; safe to re-run.\n* **SSE normalization**: we auto-fix URLs to end in `/sse` unless the manifest explicitly requests `/messages` or a different transport.\n\n**Deferred install**: You can *describe* millions of candidates offline, then install only when a tenant wants them.\n\n---\n\n## Transports\n\nMCP Ingest supports three server transports when building manifests:\n\n* **SSE** (default): URL is normalized to `/sse` if needed.\n* **STDIO**: provide an `exec` block with `cmd` (e.g. Node servers via `npx`).\n* **WS**: WebSocket endpoints are preserved as provided.\n\nExample STDIO snippet:\n\n```json\n{\n \"mcp_registration\": {\n \"server\": {\n \"name\": \"filesystem\",\n \"transport\": \"STDIO\",\n \"exec\": { \"cmd\": [\"npx\",\"-y\",\"@modelcontextprotocol/server-filesystem\"] }\n }\n }\n}\n```\n\n---\n\n## Project layout\n\n```\nmcp_ingest/\n __init__.py # exports: describe(), autoinstall()\n sdk.py # orchestrates describe/register\n cli.py # detect/describe/register/pack/harvest-repo\n emit/ # manifest/index + optional MatrixHub adapters\n register/ # MatrixHub /catalog/install + gateway fallback\n utils/ # sse/io/idempotency/jsonschema/ast_parse/fetch/git/temp\n detect/ # fastmcp, langchain, llamaindex, autogen, crewai, semantic_kernel, raw\n validate/ # mcp_probe + sandbox (proc/container)\nservices/harvester/\n app.py + routers + workers + discovery + store + clients\nexamples/watsonx/\n server.py, manifest.json, index.json\n```\n\nMkDocs documentation lives under `docs/` (Material theme). CI builds lint/type/tests and wheels.\n\n---\n\n## Development\n\nUse the Makefile helpers:\n\n```bash\nmake help\nmake setup # create .venv (Python 3.11)\nmake install # install package + dev extras\nmake format # black\nmake lint # ruff\nmake typecheck # mypy\nmake test # pytest\nmake ci # full gate (ruff+black+mypy+pytest)\nmake build # sdist/wheel \u2192 dist/\n```\n\nLocal harvester API:\n\n```bash\nuvicorn services.harvester.app:app --reload\n# POST /jobs {\"mode\":\"harvest_repo\",\"source\":\"<git|zip|dir>\",\"options\":{}}\n```\n\n---\n\n## CI & Quality\n\n* **Ruff** (lint), **Black** (format), **Mypy** (types), **Pytest** (coverage)\n* GitHub Actions workflow in `.github/workflows/ci.yml`\n* Package is built and uploaded as CI artifact; PyPI publishing via Twine is supported.\n\n---\n\n## Security & Safety\n\n* Idempotent HTTP and retries with exponential backoff (409 \u2192 success).\n* Sandboxes (process & container) with timeouts and memory caps.\n* No secrets stored at rest; inject via environment only.\n* Logs are structured; per-job trace IDs in the harvester path.\n\n---\n\n## Roadmap\n\n* More detectors (framework coverage), stronger schema inference.\n* Richer validation (multiple tool calls, golden tests), SBOM/provenance by default.\n* Global public index shards (CDN) fed by the harvester.\n\n---\n\n## License\n\n`mcp-ingest` is licensed under the Apache License 2.0. See the `LICENSE` file for more details.\n\n\n---\n\n### Acknowledgements\n\nThis project is part of the **Agent\u2011Matrix** ecosystem and is inspired by the Model Context Protocol community work.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "SDK + CLI to describe, validate, and register MCP servers into MatrixHub",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [
"agents",
" ingest",
" matrixhub",
" mcp",
" model-context-protocol"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "811b347c5f7a79d7db5b82e4f455b02673f0dcbc14645dcd46bc4c87c68734ad",
"md5": "c5616b762551bfe7c2e5ceef79c36244",
"sha256": "aecb32d1160bea262020fce014c5d3124884e76e2cb6a985ccdb9cbabd7d64ed"
},
"downloads": -1,
"filename": "mcp_ingest-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c5616b762551bfe7c2e5ceef79c36244",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.11",
"size": 68371,
"upload_time": "2025-08-10T15:36:16",
"upload_time_iso_8601": "2025-08-10T15:36:16.610164Z",
"url": "https://files.pythonhosted.org/packages/81/1b/347c5f7a79d7db5b82e4f455b02673f0dcbc14645dcd46bc4c87c68734ad/mcp_ingest-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "936b1862c27422c69455a07e1009f91096f31e7d09cd77b5f6be3a89496fdfcc",
"md5": "829cba8a86c5ac792269e62ff53c3c6b",
"sha256": "658ef648cf0c50482e684392917486f4aec1ae8b4230fe968b83fbd9c779e1b8"
},
"downloads": -1,
"filename": "mcp_ingest-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "829cba8a86c5ac792269e62ff53c3c6b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.11",
"size": 72871,
"upload_time": "2025-08-10T15:36:18",
"upload_time_iso_8601": "2025-08-10T15:36:18.123907Z",
"url": "https://files.pythonhosted.org/packages/93/6b/1862c27422c69455a07e1009f91096f31e7d09cd77b5f6be3a89496fdfcc/mcp_ingest-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-10 15:36:18",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "mcp-ingest"
}