Name | gx-mcp-server JSON |
Version |
1.0.0
JSON |
| download |
home_page | None |
Summary | Expose Great Expectations data-quality checks via MCP |
upload_time | 2025-07-24 10:52:36 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.11 |
license | MIT License
Copyright (c) 2025 David Front
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. |
keywords |
data validation
great-expectations
mcp
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Great Expectations MCP Server
> Expose Great Expectations data-quality checks as MCP tools for LLM agents.
[](https://pypi.org/project/gx-mcp-server)
[](https://pypi.org/project/gx-mcp-server)
[](https://hub.docker.com/r/davidf9999/gx-mcp-server)
[](LICENSE)
[](https://github.com/davidf9999/gx-mcp-server/actions/workflows/ci.yaml)
[](https://github.com/davidf9999/gx-mcp-server/actions/workflows/publish.yaml)
## Motivation
Large Language Model (LLM) agents often need to interact with and validate data. Great Expectations is a powerful open-source tool for data quality, but it's not natively accessible to LLM agents. This server bridges that gap by exposing core Great Expectations functionality through the Model Context Protocol (MCP), allowing agents to:
- Programmatically load datasets from various sources.
- Define data quality rules (Expectations) on the fly.
- Run validation checks and interpret the results.
- Integrate robust data quality checks into their automated workflows.
## TL;DR
- **Install:** `just install`
- **Run server:** `just serve`
- **Try examples:** `just run-examples`
- **Test:** `just test`
- **Lint and type-check:** `just ci`
- **Default CSV limit:** 50 MB (`MCP_CSV_SIZE_LIMIT_MB` to change)
## Features
- Load CSV data from file, URL, or inline (up to 1 GB, configurable)
- Load tables from Snowflake or BigQuery using URI prefixes
- Define and modify ExpectationSuites (profiler flag is **deprecated**)
- Validate data and fetch detailed results (sync or async)
- Choose **in-memory** (default) or **SQLite** storage for datasets & results
- Optional **Basic** or **Bearer** token authentication for HTTP clients
- Configure **HTTP rate limiting** per minute
- Restrict origins with `--allowed-origins`
- **Prometheus** metrics on `--metrics-port`
- **OpenTelemetry** tracing via `--trace` (OTLP exporter)
- Multiple transport modes: **STDIO**, **HTTP**, **Inspector (GUI)**
## Quickstart
```bash
just install
cp .env.example .env # optional: add your OpenAI API key
just run-examples
```
## Usage
**Help**
```bash
uv run python -m gx_mcp_server --help
```
**STDIO mode** (default for AI clients):
```bash
uv run python -m gx_mcp_server
```
**HTTP mode** (for web / API clients):
```bash
just serve
# Add basic auth
uv run python -m gx_mcp_server --http --basic-auth user:pass
# Add rate limiting
uv run python -m gx_mcp_server --http --rate-limit 30
```
**Inspector GUI** (development):
```bash
uv run python -m gx_mcp_server --inspect
# Then in another shell:
npx @modelcontextprotocol/inspector
```
## Configuring Maximum CSV File Size
Default limit is **50 MB**. Override via environment variable:
```bash
export MCP_CSV_SIZE_LIMIT_MB=200 # 1–1024 MB allowed
just serve
```
## Warehouse Connectors
Install extras:
```bash
uv pip install -e .[snowflake]
uv pip install -e .[bigquery]
```
Use URI prefixes:
```python
load_dataset("snowflake://user:pass@account/db/schema/table?warehouse=WH")
load_dataset("bigquery://project/dataset/table")
```
`load_dataset` automatically detects these prefixes and delegates to the appropriate connector.
## Metrics and Tracing
- Prometheus metrics endpoint: `http://localhost:9090/metrics`
- OpenTelemetry: `uv run python -m gx_mcp_server --http --trace`
## Docker
Build and run the server in Docker:
```bash
# Build the production image
just docker-build
# Run the server
just docker-run
```
The server will be available at `http://localhost:8000`.
For development, you can build a development image that includes test dependencies and run tests or examples:
```bash
# Build the development image
just docker-build-dev
# Run tests
just docker-test
# Run examples (requires OPENAI_API_KEY in .env file)
just docker-run-examples
```
## Development
### Quickstart
```bash
just install
cp .env.example .env # optional: add your OpenAI API key
just run-examples
```
## Telemetry
Great Expectations sends anonymous usage data to `posthog.greatexpectations.io` by default. Disable:
```bash
export GX_ANALYTICS_ENABLED=false
```
## Current Limitations
- Stores last 100 datasets / results only
- Concurrency is **in-process** (`asyncio`) – no external queue
- Expect API evolution while the project stabilises
## Security
- Run behind a reverse proxy (Nginx, Caddy, cloud LB) in production
- Supply `--ssl-certfile` / `--ssl-keyfile` only if the proxy cannot terminate TLS
- Anonymous sessions use UUIDv4; persistent apps should use `secrets.token_urlsafe(32)`
## Project Roadmap
See [ROADMAP-v2.md](ROADMAP-v2.md) for upcoming sprints.
## License & Contributing
MIT License – see [CONTRIBUTING.md](CONTRIBUTING.md) for how to help!
## Author
David Front – dfront@gmail.com | GitHub: [davidf9999](https://github.com/davidf9999)
Raw data
{
"_id": null,
"home_page": null,
"name": "gx-mcp-server",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "data validation, great-expectations, mcp",
"author": null,
"author_email": "David Front <dfront@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/60/84/3fce40e97df3e50a44971facbd5df8132d63e3cda43e2d4e4df0a07081aa/gx_mcp_server-1.0.0.tar.gz",
"platform": null,
"description": "# Great Expectations MCP Server\n\n> Expose Great Expectations data-quality checks as MCP tools for LLM agents.\n\n[](https://pypi.org/project/gx-mcp-server) \n[](https://pypi.org/project/gx-mcp-server) \n[](https://hub.docker.com/r/davidf9999/gx-mcp-server) \n[](LICENSE) \n[](https://github.com/davidf9999/gx-mcp-server/actions/workflows/ci.yaml) \n[](https://github.com/davidf9999/gx-mcp-server/actions/workflows/publish.yaml)\n\n## Motivation\n\nLarge Language Model (LLM) agents often need to interact with and validate data. Great Expectations is a powerful open-source tool for data quality, but it's not natively accessible to LLM agents. This server bridges that gap by exposing core Great Expectations functionality through the Model Context Protocol (MCP), allowing agents to:\n\n- Programmatically load datasets from various sources.\n- Define data quality rules (Expectations) on the fly.\n- Run validation checks and interpret the results.\n- Integrate robust data quality checks into their automated workflows.\n\n## TL;DR\n\n- **Install:** `just install`\n- **Run server:** `just serve`\n- **Try examples:** `just run-examples`\n- **Test:** `just test`\n- **Lint and type-check:** `just ci`\n- **Default CSV limit:** 50 MB (`MCP_CSV_SIZE_LIMIT_MB` to change)\n\n## Features\n\n- Load CSV data from file, URL, or inline (up to 1 GB, configurable)\n- Load tables from Snowflake or BigQuery using URI prefixes\n- Define and modify ExpectationSuites (profiler flag is **deprecated**)\n- Validate data and fetch detailed results (sync or async)\n- Choose **in-memory** (default) or **SQLite** storage for datasets & results\n- Optional **Basic** or **Bearer** token authentication for HTTP clients\n- Configure **HTTP rate limiting** per minute\n- Restrict origins with `--allowed-origins`\n- **Prometheus** metrics on `--metrics-port`\n- **OpenTelemetry** tracing via `--trace` (OTLP exporter)\n- Multiple transport modes: **STDIO**, **HTTP**, **Inspector (GUI)**\n\n## Quickstart\n\n```bash\njust install\ncp .env.example .env # optional: add your OpenAI API key\njust run-examples\n```\n\n## Usage\n\n\n**Help**\n```bash\nuv run python -m gx_mcp_server --help\n```\n\n**STDIO mode** (default for AI clients):\n```bash\nuv run python -m gx_mcp_server\n```\n\n**HTTP mode** (for web / API clients):\n```bash\njust serve\n# Add basic auth\nuv run python -m gx_mcp_server --http --basic-auth user:pass\n# Add rate limiting\nuv run python -m gx_mcp_server --http --rate-limit 30\n```\n\n**Inspector GUI** (development):\n```bash\nuv run python -m gx_mcp_server --inspect\n# Then in another shell:\nnpx @modelcontextprotocol/inspector\n```\n\n## Configuring Maximum CSV File Size\n\nDefault limit is **50 MB**. Override via environment variable:\n```bash\nexport MCP_CSV_SIZE_LIMIT_MB=200 # 1\u20131024 MB allowed\njust serve\n```\n\n## Warehouse Connectors\n\nInstall extras:\n```bash\nuv pip install -e .[snowflake]\nuv pip install -e .[bigquery]\n```\n\nUse URI prefixes:\n```python\nload_dataset(\"snowflake://user:pass@account/db/schema/table?warehouse=WH\")\nload_dataset(\"bigquery://project/dataset/table\")\n```\n`load_dataset` automatically detects these prefixes and delegates to the appropriate connector.\n\n## Metrics and Tracing\n\n- Prometheus metrics endpoint: `http://localhost:9090/metrics`\n- OpenTelemetry: `uv run python -m gx_mcp_server --http --trace`\n\n## Docker\n\nBuild and run the server in Docker:\n\n```bash\n# Build the production image\njust docker-build\n\n# Run the server\njust docker-run\n```\n\nThe server will be available at `http://localhost:8000`.\n\nFor development, you can build a development image that includes test dependencies and run tests or examples:\n\n```bash\n# Build the development image\njust docker-build-dev\n\n# Run tests\njust docker-test\n\n# Run examples (requires OPENAI_API_KEY in .env file)\njust docker-run-examples\n```\n\n## Development\n\n### Quickstart\n\n```bash\njust install\ncp .env.example .env # optional: add your OpenAI API key\njust run-examples\n```\n\n\n## Telemetry\n\nGreat Expectations sends anonymous usage data to `posthog.greatexpectations.io` by default. Disable:\n```bash\nexport GX_ANALYTICS_ENABLED=false\n```\n\n## Current Limitations\n\n- Stores last 100 datasets / results only\n- Concurrency is **in-process** (`asyncio`) \u2013 no external queue\n- Expect API evolution while the project stabilises\n\n## Security\n\n- Run behind a reverse proxy (Nginx, Caddy, cloud LB) in production\n- Supply `--ssl-certfile` / `--ssl-keyfile` only if the proxy cannot terminate TLS\n- Anonymous sessions use UUIDv4; persistent apps should use `secrets.token_urlsafe(32)`\n\n## Project Roadmap\n\nSee [ROADMAP-v2.md](ROADMAP-v2.md) for upcoming sprints.\n\n## License & Contributing\n\nMIT License \u2013 see [CONTRIBUTING.md](CONTRIBUTING.md) for how to help!\n\n## Author\n\nDavid Front \u2013 dfront@gmail.com | GitHub: [davidf9999](https://github.com/davidf9999)",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2025 David Front\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.",
"summary": "Expose Great Expectations data-quality checks via MCP",
"version": "1.0.0",
"project_urls": {
"Documentation": "https://github.com/dfront/gx-mcp-server#readme",
"Homepage": "https://github.com/dfront/gx-mcp-server",
"Issues": "https://github.com/dfront/gx-mcp-server/issues",
"Repository": "https://github.com/dfront/gx-mcp-server"
},
"split_keywords": [
"data validation",
" great-expectations",
" mcp"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3517a32e33b18aa2491fa09e0e0b09580e6fd821e8863b06d9e779556c3aff9c",
"md5": "e604ec927a3c65d13d6fd0f9a0b070bc",
"sha256": "a9c840d7e82a8b0f89fc5d2686b9ead735576bb20e176da1484a35b83c11726f"
},
"downloads": -1,
"filename": "gx_mcp_server-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e604ec927a3c65d13d6fd0f9a0b070bc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 26220,
"upload_time": "2025-07-24T10:52:34",
"upload_time_iso_8601": "2025-07-24T10:52:34.774882Z",
"url": "https://files.pythonhosted.org/packages/35/17/a32e33b18aa2491fa09e0e0b09580e6fd821e8863b06d9e779556c3aff9c/gx_mcp_server-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "60843fce40e97df3e50a44971facbd5df8132d63e3cda43e2d4e4df0a07081aa",
"md5": "4adfca4a97a777b2a17ab69bbdc21e93",
"sha256": "de3df974f75c06af03b277cd027036fff494d3437284afbd903800e4fc898982"
},
"downloads": -1,
"filename": "gx_mcp_server-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "4adfca4a97a777b2a17ab69bbdc21e93",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 40697,
"upload_time": "2025-07-24T10:52:36",
"upload_time_iso_8601": "2025-07-24T10:52:36.288499Z",
"url": "https://files.pythonhosted.org/packages/60/84/3fce40e97df3e50a44971facbd5df8132d63e3cda43e2d4e4df0a07081aa/gx_mcp_server-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-24 10:52:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dfront",
"github_project": "gx-mcp-server#readme",
"github_not_found": true,
"lcname": "gx-mcp-server"
}