databeak


Namedatabeak JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryDataBeak: MCP server for comprehensive CSV file operations with pandas-based tools
upload_time2025-10-07 13:20:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseApache-2.0
keywords csv data-analysis data-manipulation data-profiling data-quality data-validation fastmcp mcp model-context-protocol outlier-detection pandas
VCS
bugtrack_url
requirements fastmcp pandas numpy pydantic aiofiles python-dateutil httpx openpyxl pyarrow tabulate numexpr bottleneck pytz
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DataBeak

[![Tests](https://github.com/jonpspri/databeak/actions/workflows/test.yml/badge.svg)](https://github.com/jonpspri/databeak/actions/workflows/test.yml)
[![codecov](https://codecov.io/gh/jonpspri/databeak/branch/main/graph/badge.svg)](https://codecov.io/gh/jonpspri/databeak)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

## AI-Powered CSV Processing via Model Context Protocol

Transform how AI assistants work with CSV data. DataBeak provides 40+
specialized tools for data manipulation, analysis, and validation through the
Model Context Protocol (MCP).

## Features

- 🔄 **Complete Data Operations** - Load, transform, analyze, and export CSV data
- 📊 **Advanced Analytics** - Statistics, correlations, outlier detection, data
  profiling
- ✅ **Data Validation** - Schema validation, quality scoring, anomaly detection
- 🎯 **Stateless Design** - Clean MCP architecture with external context
  management
- âš¡ **High Performance** - Handles large datasets with streaming and chunking
- 🔒 **Session Management** - Multi-user support with isolated sessions
- 🌟 **Code Quality** - Zero ruff violations, 100% mypy compliance, perfect MCP
  documentation standards, comprehensive test coverage

## Getting Started

The fastest way to use DataBeak is with `uvx` (no installation required):

### For Claude Desktop

Add this to your MCP Settings file:

```json
{
  "mcpServers": {
    "databeak": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jonpspri/databeak.git",
        "databeak"
      ]
    }
  }
}
```

### For Other AI Clients

DataBeak works with Continue, Cline, Windsurf, and Zed. See the
[installation guide](https://jonpspri.github.io/databeak/installation) for
specific configuration examples.

### HTTP Mode (Advanced)

For HTTP-based AI clients or custom deployments:

```bash
# Run in HTTP mode
uv run databeak --transport http --host 0.0.0.0 --port 8000

# Access server at http://localhost:8000/mcp
# Health check at http://localhost:8000/health
```

### Quick Test

Once configured, ask your AI assistant:

```text
"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"
```

## Documentation

📚 **[Complete Documentation](https://jonpspri.github.io/databeak/)**

- [Installation Guide](https://jonpspri.github.io/databeak/installation) - Setup
  for all AI clients
- [Quick Start Tutorial](https://jonpspri.github.io/databeak/tutorials/quickstart)
  \- Learn in 10 minutes
- [API Reference](https://jonpspri.github.io/databeak/api/overview) - All 40+
  tools documented
- [Architecture](https://jonpspri.github.io/databeak/architecture) - Technical
  details

## Environment Variables

| Variable                    | Default | Description               |
| --------------------------- | ------- | ------------------------- |
| `DATABEAK_MAX_FILE_SIZE_MB` | 1024    | Maximum file size         |
| `DATABEAK_CSV_HISTORY_DIR`  | "."     | History storage location  |
| `DATABEAK_SESSION_TIMEOUT`  | 3600    | Session timeout (seconds) |

## Known Limitations

DataBeak is designed for interactive CSV processing with AI assistants. Be aware
of these constraints:

- **File Size**: Maximum 1024MB per file (configurable via
  `DATABEAK_MAX_FILE_SIZE_MB`)
- **Session Management**: Maximum 100 concurrent sessions, 1-hour timeout
  (configurable)
- **Memory**: Large datasets may require significant memory; monitor with
  `system_info` tool
- **CSV Dialects**: Assumes standard CSV format; complex dialects may require
  pre-processing
- **Concurrency**: Single-threaded processing per session; parallel sessions
  supported
- **Data Types**: Automatic type inference; complex types may need explicit
  conversion
- **URL Loading**: HTTPS only; blocks private networks (127.0.0.1, 192.168.x.x,
  10.x.x.x) for security

For production deployments with larger datasets, consider adjusting environment
variables and monitoring resource usage.

## Contributing

We welcome contributions! Please:

1. Fork the repository
1. Create a feature branch (`git checkout -b feature/amazing-feature`)
1. Make your changes with tests
1. Run quality checks: `uv run -m pytest`
1. Submit a pull request

**Note**: All changes must go through pull requests. Direct commits to `main`
are blocked by pre-commit hooks.

## Development

```bash
# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync

# Run the server locally
uv run databeak

# Run tests
uv run -m pytest tests/unit/          # Unit tests (primary)
uv run -m pytest                      # All tests

# Run quality checks
uv run ruff check
uv run mypy src/databeak/
```

### Testing Structure

DataBeak implements comprehensive unit and integration testing:

- **Unit Tests** (`tests/unit/`) - 940+ fast, isolated module tests
- **Integration Tests** (`tests/integration/`) - 43 FastMCP Client-based
  protocol tests across 7 test files
- **E2E Tests** (`tests/e2e/`) - Planned: Complete workflow validation

**Test Execution:**

```bash
uv run pytest -n auto tests/unit/          # Run unit tests (940+ tests)
uv run pytest -n auto tests/integration/   # Run integration tests (43 tests)
uv run pytest -n auto --cov=src/databeak   # Run with coverage analysis
```

See [Testing Guide](tests/README.md) for comprehensive testing details.

## License

Apache 2.0 - see [LICENSE](LICENSE) file.

## Support

- **Issues**: [GitHub Issues](https://github.com/jonpspri/databeak/issues)
- **Discussions**:
  [GitHub Discussions](https://github.com/jonpspri/databeak/discussions)
- **Documentation**:
  [jonpspri.github.io/databeak](https://jonpspri.github.io/databeak/)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "databeak",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": "Jonathan Springer <jps@s390x.com>",
    "keywords": "csv, data-analysis, data-manipulation, data-profiling, data-quality, data-validation, fastmcp, mcp, model-context-protocol, outlier-detection, pandas",
    "author": null,
    "author_email": "Jonathan Springer <jps@s390x.com>",
    "download_url": "https://files.pythonhosted.org/packages/37/f2/6084a9e1165bf13846208427f2babe7b317ac6f6f5acd2eca7bf82e65a42/databeak-0.1.2.tar.gz",
    "platform": null,
    "description": "# DataBeak\n\n[![Tests](https://github.com/jonpspri/databeak/actions/workflows/test.yml/badge.svg)](https://github.com/jonpspri/databeak/actions/workflows/test.yml)\n[![codecov](https://codecov.io/gh/jonpspri/databeak/branch/main/graph/badge.svg)](https://codecov.io/gh/jonpspri/databeak)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n\n## AI-Powered CSV Processing via Model Context Protocol\n\nTransform how AI assistants work with CSV data. DataBeak provides 40+\nspecialized tools for data manipulation, analysis, and validation through the\nModel Context Protocol (MCP).\n\n## Features\n\n- \ud83d\udd04 **Complete Data Operations** - Load, transform, analyze, and export CSV data\n- \ud83d\udcca **Advanced Analytics** - Statistics, correlations, outlier detection, data\n  profiling\n- \u2705 **Data Validation** - Schema validation, quality scoring, anomaly detection\n- \ud83c\udfaf **Stateless Design** - Clean MCP architecture with external context\n  management\n- \u26a1 **High Performance** - Handles large datasets with streaming and chunking\n- \ud83d\udd12 **Session Management** - Multi-user support with isolated sessions\n- \ud83c\udf1f **Code Quality** - Zero ruff violations, 100% mypy compliance, perfect MCP\n  documentation standards, comprehensive test coverage\n\n## Getting Started\n\nThe fastest way to use DataBeak is with `uvx` (no installation required):\n\n### For Claude Desktop\n\nAdd this to your MCP Settings file:\n\n```json\n{\n  \"mcpServers\": {\n    \"databeak\": {\n      \"command\": \"uvx\",\n      \"args\": [\n        \"--from\",\n        \"git+https://github.com/jonpspri/databeak.git\",\n        \"databeak\"\n      ]\n    }\n  }\n}\n```\n\n### For Other AI Clients\n\nDataBeak works with Continue, Cline, Windsurf, and Zed. See the\n[installation guide](https://jonpspri.github.io/databeak/installation) for\nspecific configuration examples.\n\n### HTTP Mode (Advanced)\n\nFor HTTP-based AI clients or custom deployments:\n\n```bash\n# Run in HTTP mode\nuv run databeak --transport http --host 0.0.0.0 --port 8000\n\n# Access server at http://localhost:8000/mcp\n# Health check at http://localhost:8000/health\n```\n\n### Quick Test\n\nOnce configured, ask your AI assistant:\n\n```text\n\"Load a CSV file and show me basic statistics\"\n\"Remove duplicate rows and export as Excel\"\n\"Find outliers in the price column\"\n```\n\n## Documentation\n\n\ud83d\udcda **[Complete Documentation](https://jonpspri.github.io/databeak/)**\n\n- [Installation Guide](https://jonpspri.github.io/databeak/installation) - Setup\n  for all AI clients\n- [Quick Start Tutorial](https://jonpspri.github.io/databeak/tutorials/quickstart)\n  \\- Learn in 10 minutes\n- [API Reference](https://jonpspri.github.io/databeak/api/overview) - All 40+\n  tools documented\n- [Architecture](https://jonpspri.github.io/databeak/architecture) - Technical\n  details\n\n## Environment Variables\n\n| Variable                    | Default | Description               |\n| --------------------------- | ------- | ------------------------- |\n| `DATABEAK_MAX_FILE_SIZE_MB` | 1024    | Maximum file size         |\n| `DATABEAK_CSV_HISTORY_DIR`  | \".\"     | History storage location  |\n| `DATABEAK_SESSION_TIMEOUT`  | 3600    | Session timeout (seconds) |\n\n## Known Limitations\n\nDataBeak is designed for interactive CSV processing with AI assistants. Be aware\nof these constraints:\n\n- **File Size**: Maximum 1024MB per file (configurable via\n  `DATABEAK_MAX_FILE_SIZE_MB`)\n- **Session Management**: Maximum 100 concurrent sessions, 1-hour timeout\n  (configurable)\n- **Memory**: Large datasets may require significant memory; monitor with\n  `system_info` tool\n- **CSV Dialects**: Assumes standard CSV format; complex dialects may require\n  pre-processing\n- **Concurrency**: Single-threaded processing per session; parallel sessions\n  supported\n- **Data Types**: Automatic type inference; complex types may need explicit\n  conversion\n- **URL Loading**: HTTPS only; blocks private networks (127.0.0.1, 192.168.x.x,\n  10.x.x.x) for security\n\nFor production deployments with larger datasets, consider adjusting environment\nvariables and monitoring resource usage.\n\n## Contributing\n\nWe welcome contributions! Please:\n\n1. Fork the repository\n1. Create a feature branch (`git checkout -b feature/amazing-feature`)\n1. Make your changes with tests\n1. Run quality checks: `uv run -m pytest`\n1. Submit a pull request\n\n**Note**: All changes must go through pull requests. Direct commits to `main`\nare blocked by pre-commit hooks.\n\n## Development\n\n```bash\n# Setup development environment\ngit clone https://github.com/jonpspri/databeak.git\ncd databeak\nuv sync\n\n# Run the server locally\nuv run databeak\n\n# Run tests\nuv run -m pytest tests/unit/          # Unit tests (primary)\nuv run -m pytest                      # All tests\n\n# Run quality checks\nuv run ruff check\nuv run mypy src/databeak/\n```\n\n### Testing Structure\n\nDataBeak implements comprehensive unit and integration testing:\n\n- **Unit Tests** (`tests/unit/`) - 940+ fast, isolated module tests\n- **Integration Tests** (`tests/integration/`) - 43 FastMCP Client-based\n  protocol tests across 7 test files\n- **E2E Tests** (`tests/e2e/`) - Planned: Complete workflow validation\n\n**Test Execution:**\n\n```bash\nuv run pytest -n auto tests/unit/          # Run unit tests (940+ tests)\nuv run pytest -n auto tests/integration/   # Run integration tests (43 tests)\nuv run pytest -n auto --cov=src/databeak   # Run with coverage analysis\n```\n\nSee [Testing Guide](tests/README.md) for comprehensive testing details.\n\n## License\n\nApache 2.0 - see [LICENSE](LICENSE) file.\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/jonpspri/databeak/issues)\n- **Discussions**:\n  [GitHub Discussions](https://github.com/jonpspri/databeak/discussions)\n- **Documentation**:\n  [jonpspri.github.io/databeak](https://jonpspri.github.io/databeak/)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools",
    "version": "0.1.2",
    "project_urls": {
        "Changelog": "https://github.com/jonpspri/databeak/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/jonpspri/databeak#readme",
        "Homepage": "https://github.com/jonpspri/databeak",
        "Issues": "https://github.com/jonpspri/databeak/issues",
        "Release Notes": "https://github.com/jonpspri/databeak/releases",
        "Repository": "https://github.com/jonpspri/databeak"
    },
    "split_keywords": [
        "csv",
        " data-analysis",
        " data-manipulation",
        " data-profiling",
        " data-quality",
        " data-validation",
        " fastmcp",
        " mcp",
        " model-context-protocol",
        " outlier-detection",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5ee4a19f336544c1765f25fbedb0bbbbbbf17c5885075fb0ab1c379c33ea9ca4",
                "md5": "e280b5164ef80a2dcff52a240a034334",
                "sha256": "3e6d4afa6d0c25cc97148bfe8e94d18d4807439b65776d5eff38333af59d890d"
            },
            "downloads": -1,
            "filename": "databeak-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e280b5164ef80a2dcff52a240a034334",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 102138,
            "upload_time": "2025-10-07T13:20:42",
            "upload_time_iso_8601": "2025-10-07T13:20:42.030963Z",
            "url": "https://files.pythonhosted.org/packages/5e/e4/a19f336544c1765f25fbedb0bbbbbbf17c5885075fb0ab1c379c33ea9ca4/databeak-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "37f26084a9e1165bf13846208427f2babe7b317ac6f6f5acd2eca7bf82e65a42",
                "md5": "f1ba5f7f842fcaa6da2ad4005c0a2a7f",
                "sha256": "e21011067bc117dcbe1cfb4d0cf8cdbb641c4dfeb02c38785608d81074f6fa76"
            },
            "downloads": -1,
            "filename": "databeak-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f1ba5f7f842fcaa6da2ad4005c0a2a7f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 191378,
            "upload_time": "2025-10-07T13:20:43",
            "upload_time_iso_8601": "2025-10-07T13:20:43.257813Z",
            "url": "https://files.pythonhosted.org/packages/37/f2/6084a9e1165bf13846208427f2babe7b317ac6f6f5acd2eca7bf82e65a42/databeak-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-07 13:20:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonpspri",
    "github_project": "databeak",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "fastmcp",
            "specs": [
                [
                    ">=",
                    "2.11.3"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "2.1.3"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.10.4"
                ]
            ]
        },
        {
            "name": "aiofiles",
            "specs": [
                [
                    ">=",
                    "24.1.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    ">=",
                    "2.9.0"
                ]
            ]
        },
        {
            "name": "httpx",
            "specs": [
                [
                    ">=",
                    "0.27.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    ">=",
                    "17.0.0"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    ">=",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "numexpr",
            "specs": [
                [
                    ">=",
                    "2.10.0"
                ]
            ]
        },
        {
            "name": "bottleneck",
            "specs": [
                [
                    ">=",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    ">=",
                    "2024.2"
                ]
            ]
        }
    ],
    "lcname": "databeak"
}
        
Elapsed time: 0.87702s