crawl4ai-mcp-sse-stdio

Name	crawl4ai-mcp-sse-stdio JSON
Version	1.0.5 JSON
	download
home_page	None
Summary	MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction
upload_time	2025-09-11 22:58:21
maintainer	None
docs_url	None
author	stgmt
requires_python	>=3.9
license	MIT
keywords	mcp model-context-protocol crawl4ai web-scraping web-crawler data-extraction ai-tools llm-tools anthropic claude openai chatgpt async python sse server-sent-events stdio http webscraping beautifulsoup playwright selenium api automation
VCS
bugtrack_url
requirements	mcp crawl4ai starlette uvicorn httpx pydantic python-dotenv typing-extensions aiofiles
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # 🕷️ Crawl4AI MCP Server

[![PyPI version](https://badge.fury.io/py/crawl4ai-mcp.svg)](https://badge.fury.io/py/crawl4ai-mcp)
[![Python](https://img.shields.io/pypi/pyversions/crawl4ai-mcp.svg)](https://pypi.org/project/crawl4ai-mcp/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://pepy.tech/badge/crawl4ai-mcp)](https://pepy.tech/project/crawl4ai-mcp)

**MCP (Model Context Protocol) server for Crawl4AI** - Universal web crawling and data extraction for AI agents.

Integrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.

## 📑 Table of Contents

- [🎯 Why This Tool?](#-why-this-tool)
- [⚡ Quick Start](#-quick-start)
- [🚀 Features](#-features)
- [📦 Installation](#-installation)
- [🔧 Usage](#-usage)
- [🛠️ Available Tools](#️-available-tools)
- [🌐 Transport Protocols](#-transport-protocols)
- [⚙️ Configuration](#️-configuration)
- [🐳 Docker Support](#-docker-support)
- [🤝 Contributing](#-contributing)
- [📄 License](#-license)

## 🎯 Why This Tool?

### The Problem

- 🔴 **No MCP servers for web scraping** - AI agents can't access web content
- 🔴 **Complex scraping setup** - Crawl4AI requires custom integration
- 🔴 **Limited protocol support** - Most tools only support one transport
- 🔴 **Poor AI integration** - Existing scrapers aren't optimized for LLMs

### Our Solution  

- ✅ **First Crawl4AI MCP server** - Native MCP integration
- ✅ **All MCP transports** - STDIO, SSE, and HTTP support
- ✅ **AI-optimized extraction** - Clean markdown, structured data
- ✅ **One-line execution** - `crawl4ai-mcp --stdio`
- ✅ **Production ready** - Type hints, tests, Docker support

## ⚡ Quick Start

### One-Line Execution

```bash
# Install and run
pip install crawl4ai-mcp
crawl4ai-mcp --stdio
```

### With Claude Desktop

Add to your `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "crawl4ai": {
      "command": "crawl4ai-mcp",
      "args": ["--stdio"]
    }
  }
}
```

### With Any MCP Client

```bash
# STDIO mode (for CLI tools)
crawl4ai-mcp --stdio

# SSE mode (for web clients)  
crawl4ai-mcp --sse

# HTTP mode (for REST API)
crawl4ai-mcp --http
```

## 🚀 Features

### Core Capabilities

- 🌐 **Universal Web Scraping** - Extract content from any website
- 📝 **Markdown Conversion** - Clean, formatted markdown output
- 📸 **Screenshots** - Capture visual content
- 📄 **PDF Generation** - Save pages as PDF
- 🎭 **JavaScript Execution** - Interact with dynamic content
- 🔄 **Multiple Transports** - STDIO, SSE, HTTP protocols

### Why Choose crawl4ai-mcp?

| Feature | crawl4ai-mcp | Other Tools |
|---------|--------------|-------------|
| MCP Protocol Support | ✅ Full | ❌ None |
| Crawl4AI Integration | ✅ Native | ❌ Manual |
| Transport Protocols | ✅ All 3 | ⚠️ Usually 1 |
| AI Optimization | ✅ Built-in | ❌ Generic |
| Production Ready | ✅ Yes | ⚠️ Varies |
| Docker Support | ✅ Yes | ⚠️ Limited |

## 📦 Installation

### From PyPI

```bash
pip install crawl4ai-mcp
```

### From Source

```bash
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
pip install -e .
```

### With Docker

```bash
docker pull stgmt/crawl4ai-mcp
docker run -p 3000:3000 stgmt/crawl4ai-mcp
```

## 🔧 Usage

### Basic Command Line

```bash
# Default HTTP mode
crawl4ai-mcp

# STDIO mode for CLI integration
crawl4ai-mcp --stdio

# SSE mode for real-time streaming
crawl4ai-mcp --sse

# HTTP mode explicitly
crawl4ai-mcp --http
```

### Python Integration

```python
import asyncio
from crawl4ai_mcp import Crawl4AIMCPServer

async def main():
    server = Crawl4AIMCPServer()
    
    # Run in STDIO mode
    await server.run_stdio()
    
    # Or run in HTTP mode
    # server.run_http(host="0.0.0.0", port=3000)

asyncio.run(main())
```

### With MCP Clients

#### Using mcp-client SDK

```python
from mcp import ClientSession, StdioServerParameters
import asyncio

async def main():
    server_params = StdioServerParameters(
        command="crawl4ai-mcp",
        args=["--stdio"]
    )
    
    async with ClientSession(server_params) as session:
        # List available tools
        tools = await session.list_tools()
        
        # Crawl a webpage
        result = await session.call_tool(
            "crawl",
            {"url": "https://example.com"}
        )
        print(result)

asyncio.run(main())
```

## 🛠️ Available Tools

### 1. `crawl` - Full Web Crawling

Extract complete content from any webpage.

```json
{
  "name": "crawl",
  "arguments": {
    "url": "https://example.com",
    "wait_for": "css:.content",
    "timeout": 30000
  }
}
```

### 2. `md` - Markdown Extraction

Get clean markdown content from webpages.

```json
{
  "name": "md", 
  "arguments": {
    "url": "https://docs.example.com",
    "clean": true
  }
}
```

### 3. `html` - Raw HTML

Retrieve raw HTML content.

```json
{
  "name": "html",
  "arguments": {
    "url": "https://example.com"
  }
}
```

### 4. `screenshot` - Visual Capture

Take screenshots of webpages.

```json
{
  "name": "screenshot",
  "arguments": {
    "url": "https://example.com",
    "full_page": true
  }
}
```

### 5. `pdf` - PDF Generation

Convert webpages to PDF.

```json
{
  "name": "pdf",
  "arguments": {
    "url": "https://example.com",
    "format": "A4"
  }
}
```

### 6. `execute_js` - JavaScript Execution

Execute JavaScript on webpages.

```json
{
  "name": "execute_js",
  "arguments": {
    "url": "https://example.com",
    "script": "document.title"
  }
}
```

## 🌐 Transport Protocols

### STDIO Transport

Best for command-line tools and local development.

```bash
crawl4ai-mcp --stdio
```

**Use cases:**
- Claude Desktop app
- Terminal-based MCP clients
- Local development and testing
- CI/CD pipelines

### SSE Transport (Server-Sent Events)

Ideal for real-time web applications.

```bash
crawl4ai-mcp --sse
```

**Use cases:**
- Web-based MCP clients
- Real-time streaming applications
- Browser extensions
- Progressive web apps

### HTTP Transport

Standard REST API for maximum compatibility.

```bash
crawl4ai-mcp --http
```

**Use cases:**
- REST API clients
- Microservice architectures
- Cloud deployments
- Load-balanced environments

## ⚙️ Configuration

### Environment Variables

```bash
# Crawl4AI endpoint (if using remote instance)
CRAWL4AI_ENDPOINT=http://localhost:8000

# Server ports
HTTP_PORT=3000
SSE_PORT=3001

# Logging
LOG_LEVEL=INFO

# Performance
MAX_CONCURRENT_REQUESTS=10
REQUEST_TIMEOUT=30
```

### Configuration File

Create `.env` file:

```env
CRAWL4AI_ENDPOINT=http://your-crawl4ai-instance.com
HTTP_PORT=3000
SSE_PORT=3001
LOG_LEVEL=DEBUG
DEBUG=true
```

### Advanced Settings

```python
# config/settings.py
from pydantic import BaseSettings

class Settings(BaseSettings):
    crawl4ai_endpoint: str = "http://localhost:8000"
    http_port: int = 3000
    sse_port: int = 3001
    log_level: str = "INFO"
    debug: bool = False
    max_concurrent_requests: int = 10
    request_timeout: int = 30
    
    class Config:
        env_file = ".env"

settings = Settings()
```

## 🐳 Docker Support

### Quick Start with Docker

```bash
# Run with default settings
docker run -p 3000:3000 stgmt/crawl4ai-mcp

# With environment variables
docker run -p 3000:3000 \
  -e CRAWL4AI_ENDPOINT=http://crawl4ai:8000 \
  -e LOG_LEVEL=DEBUG \
  stgmt/crawl4ai-mcp

# With Docker Compose
docker-compose up
```

### Docker Compose

```yaml
version: '3.8'

services:
  crawl4ai-mcp:
    image: stgmt/crawl4ai-mcp
    ports:
      - "3000:3000"
      - "3001:3001"
    environment:
      - CRAWL4AI_ENDPOINT=http://crawl4ai:8000
      - LOG_LEVEL=INFO
    depends_on:
      - crawl4ai
      
  crawl4ai:
    image: crawl4ai/crawl4ai
    ports:
      - "8000:8000"
```

### Building Custom Image

```dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

RUN pip install -e .

EXPOSE 3000 3001

CMD ["crawl4ai-mcp", "--http"]
```

## 🧪 Testing

### Running Tests

```bash
# Install test dependencies
pip install -e .[test]

# Run all tests
pytest

# Run with coverage
pytest --cov=crawl4ai_mcp

# Run specific test
pytest tests/test_server.py::test_crawl_tool
```

### Testing with MCP Server Tester

This MCP server can be comprehensively tested using the [MCP Server Tester](https://github.com/stgmt/mcp-server-tester-sse-http-stdio) that supports all three transport protocols (STDIO, SSE, HTTP).

#### Install MCP Server Tester

```bash
# Option 1: Using Docker (recommended)
docker run -it stgmt/mcp-server-tester test --help

# Option 2: Using NPM
npm install -g mcp-server-tester-sse-http-stdio
mcp-server-tester test --help

# Option 3: Using Python
pip install mcp-server-tester-sse-http-stdio
mcp-server-tester test --help
```

#### Test Examples

**Test STDIO mode:**

```bash
# Docker
docker run -it stgmt/mcp-server-tester test \
  --transport stdio \
  --command "crawl4ai-mcp --stdio"

# NPM/Python
mcp-server-tester test \
  --transport stdio \
  --command "crawl4ai-mcp --stdio"
```

**Test HTTP mode:**

```bash
# Start the server first
crawl4ai-mcp --http &

# Run tests
mcp-server-tester test \
  --transport http \
  --url http://localhost:3000
```

**Test SSE mode:**

```bash
# Start the server first
crawl4ai-mcp --sse &

# Run tests
mcp-server-tester test \
  --transport sse \
  --url http://localhost:3001
```

**Test with configuration file:**

Create `test-config.yaml`:

```yaml
name: Crawl4AI Comprehensive Tests
transport: stdio
command: crawl4ai-mcp --stdio
tests:
  - name: Test markdown extraction
    tool: md
    arguments:
      url: https://example.com
      f: fit
    assert:
      - type: contains
        value: "Example Domain"
  
  - name: Test screenshot
    tool: screenshot
    arguments:
      url: https://example.com
      screenshot_wait_for: 2
    assert:
      - type: exists
        path: result
  
  - name: Test HTML extraction
    tool: html
    arguments:
      url: https://example.com
    assert:
      - type: contains
        value: "<html"
  
  - name: Test JavaScript execution
    tool: execute_js
    arguments:
      url: https://example.com
      scripts: ["document.title"]
    assert:
      - type: contains
        value: "Example"
```

Run the test:

```bash
mcp-server-tester test -f test-config.yaml
```

#### Interactive Testing

The tester also provides an interactive mode for manual testing:

```bash
# Interactive STDIO mode
mcp-server-tester interactive \
  --transport stdio \
  --command "crawl4ai-mcp --stdio"

# Interactive HTTP mode
mcp-server-tester interactive \
  --transport http \
  --url http://localhost:3000
```

## 📚 Examples

### Example 1: Extract Documentation

```python
# Extract markdown from documentation
result = await session.call_tool("md", {
    "url": "https://docs.python.org/3/",
    "clean": True
})
```

### Example 2: Monitor Price Changes

```python
# Screenshot for visual comparison
screenshot = await session.call_tool("screenshot", {
    "url": "https://store.example.com/product",
    "full_page": False
})

# Extract price via JavaScript
price = await session.call_tool("execute_js", {
    "url": "https://store.example.com/product",
    "script": "document.querySelector('.price').innerText"
})
```

### Example 3: Generate Reports

```python
# Generate PDF report
pdf = await session.call_tool("pdf", {
    "url": "https://analytics.example.com/report",
    "format": "A4",
    "landscape": True
})
```

## 🤝 Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Development Setup

```bash
# Clone the repository
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

# Format code
black crawl4ai_mcp tests
ruff check --fix crawl4ai_mcp tests
```

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🔗 Links

- **PyPI Package**: [https://pypi.org/project/crawl4ai-mcp/](https://pypi.org/project/crawl4ai-mcp/)
- **GitHub Repository**: [https://github.com/stgmt/crawl4ai-mcp](https://github.com/stgmt/crawl4ai-mcp)
- **Documentation**: [https://github.com/stgmt/crawl4ai-mcp#readme](https://github.com/stgmt/crawl4ai-mcp#readme)
- **Issues**: [https://github.com/stgmt/crawl4ai-mcp/issues](https://github.com/stgmt/crawl4ai-mcp/issues)

## 🙏 Acknowledgments

- [Crawl4AI](https://github.com/unclecode/crawl4ai) - The powerful crawling engine
- [MCP](https://modelcontextprotocol.io) - Model Context Protocol specification
- [Anthropic](https://anthropic.com) - For creating the MCP standard

---

**Made with ❤️ for the AI community**

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "crawl4ai-mcp-sse-stdio",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "mcp, model-context-protocol, crawl4ai, web-scraping, web-crawler, data-extraction, ai-tools, llm-tools, anthropic, claude, openai, chatgpt, async, python, sse, server-sent-events, stdio, http, webscraping, beautifulsoup, playwright, selenium, api, automation",
    "author": "stgmt",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/1b/a8/202614530c15fd969c58cf56c1a514e2fccd7a0a67945a8b5d950b716067/crawl4ai_mcp_sse_stdio-1.0.5.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd77\ufe0f Crawl4AI MCP Server\n\n[![PyPI version](https://badge.fury.io/py/crawl4ai-mcp.svg)](https://badge.fury.io/py/crawl4ai-mcp)\n[![Python](https://img.shields.io/pypi/pyversions/crawl4ai-mcp.svg)](https://pypi.org/project/crawl4ai-mcp/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://pepy.tech/badge/crawl4ai-mcp)](https://pepy.tech/project/crawl4ai-mcp)\n\n**MCP (Model Context Protocol) server for Crawl4AI** - Universal web crawling and data extraction for AI agents.\n\nIntegrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.\n\n## \ud83d\udcd1 Table of Contents\n\n- [\ud83c\udfaf Why This Tool?](#-why-this-tool)\n- [\u26a1 Quick Start](#-quick-start)\n- [\ud83d\ude80 Features](#-features)\n- [\ud83d\udce6 Installation](#-installation)\n- [\ud83d\udd27 Usage](#-usage)\n- [\ud83d\udee0\ufe0f Available Tools](#\ufe0f-available-tools)\n- [\ud83c\udf10 Transport Protocols](#-transport-protocols)\n- [\u2699\ufe0f Configuration](#\ufe0f-configuration)\n- [\ud83d\udc33 Docker Support](#-docker-support)\n- [\ud83e\udd1d Contributing](#-contributing)\n- [\ud83d\udcc4 License](#-license)\n\n## \ud83c\udfaf Why This Tool?\n\n### The Problem\n\n- \ud83d\udd34 **No MCP servers for web scraping** - AI agents can't access web content\n- \ud83d\udd34 **Complex scraping setup** - Crawl4AI requires custom integration\n- \ud83d\udd34 **Limited protocol support** - Most tools only support one transport\n- \ud83d\udd34 **Poor AI integration** - Existing scrapers aren't optimized for LLMs\n\n### Our Solution  \n\n- \u2705 **First Crawl4AI MCP server** - Native MCP integration\n- \u2705 **All MCP transports** - STDIO, SSE, and HTTP support\n- \u2705 **AI-optimized extraction** - Clean markdown, structured data\n- \u2705 **One-line execution** - `crawl4ai-mcp --stdio`\n- \u2705 **Production ready** - Type hints, tests, Docker support\n\n## \u26a1 Quick Start\n\n### One-Line Execution\n\n```bash\n# Install and run\npip install crawl4ai-mcp\ncrawl4ai-mcp --stdio\n```\n\n### With Claude Desktop\n\nAdd to your `claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"crawl4ai\": {\n      \"command\": \"crawl4ai-mcp\",\n      \"args\": [\"--stdio\"]\n    }\n  }\n}\n```\n\n### With Any MCP Client\n\n```bash\n# STDIO mode (for CLI tools)\ncrawl4ai-mcp --stdio\n\n# SSE mode (for web clients)  \ncrawl4ai-mcp --sse\n\n# HTTP mode (for REST API)\ncrawl4ai-mcp --http\n```\n\n## \ud83d\ude80 Features\n\n### Core Capabilities\n\n- \ud83c\udf10 **Universal Web Scraping** - Extract content from any website\n- \ud83d\udcdd **Markdown Conversion** - Clean, formatted markdown output\n- \ud83d\udcf8 **Screenshots** - Capture visual content\n- \ud83d\udcc4 **PDF Generation** - Save pages as PDF\n- \ud83c\udfad **JavaScript Execution** - Interact with dynamic content\n- \ud83d\udd04 **Multiple Transports** - STDIO, SSE, HTTP protocols\n\n### Why Choose crawl4ai-mcp?\n\n| Feature | crawl4ai-mcp | Other Tools |\n|---------|--------------|-------------|\n| MCP Protocol Support | \u2705 Full | \u274c None |\n| Crawl4AI Integration | \u2705 Native | \u274c Manual |\n| Transport Protocols | \u2705 All 3 | \u26a0\ufe0f Usually 1 |\n| AI Optimization | \u2705 Built-in | \u274c Generic |\n| Production Ready | \u2705 Yes | \u26a0\ufe0f Varies |\n| Docker Support | \u2705 Yes | \u26a0\ufe0f Limited |\n\n## \ud83d\udce6 Installation\n\n### From PyPI\n\n```bash\npip install crawl4ai-mcp\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/stgmt/crawl4ai-mcp.git\ncd crawl4ai-mcp\npip install -e .\n```\n\n### With Docker\n\n```bash\ndocker pull stgmt/crawl4ai-mcp\ndocker run -p 3000:3000 stgmt/crawl4ai-mcp\n```\n\n## \ud83d\udd27 Usage\n\n### Basic Command Line\n\n```bash\n# Default HTTP mode\ncrawl4ai-mcp\n\n# STDIO mode for CLI integration\ncrawl4ai-mcp --stdio\n\n# SSE mode for real-time streaming\ncrawl4ai-mcp --sse\n\n# HTTP mode explicitly\ncrawl4ai-mcp --http\n```\n\n### Python Integration\n\n```python\nimport asyncio\nfrom crawl4ai_mcp import Crawl4AIMCPServer\n\nasync def main():\n    server = Crawl4AIMCPServer()\n    \n    # Run in STDIO mode\n    await server.run_stdio()\n    \n    # Or run in HTTP mode\n    # server.run_http(host=\"0.0.0.0\", port=3000)\n\nasyncio.run(main())\n```\n\n### With MCP Clients\n\n#### Using mcp-client SDK\n\n```python\nfrom mcp import ClientSession, StdioServerParameters\nimport asyncio\n\nasync def main():\n    server_params = StdioServerParameters(\n        command=\"crawl4ai-mcp\",\n        args=[\"--stdio\"]\n    )\n    \n    async with ClientSession(server_params) as session:\n        # List available tools\n        tools = await session.list_tools()\n        \n        # Crawl a webpage\n        result = await session.call_tool(\n            \"crawl\",\n            {\"url\": \"https://example.com\"}\n        )\n        print(result)\n\nasyncio.run(main())\n```\n\n## \ud83d\udee0\ufe0f Available Tools\n\n### 1. `crawl` - Full Web Crawling\n\nExtract complete content from any webpage.\n\n```json\n{\n  \"name\": \"crawl\",\n  \"arguments\": {\n    \"url\": \"https://example.com\",\n    \"wait_for\": \"css:.content\",\n    \"timeout\": 30000\n  }\n}\n```\n\n### 2. `md` - Markdown Extraction\n\nGet clean markdown content from webpages.\n\n```json\n{\n  \"name\": \"md\", \n  \"arguments\": {\n    \"url\": \"https://docs.example.com\",\n    \"clean\": true\n  }\n}\n```\n\n### 3. `html` - Raw HTML\n\nRetrieve raw HTML content.\n\n```json\n{\n  \"name\": \"html\",\n  \"arguments\": {\n    \"url\": \"https://example.com\"\n  }\n}\n```\n\n### 4. `screenshot` - Visual Capture\n\nTake screenshots of webpages.\n\n```json\n{\n  \"name\": \"screenshot\",\n  \"arguments\": {\n    \"url\": \"https://example.com\",\n    \"full_page\": true\n  }\n}\n```\n\n### 5. `pdf` - PDF Generation\n\nConvert webpages to PDF.\n\n```json\n{\n  \"name\": \"pdf\",\n  \"arguments\": {\n    \"url\": \"https://example.com\",\n    \"format\": \"A4\"\n  }\n}\n```\n\n### 6. `execute_js` - JavaScript Execution\n\nExecute JavaScript on webpages.\n\n```json\n{\n  \"name\": \"execute_js\",\n  \"arguments\": {\n    \"url\": \"https://example.com\",\n    \"script\": \"document.title\"\n  }\n}\n```\n\n## \ud83c\udf10 Transport Protocols\n\n### STDIO Transport\n\nBest for command-line tools and local development.\n\n```bash\ncrawl4ai-mcp --stdio\n```\n\n**Use cases:**\n- Claude Desktop app\n- Terminal-based MCP clients\n- Local development and testing\n- CI/CD pipelines\n\n### SSE Transport (Server-Sent Events)\n\nIdeal for real-time web applications.\n\n```bash\ncrawl4ai-mcp --sse\n```\n\n**Use cases:**\n- Web-based MCP clients\n- Real-time streaming applications\n- Browser extensions\n- Progressive web apps\n\n### HTTP Transport\n\nStandard REST API for maximum compatibility.\n\n```bash\ncrawl4ai-mcp --http\n```\n\n**Use cases:**\n- REST API clients\n- Microservice architectures\n- Cloud deployments\n- Load-balanced environments\n\n## \u2699\ufe0f Configuration\n\n### Environment Variables\n\n```bash\n# Crawl4AI endpoint (if using remote instance)\nCRAWL4AI_ENDPOINT=http://localhost:8000\n\n# Server ports\nHTTP_PORT=3000\nSSE_PORT=3001\n\n# Logging\nLOG_LEVEL=INFO\n\n# Performance\nMAX_CONCURRENT_REQUESTS=10\nREQUEST_TIMEOUT=30\n```\n\n### Configuration File\n\nCreate `.env` file:\n\n```env\nCRAWL4AI_ENDPOINT=http://your-crawl4ai-instance.com\nHTTP_PORT=3000\nSSE_PORT=3001\nLOG_LEVEL=DEBUG\nDEBUG=true\n```\n\n### Advanced Settings\n\n```python\n# config/settings.py\nfrom pydantic import BaseSettings\n\nclass Settings(BaseSettings):\n    crawl4ai_endpoint: str = \"http://localhost:8000\"\n    http_port: int = 3000\n    sse_port: int = 3001\n    log_level: str = \"INFO\"\n    debug: bool = False\n    max_concurrent_requests: int = 10\n    request_timeout: int = 30\n    \n    class Config:\n        env_file = \".env\"\n\nsettings = Settings()\n```\n\n## \ud83d\udc33 Docker Support\n\n### Quick Start with Docker\n\n```bash\n# Run with default settings\ndocker run -p 3000:3000 stgmt/crawl4ai-mcp\n\n# With environment variables\ndocker run -p 3000:3000 \\\n  -e CRAWL4AI_ENDPOINT=http://crawl4ai:8000 \\\n  -e LOG_LEVEL=DEBUG \\\n  stgmt/crawl4ai-mcp\n\n# With Docker Compose\ndocker-compose up\n```\n\n### Docker Compose\n\n```yaml\nversion: '3.8'\n\nservices:\n  crawl4ai-mcp:\n    image: stgmt/crawl4ai-mcp\n    ports:\n      - \"3000:3000\"\n      - \"3001:3001\"\n    environment:\n      - CRAWL4AI_ENDPOINT=http://crawl4ai:8000\n      - LOG_LEVEL=INFO\n    depends_on:\n      - crawl4ai\n      \n  crawl4ai:\n    image: crawl4ai/crawl4ai\n    ports:\n      - \"8000:8000\"\n```\n\n### Building Custom Image\n\n```dockerfile\nFROM python:3.11-slim\n\nWORKDIR /app\n\nCOPY requirements.txt .\nRUN pip install --no-cache-dir -r requirements.txt\n\nCOPY . .\n\nRUN pip install -e .\n\nEXPOSE 3000 3001\n\nCMD [\"crawl4ai-mcp\", \"--http\"]\n```\n\n## \ud83e\uddea Testing\n\n### Running Tests\n\n```bash\n# Install test dependencies\npip install -e .[test]\n\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=crawl4ai_mcp\n\n# Run specific test\npytest tests/test_server.py::test_crawl_tool\n```\n\n### Testing with MCP Server Tester\n\nThis MCP server can be comprehensively tested using the [MCP Server Tester](https://github.com/stgmt/mcp-server-tester-sse-http-stdio) that supports all three transport protocols (STDIO, SSE, HTTP).\n\n#### Install MCP Server Tester\n\n```bash\n# Option 1: Using Docker (recommended)\ndocker run -it stgmt/mcp-server-tester test --help\n\n# Option 2: Using NPM\nnpm install -g mcp-server-tester-sse-http-stdio\nmcp-server-tester test --help\n\n# Option 3: Using Python\npip install mcp-server-tester-sse-http-stdio\nmcp-server-tester test --help\n```\n\n#### Test Examples\n\n**Test STDIO mode:**\n\n```bash\n# Docker\ndocker run -it stgmt/mcp-server-tester test \\\n  --transport stdio \\\n  --command \"crawl4ai-mcp --stdio\"\n\n# NPM/Python\nmcp-server-tester test \\\n  --transport stdio \\\n  --command \"crawl4ai-mcp --stdio\"\n```\n\n**Test HTTP mode:**\n\n```bash\n# Start the server first\ncrawl4ai-mcp --http &\n\n# Run tests\nmcp-server-tester test \\\n  --transport http \\\n  --url http://localhost:3000\n```\n\n**Test SSE mode:**\n\n```bash\n# Start the server first\ncrawl4ai-mcp --sse &\n\n# Run tests\nmcp-server-tester test \\\n  --transport sse \\\n  --url http://localhost:3001\n```\n\n**Test with configuration file:**\n\nCreate `test-config.yaml`:\n\n```yaml\nname: Crawl4AI Comprehensive Tests\ntransport: stdio\ncommand: crawl4ai-mcp --stdio\ntests:\n  - name: Test markdown extraction\n    tool: md\n    arguments:\n      url: https://example.com\n      f: fit\n    assert:\n      - type: contains\n        value: \"Example Domain\"\n  \n  - name: Test screenshot\n    tool: screenshot\n    arguments:\n      url: https://example.com\n      screenshot_wait_for: 2\n    assert:\n      - type: exists\n        path: result\n  \n  - name: Test HTML extraction\n    tool: html\n    arguments:\n      url: https://example.com\n    assert:\n      - type: contains\n        value: \"<html\"\n  \n  - name: Test JavaScript execution\n    tool: execute_js\n    arguments:\n      url: https://example.com\n      scripts: [\"document.title\"]\n    assert:\n      - type: contains\n        value: \"Example\"\n```\n\nRun the test:\n\n```bash\nmcp-server-tester test -f test-config.yaml\n```\n\n#### Interactive Testing\n\nThe tester also provides an interactive mode for manual testing:\n\n```bash\n# Interactive STDIO mode\nmcp-server-tester interactive \\\n  --transport stdio \\\n  --command \"crawl4ai-mcp --stdio\"\n\n# Interactive HTTP mode\nmcp-server-tester interactive \\\n  --transport http \\\n  --url http://localhost:3000\n```\n\n## \ud83d\udcda Examples\n\n### Example 1: Extract Documentation\n\n```python\n# Extract markdown from documentation\nresult = await session.call_tool(\"md\", {\n    \"url\": \"https://docs.python.org/3/\",\n    \"clean\": True\n})\n```\n\n### Example 2: Monitor Price Changes\n\n```python\n# Screenshot for visual comparison\nscreenshot = await session.call_tool(\"screenshot\", {\n    \"url\": \"https://store.example.com/product\",\n    \"full_page\": False\n})\n\n# Extract price via JavaScript\nprice = await session.call_tool(\"execute_js\", {\n    \"url\": \"https://store.example.com/product\",\n    \"script\": \"document.querySelector('.price').innerText\"\n})\n```\n\n### Example 3: Generate Reports\n\n```python\n# Generate PDF report\npdf = await session.call_tool(\"pdf\", {\n    \"url\": \"https://analytics.example.com/report\",\n    \"format\": \"A4\",\n    \"landscape\": True\n})\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/stgmt/crawl4ai-mcp.git\ncd crawl4ai-mcp\n\n# Install in development mode\npip install -e .[dev]\n\n# Run tests\npytest\n\n# Format code\nblack crawl4ai_mcp tests\nruff check --fix crawl4ai_mcp tests\n```\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **PyPI Package**: [https://pypi.org/project/crawl4ai-mcp/](https://pypi.org/project/crawl4ai-mcp/)\n- **GitHub Repository**: [https://github.com/stgmt/crawl4ai-mcp](https://github.com/stgmt/crawl4ai-mcp)\n- **Documentation**: [https://github.com/stgmt/crawl4ai-mcp#readme](https://github.com/stgmt/crawl4ai-mcp#readme)\n- **Issues**: [https://github.com/stgmt/crawl4ai-mcp/issues](https://github.com/stgmt/crawl4ai-mcp/issues)\n\n## \ud83d\ude4f Acknowledgments\n\n- [Crawl4AI](https://github.com/unclecode/crawl4ai) - The powerful crawling engine\n- [MCP](https://modelcontextprotocol.io) - Model Context Protocol specification\n- [Anthropic](https://anthropic.com) - For creating the MCP standard\n\n---\n\n**Made with \u2764\ufe0f for the AI community**\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction",
    "version": "1.0.5",
    "project_urls": {
        "Documentation": "https://github.com/stgmt/crawl4ai-mcp#readme",
        "Homepage": "https://github.com/stgmt/crawl4ai-mcp",
        "Issues": "https://github.com/stgmt/crawl4ai-mcp/issues",
        "Repository": "https://github.com/stgmt/crawl4ai-mcp"
    },
    "split_keywords": [
        "mcp",
        " model-context-protocol",
        " crawl4ai",
        " web-scraping",
        " web-crawler",
        " data-extraction",
        " ai-tools",
        " llm-tools",
        " anthropic",
        " claude",
        " openai",
        " chatgpt",
        " async",
        " python",
        " sse",
        " server-sent-events",
        " stdio",
        " http",
        " webscraping",
        " beautifulsoup",
        " playwright",
        " selenium",
        " api",
        " automation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dd6a1c3085d836dde6fb65409c20d05c42ea9984fe1c21b924111eb4acf40fab",
                "md5": "a97a2e9ce0e4d2c4e94a0ee9beb6d597",
                "sha256": "a3ff5a56b32e8cb6af2e90bb4bef1ca88d952e6cf70a82ec9e079538fa1d299c"
            },
            "downloads": -1,
            "filename": "crawl4ai_mcp_sse_stdio-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a97a2e9ce0e4d2c4e94a0ee9beb6d597",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12178,
            "upload_time": "2025-09-11T22:58:20",
            "upload_time_iso_8601": "2025-09-11T22:58:20.633881Z",
            "url": "https://files.pythonhosted.org/packages/dd/6a/1c3085d836dde6fb65409c20d05c42ea9984fe1c21b924111eb4acf40fab/crawl4ai_mcp_sse_stdio-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1ba8202614530c15fd969c58cf56c1a514e2fccd7a0a67945a8b5d950b716067",
                "md5": "1966c42eefe7ff81c37a16aa97c93176",
                "sha256": "4254c5ba5904f54d8cd4b73543e125b128014745a36cb7f71325d473fe1e8bcf"
            },
            "downloads": -1,
            "filename": "crawl4ai_mcp_sse_stdio-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "1966c42eefe7ff81c37a16aa97c93176",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 17069,
            "upload_time": "2025-09-11T22:58:21",
            "upload_time_iso_8601": "2025-09-11T22:58:21.803968Z",
            "url": "https://files.pythonhosted.org/packages/1b/a8/202614530c15fd969c58cf56c1a514e2fccd7a0a67945a8b5d950b716067/crawl4ai_mcp_sse_stdio-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-11 22:58:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stgmt",
    "github_project": "crawl4ai-mcp#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "mcp",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "crawl4ai",
            "specs": [
                [
                    ">=",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "starlette",
            "specs": [
                [
                    ">=",
                    "0.27.0"
                ]
            ]
        },
        {
            "name": "uvicorn",
            "specs": [
                [
                    ">=",
                    "0.24.0"
                ]
            ]
        },
        {
            "name": "httpx",
            "specs": [
                [
                    ">=",
                    "0.25.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "aiofiles",
            "specs": [
                [
                    ">=",
                    "23.0.0"
                ]
            ]
        }
    ],
    "lcname": "crawl4ai-mcp-sse-stdio"
}

stgmt