# ๐ท๏ธ Crawl4AI MCP Server
[](https://badge.fury.io/py/crawl4ai-mcp)
[](https://pypi.org/project/crawl4ai-mcp/)
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/project/crawl4ai-mcp)
**MCP (Model Context Protocol) server for Crawl4AI** - Universal web crawling and data extraction for AI agents.
Integrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.
## ๐ Table of Contents
- [๐ฏ Why This Tool?](#-why-this-tool)
- [โก Quick Start](#-quick-start)
- [๐ Features](#-features)
- [๐ฆ Installation](#-installation)
- [๐ง Usage](#-usage)
- [๐ ๏ธ Available Tools](#๏ธ-available-tools)
- [๐ Transport Protocols](#-transport-protocols)
- [โ๏ธ Configuration](#๏ธ-configuration)
- [๐ณ Docker Support](#-docker-support)
- [๐ค Contributing](#-contributing)
- [๐ License](#-license)
## ๐ฏ Why This Tool?
### The Problem
- ๐ด **No MCP servers for web scraping** - AI agents can't access web content
- ๐ด **Complex scraping setup** - Crawl4AI requires custom integration
- ๐ด **Limited protocol support** - Most tools only support one transport
- ๐ด **Poor AI integration** - Existing scrapers aren't optimized for LLMs
### Our Solution
- โ
**First Crawl4AI MCP server** - Native MCP integration
- โ
**All MCP transports** - STDIO, SSE, and HTTP support
- โ
**AI-optimized extraction** - Clean markdown, structured data
- โ
**One-line execution** - `crawl4ai-mcp --stdio`
- โ
**Production ready** - Type hints, tests, Docker support
## โก Quick Start
### One-Line Execution
```bash
# Install and run
pip install crawl4ai-mcp
crawl4ai-mcp --stdio
```
### With Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"crawl4ai": {
"command": "crawl4ai-mcp",
"args": ["--stdio"]
}
}
}
```
### With Any MCP Client
```bash
# STDIO mode (for CLI tools)
crawl4ai-mcp --stdio
# SSE mode (for web clients)
crawl4ai-mcp --sse
# HTTP mode (for REST API)
crawl4ai-mcp --http
```
## ๐ Features
### Core Capabilities
- ๐ **Universal Web Scraping** - Extract content from any website
- ๐ **Markdown Conversion** - Clean, formatted markdown output
- ๐ธ **Screenshots** - Capture visual content
- ๐ **PDF Generation** - Save pages as PDF
- ๐ญ **JavaScript Execution** - Interact with dynamic content
- ๐ **Multiple Transports** - STDIO, SSE, HTTP protocols
### Why Choose crawl4ai-mcp?
| Feature | crawl4ai-mcp | Other Tools |
|---------|--------------|-------------|
| MCP Protocol Support | โ
Full | โ None |
| Crawl4AI Integration | โ
Native | โ Manual |
| Transport Protocols | โ
All 3 | โ ๏ธ Usually 1 |
| AI Optimization | โ
Built-in | โ Generic |
| Production Ready | โ
Yes | โ ๏ธ Varies |
| Docker Support | โ
Yes | โ ๏ธ Limited |
## ๐ฆ Installation
### From PyPI
```bash
pip install crawl4ai-mcp
```
### From Source
```bash
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
pip install -e .
```
### With Docker
```bash
docker pull stgmt/crawl4ai-mcp
docker run -p 3000:3000 stgmt/crawl4ai-mcp
```
## ๐ง Usage
### Basic Command Line
```bash
# Default HTTP mode
crawl4ai-mcp
# STDIO mode for CLI integration
crawl4ai-mcp --stdio
# SSE mode for real-time streaming
crawl4ai-mcp --sse
# HTTP mode explicitly
crawl4ai-mcp --http
```
### Python Integration
```python
import asyncio
from crawl4ai_mcp import Crawl4AIMCPServer
async def main():
server = Crawl4AIMCPServer()
# Run in STDIO mode
await server.run_stdio()
# Or run in HTTP mode
# server.run_http(host="0.0.0.0", port=3000)
asyncio.run(main())
```
### With MCP Clients
#### Using mcp-client SDK
```python
from mcp import ClientSession, StdioServerParameters
import asyncio
async def main():
server_params = StdioServerParameters(
command="crawl4ai-mcp",
args=["--stdio"]
)
async with ClientSession(server_params) as session:
# List available tools
tools = await session.list_tools()
# Crawl a webpage
result = await session.call_tool(
"crawl",
{"url": "https://example.com"}
)
print(result)
asyncio.run(main())
```
## ๐ ๏ธ Available Tools
### 1. `crawl` - Full Web Crawling
Extract complete content from any webpage.
```json
{
"name": "crawl",
"arguments": {
"url": "https://example.com",
"wait_for": "css:.content",
"timeout": 30000
}
}
```
### 2. `md` - Markdown Extraction
Get clean markdown content from webpages.
```json
{
"name": "md",
"arguments": {
"url": "https://docs.example.com",
"clean": true
}
}
```
### 3. `html` - Raw HTML
Retrieve raw HTML content.
```json
{
"name": "html",
"arguments": {
"url": "https://example.com"
}
}
```
### 4. `screenshot` - Visual Capture
Take screenshots of webpages.
```json
{
"name": "screenshot",
"arguments": {
"url": "https://example.com",
"full_page": true
}
}
```
### 5. `pdf` - PDF Generation
Convert webpages to PDF.
```json
{
"name": "pdf",
"arguments": {
"url": "https://example.com",
"format": "A4"
}
}
```
### 6. `execute_js` - JavaScript Execution
Execute JavaScript on webpages.
```json
{
"name": "execute_js",
"arguments": {
"url": "https://example.com",
"script": "document.title"
}
}
```
## ๐ Transport Protocols
### STDIO Transport
Best for command-line tools and local development.
```bash
crawl4ai-mcp --stdio
```
**Use cases:**
- Claude Desktop app
- Terminal-based MCP clients
- Local development and testing
- CI/CD pipelines
### SSE Transport (Server-Sent Events)
Ideal for real-time web applications.
```bash
crawl4ai-mcp --sse
```
**Use cases:**
- Web-based MCP clients
- Real-time streaming applications
- Browser extensions
- Progressive web apps
### HTTP Transport
Standard REST API for maximum compatibility.
```bash
crawl4ai-mcp --http
```
**Use cases:**
- REST API clients
- Microservice architectures
- Cloud deployments
- Load-balanced environments
## โ๏ธ Configuration
### Environment Variables
```bash
# Crawl4AI endpoint (if using remote instance)
CRAWL4AI_ENDPOINT=http://localhost:8000
# Server ports
HTTP_PORT=3000
SSE_PORT=3001
# Logging
LOG_LEVEL=INFO
# Performance
MAX_CONCURRENT_REQUESTS=10
REQUEST_TIMEOUT=30
```
### Configuration File
Create `.env` file:
```env
CRAWL4AI_ENDPOINT=http://your-crawl4ai-instance.com
HTTP_PORT=3000
SSE_PORT=3001
LOG_LEVEL=DEBUG
DEBUG=true
```
### Advanced Settings
```python
# config/settings.py
from pydantic import BaseSettings
class Settings(BaseSettings):
crawl4ai_endpoint: str = "http://localhost:8000"
http_port: int = 3000
sse_port: int = 3001
log_level: str = "INFO"
debug: bool = False
max_concurrent_requests: int = 10
request_timeout: int = 30
class Config:
env_file = ".env"
settings = Settings()
```
## ๐ณ Docker Support
### Quick Start with Docker
```bash
# Run with default settings
docker run -p 3000:3000 stgmt/crawl4ai-mcp
# With environment variables
docker run -p 3000:3000 \
-e CRAWL4AI_ENDPOINT=http://crawl4ai:8000 \
-e LOG_LEVEL=DEBUG \
stgmt/crawl4ai-mcp
# With Docker Compose
docker-compose up
```
### Docker Compose
```yaml
version: '3.8'
services:
crawl4ai-mcp:
image: stgmt/crawl4ai-mcp
ports:
- "3000:3000"
- "3001:3001"
environment:
- CRAWL4AI_ENDPOINT=http://crawl4ai:8000
- LOG_LEVEL=INFO
depends_on:
- crawl4ai
crawl4ai:
image: crawl4ai/crawl4ai
ports:
- "8000:8000"
```
### Building Custom Image
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN pip install -e .
EXPOSE 3000 3001
CMD ["crawl4ai-mcp", "--http"]
```
## ๐งช Testing
### Running Tests
```bash
# Install test dependencies
pip install -e .[test]
# Run all tests
pytest
# Run with coverage
pytest --cov=crawl4ai_mcp
# Run specific test
pytest tests/test_server.py::test_crawl_tool
```
### Testing with MCP Server Tester
This MCP server can be comprehensively tested using the [MCP Server Tester](https://github.com/stgmt/mcp-server-tester-sse-http-stdio) that supports all three transport protocols (STDIO, SSE, HTTP).
#### Install MCP Server Tester
```bash
# Option 1: Using Docker (recommended)
docker run -it stgmt/mcp-server-tester test --help
# Option 2: Using NPM
npm install -g mcp-server-tester-sse-http-stdio
mcp-server-tester test --help
# Option 3: Using Python
pip install mcp-server-tester-sse-http-stdio
mcp-server-tester test --help
```
#### Test Examples
**Test STDIO mode:**
```bash
# Docker
docker run -it stgmt/mcp-server-tester test \
--transport stdio \
--command "crawl4ai-mcp --stdio"
# NPM/Python
mcp-server-tester test \
--transport stdio \
--command "crawl4ai-mcp --stdio"
```
**Test HTTP mode:**
```bash
# Start the server first
crawl4ai-mcp --http &
# Run tests
mcp-server-tester test \
--transport http \
--url http://localhost:3000
```
**Test SSE mode:**
```bash
# Start the server first
crawl4ai-mcp --sse &
# Run tests
mcp-server-tester test \
--transport sse \
--url http://localhost:3001
```
**Test with configuration file:**
Create `test-config.yaml`:
```yaml
name: Crawl4AI Comprehensive Tests
transport: stdio
command: crawl4ai-mcp --stdio
tests:
- name: Test markdown extraction
tool: md
arguments:
url: https://example.com
f: fit
assert:
- type: contains
value: "Example Domain"
- name: Test screenshot
tool: screenshot
arguments:
url: https://example.com
screenshot_wait_for: 2
assert:
- type: exists
path: result
- name: Test HTML extraction
tool: html
arguments:
url: https://example.com
assert:
- type: contains
value: "<html"
- name: Test JavaScript execution
tool: execute_js
arguments:
url: https://example.com
scripts: ["document.title"]
assert:
- type: contains
value: "Example"
```
Run the test:
```bash
mcp-server-tester test -f test-config.yaml
```
#### Interactive Testing
The tester also provides an interactive mode for manual testing:
```bash
# Interactive STDIO mode
mcp-server-tester interactive \
--transport stdio \
--command "crawl4ai-mcp --stdio"
# Interactive HTTP mode
mcp-server-tester interactive \
--transport http \
--url http://localhost:3000
```
## ๐ Examples
### Example 1: Extract Documentation
```python
# Extract markdown from documentation
result = await session.call_tool("md", {
"url": "https://docs.python.org/3/",
"clean": True
})
```
### Example 2: Monitor Price Changes
```python
# Screenshot for visual comparison
screenshot = await session.call_tool("screenshot", {
"url": "https://store.example.com/product",
"full_page": False
})
# Extract price via JavaScript
price = await session.call_tool("execute_js", {
"url": "https://store.example.com/product",
"script": "document.querySelector('.price').innerText"
})
```
### Example 3: Generate Reports
```python
# Generate PDF report
pdf = await session.call_tool("pdf", {
"url": "https://analytics.example.com/report",
"format": "A4",
"landscape": True
})
```
## ๐ค Contributing
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### Development Setup
```bash
# Clone the repository
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
# Install in development mode
pip install -e .[dev]
# Run tests
pytest
# Format code
black crawl4ai_mcp tests
ruff check --fix crawl4ai_mcp tests
```
## ๐ License
MIT License - see [LICENSE](LICENSE) file for details.
## ๐ Links
- **PyPI Package**: [https://pypi.org/project/crawl4ai-mcp/](https://pypi.org/project/crawl4ai-mcp/)
- **GitHub Repository**: [https://github.com/stgmt/crawl4ai-mcp](https://github.com/stgmt/crawl4ai-mcp)
- **Documentation**: [https://github.com/stgmt/crawl4ai-mcp#readme](https://github.com/stgmt/crawl4ai-mcp#readme)
- **Issues**: [https://github.com/stgmt/crawl4ai-mcp/issues](https://github.com/stgmt/crawl4ai-mcp/issues)
## ๐ Acknowledgments
- [Crawl4AI](https://github.com/unclecode/crawl4ai) - The powerful crawling engine
- [MCP](https://modelcontextprotocol.io) - Model Context Protocol specification
- [Anthropic](https://anthropic.com) - For creating the MCP standard
---
**Made with โค๏ธ for the AI community**
Raw data
{
"_id": null,
"home_page": null,
"name": "crawl4ai-mcp-sse-stdio",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "mcp, model-context-protocol, crawl4ai, web-scraping, web-crawler, data-extraction, ai-tools, llm-tools, anthropic, claude, openai, chatgpt, async, python, sse, server-sent-events, stdio, http, webscraping, beautifulsoup, playwright, selenium, api, automation",
"author": "stgmt",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/1b/a8/202614530c15fd969c58cf56c1a514e2fccd7a0a67945a8b5d950b716067/crawl4ai_mcp_sse_stdio-1.0.5.tar.gz",
"platform": null,
"description": "# \ud83d\udd77\ufe0f Crawl4AI MCP Server\n\n[](https://badge.fury.io/py/crawl4ai-mcp)\n[](https://pypi.org/project/crawl4ai-mcp/)\n[](https://opensource.org/licenses/MIT)\n[](https://pepy.tech/project/crawl4ai-mcp)\n\n**MCP (Model Context Protocol) server for Crawl4AI** - Universal web crawling and data extraction for AI agents.\n\nIntegrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.\n\n## \ud83d\udcd1 Table of Contents\n\n- [\ud83c\udfaf Why This Tool?](#-why-this-tool)\n- [\u26a1 Quick Start](#-quick-start)\n- [\ud83d\ude80 Features](#-features)\n- [\ud83d\udce6 Installation](#-installation)\n- [\ud83d\udd27 Usage](#-usage)\n- [\ud83d\udee0\ufe0f Available Tools](#\ufe0f-available-tools)\n- [\ud83c\udf10 Transport Protocols](#-transport-protocols)\n- [\u2699\ufe0f Configuration](#\ufe0f-configuration)\n- [\ud83d\udc33 Docker Support](#-docker-support)\n- [\ud83e\udd1d Contributing](#-contributing)\n- [\ud83d\udcc4 License](#-license)\n\n## \ud83c\udfaf Why This Tool?\n\n### The Problem\n\n- \ud83d\udd34 **No MCP servers for web scraping** - AI agents can't access web content\n- \ud83d\udd34 **Complex scraping setup** - Crawl4AI requires custom integration\n- \ud83d\udd34 **Limited protocol support** - Most tools only support one transport\n- \ud83d\udd34 **Poor AI integration** - Existing scrapers aren't optimized for LLMs\n\n### Our Solution \n\n- \u2705 **First Crawl4AI MCP server** - Native MCP integration\n- \u2705 **All MCP transports** - STDIO, SSE, and HTTP support\n- \u2705 **AI-optimized extraction** - Clean markdown, structured data\n- \u2705 **One-line execution** - `crawl4ai-mcp --stdio`\n- \u2705 **Production ready** - Type hints, tests, Docker support\n\n## \u26a1 Quick Start\n\n### One-Line Execution\n\n```bash\n# Install and run\npip install crawl4ai-mcp\ncrawl4ai-mcp --stdio\n```\n\n### With Claude Desktop\n\nAdd to your `claude_desktop_config.json`:\n\n```json\n{\n \"mcpServers\": {\n \"crawl4ai\": {\n \"command\": \"crawl4ai-mcp\",\n \"args\": [\"--stdio\"]\n }\n }\n}\n```\n\n### With Any MCP Client\n\n```bash\n# STDIO mode (for CLI tools)\ncrawl4ai-mcp --stdio\n\n# SSE mode (for web clients) \ncrawl4ai-mcp --sse\n\n# HTTP mode (for REST API)\ncrawl4ai-mcp --http\n```\n\n## \ud83d\ude80 Features\n\n### Core Capabilities\n\n- \ud83c\udf10 **Universal Web Scraping** - Extract content from any website\n- \ud83d\udcdd **Markdown Conversion** - Clean, formatted markdown output\n- \ud83d\udcf8 **Screenshots** - Capture visual content\n- \ud83d\udcc4 **PDF Generation** - Save pages as PDF\n- \ud83c\udfad **JavaScript Execution** - Interact with dynamic content\n- \ud83d\udd04 **Multiple Transports** - STDIO, SSE, HTTP protocols\n\n### Why Choose crawl4ai-mcp?\n\n| Feature | crawl4ai-mcp | Other Tools |\n|---------|--------------|-------------|\n| MCP Protocol Support | \u2705 Full | \u274c None |\n| Crawl4AI Integration | \u2705 Native | \u274c Manual |\n| Transport Protocols | \u2705 All 3 | \u26a0\ufe0f Usually 1 |\n| AI Optimization | \u2705 Built-in | \u274c Generic |\n| Production Ready | \u2705 Yes | \u26a0\ufe0f Varies |\n| Docker Support | \u2705 Yes | \u26a0\ufe0f Limited |\n\n## \ud83d\udce6 Installation\n\n### From PyPI\n\n```bash\npip install crawl4ai-mcp\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/stgmt/crawl4ai-mcp.git\ncd crawl4ai-mcp\npip install -e .\n```\n\n### With Docker\n\n```bash\ndocker pull stgmt/crawl4ai-mcp\ndocker run -p 3000:3000 stgmt/crawl4ai-mcp\n```\n\n## \ud83d\udd27 Usage\n\n### Basic Command Line\n\n```bash\n# Default HTTP mode\ncrawl4ai-mcp\n\n# STDIO mode for CLI integration\ncrawl4ai-mcp --stdio\n\n# SSE mode for real-time streaming\ncrawl4ai-mcp --sse\n\n# HTTP mode explicitly\ncrawl4ai-mcp --http\n```\n\n### Python Integration\n\n```python\nimport asyncio\nfrom crawl4ai_mcp import Crawl4AIMCPServer\n\nasync def main():\n server = Crawl4AIMCPServer()\n \n # Run in STDIO mode\n await server.run_stdio()\n \n # Or run in HTTP mode\n # server.run_http(host=\"0.0.0.0\", port=3000)\n\nasyncio.run(main())\n```\n\n### With MCP Clients\n\n#### Using mcp-client SDK\n\n```python\nfrom mcp import ClientSession, StdioServerParameters\nimport asyncio\n\nasync def main():\n server_params = StdioServerParameters(\n command=\"crawl4ai-mcp\",\n args=[\"--stdio\"]\n )\n \n async with ClientSession(server_params) as session:\n # List available tools\n tools = await session.list_tools()\n \n # Crawl a webpage\n result = await session.call_tool(\n \"crawl\",\n {\"url\": \"https://example.com\"}\n )\n print(result)\n\nasyncio.run(main())\n```\n\n## \ud83d\udee0\ufe0f Available Tools\n\n### 1. `crawl` - Full Web Crawling\n\nExtract complete content from any webpage.\n\n```json\n{\n \"name\": \"crawl\",\n \"arguments\": {\n \"url\": \"https://example.com\",\n \"wait_for\": \"css:.content\",\n \"timeout\": 30000\n }\n}\n```\n\n### 2. `md` - Markdown Extraction\n\nGet clean markdown content from webpages.\n\n```json\n{\n \"name\": \"md\", \n \"arguments\": {\n \"url\": \"https://docs.example.com\",\n \"clean\": true\n }\n}\n```\n\n### 3. `html` - Raw HTML\n\nRetrieve raw HTML content.\n\n```json\n{\n \"name\": \"html\",\n \"arguments\": {\n \"url\": \"https://example.com\"\n }\n}\n```\n\n### 4. `screenshot` - Visual Capture\n\nTake screenshots of webpages.\n\n```json\n{\n \"name\": \"screenshot\",\n \"arguments\": {\n \"url\": \"https://example.com\",\n \"full_page\": true\n }\n}\n```\n\n### 5. `pdf` - PDF Generation\n\nConvert webpages to PDF.\n\n```json\n{\n \"name\": \"pdf\",\n \"arguments\": {\n \"url\": \"https://example.com\",\n \"format\": \"A4\"\n }\n}\n```\n\n### 6. `execute_js` - JavaScript Execution\n\nExecute JavaScript on webpages.\n\n```json\n{\n \"name\": \"execute_js\",\n \"arguments\": {\n \"url\": \"https://example.com\",\n \"script\": \"document.title\"\n }\n}\n```\n\n## \ud83c\udf10 Transport Protocols\n\n### STDIO Transport\n\nBest for command-line tools and local development.\n\n```bash\ncrawl4ai-mcp --stdio\n```\n\n**Use cases:**\n- Claude Desktop app\n- Terminal-based MCP clients\n- Local development and testing\n- CI/CD pipelines\n\n### SSE Transport (Server-Sent Events)\n\nIdeal for real-time web applications.\n\n```bash\ncrawl4ai-mcp --sse\n```\n\n**Use cases:**\n- Web-based MCP clients\n- Real-time streaming applications\n- Browser extensions\n- Progressive web apps\n\n### HTTP Transport\n\nStandard REST API for maximum compatibility.\n\n```bash\ncrawl4ai-mcp --http\n```\n\n**Use cases:**\n- REST API clients\n- Microservice architectures\n- Cloud deployments\n- Load-balanced environments\n\n## \u2699\ufe0f Configuration\n\n### Environment Variables\n\n```bash\n# Crawl4AI endpoint (if using remote instance)\nCRAWL4AI_ENDPOINT=http://localhost:8000\n\n# Server ports\nHTTP_PORT=3000\nSSE_PORT=3001\n\n# Logging\nLOG_LEVEL=INFO\n\n# Performance\nMAX_CONCURRENT_REQUESTS=10\nREQUEST_TIMEOUT=30\n```\n\n### Configuration File\n\nCreate `.env` file:\n\n```env\nCRAWL4AI_ENDPOINT=http://your-crawl4ai-instance.com\nHTTP_PORT=3000\nSSE_PORT=3001\nLOG_LEVEL=DEBUG\nDEBUG=true\n```\n\n### Advanced Settings\n\n```python\n# config/settings.py\nfrom pydantic import BaseSettings\n\nclass Settings(BaseSettings):\n crawl4ai_endpoint: str = \"http://localhost:8000\"\n http_port: int = 3000\n sse_port: int = 3001\n log_level: str = \"INFO\"\n debug: bool = False\n max_concurrent_requests: int = 10\n request_timeout: int = 30\n \n class Config:\n env_file = \".env\"\n\nsettings = Settings()\n```\n\n## \ud83d\udc33 Docker Support\n\n### Quick Start with Docker\n\n```bash\n# Run with default settings\ndocker run -p 3000:3000 stgmt/crawl4ai-mcp\n\n# With environment variables\ndocker run -p 3000:3000 \\\n -e CRAWL4AI_ENDPOINT=http://crawl4ai:8000 \\\n -e LOG_LEVEL=DEBUG \\\n stgmt/crawl4ai-mcp\n\n# With Docker Compose\ndocker-compose up\n```\n\n### Docker Compose\n\n```yaml\nversion: '3.8'\n\nservices:\n crawl4ai-mcp:\n image: stgmt/crawl4ai-mcp\n ports:\n - \"3000:3000\"\n - \"3001:3001\"\n environment:\n - CRAWL4AI_ENDPOINT=http://crawl4ai:8000\n - LOG_LEVEL=INFO\n depends_on:\n - crawl4ai\n \n crawl4ai:\n image: crawl4ai/crawl4ai\n ports:\n - \"8000:8000\"\n```\n\n### Building Custom Image\n\n```dockerfile\nFROM python:3.11-slim\n\nWORKDIR /app\n\nCOPY requirements.txt .\nRUN pip install --no-cache-dir -r requirements.txt\n\nCOPY . .\n\nRUN pip install -e .\n\nEXPOSE 3000 3001\n\nCMD [\"crawl4ai-mcp\", \"--http\"]\n```\n\n## \ud83e\uddea Testing\n\n### Running Tests\n\n```bash\n# Install test dependencies\npip install -e .[test]\n\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=crawl4ai_mcp\n\n# Run specific test\npytest tests/test_server.py::test_crawl_tool\n```\n\n### Testing with MCP Server Tester\n\nThis MCP server can be comprehensively tested using the [MCP Server Tester](https://github.com/stgmt/mcp-server-tester-sse-http-stdio) that supports all three transport protocols (STDIO, SSE, HTTP).\n\n#### Install MCP Server Tester\n\n```bash\n# Option 1: Using Docker (recommended)\ndocker run -it stgmt/mcp-server-tester test --help\n\n# Option 2: Using NPM\nnpm install -g mcp-server-tester-sse-http-stdio\nmcp-server-tester test --help\n\n# Option 3: Using Python\npip install mcp-server-tester-sse-http-stdio\nmcp-server-tester test --help\n```\n\n#### Test Examples\n\n**Test STDIO mode:**\n\n```bash\n# Docker\ndocker run -it stgmt/mcp-server-tester test \\\n --transport stdio \\\n --command \"crawl4ai-mcp --stdio\"\n\n# NPM/Python\nmcp-server-tester test \\\n --transport stdio \\\n --command \"crawl4ai-mcp --stdio\"\n```\n\n**Test HTTP mode:**\n\n```bash\n# Start the server first\ncrawl4ai-mcp --http &\n\n# Run tests\nmcp-server-tester test \\\n --transport http \\\n --url http://localhost:3000\n```\n\n**Test SSE mode:**\n\n```bash\n# Start the server first\ncrawl4ai-mcp --sse &\n\n# Run tests\nmcp-server-tester test \\\n --transport sse \\\n --url http://localhost:3001\n```\n\n**Test with configuration file:**\n\nCreate `test-config.yaml`:\n\n```yaml\nname: Crawl4AI Comprehensive Tests\ntransport: stdio\ncommand: crawl4ai-mcp --stdio\ntests:\n - name: Test markdown extraction\n tool: md\n arguments:\n url: https://example.com\n f: fit\n assert:\n - type: contains\n value: \"Example Domain\"\n \n - name: Test screenshot\n tool: screenshot\n arguments:\n url: https://example.com\n screenshot_wait_for: 2\n assert:\n - type: exists\n path: result\n \n - name: Test HTML extraction\n tool: html\n arguments:\n url: https://example.com\n assert:\n - type: contains\n value: \"<html\"\n \n - name: Test JavaScript execution\n tool: execute_js\n arguments:\n url: https://example.com\n scripts: [\"document.title\"]\n assert:\n - type: contains\n value: \"Example\"\n```\n\nRun the test:\n\n```bash\nmcp-server-tester test -f test-config.yaml\n```\n\n#### Interactive Testing\n\nThe tester also provides an interactive mode for manual testing:\n\n```bash\n# Interactive STDIO mode\nmcp-server-tester interactive \\\n --transport stdio \\\n --command \"crawl4ai-mcp --stdio\"\n\n# Interactive HTTP mode\nmcp-server-tester interactive \\\n --transport http \\\n --url http://localhost:3000\n```\n\n## \ud83d\udcda Examples\n\n### Example 1: Extract Documentation\n\n```python\n# Extract markdown from documentation\nresult = await session.call_tool(\"md\", {\n \"url\": \"https://docs.python.org/3/\",\n \"clean\": True\n})\n```\n\n### Example 2: Monitor Price Changes\n\n```python\n# Screenshot for visual comparison\nscreenshot = await session.call_tool(\"screenshot\", {\n \"url\": \"https://store.example.com/product\",\n \"full_page\": False\n})\n\n# Extract price via JavaScript\nprice = await session.call_tool(\"execute_js\", {\n \"url\": \"https://store.example.com/product\",\n \"script\": \"document.querySelector('.price').innerText\"\n})\n```\n\n### Example 3: Generate Reports\n\n```python\n# Generate PDF report\npdf = await session.call_tool(\"pdf\", {\n \"url\": \"https://analytics.example.com/report\",\n \"format\": \"A4\",\n \"landscape\": True\n})\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/stgmt/crawl4ai-mcp.git\ncd crawl4ai-mcp\n\n# Install in development mode\npip install -e .[dev]\n\n# Run tests\npytest\n\n# Format code\nblack crawl4ai_mcp tests\nruff check --fix crawl4ai_mcp tests\n```\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **PyPI Package**: [https://pypi.org/project/crawl4ai-mcp/](https://pypi.org/project/crawl4ai-mcp/)\n- **GitHub Repository**: [https://github.com/stgmt/crawl4ai-mcp](https://github.com/stgmt/crawl4ai-mcp)\n- **Documentation**: [https://github.com/stgmt/crawl4ai-mcp#readme](https://github.com/stgmt/crawl4ai-mcp#readme)\n- **Issues**: [https://github.com/stgmt/crawl4ai-mcp/issues](https://github.com/stgmt/crawl4ai-mcp/issues)\n\n## \ud83d\ude4f Acknowledgments\n\n- [Crawl4AI](https://github.com/unclecode/crawl4ai) - The powerful crawling engine\n- [MCP](https://modelcontextprotocol.io) - Model Context Protocol specification\n- [Anthropic](https://anthropic.com) - For creating the MCP standard\n\n---\n\n**Made with \u2764\ufe0f for the AI community**\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction",
"version": "1.0.5",
"project_urls": {
"Documentation": "https://github.com/stgmt/crawl4ai-mcp#readme",
"Homepage": "https://github.com/stgmt/crawl4ai-mcp",
"Issues": "https://github.com/stgmt/crawl4ai-mcp/issues",
"Repository": "https://github.com/stgmt/crawl4ai-mcp"
},
"split_keywords": [
"mcp",
" model-context-protocol",
" crawl4ai",
" web-scraping",
" web-crawler",
" data-extraction",
" ai-tools",
" llm-tools",
" anthropic",
" claude",
" openai",
" chatgpt",
" async",
" python",
" sse",
" server-sent-events",
" stdio",
" http",
" webscraping",
" beautifulsoup",
" playwright",
" selenium",
" api",
" automation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "dd6a1c3085d836dde6fb65409c20d05c42ea9984fe1c21b924111eb4acf40fab",
"md5": "a97a2e9ce0e4d2c4e94a0ee9beb6d597",
"sha256": "a3ff5a56b32e8cb6af2e90bb4bef1ca88d952e6cf70a82ec9e079538fa1d299c"
},
"downloads": -1,
"filename": "crawl4ai_mcp_sse_stdio-1.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a97a2e9ce0e4d2c4e94a0ee9beb6d597",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 12178,
"upload_time": "2025-09-11T22:58:20",
"upload_time_iso_8601": "2025-09-11T22:58:20.633881Z",
"url": "https://files.pythonhosted.org/packages/dd/6a/1c3085d836dde6fb65409c20d05c42ea9984fe1c21b924111eb4acf40fab/crawl4ai_mcp_sse_stdio-1.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1ba8202614530c15fd969c58cf56c1a514e2fccd7a0a67945a8b5d950b716067",
"md5": "1966c42eefe7ff81c37a16aa97c93176",
"sha256": "4254c5ba5904f54d8cd4b73543e125b128014745a36cb7f71325d473fe1e8bcf"
},
"downloads": -1,
"filename": "crawl4ai_mcp_sse_stdio-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "1966c42eefe7ff81c37a16aa97c93176",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 17069,
"upload_time": "2025-09-11T22:58:21",
"upload_time_iso_8601": "2025-09-11T22:58:21.803968Z",
"url": "https://files.pythonhosted.org/packages/1b/a8/202614530c15fd969c58cf56c1a514e2fccd7a0a67945a8b5d950b716067/crawl4ai_mcp_sse_stdio-1.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-11 22:58:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "stgmt",
"github_project": "crawl4ai-mcp#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "mcp",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "crawl4ai",
"specs": [
[
">=",
"0.2.0"
]
]
},
{
"name": "starlette",
"specs": [
[
">=",
"0.27.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "httpx",
"specs": [
[
">=",
"0.25.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "aiofiles",
"specs": [
[
">=",
"23.0.0"
]
]
}
],
"lcname": "crawl4ai-mcp-sse-stdio"
}