spider-mcp-client


Namespider-mcp-client JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/spider-mcp/spider-mcp-client
SummaryOfficial Python client for Spider MCP web scraping API
upload_time2025-08-30 17:49:30
maintainerNone
docs_urlNone
authorSpider MCP Team
requires_python>=3.8
licenseMIT
keywords web scraping spider mcp api client html parsing data extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Spider MCP Client

[![PyPI version](https://badge.fury.io/py/spider-mcp-client.svg)](https://badge.fury.io/py/spider-mcp-client)
[![Python Support](https://img.shields.io/pypi/pyversions/spider-mcp-client.svg)](https://pypi.org/project/spider-mcp-client/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Official Python client for **Spider MCP** - a professional web scraping API with advanced anti-detection capabilities.

## 🚀 Quick Start

### Installation

```bash
pip install spider-mcp-client
```

### Basic Usage

```python
from spider_mcp_client import SpiderMCPClient

# Initialize client
client = SpiderMCPClient(
    api_key="your-api-key-here",
    base_url="http://localhost:8003"  # Your Spider MCP server
)

# Parse a URL
result = client.parse_url("https://example.com/article")

print(f"Status: {result['status']}")
print(f"Title: {result['html_data'].get('title', 'N/A')}")
print(f"Parser: {result['status_detail']['parser_used']}")
print(f"API Calls: {len(result['api_calls'])}")
print(f"Images: {len(result['downloaded_images'])}")
```

## 📋 Features

- ✅ **Simple API** - One method to parse any supported URL
- ✅ **Built-in retry logic** - Automatic retries with exponential backoff
- ✅ **Rate limiting** - Respectful delays between requests
- ✅ **Error handling** - Clear exceptions for different error types
- ✅ **Image support** - Optional image download and localization
- ✅ **Session isolation** - Multiple isolated browser sessions
- ✅ **Type hints** - Full typing support for better IDE experience

## 🔧 API Reference

### SpiderMCPClient

```python
client = SpiderMCPClient(
    api_key="your-api-key",           # Required: Your API key
    base_url="http://localhost:8003", # Spider MCP server URL
    timeout=30,                       # Request timeout (seconds)
    max_retries=3,                    # Max retry attempts
    rate_limit_delay=1.0             # Delay between requests (seconds)
)
```

### parse_url()

```python
result = client.parse_url(
    url="https://example.com/article",  # Required: URL to parse
    download_images=False,              # Optional: Download images
    session_name="my-session",          # Optional: Session name
    retry=1                             # Optional: Retry attempts (default: 1)
)
```

**Returns:**

```python
{
    "status": "success",
    "url": "https://example.com/article",
    "html_data": {
        "type": "article",
        "title": "Article Title",
        "content": "Full article content...",
        "author": "Author Name",
        "publish_date": "2025-01-17"
    },
    "api_calls": [...],  # Captured API calls
    "downloaded_images": [...],  # Downloaded images
    "status_detail": {
        "parser_used": "example.com - article_parser",
        "parser_id": 123,
        "success": true
    }
}
```

## 📖 Examples

### Basic Article Parsing

```python
from spider_mcp_client import SpiderMCPClient

client = SpiderMCPClient(api_key="sk-1234567890abcdef")

# Parse a news article
result = client.parse_url("https://techcrunch.com/2025/01/17/ai-news")

if result['status'] == 'success':
    html_data = result['html_data']
    print(f"📰 {html_data.get('title', 'N/A')}")
    print(f"✍️  {html_data.get('author', 'Unknown')}")
    print(f"📅 {html_data.get('publish_date', 'Unknown')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")
```

### With Image Download

```python
# Parse with image download
result = client.parse_url(
    url="https://news-site.com/photo-story",
    download_images=True
)

if result['status'] == 'success':
    images = result['downloaded_images']
    print(f"Downloaded {len(images)} images:")
    for img_url in images:
        print(f"  🖼️  {img_url}")
```

### Error Handling

```python
from spider_mcp_client import (
    SpiderMCPClient,
    ParserNotFoundError,
    AuthenticationError
)

client = SpiderMCPClient(api_key="your-api-key")

try:
    result = client.parse_url("https://unsupported-site.com/article")
    if result['status'] == 'success':
        print(f"Success: {result['html_data'].get('title', 'N/A')}")
    else:
        print(f"Parse failed: {result['status_detail'].get('error', 'Unknown error')}")

except ParserNotFoundError:
    print("❌ No parser available for this website")

except AuthenticationError:
    print("❌ Invalid API key")

except Exception as e:
    print(f"❌ Error: {e}")
```

### With Retry Logic

```python
# Parse with automatic retries
result = client.parse_url(
    url="https://sometimes-slow-site.com/article",
    retry=3  # Will attempt up to 4 times (initial + 3 retries)
)

if result['status'] == 'success':
    print(f"✅ Success: {result['html_data'].get('title')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")
else:
    print(f"❌ Failed: {result['status_detail'].get('error')}")
```

### API Calls and Images

```python
# Parse a page that makes API calls and has images
result = client.parse_url(
    url="https://dynamic-site.com/article",
    download_images=True
)

if result['status'] == 'success':
    print(f"📰 Title: {result['html_data'].get('title')}")
    print(f"🌐 API calls captured: {len(result['api_calls'])}")
    print(f"🖼️  Images downloaded: {len(result['downloaded_images'])}")

    # Show captured API calls
    for api_call in result['api_calls']:
        print(f"  📡 {api_call['method']} {api_call['url']}")
```

### Check Parser Availability

```python
# Check if parser exists before parsing
parser_info = client.check_parser("https://target-site.com/article")

if parser_info.get('found'):
    print(f"✅ Parser available: {parser_info['parser']['site_name']}")
    result = client.parse_url("https://target-site.com/article")
    if result['status'] == 'success':
        print(f"📰 {result['html_data'].get('title')}")
else:
    print("❌ No parser found for this URL")
```

## 🚨 Exception Types

```python
from spider_mcp_client import (
    SpiderMCPError,        # Base exception
    AuthenticationError,   # Invalid API key
    ParserNotFoundError,   # No parser for URL
    RateLimitError,        # Rate limit exceeded
    ServerError,           # Server error (5xx)
    TimeoutError,          # Request timeout
    ConnectionError        # Connection failed
)
```

## 🔑 Getting Your API Key

1. **Start Spider MCP server:**

   ```bash
   # On your Spider MCP server
   ./restart.sh
   ```

2. **Visit admin interface:**

   ```
   http://localhost:8003/admin/users
   ```

3. **Create/view user and copy API key**

## 🌐 Server Requirements

This client requires a running **Spider MCP server**. The server provides:

- ✅ **Custom parsers** for each website
- ✅ **Undetected ChromeDriver** for Cloudflare bypass
- ✅ **Professional anti-detection** capabilities
- ✅ **Image processing** and localization
- ✅ **Session management** and isolation

## 📚 Advanced Usage

### Session Isolation

```python
# Use session names for browser isolation
client = SpiderMCPClient(api_key="your-api-key")

# Each session gets its own browser context
result1 = client.parse_url(
    "https://site.com/page1",
    session_name="session-1"
)

result2 = client.parse_url(
    "https://site.com/page2",
    session_name="session-2"
)
```

### Configuration

```python
# Production configuration
client = SpiderMCPClient(
    api_key="your-api-key",
    base_url="https://your-spider-mcp-server.com",
    timeout=60,           # Longer timeout for complex pages
    max_retries=5,        # More retries for reliability
    rate_limit_delay=2.0  # Slower rate for respectful scraping
)
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- **PyPI Package:** https://pypi.org/project/spider-mcp-client/
- **GitHub Repository:** https://github.com/spider-mcp/spider-mcp-client
- **Documentation:** https://spider-mcp.readthedocs.io/
- **Spider MCP Server:** https://github.com/spider-mcp/spider-mcp

---

**Made with ❤️ by the Spider MCP Team**

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/spider-mcp/spider-mcp-client",
    "name": "spider-mcp-client",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "importal <xychen@msn.com>",
    "keywords": "web scraping, spider, mcp, api client, html parsing, data extraction",
    "author": "Spider MCP Team",
    "author_email": "importal <xychen@msn.com>",
    "download_url": "https://files.pythonhosted.org/packages/42/bb/9d36d78d158772397d844796cbdf0e23aa81a2a9049385cfa6951c3e7179/spider_mcp_client-0.1.4.tar.gz",
    "platform": null,
    "description": "# Spider MCP Client\n\n[![PyPI version](https://badge.fury.io/py/spider-mcp-client.svg)](https://badge.fury.io/py/spider-mcp-client)\n[![Python Support](https://img.shields.io/pypi/pyversions/spider-mcp-client.svg)](https://pypi.org/project/spider-mcp-client/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nOfficial Python client for **Spider MCP** - a professional web scraping API with advanced anti-detection capabilities.\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install spider-mcp-client\n```\n\n### Basic Usage\n\n```python\nfrom spider_mcp_client import SpiderMCPClient\n\n# Initialize client\nclient = SpiderMCPClient(\n    api_key=\"your-api-key-here\",\n    base_url=\"http://localhost:8003\"  # Your Spider MCP server\n)\n\n# Parse a URL\nresult = client.parse_url(\"https://example.com/article\")\n\nprint(f\"Status: {result['status']}\")\nprint(f\"Title: {result['html_data'].get('title', 'N/A')}\")\nprint(f\"Parser: {result['status_detail']['parser_used']}\")\nprint(f\"API Calls: {len(result['api_calls'])}\")\nprint(f\"Images: {len(result['downloaded_images'])}\")\n```\n\n## \ud83d\udccb Features\n\n- \u2705 **Simple API** - One method to parse any supported URL\n- \u2705 **Built-in retry logic** - Automatic retries with exponential backoff\n- \u2705 **Rate limiting** - Respectful delays between requests\n- \u2705 **Error handling** - Clear exceptions for different error types\n- \u2705 **Image support** - Optional image download and localization\n- \u2705 **Session isolation** - Multiple isolated browser sessions\n- \u2705 **Type hints** - Full typing support for better IDE experience\n\n## \ud83d\udd27 API Reference\n\n### SpiderMCPClient\n\n```python\nclient = SpiderMCPClient(\n    api_key=\"your-api-key\",           # Required: Your API key\n    base_url=\"http://localhost:8003\", # Spider MCP server URL\n    timeout=30,                       # Request timeout (seconds)\n    max_retries=3,                    # Max retry attempts\n    rate_limit_delay=1.0             # Delay between requests (seconds)\n)\n```\n\n### parse_url()\n\n```python\nresult = client.parse_url(\n    url=\"https://example.com/article\",  # Required: URL to parse\n    download_images=False,              # Optional: Download images\n    session_name=\"my-session\",          # Optional: Session name\n    retry=1                             # Optional: Retry attempts (default: 1)\n)\n```\n\n**Returns:**\n\n```python\n{\n    \"status\": \"success\",\n    \"url\": \"https://example.com/article\",\n    \"html_data\": {\n        \"type\": \"article\",\n        \"title\": \"Article Title\",\n        \"content\": \"Full article content...\",\n        \"author\": \"Author Name\",\n        \"publish_date\": \"2025-01-17\"\n    },\n    \"api_calls\": [...],  # Captured API calls\n    \"downloaded_images\": [...],  # Downloaded images\n    \"status_detail\": {\n        \"parser_used\": \"example.com - article_parser\",\n        \"parser_id\": 123,\n        \"success\": true\n    }\n}\n```\n\n## \ud83d\udcd6 Examples\n\n### Basic Article Parsing\n\n```python\nfrom spider_mcp_client import SpiderMCPClient\n\nclient = SpiderMCPClient(api_key=\"sk-1234567890abcdef\")\n\n# Parse a news article\nresult = client.parse_url(\"https://techcrunch.com/2025/01/17/ai-news\")\n\nif result['status'] == 'success':\n    html_data = result['html_data']\n    print(f\"\ud83d\udcf0 {html_data.get('title', 'N/A')}\")\n    print(f\"\u270d\ufe0f  {html_data.get('author', 'Unknown')}\")\n    print(f\"\ud83d\udcc5 {html_data.get('publish_date', 'Unknown')}\")\n    print(f\"\ud83d\udd27 Parser: {result['status_detail']['parser_used']}\")\n```\n\n### With Image Download\n\n```python\n# Parse with image download\nresult = client.parse_url(\n    url=\"https://news-site.com/photo-story\",\n    download_images=True\n)\n\nif result['status'] == 'success':\n    images = result['downloaded_images']\n    print(f\"Downloaded {len(images)} images:\")\n    for img_url in images:\n        print(f\"  \ud83d\uddbc\ufe0f  {img_url}\")\n```\n\n### Error Handling\n\n```python\nfrom spider_mcp_client import (\n    SpiderMCPClient,\n    ParserNotFoundError,\n    AuthenticationError\n)\n\nclient = SpiderMCPClient(api_key=\"your-api-key\")\n\ntry:\n    result = client.parse_url(\"https://unsupported-site.com/article\")\n    if result['status'] == 'success':\n        print(f\"Success: {result['html_data'].get('title', 'N/A')}\")\n    else:\n        print(f\"Parse failed: {result['status_detail'].get('error', 'Unknown error')}\")\n\nexcept ParserNotFoundError:\n    print(\"\u274c No parser available for this website\")\n\nexcept AuthenticationError:\n    print(\"\u274c Invalid API key\")\n\nexcept Exception as e:\n    print(f\"\u274c Error: {e}\")\n```\n\n### With Retry Logic\n\n```python\n# Parse with automatic retries\nresult = client.parse_url(\n    url=\"https://sometimes-slow-site.com/article\",\n    retry=3  # Will attempt up to 4 times (initial + 3 retries)\n)\n\nif result['status'] == 'success':\n    print(f\"\u2705 Success: {result['html_data'].get('title')}\")\n    print(f\"\ud83d\udd27 Parser: {result['status_detail']['parser_used']}\")\nelse:\n    print(f\"\u274c Failed: {result['status_detail'].get('error')}\")\n```\n\n### API Calls and Images\n\n```python\n# Parse a page that makes API calls and has images\nresult = client.parse_url(\n    url=\"https://dynamic-site.com/article\",\n    download_images=True\n)\n\nif result['status'] == 'success':\n    print(f\"\ud83d\udcf0 Title: {result['html_data'].get('title')}\")\n    print(f\"\ud83c\udf10 API calls captured: {len(result['api_calls'])}\")\n    print(f\"\ud83d\uddbc\ufe0f  Images downloaded: {len(result['downloaded_images'])}\")\n\n    # Show captured API calls\n    for api_call in result['api_calls']:\n        print(f\"  \ud83d\udce1 {api_call['method']} {api_call['url']}\")\n```\n\n### Check Parser Availability\n\n```python\n# Check if parser exists before parsing\nparser_info = client.check_parser(\"https://target-site.com/article\")\n\nif parser_info.get('found'):\n    print(f\"\u2705 Parser available: {parser_info['parser']['site_name']}\")\n    result = client.parse_url(\"https://target-site.com/article\")\n    if result['status'] == 'success':\n        print(f\"\ud83d\udcf0 {result['html_data'].get('title')}\")\nelse:\n    print(\"\u274c No parser found for this URL\")\n```\n\n## \ud83d\udea8 Exception Types\n\n```python\nfrom spider_mcp_client import (\n    SpiderMCPError,        # Base exception\n    AuthenticationError,   # Invalid API key\n    ParserNotFoundError,   # No parser for URL\n    RateLimitError,        # Rate limit exceeded\n    ServerError,           # Server error (5xx)\n    TimeoutError,          # Request timeout\n    ConnectionError        # Connection failed\n)\n```\n\n## \ud83d\udd11 Getting Your API Key\n\n1. **Start Spider MCP server:**\n\n   ```bash\n   # On your Spider MCP server\n   ./restart.sh\n   ```\n\n2. **Visit admin interface:**\n\n   ```\n   http://localhost:8003/admin/users\n   ```\n\n3. **Create/view user and copy API key**\n\n## \ud83c\udf10 Server Requirements\n\nThis client requires a running **Spider MCP server**. The server provides:\n\n- \u2705 **Custom parsers** for each website\n- \u2705 **Undetected ChromeDriver** for Cloudflare bypass\n- \u2705 **Professional anti-detection** capabilities\n- \u2705 **Image processing** and localization\n- \u2705 **Session management** and isolation\n\n## \ud83d\udcda Advanced Usage\n\n### Session Isolation\n\n```python\n# Use session names for browser isolation\nclient = SpiderMCPClient(api_key=\"your-api-key\")\n\n# Each session gets its own browser context\nresult1 = client.parse_url(\n    \"https://site.com/page1\",\n    session_name=\"session-1\"\n)\n\nresult2 = client.parse_url(\n    \"https://site.com/page2\",\n    session_name=\"session-2\"\n)\n```\n\n### Configuration\n\n```python\n# Production configuration\nclient = SpiderMCPClient(\n    api_key=\"your-api-key\",\n    base_url=\"https://your-spider-mcp-server.com\",\n    timeout=60,           # Longer timeout for complex pages\n    max_retries=5,        # More retries for reliability\n    rate_limit_delay=2.0  # Slower rate for respectful scraping\n)\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **PyPI Package:** https://pypi.org/project/spider-mcp-client/\n- **GitHub Repository:** https://github.com/spider-mcp/spider-mcp-client\n- **Documentation:** https://spider-mcp.readthedocs.io/\n- **Spider MCP Server:** https://github.com/spider-mcp/spider-mcp\n\n---\n\n**Made with \u2764\ufe0f by the Spider MCP Team**\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Official Python client for Spider MCP web scraping API",
    "version": "0.1.4",
    "project_urls": {
        "Bug Reports": "https://github.com/xychenmsn/spider-mcp-client/issues",
        "Changelog": "https://github.com/xychenmsn/spider-mcp-client/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/xychenmsn/spider-mcp-client#readme",
        "Homepage": "https://github.com/xychenmsn/spider-mcp-client",
        "Repository": "https://github.com/xychenmsn/spider-mcp-client"
    },
    "split_keywords": [
        "web scraping",
        " spider",
        " mcp",
        " api client",
        " html parsing",
        " data extraction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "913b304ab9e77286a550d4374448f3d136b2c1c7efb834d0397ffce2c3e1eeb4",
                "md5": "7083fcc30657bc088fa8533206127f9a",
                "sha256": "a1840df92a020ec272698a74e1980445488cd368aa5ca8bfa72bca598376de11"
            },
            "downloads": -1,
            "filename": "spider_mcp_client-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7083fcc30657bc088fa8533206127f9a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9385,
            "upload_time": "2025-08-30T17:49:29",
            "upload_time_iso_8601": "2025-08-30T17:49:29.636998Z",
            "url": "https://files.pythonhosted.org/packages/91/3b/304ab9e77286a550d4374448f3d136b2c1c7efb834d0397ffce2c3e1eeb4/spider_mcp_client-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "42bb9d36d78d158772397d844796cbdf0e23aa81a2a9049385cfa6951c3e7179",
                "md5": "0b63f4f7419e209ee0a33cd9bca0cb9f",
                "sha256": "b2e1bfa89383889eb25d7d7a67cd97f85f78e6061501b1fbd32125426edcea2e"
            },
            "downloads": -1,
            "filename": "spider_mcp_client-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "0b63f4f7419e209ee0a33cd9bca0cb9f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17891,
            "upload_time": "2025-08-30T17:49:30",
            "upload_time_iso_8601": "2025-08-30T17:49:30.438032Z",
            "url": "https://files.pythonhosted.org/packages/42/bb/9d36d78d158772397d844796cbdf0e23aa81a2a9049385cfa6951c3e7179/spider_mcp_client-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-30 17:49:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "spider-mcp",
    "github_project": "spider-mcp-client",
    "github_not_found": true,
    "lcname": "spider-mcp-client"
}
        
Elapsed time: 2.41612s