scrapeless


Namescrapeless JSON
Version 1.2.1 PyPI version JSON
download
home_pagehttps://github.com/scrapeless-ai/sdk-python
Summaryscrapeless python sdk
upload_time2025-07-23 07:26:59
maintainerNone
docs_urlNone
authorscrapelessteam
requires_python>=3.8
licenseMIT
keywords scrapeless browserless scraping browser scraping web unlocker captcha solver rotate proxy
VCS
bugtrack_url
requirements pyppeteer setuptools python-dotenv requests playwright requests playwright
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Scrapeless Python SDK

The official Python SDK for [Scrapeless AI](https://scrapeless.com) - End-to-End Data Infrastructure for AI Developers & Enterprises.

## 📑 Table of Contents

- [🌟 Features](#-features)
- [📦 Installation](#-installation)
- [🚀 Quick Start](#-quick-start)
- [📖 Usage Examples](#-usage-examples)
- [🔧 API Reference](#-api-reference)
- [📚 Examples](#-examples)
- [📄 License](#-license)
- [📞 Support](#-support)
- [🏢 About Scrapeless](#-about-scrapeless)

## 🌟 Features

- **Browser**: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
- **Universal Scraping API**: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
- **Crawl**: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
- **Scraping API**: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
- **Deep SerpApi**: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
- **Proxies**: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
- **Actor**: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
- **Storage Solutions**: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.

## 📦 Installation

Install the SDK using pip:

```bash
pip install scrapeless
```

## 🚀 Quick Start

### Prerequisite

[Log in](https://app.scrapeless.com) to the Scrapeless Dashboard and get the API Key

### Basic Setup

```python
from scrapeless import Scrapeless

client = Scrapeless({
    'api_key': 'your-api-key'  # Get your API key from https://scrapeless.com
})
```

### Environment Variables

You can also configure the SDK using environment variables:

```bash
# Required
SCRAPELESS_API_KEY=your-api-key

# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com
```

## 📖 Usage Examples

### Browser

Advanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:

```python
from scrapeless import Scrapeless
from scrapeless.types import ICreateBrowser
import pyppeteer

client = Scrapeless()


async def example():
    # Create a browser session
    config = ICreateBrowser(
        session_name='sdk_test',
        session_ttl=180,
        proxy_country='US',
        session_recording=True
    )
    session = client.browser.create(config).__dict__
    browser_ws_endpoint = session['browser_ws_endpoint']
    print('Browser WebSocket endpoint created:', browser_ws_endpoint)

    # Connect to browser using pyppeteer
    browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})
    # Open new page and navigate to website
    page = await browser.newPage()
    await page.goto('https://www.scrapeless.com')
```

### Crawl

Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.

```python
from scrapeless import Scrapeless

client = Scrapeless()

result = client.scraping_crawl.scrape_url("https://example.com")
print(result)
```

### Scraping API

Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:

```python
from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.scraping.scrape(request=request)
print(result)
```

### Deep SerpApi

Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:

```python
from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.deepserp.scrape(request=request)
print(result)
```

### Actor

Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:

```python
from scrapeless import Scrapeless
from scrapeless.types import IRunActorData, IActorRunOptions

client = Scrapeless()
data = IRunActorData(
    input={'url': 'https://example.com'},
    run_options=IActorRunOptions(
        CPU=2,
        memory=2048,
        timeout=600,
    )
)

run = client.actor.run(
    actor_id='your_actor_id',
    data=data
)
print('Actor run result:', run)
```

### Error Handling

The SDK throws `ScrapelessError` for API-related errors:

```python
from scrapeless import Scrapeless, ScrapelessError

client = Scrapeless()
try:
    result = client.scraping.scrape({'url': 'invalid-url'})
except ScrapelessError as error:
    print(f"Scrapeless API error: {error}")
    if hasattr(error, 'status_code'):
        print(f"Status code: {error.status_code}")
```

## 🔧 API Reference

### Client Configuration

```python
from scrapeless.types import ScrapelessConfig 

config = ScrapelessConfig(
    api_key='', # Your api key
    timeout=30000, # Request timeout in milliseconds (default: 30000)
    base_api_url='', # Base API URL
    actor_api_url='', # Actor service URL
    storage_api_url='', # Storage service URL
    browser_api_url='', # Browser service URL
    scraping_crawl_api_url='' # Crawl service URL
)
```

### Available Services

The SDK provides the following services through the main client:

- `client.browser` - browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.
- `client.universal` - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.
- `client.scraping_crawl` - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).
- `client.scraping` - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.
- `client.deepserp` - Search engine results extraction
- `client.proxies` - Proxy management
- `client.actor` - Scalable workflow automation with built-in scheduling and resource management.
- `client.storage` - Data storage solutions

## 📚 Examples

Check out the [`examples`](./examples) directory for comprehensive usage examples:

- [Browser](./examples/browser_example.py)
- [Playwright Integration](./examples/playwright_example.py)
- [Pyppeteer Integration](./examples/pyppeteer_example.py)
- [Scraping API](./examples/scraping_example.py)
- [Actor](./examples/actor_example.py)
- [Storage Usage](./examples/storage_example.py)
- [Proxies](./examples/proxies_example.py)
- [Deep SerpApi](./examples/deepserp_example.py)

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📞 Support

- 📖 **Documentation**: [https://docs.scrapeless.com](https://docs.scrapeless.com)
- 💬 **Community**: [Join our Discord](https://backend.scrapeless.com/app/api/v1/public/links/discord)
- 🐛 **Issues**: [GitHub Issues](https://github.com/scrapeless-ai/sdk-python/issues)
- 📧 **Email**: [support@scrapeless.com](mailto:support@scrapeless.com)

## 🏢 About Scrapeless

Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:

- High-performance web scraping infrastructure
- Global proxy network
- Browser automation capabilities
- Enterprise-grade reliability and support

Visit [scrapeless.com](https://scrapeless.com) to learn more and get started.

---

Made with ❤️ by the Scrapeless team

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/scrapeless-ai/sdk-python",
    "name": "scrapeless",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "scrapeless, browserless, scraping browser, scraping, web unlocker, captcha solver, rotate proxy",
    "author": "scrapelessteam",
    "author_email": "scrapelessteam@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/64/fe/8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd/scrapeless-1.2.1.tar.gz",
    "platform": null,
    "description": "# Scrapeless Python SDK\n\nThe official Python SDK for [Scrapeless AI](https://scrapeless.com) - End-to-End Data Infrastructure for AI Developers & Enterprises.\n\n## \ud83d\udcd1 Table of Contents\n\n- [\ud83c\udf1f Features](#-features)\n- [\ud83d\udce6 Installation](#-installation)\n- [\ud83d\ude80 Quick Start](#-quick-start)\n- [\ud83d\udcd6 Usage Examples](#-usage-examples)\n- [\ud83d\udd27 API Reference](#-api-reference)\n- [\ud83d\udcda Examples](#-examples)\n- [\ud83d\udcc4 License](#-license)\n- [\ud83d\udcde Support](#-support)\n- [\ud83c\udfe2 About Scrapeless](#-about-scrapeless)\n\n## \ud83c\udf1f Features\n\n- **Browser**: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.\n- **Universal Scraping API**: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.\n- **Crawl**: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.\n- **Scraping API**: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.\n- **Deep SerpApi**: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.\n- **Proxies**: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.\n- **Actor**: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.\n- **Storage Solutions**: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.\n\n## \ud83d\udce6 Installation\n\nInstall the SDK using pip:\n\n```bash\npip install scrapeless\n```\n\n## \ud83d\ude80 Quick Start\n\n### Prerequisite\n\n[Log in](https://app.scrapeless.com) to the Scrapeless Dashboard and get the API Key\n\n### Basic Setup\n\n```python\nfrom scrapeless import Scrapeless\n\nclient = Scrapeless({\n    'api_key': 'your-api-key'  # Get your API key from https://scrapeless.com\n})\n```\n\n### Environment Variables\n\nYou can also configure the SDK using environment variables:\n\n```bash\n# Required\nSCRAPELESS_API_KEY=your-api-key\n\n# Optional - Custom API endpoints\nSCRAPELESS_BASE_API_URL=https://api.scrapeless.com\nSCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com\nSCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com\nSCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com\nSCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com\n```\n\n## \ud83d\udcd6 Usage Examples\n\n### Browser\n\nAdvanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import ICreateBrowser\nimport pyppeteer\n\nclient = Scrapeless()\n\n\nasync def example():\n    # Create a browser session\n    config = ICreateBrowser(\n        session_name='sdk_test',\n        session_ttl=180,\n        proxy_country='US',\n        session_recording=True\n    )\n    session = client.browser.create(config).__dict__\n    browser_ws_endpoint = session['browser_ws_endpoint']\n    print('Browser WebSocket endpoint created:', browser_ws_endpoint)\n\n    # Connect to browser using pyppeteer\n    browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})\n    # Open new page and navigate to website\n    page = await browser.newPage()\n    await page.goto('https://www.scrapeless.com')\n```\n\n### Crawl\n\nExtract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.\n\n```python\nfrom scrapeless import Scrapeless\n\nclient = Scrapeless()\n\nresult = client.scraping_crawl.scrape_url(\"https://example.com\")\nprint(result)\n```\n\n### Scraping API\n\nDirect data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import ScrapingTaskRequest\n\nclient = Scrapeless()\nrequest = ScrapingTaskRequest(\n    actor='scraper.google.search',\n    input={'q': 'nike site:www.nike.com'}\n)\nresult = client.scraping.scrape(request=request)\nprint(result)\n```\n\n### Deep SerpApi\n\nGoogle SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import ScrapingTaskRequest\n\nclient = Scrapeless()\nrequest = ScrapingTaskRequest(\n    actor='scraper.google.search',\n    input={'q': 'nike site:www.nike.com'}\n)\nresult = client.deepserp.scrape(request=request)\nprint(result)\n```\n\n### Actor\n\nDeploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import IRunActorData, IActorRunOptions\n\nclient = Scrapeless()\ndata = IRunActorData(\n    input={'url': 'https://example.com'},\n    run_options=IActorRunOptions(\n        CPU=2,\n        memory=2048,\n        timeout=600,\n    )\n)\n\nrun = client.actor.run(\n    actor_id='your_actor_id',\n    data=data\n)\nprint('Actor run result:', run)\n```\n\n### Error Handling\n\nThe SDK throws `ScrapelessError` for API-related errors:\n\n```python\nfrom scrapeless import Scrapeless, ScrapelessError\n\nclient = Scrapeless()\ntry:\n    result = client.scraping.scrape({'url': 'invalid-url'})\nexcept ScrapelessError as error:\n    print(f\"Scrapeless API error: {error}\")\n    if hasattr(error, 'status_code'):\n        print(f\"Status code: {error.status_code}\")\n```\n\n## \ud83d\udd27 API Reference\n\n### Client Configuration\n\n```python\nfrom scrapeless.types import ScrapelessConfig \n\nconfig = ScrapelessConfig(\n    api_key='', # Your api key\n    timeout=30000, # Request timeout in milliseconds (default: 30000)\n    base_api_url='', # Base API URL\n    actor_api_url='', # Actor service URL\n    storage_api_url='', # Storage service URL\n    browser_api_url='', # Browser service URL\n    scraping_crawl_api_url='' # Crawl service URL\n)\n```\n\n### Available Services\n\nThe SDK provides the following services through the main client:\n\n- `client.browser` - browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.\n- `client.universal` - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.\n- `client.scraping_crawl` - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).\n- `client.scraping` - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.\n- `client.deepserp` - Search engine results extraction\n- `client.proxies` - Proxy management\n- `client.actor` - Scalable workflow automation with built-in scheduling and resource management.\n- `client.storage` - Data storage solutions\n\n## \ud83d\udcda Examples\n\nCheck out the [`examples`](./examples) directory for comprehensive usage examples:\n\n- [Browser](./examples/browser_example.py)\n- [Playwright Integration](./examples/playwright_example.py)\n- [Pyppeteer Integration](./examples/pyppeteer_example.py)\n- [Scraping API](./examples/scraping_example.py)\n- [Actor](./examples/actor_example.py)\n- [Storage Usage](./examples/storage_example.py)\n- [Proxies](./examples/proxies_example.py)\n- [Deep SerpApi](./examples/deepserp_example.py)\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udcde Support\n\n- \ud83d\udcd6 **Documentation**: [https://docs.scrapeless.com](https://docs.scrapeless.com)\n- \ud83d\udcac **Community**: [Join our Discord](https://backend.scrapeless.com/app/api/v1/public/links/discord)\n- \ud83d\udc1b **Issues**: [GitHub Issues](https://github.com/scrapeless-ai/sdk-python/issues)\n- \ud83d\udce7 **Email**: [support@scrapeless.com](mailto:support@scrapeless.com)\n\n## \ud83c\udfe2 About Scrapeless\n\nScrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:\n\n- High-performance web scraping infrastructure\n- Global proxy network\n- Browser automation capabilities\n- Enterprise-grade reliability and support\n\nVisit [scrapeless.com](https://scrapeless.com) to learn more and get started.\n\n---\n\nMade with \u2764\ufe0f by the Scrapeless team\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "scrapeless python sdk",
    "version": "1.2.1",
    "project_urls": {
        "Documentation": "https://github.com/scrapeless-ai/sdk-python#readme",
        "Homepage": "https://github.com/scrapeless-ai/sdk-python",
        "Source": "https://github.com/scrapeless-ai/sdk-python"
    },
    "split_keywords": [
        "scrapeless",
        " browserless",
        " scraping browser",
        " scraping",
        " web unlocker",
        " captcha solver",
        " rotate proxy"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "005fe512f9b06759ce1c130a7b8d6a7db87ad4906f813c6138484815ea23bc50",
                "md5": "ee9f516b8a33827a2b24c8edc58080b7",
                "sha256": "67d8f33ceee4c59f3d5725466152ff52718f88b6aa27d5df1c3f8f41063b8ca6"
            },
            "downloads": -1,
            "filename": "scrapeless-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ee9f516b8a33827a2b24c8edc58080b7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 39852,
            "upload_time": "2025-07-23T07:26:57",
            "upload_time_iso_8601": "2025-07-23T07:26:57.948025Z",
            "url": "https://files.pythonhosted.org/packages/00/5f/e512f9b06759ce1c130a7b8d6a7db87ad4906f813c6138484815ea23bc50/scrapeless-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "64fe8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd",
                "md5": "8cfe87f616e1917d996a2ad65d9e13cd",
                "sha256": "152aa452a157ef4d5dcdfe610cf10f8e76f8c0804823a4a731933aff36b0984b"
            },
            "downloads": -1,
            "filename": "scrapeless-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8cfe87f616e1917d996a2ad65d9e13cd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 26439,
            "upload_time": "2025-07-23T07:26:59",
            "upload_time_iso_8601": "2025-07-23T07:26:59.217969Z",
            "url": "https://files.pythonhosted.org/packages/64/fe/8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd/scrapeless-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-23 07:26:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scrapeless-ai",
    "github_project": "sdk-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pyppeteer",
            "specs": [
                [
                    "~=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "~=",
                    "78.1.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    "~=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "~=",
                    "2.32.4"
                ]
            ]
        },
        {
            "name": "playwright",
            "specs": [
                [
                    "~=",
                    "1.53.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.25.0"
                ]
            ]
        },
        {
            "name": "playwright",
            "specs": []
        }
    ],
    "lcname": "scrapeless"
}
        
Elapsed time: 0.97307s