# Scrapeless Python SDK
The official Python SDK for [Scrapeless AI](https://scrapeless.com) - End-to-End Data Infrastructure for AI Developers & Enterprises.
## 📑 Table of Contents
- [🌟 Features](#-features)
- [📦 Installation](#-installation)
- [🚀 Quick Start](#-quick-start)
- [📖 Usage Examples](#-usage-examples)
- [🔧 API Reference](#-api-reference)
- [📚 Examples](#-examples)
- [📄 License](#-license)
- [📞 Support](#-support)
- [🏢 About Scrapeless](#-about-scrapeless)
## 🌟 Features
- **Browser**: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
- **Universal Scraping API**: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
- **Crawl**: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
- **Scraping API**: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
- **Deep SerpApi**: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
- **Proxies**: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
- **Actor**: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
- **Storage Solutions**: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.
## 📦 Installation
Install the SDK using pip:
```bash
pip install scrapeless
```
## 🚀 Quick Start
### Prerequisite
[Log in](https://app.scrapeless.com) to the Scrapeless Dashboard and get the API Key
### Basic Setup
```python
from scrapeless import Scrapeless
client = Scrapeless({
'api_key': 'your-api-key' # Get your API key from https://scrapeless.com
})
```
### Environment Variables
You can also configure the SDK using environment variables:
```bash
# Required
SCRAPELESS_API_KEY=your-api-key
# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com
```
## 📖 Usage Examples
### Browser
Advanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:
```python
from scrapeless import Scrapeless
from scrapeless.types import ICreateBrowser
import pyppeteer
client = Scrapeless()
async def example():
# Create a browser session
config = ICreateBrowser(
session_name='sdk_test',
session_ttl=180,
proxy_country='US',
session_recording=True
)
session = client.browser.create(config).__dict__
browser_ws_endpoint = session['browser_ws_endpoint']
print('Browser WebSocket endpoint created:', browser_ws_endpoint)
# Connect to browser using pyppeteer
browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})
# Open new page and navigate to website
page = await browser.newPage()
await page.goto('https://www.scrapeless.com')
```
### Crawl
Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
```python
from scrapeless import Scrapeless
client = Scrapeless()
result = client.scraping_crawl.scrape_url("https://example.com")
print(result)
```
### Scraping API
Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:
```python
from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest
client = Scrapeless()
request = ScrapingTaskRequest(
actor='scraper.google.search',
input={'q': 'nike site:www.nike.com'}
)
result = client.scraping.scrape(request=request)
print(result)
```
### Deep SerpApi
Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:
```python
from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest
client = Scrapeless()
request = ScrapingTaskRequest(
actor='scraper.google.search',
input={'q': 'nike site:www.nike.com'}
)
result = client.deepserp.scrape(request=request)
print(result)
```
### Actor
Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:
```python
from scrapeless import Scrapeless
from scrapeless.types import IRunActorData, IActorRunOptions
client = Scrapeless()
data = IRunActorData(
input={'url': 'https://example.com'},
run_options=IActorRunOptions(
CPU=2,
memory=2048,
timeout=600,
)
)
run = client.actor.run(
actor_id='your_actor_id',
data=data
)
print('Actor run result:', run)
```
### Error Handling
The SDK throws `ScrapelessError` for API-related errors:
```python
from scrapeless import Scrapeless, ScrapelessError
client = Scrapeless()
try:
result = client.scraping.scrape({'url': 'invalid-url'})
except ScrapelessError as error:
print(f"Scrapeless API error: {error}")
if hasattr(error, 'status_code'):
print(f"Status code: {error.status_code}")
```
## 🔧 API Reference
### Client Configuration
```python
from scrapeless.types import ScrapelessConfig
config = ScrapelessConfig(
api_key='', # Your api key
timeout=30000, # Request timeout in milliseconds (default: 30000)
base_api_url='', # Base API URL
actor_api_url='', # Actor service URL
storage_api_url='', # Storage service URL
browser_api_url='', # Browser service URL
scraping_crawl_api_url='' # Crawl service URL
)
```
### Available Services
The SDK provides the following services through the main client:
- `client.browser` - browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.
- `client.universal` - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.
- `client.scraping_crawl` - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).
- `client.scraping` - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.
- `client.deepserp` - Search engine results extraction
- `client.proxies` - Proxy management
- `client.actor` - Scalable workflow automation with built-in scheduling and resource management.
- `client.storage` - Data storage solutions
## 📚 Examples
Check out the [`examples`](./examples) directory for comprehensive usage examples:
- [Browser](./examples/browser_example.py)
- [Playwright Integration](./examples/playwright_example.py)
- [Pyppeteer Integration](./examples/pyppeteer_example.py)
- [Scraping API](./examples/scraping_example.py)
- [Actor](./examples/actor_example.py)
- [Storage Usage](./examples/storage_example.py)
- [Proxies](./examples/proxies_example.py)
- [Deep SerpApi](./examples/deepserp_example.py)
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 📞 Support
- 📖 **Documentation**: [https://docs.scrapeless.com](https://docs.scrapeless.com)
- 💬 **Community**: [Join our Discord](https://backend.scrapeless.com/app/api/v1/public/links/discord)
- 🐛 **Issues**: [GitHub Issues](https://github.com/scrapeless-ai/sdk-python/issues)
- 📧 **Email**: [support@scrapeless.com](mailto:support@scrapeless.com)
## 🏢 About Scrapeless
Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:
- High-performance web scraping infrastructure
- Global proxy network
- Browser automation capabilities
- Enterprise-grade reliability and support
Visit [scrapeless.com](https://scrapeless.com) to learn more and get started.
---
Made with ❤️ by the Scrapeless team
Raw data
{
"_id": null,
"home_page": "https://github.com/scrapeless-ai/sdk-python",
"name": "scrapeless",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "scrapeless, browserless, scraping browser, scraping, web unlocker, captcha solver, rotate proxy",
"author": "scrapelessteam",
"author_email": "scrapelessteam@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/64/fe/8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd/scrapeless-1.2.1.tar.gz",
"platform": null,
"description": "# Scrapeless Python SDK\n\nThe official Python SDK for [Scrapeless AI](https://scrapeless.com) - End-to-End Data Infrastructure for AI Developers & Enterprises.\n\n## \ud83d\udcd1 Table of Contents\n\n- [\ud83c\udf1f Features](#-features)\n- [\ud83d\udce6 Installation](#-installation)\n- [\ud83d\ude80 Quick Start](#-quick-start)\n- [\ud83d\udcd6 Usage Examples](#-usage-examples)\n- [\ud83d\udd27 API Reference](#-api-reference)\n- [\ud83d\udcda Examples](#-examples)\n- [\ud83d\udcc4 License](#-license)\n- [\ud83d\udcde Support](#-support)\n- [\ud83c\udfe2 About Scrapeless](#-about-scrapeless)\n\n## \ud83c\udf1f Features\n\n- **Browser**: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.\n- **Universal Scraping API**: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.\n- **Crawl**: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.\n- **Scraping API**: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.\n- **Deep SerpApi**: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.\n- **Proxies**: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.\n- **Actor**: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.\n- **Storage Solutions**: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.\n\n## \ud83d\udce6 Installation\n\nInstall the SDK using pip:\n\n```bash\npip install scrapeless\n```\n\n## \ud83d\ude80 Quick Start\n\n### Prerequisite\n\n[Log in](https://app.scrapeless.com) to the Scrapeless Dashboard and get the API Key\n\n### Basic Setup\n\n```python\nfrom scrapeless import Scrapeless\n\nclient = Scrapeless({\n 'api_key': 'your-api-key' # Get your API key from https://scrapeless.com\n})\n```\n\n### Environment Variables\n\nYou can also configure the SDK using environment variables:\n\n```bash\n# Required\nSCRAPELESS_API_KEY=your-api-key\n\n# Optional - Custom API endpoints\nSCRAPELESS_BASE_API_URL=https://api.scrapeless.com\nSCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com\nSCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com\nSCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com\nSCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com\n```\n\n## \ud83d\udcd6 Usage Examples\n\n### Browser\n\nAdvanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import ICreateBrowser\nimport pyppeteer\n\nclient = Scrapeless()\n\n\nasync def example():\n # Create a browser session\n config = ICreateBrowser(\n session_name='sdk_test',\n session_ttl=180,\n proxy_country='US',\n session_recording=True\n )\n session = client.browser.create(config).__dict__\n browser_ws_endpoint = session['browser_ws_endpoint']\n print('Browser WebSocket endpoint created:', browser_ws_endpoint)\n\n # Connect to browser using pyppeteer\n browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})\n # Open new page and navigate to website\n page = await browser.newPage()\n await page.goto('https://www.scrapeless.com')\n```\n\n### Crawl\n\nExtract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.\n\n```python\nfrom scrapeless import Scrapeless\n\nclient = Scrapeless()\n\nresult = client.scraping_crawl.scrape_url(\"https://example.com\")\nprint(result)\n```\n\n### Scraping API\n\nDirect data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import ScrapingTaskRequest\n\nclient = Scrapeless()\nrequest = ScrapingTaskRequest(\n actor='scraper.google.search',\n input={'q': 'nike site:www.nike.com'}\n)\nresult = client.scraping.scrape(request=request)\nprint(result)\n```\n\n### Deep SerpApi\n\nGoogle SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import ScrapingTaskRequest\n\nclient = Scrapeless()\nrequest = ScrapingTaskRequest(\n actor='scraper.google.search',\n input={'q': 'nike site:www.nike.com'}\n)\nresult = client.deepserp.scrape(request=request)\nprint(result)\n```\n\n### Actor\n\nDeploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:\n\n```python\nfrom scrapeless import Scrapeless\nfrom scrapeless.types import IRunActorData, IActorRunOptions\n\nclient = Scrapeless()\ndata = IRunActorData(\n input={'url': 'https://example.com'},\n run_options=IActorRunOptions(\n CPU=2,\n memory=2048,\n timeout=600,\n )\n)\n\nrun = client.actor.run(\n actor_id='your_actor_id',\n data=data\n)\nprint('Actor run result:', run)\n```\n\n### Error Handling\n\nThe SDK throws `ScrapelessError` for API-related errors:\n\n```python\nfrom scrapeless import Scrapeless, ScrapelessError\n\nclient = Scrapeless()\ntry:\n result = client.scraping.scrape({'url': 'invalid-url'})\nexcept ScrapelessError as error:\n print(f\"Scrapeless API error: {error}\")\n if hasattr(error, 'status_code'):\n print(f\"Status code: {error.status_code}\")\n```\n\n## \ud83d\udd27 API Reference\n\n### Client Configuration\n\n```python\nfrom scrapeless.types import ScrapelessConfig \n\nconfig = ScrapelessConfig(\n api_key='', # Your api key\n timeout=30000, # Request timeout in milliseconds (default: 30000)\n base_api_url='', # Base API URL\n actor_api_url='', # Actor service URL\n storage_api_url='', # Storage service URL\n browser_api_url='', # Browser service URL\n scraping_crawl_api_url='' # Crawl service URL\n)\n```\n\n### Available Services\n\nThe SDK provides the following services through the main client:\n\n- `client.browser` - browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.\n- `client.universal` - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.\n- `client.scraping_crawl` - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).\n- `client.scraping` - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.\n- `client.deepserp` - Search engine results extraction\n- `client.proxies` - Proxy management\n- `client.actor` - Scalable workflow automation with built-in scheduling and resource management.\n- `client.storage` - Data storage solutions\n\n## \ud83d\udcda Examples\n\nCheck out the [`examples`](./examples) directory for comprehensive usage examples:\n\n- [Browser](./examples/browser_example.py)\n- [Playwright Integration](./examples/playwright_example.py)\n- [Pyppeteer Integration](./examples/pyppeteer_example.py)\n- [Scraping API](./examples/scraping_example.py)\n- [Actor](./examples/actor_example.py)\n- [Storage Usage](./examples/storage_example.py)\n- [Proxies](./examples/proxies_example.py)\n- [Deep SerpApi](./examples/deepserp_example.py)\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udcde Support\n\n- \ud83d\udcd6 **Documentation**: [https://docs.scrapeless.com](https://docs.scrapeless.com)\n- \ud83d\udcac **Community**: [Join our Discord](https://backend.scrapeless.com/app/api/v1/public/links/discord)\n- \ud83d\udc1b **Issues**: [GitHub Issues](https://github.com/scrapeless-ai/sdk-python/issues)\n- \ud83d\udce7 **Email**: [support@scrapeless.com](mailto:support@scrapeless.com)\n\n## \ud83c\udfe2 About Scrapeless\n\nScrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:\n\n- High-performance web scraping infrastructure\n- Global proxy network\n- Browser automation capabilities\n- Enterprise-grade reliability and support\n\nVisit [scrapeless.com](https://scrapeless.com) to learn more and get started.\n\n---\n\nMade with \u2764\ufe0f by the Scrapeless team\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "scrapeless python sdk",
"version": "1.2.1",
"project_urls": {
"Documentation": "https://github.com/scrapeless-ai/sdk-python#readme",
"Homepage": "https://github.com/scrapeless-ai/sdk-python",
"Source": "https://github.com/scrapeless-ai/sdk-python"
},
"split_keywords": [
"scrapeless",
" browserless",
" scraping browser",
" scraping",
" web unlocker",
" captcha solver",
" rotate proxy"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "005fe512f9b06759ce1c130a7b8d6a7db87ad4906f813c6138484815ea23bc50",
"md5": "ee9f516b8a33827a2b24c8edc58080b7",
"sha256": "67d8f33ceee4c59f3d5725466152ff52718f88b6aa27d5df1c3f8f41063b8ca6"
},
"downloads": -1,
"filename": "scrapeless-1.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ee9f516b8a33827a2b24c8edc58080b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 39852,
"upload_time": "2025-07-23T07:26:57",
"upload_time_iso_8601": "2025-07-23T07:26:57.948025Z",
"url": "https://files.pythonhosted.org/packages/00/5f/e512f9b06759ce1c130a7b8d6a7db87ad4906f813c6138484815ea23bc50/scrapeless-1.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "64fe8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd",
"md5": "8cfe87f616e1917d996a2ad65d9e13cd",
"sha256": "152aa452a157ef4d5dcdfe610cf10f8e76f8c0804823a4a731933aff36b0984b"
},
"downloads": -1,
"filename": "scrapeless-1.2.1.tar.gz",
"has_sig": false,
"md5_digest": "8cfe87f616e1917d996a2ad65d9e13cd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 26439,
"upload_time": "2025-07-23T07:26:59",
"upload_time_iso_8601": "2025-07-23T07:26:59.217969Z",
"url": "https://files.pythonhosted.org/packages/64/fe/8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd/scrapeless-1.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-23 07:26:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scrapeless-ai",
"github_project": "sdk-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pyppeteer",
"specs": [
[
"~=",
"2.0.0"
]
]
},
{
"name": "setuptools",
"specs": [
[
"~=",
"78.1.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"~=",
"1.1.0"
]
]
},
{
"name": "requests",
"specs": [
[
"~=",
"2.32.4"
]
]
},
{
"name": "playwright",
"specs": [
[
"~=",
"1.53.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.25.0"
]
]
},
{
"name": "playwright",
"specs": []
}
],
"lcname": "scrapeless"
}