brightdata-sdk


Namebrightdata-sdk JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/brightdata/brightdata-sdk-python
SummaryPython SDK for Bright Data Web Scraping and SERP APIs
upload_time2025-08-18 10:20:22
maintainerNone
docs_urlNone
authorBright Data
requires_python>=3.7
licenseMIT
keywords brightdata web scraping proxy serp search data extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<img width="1300" height="200" alt="sdk-banner(1)" src="https://github.com/user-attachments/assets/c4a7857e-10dd-420b-947a-ed2ea5825cb8" />

<h3 align="center">A Python SDK for the Bright Data's Data extraction and Web unlocking tools, providing easy-to-use scalable methods for web scraping, web searches and more.</h3>


For a quick start you can try to run our example files in this repositories under the "Codespace" section.

## Features

- **Web Scraping**: Scrape websites using Bright Data Web Unlocker API with proxy support
- **Search Engine Results**: Perform web searches using Bright Data SERP API
- **Multiple Search Engines**: Support for Google, Bing, and Yandex
- **Parallel Processing**: Concurrent processing for multiple URLs or queries
- **Robust Error Handling**: Comprehensive error handling with retry logic
- **Zone Management**: Automatic zone creation and management
- **Multiple Output Formats**: JSON, raw HTML, markdown, and more

## Installation
To install the package, open your terminal:
> [!NOTE]
> If you are using macOS you will need to open a virtual environment for your project first.

```bash
pip install brightdata-sdk
```

## Quick Start

### 1. Initialize the Client
> [!IMPORTANT]
> Go to your [**account settings**](https://brightdata.com/cp/setting/users), to verify that your API key have **"admin permissions"**.

```python
from brightdata import bdclient

client = bdclient(api_token="your_api_token_here") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file
```

Or you can use a custom zone name
```python
client = bdclient(
    api_token="your_token",
    auto_create_zones=False,          # Else it creates the Zone automatically
    web_unlocker_zone="custom_zone",  # Custom zone name for web scraping
    serp_zone="custom_serp_zone"      # Custom zone name for search requests
)
```
> [!TIP]
> Hover over the "bdclient" (or over each function in the package) with your cursor to see all its available parameters.


### 2. Scrape Websites

```python
# Single URL
result = client.scrape("https://example.com")

# Multiple URLs (parallel processing)
urls = ["https://example1.com", "https://example2.com", "https://example3.com"]
results = client.scrape(urls)

# Custom options
result = client.scrape(
    "https://example.com",
    format="raw",
    country="gb",
    data_format="screenshot"
)
```

### 3. Search Engine Results

```python
# Single search query
result = client.search("pizza restaurants")

# Multiple queries (parallel processing)
queries = ["pizza", "restaurants", "delivery"]
results = client.search(queries)

# Different search engines
result = client.search("pizza", search_engine="google") # search_engine can also be set to "yandex" or "bing"

# Custom options
results = client.search(
    ["pizza", "sushi"],
    country="gb",
    format="raw"
)
```

### 4. Download Content

```python
# Download scraped content
data = client.scrape("https://example.com")
client.download_content(data, "results.json", "json") # Auto-generate filename if not specified
```

## Configuration

### Environment Variables

Create a `.env` file in your project root:

```env
BRIGHTDATA_API_TOKEN=your_bright_data_api_token
WEB_UNLOCKER_ZONE=your_web_unlocker_zone  # Optional
SERP_ZONE=your_serp_zone                  # Optional
```

### Manage Zones

```python
# List all active zones
zones = client.list_zones()
print(f"Found {len(zones)} zones")
```

## API Reference

### bdclient Class

```python
bdclient(
    api_token: str = None,
    auto_create_zones: bool = True,
    web_unlocker_zone: str = None,
    serp_zone: str = None,
)
```

### Key Methods

#### scrape(...)
Scrapes a single URL or list of URLs using the Web Unlocker.
```python
- `url`: Single URL string or list of URLs
- `zone`: Zone identifier (auto-configured if None)
- `format`: "json" or "raw"
- `method`: HTTP method
- `country`: Two-letter country code
- `data_format`: "markdown", "screenshot", etc.
- `async_request`: Enable async processing
- `max_workers`: Max parallel workers (default: 10)
- `timeout`: Request timeout in seconds (default: 30)
```

#### search(...)
Searches using the SERP API. Accepts the same arguments as scrape(), plus:
```python
- `query`: Search query string or list of queries
- `search_engine`: "google", "bing", or "yandex"
- Other parameters same as scrape()
```

#### download_content(...)

Save content to local file.
```python
- `content`: Content to save
- `filename`: Output filename (auto-generated if None)
- `format`: File format ("json", "csv", "txt", etc.)
```

#### list_zones()

List all active zones in your Bright Data account.

## Error Handling

The SDK includes built-in input validation and retry logic:
```python
try:
    result = client.scrape("https://example.com")
except ValueError as e:
    print(f"Invalid input: {e}")
except Exception as e:
    print(f"API error: {e}")
```

## Production Features

- **Retry Logic**: Automatic retries with exponential backoff for network failures
- **Input Validation**: Validates URLs, zone names, and parameters
- **Connection Pooling**: Efficient HTTP connection management
- **Logging**: Comprehensive logging for debugging and monitoring
- **Zone Auto-Creation**: Automatically creates required zones if they don't exist

## Configuration Constants

| Constant               | Default | Description                     |
| ---------------------- | ------- | ------------------------------- |
| `DEFAULT_MAX_WORKERS`  | `10`    | Max parallel tasks              |
| `DEFAULT_TIMEOUT`      | `30`    | Request timeout (in seconds)    |
| `CONNECTION_POOL_SIZE` | `20`    | Max concurrent HTTP connections |
| `MAX_RETRIES`          | `3`     | Retry attempts on failure       |
| `RETRY_BACKOFF_FACTOR` | `1.5`   | Exponential backoff multiplier  |

## Getting Your API Token

1. Sign up at [brightdata.com](https://brightdata.com/), and navigate to your dashboard
2. Create or access your API credentials
3. Copy your API token and paste it in your .env or code file

## Development

For development installation, open your terminal:

```bash
git clone https://github.com/brightdata/bright-data-sdk-python.git

# If you are using Mac you will need to open a virtual environment for your project first.
cd bright-data-sdk-python
pip install .
```

## License

This project is licensed under the MIT License.

## Support

For any issues, contact [Bright Data support](https://brightdata.com/contact), or open an issue in this repository.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/brightdata/brightdata-sdk-python",
    "name": "brightdata-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Bright Data <idanv@brightdata.com>",
    "keywords": "brightdata, web scraping, proxy, serp, search, data extraction",
    "author": "Bright Data",
    "author_email": "Bright Data <support@brightdata.com>",
    "download_url": "https://files.pythonhosted.org/packages/59/37/cd03701bfd1a79dd91320f3b8967af097a0933da46853cf94e0763d67766/brightdata_sdk-1.0.2.tar.gz",
    "platform": null,
    "description": "\n<img width=\"1300\" height=\"200\" alt=\"sdk-banner(1)\" src=\"https://github.com/user-attachments/assets/c4a7857e-10dd-420b-947a-ed2ea5825cb8\" />\n\n<h3 align=\"center\">A Python SDK for the Bright Data's Data extraction and Web unlocking tools, providing easy-to-use scalable methods for web scraping, web searches and more.</h3>\n\n\nFor a quick start you can try to run our example files in this repositories under the \"Codespace\" section.\n\n## Features\n\n- **Web Scraping**: Scrape websites using Bright Data Web Unlocker API with proxy support\n- **Search Engine Results**: Perform web searches using Bright Data SERP API\n- **Multiple Search Engines**: Support for Google, Bing, and Yandex\n- **Parallel Processing**: Concurrent processing for multiple URLs or queries\n- **Robust Error Handling**: Comprehensive error handling with retry logic\n- **Zone Management**: Automatic zone creation and management\n- **Multiple Output Formats**: JSON, raw HTML, markdown, and more\n\n## Installation\nTo install the package, open your terminal:\n> [!NOTE]\n> If you are using macOS you will need to open a virtual environment for your project first.\n\n```bash\npip install brightdata-sdk\n```\n\n## Quick Start\n\n### 1. Initialize the Client\n> [!IMPORTANT]\n> Go to your [**account settings**](https://brightdata.com/cp/setting/users), to verify that your API key have **\"admin permissions\"**.\n\n```python\nfrom brightdata import bdclient\n\nclient = bdclient(api_token=\"your_api_token_here\") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file\n```\n\nOr you can use a custom zone name\n```python\nclient = bdclient(\n    api_token=\"your_token\",\n    auto_create_zones=False,          # Else it creates the Zone automatically\n    web_unlocker_zone=\"custom_zone\",  # Custom zone name for web scraping\n    serp_zone=\"custom_serp_zone\"      # Custom zone name for search requests\n)\n```\n> [!TIP]\n> Hover over the \"bdclient\" (or over each function in the package) with your cursor to see all its available parameters.\n\n\n### 2. Scrape Websites\n\n```python\n# Single URL\nresult = client.scrape(\"https://example.com\")\n\n# Multiple URLs (parallel processing)\nurls = [\"https://example1.com\", \"https://example2.com\", \"https://example3.com\"]\nresults = client.scrape(urls)\n\n# Custom options\nresult = client.scrape(\n    \"https://example.com\",\n    format=\"raw\",\n    country=\"gb\",\n    data_format=\"screenshot\"\n)\n```\n\n### 3. Search Engine Results\n\n```python\n# Single search query\nresult = client.search(\"pizza restaurants\")\n\n# Multiple queries (parallel processing)\nqueries = [\"pizza\", \"restaurants\", \"delivery\"]\nresults = client.search(queries)\n\n# Different search engines\nresult = client.search(\"pizza\", search_engine=\"google\") # search_engine can also be set to \"yandex\" or \"bing\"\n\n# Custom options\nresults = client.search(\n    [\"pizza\", \"sushi\"],\n    country=\"gb\",\n    format=\"raw\"\n)\n```\n\n### 4. Download Content\n\n```python\n# Download scraped content\ndata = client.scrape(\"https://example.com\")\nclient.download_content(data, \"results.json\", \"json\") # Auto-generate filename if not specified\n```\n\n## Configuration\n\n### Environment Variables\n\nCreate a `.env` file in your project root:\n\n```env\nBRIGHTDATA_API_TOKEN=your_bright_data_api_token\nWEB_UNLOCKER_ZONE=your_web_unlocker_zone  # Optional\nSERP_ZONE=your_serp_zone                  # Optional\n```\n\n### Manage Zones\n\n```python\n# List all active zones\nzones = client.list_zones()\nprint(f\"Found {len(zones)} zones\")\n```\n\n## API Reference\n\n### bdclient Class\n\n```python\nbdclient(\n    api_token: str = None,\n    auto_create_zones: bool = True,\n    web_unlocker_zone: str = None,\n    serp_zone: str = None,\n)\n```\n\n### Key Methods\n\n#### scrape(...)\nScrapes a single URL or list of URLs using the Web Unlocker.\n```python\n- `url`: Single URL string or list of URLs\n- `zone`: Zone identifier (auto-configured if None)\n- `format`: \"json\" or \"raw\"\n- `method`: HTTP method\n- `country`: Two-letter country code\n- `data_format`: \"markdown\", \"screenshot\", etc.\n- `async_request`: Enable async processing\n- `max_workers`: Max parallel workers (default: 10)\n- `timeout`: Request timeout in seconds (default: 30)\n```\n\n#### search(...)\nSearches using the SERP API. Accepts the same arguments as scrape(), plus:\n```python\n- `query`: Search query string or list of queries\n- `search_engine`: \"google\", \"bing\", or \"yandex\"\n- Other parameters same as scrape()\n```\n\n#### download_content(...)\n\nSave content to local file.\n```python\n- `content`: Content to save\n- `filename`: Output filename (auto-generated if None)\n- `format`: File format (\"json\", \"csv\", \"txt\", etc.)\n```\n\n#### list_zones()\n\nList all active zones in your Bright Data account.\n\n## Error Handling\n\nThe SDK includes built-in input validation and retry logic:\n```python\ntry:\n    result = client.scrape(\"https://example.com\")\nexcept ValueError as e:\n    print(f\"Invalid input: {e}\")\nexcept Exception as e:\n    print(f\"API error: {e}\")\n```\n\n## Production Features\n\n- **Retry Logic**: Automatic retries with exponential backoff for network failures\n- **Input Validation**: Validates URLs, zone names, and parameters\n- **Connection Pooling**: Efficient HTTP connection management\n- **Logging**: Comprehensive logging for debugging and monitoring\n- **Zone Auto-Creation**: Automatically creates required zones if they don't exist\n\n## Configuration Constants\n\n| Constant               | Default | Description                     |\n| ---------------------- | ------- | ------------------------------- |\n| `DEFAULT_MAX_WORKERS`  | `10`    | Max parallel tasks              |\n| `DEFAULT_TIMEOUT`      | `30`    | Request timeout (in seconds)    |\n| `CONNECTION_POOL_SIZE` | `20`    | Max concurrent HTTP connections |\n| `MAX_RETRIES`          | `3`     | Retry attempts on failure       |\n| `RETRY_BACKOFF_FACTOR` | `1.5`   | Exponential backoff multiplier  |\n\n## Getting Your API Token\n\n1. Sign up at [brightdata.com](https://brightdata.com/), and navigate to your dashboard\n2. Create or access your API credentials\n3. Copy your API token and paste it in your .env or code file\n\n## Development\n\nFor development installation, open your terminal:\n\n```bash\ngit clone https://github.com/brightdata/bright-data-sdk-python.git\n\n# If you are using Mac you will need to open a virtual environment for your project first.\ncd bright-data-sdk-python\npip install .\n```\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Support\n\nFor any issues, contact [Bright Data support](https://brightdata.com/contact), or open an issue in this repository.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python SDK for Bright Data Web Scraping and SERP APIs",
    "version": "1.0.2",
    "project_urls": {
        "Bug Reports": "https://github.com/brightdata/brightdata-sdk-python/issues",
        "Changelog": "https://github.com/brightdata/brightdata-sdk-python/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/brightdata/brightdata-sdk-python#readme",
        "Homepage": "https://github.com/brightdata/brightdata-sdk-python",
        "Repository": "https://github.com/brightdata/brightdata-sdk-python"
    },
    "split_keywords": [
        "brightdata",
        " web scraping",
        " proxy",
        " serp",
        " search",
        " data extraction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1bc8d1dc0fd5cff093dde095584fe66b47cc687f0a0c37af41538fd69989de23",
                "md5": "a0f31e17a3688d1d269645eb8ed5f595",
                "sha256": "288c74ac90edd615cfaf838fee4238851a1d82c1a001a45557d5f48782fb468e"
            },
            "downloads": -1,
            "filename": "brightdata_sdk-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a0f31e17a3688d1d269645eb8ed5f595",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 23782,
            "upload_time": "2025-08-18T10:20:20",
            "upload_time_iso_8601": "2025-08-18T10:20:20.775011Z",
            "url": "https://files.pythonhosted.org/packages/1b/c8/d1dc0fd5cff093dde095584fe66b47cc687f0a0c37af41538fd69989de23/brightdata_sdk-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5937cd03701bfd1a79dd91320f3b8967af097a0933da46853cf94e0763d67766",
                "md5": "f112bceea29911756265c663b733829a",
                "sha256": "bdc56cec86692a4890dc99dfe9a7746c7beb92f1a9d817485bfed5e7d529e4c9"
            },
            "downloads": -1,
            "filename": "brightdata_sdk-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f112bceea29911756265c663b733829a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 22875,
            "upload_time": "2025-08-18T10:20:22",
            "upload_time_iso_8601": "2025-08-18T10:20:22.552520Z",
            "url": "https://files.pythonhosted.org/packages/59/37/cd03701bfd1a79dd91320f3b8967af097a0933da46853cf94e0763d67766/brightdata_sdk-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-18 10:20:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brightdata",
    "github_project": "brightdata-sdk-python",
    "github_not_found": true,
    "lcname": "brightdata-sdk"
}
        
Elapsed time: 2.06335s