brightdata-sdk


Namebrightdata-sdk JSON
Version 1.1.3 PyPI version JSON
download
home_pagehttps://github.com/brightdata/brightdata-sdk-python
SummaryPython SDK for Bright Data Web Scraping and SERP APIs
upload_time2025-09-07 18:21:14
maintainerNone
docs_urlNone
authorBright Data
requires_python>=3.7
licenseMIT
keywords brightdata web scraping proxy serp search data extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<img width="1300" height="200" alt="sdk-banner(1)" src="https://github.com/user-attachments/assets/c4a7857e-10dd-420b-947a-ed2ea5825cb8" />

<h3 align="center">Python SDK by Bright Data, Easy-to-use scalable methods for web search & scraping</h3>
<p></p>

## Installation
To install the package, open your terminal:

```python
pip install brightdata-sdk
```
> If using macOS, first open a virtual environment for your project

## Quick Start

Create a [Bright Data](https://brightdata.com/) account and copy your API key

### Initialize the Client

```python
from brightdata import bdclient

client = bdclient(api_token="your_api_token_here") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file
```

### Launch first request
Add to your code a serp function
```python
results = client.search("best selling shoes")

print(client.parse_content(results))
```

<img width="4774" height="2149" alt="final-banner" src="https://github.com/user-attachments/assets/1ef4f6ad-b5f2-469f-a260-36d1eeaf8dba" />

## Features

| Feature                        | Functions                   | Description
|--------------------------|-----------------------------|-------------------------------------
| **Scrape every website** | `scrape`                    | Scrape every website using Bright's scraping and unti bot-detection capabilities
| **Web search**           | `search`                    | Search google and other search engines by query (supports batch searches)
| **Web crawling**         | `crawl`                     | Discover and scrape multiple pages from websites with advanced filtering and depth control
| **AI-powered extraction** | `extract`                  | Extract specific information from websites using natural language queries and OpenAI
| **Content parsing**      | `parse_content`             | Extract text, links, images and structured data from API responses (JSON or HTML)
| **Browser automation**   | `connect_browser`           | Get WebSocket endpoint for Playwright/Selenium integration with Bright Data's scraping browser
| **Search chatGPT**       | `search_chatGPT`            | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts
| **Search linkedin**      | `search_linkedin.posts()`, `search_linkedin.jobs()`, `search_linkedin.profiles()` | Search LinkedIn by specific queries, and recieve structured data
| **Scrape linkedin**      | `scrape_linkedin.posts()`, `scrape_linkedin.jobs()`, `scrape_linkedin.profiles()`, `scrape_linkedin.companies()` | Scrape LinkedIn and recieve structured data
| **Download functions**   | `download_snapshot`, `download_content`  | Download content for both sync and async requests
| **Client class**         | `bdclient`         | Handles authentication, automatic zone creation and managment, and options for robust error handling
| **Parallel processing**  | **all functions**  | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats

### Try usig one of the functions

#### `Search()`
```python
# Simple single query search
result = client.search("pizza restaurants")

# Try using multiple queries (parallel processing), with custom configuration
queries = ["pizza", "restaurants", "delivery"]
results = client.search(
    queries,
    search_engine="bing",
    country="gb",
    format="raw"
)
```
#### `scrape()`
```python
# Simple single URL scrape
result = client.scrape("https://example.com")

# Multiple URLs (parallel processing) with custom options
urls = ["https://example1.com", "https://example2.com", "https://example3.com"]
results = client.scrape(
    "urls",
    format="raw",
    country="gb",
    data_format="screenshot"
)
```
#### `search_chatGPT()`
```python
result = client.search_chatGPT(
    prompt="what day is it today?"
    # prompt=["What are the top 3 programming languages in 2024?", "Best hotels in New York", "Explain quantum computing"],
    # additional_prompt=["Can you explain why?", "Are you sure?", ""]  
)

client.download_content(result) # In case of timeout error, your snapshot_id is presented and you will downloaded it using download_snapshot()
```

#### `search_linkedin.`
Available functions:
client.**`search_linkedin.posts()`**,client.**`search_linkedin.jobs()`**,client.**`search_linkedin.profiles()`**
```python
# Search LinkedIn profiles by name
first_names = ["James", "Idan"]
last_names = ["Smith", "Vilenski"]

result = client.search_linkedin.profiles(first_names, last_names) # can also be changed to async
# will print the snapshot_id, which can be downloaded using the download_snapshot() function
```

#### `scrape_linkedin.`
Available functions

client.**`scrape_linkedin.posts()`**,client.**`scrape_linkedin.jobs()`**,client.**`scrape_linkedin.profiles()`**,client.**`scrape_linkedin.companies()`**
```python
post_urls = [
    "https://www.linkedin.com/posts/orlenchner_scrapecon-activity-7180537307521769472-oSYN?trk=public_profile",
    "https://www.linkedin.com/pulse/getting-value-out-sunburst-guillaume-de-b%C3%A9naz%C3%A9?trk=public_profile_article_view"
]

results = client.scrape_linkedin.posts(post_urls) # can also be changed to async

print(results) # will print the snapshot_id, which can be downloaded using the download_snapshot() function
```

#### `crawl()`
```python
# Single URL crawl with filters
result = client.crawl(
    url="https://example.com/",
    depth=2,
    filter="/product/",           # Only crawl URLs containing "/product/"
    exclude_filter="/ads/",       # Exclude URLs containing "/ads/"
    custom_output_fields=["markdown", "url", "page_title"]
)
print(f"Crawl initiated. Snapshot ID: {result['snapshot_id']}")

# Download crawl results
data = client.download_snapshot(result['snapshot_id'])
```

#### `parse_content()`
```python
# Parse scraping results
scraped_data = client.scrape("https://example.com")
parsed = client.parse_content(
    scraped_data, 
    extract_text=True, 
    extract_links=True, 
    extract_images=True
)
print(f"Title: {parsed['title']}")
print(f"Text length: {len(parsed['text'])}")
print(f"Found {len(parsed['links'])} links")
```

#### `extract()`
```python
# Basic extraction (URL in query)
result = client.extract("Extract news headlines from CNN.com")
print(result)

# Using URL parameter with structured output
schema = {
    "type": "object",
    "properties": {
        "headlines": {
            "type": "array",
            "items": {"type": "string"}
        }
    },
    "required": ["headlines"]
}

result = client.extract(
    query="Extract main headlines",
    url="https://cnn.com",
    output_scheme=schema
)
print(result)  # Returns structured JSON matching the schema
```

#### `connect_browser()`
```python
# For Playwright (default browser_type)
from playwright.sync_api import sync_playwright

client = bdclient(
    api_token="your_api_token",
    browser_username="username-zone-browser_zone1",
    browser_password="your_password"
)

with sync_playwright() as playwright:
    browser = playwright.chromium.connect_over_cdp(client.connect_browser())
    page = browser.new_page()
    page.goto("https://example.com")
    print(f"Title: {page.title()}")
    browser.close()
```

**`download_content`** (for sync requests)
```python
data = client.scrape("https://example.com")
client.download_content(data) 
```
**`download_snapshot`** (for async requests)
```python
# Save this function to seperate file
client.download_snapshot("") # Insert your snapshot_id
```

> [!TIP]
> Hover over the "search" or each function in the package, to see all its available parameters.

![Hover-Over1](https://github.com/user-attachments/assets/51324485-5769-48d5-8f13-0b534385142e)

## Function Parameters
<details>
    <summary>🔍 <strong>Search(...)</strong></summary>
    
Searches using the SERP API. Accepts the same arguments as scrape(), plus:

```python
- `query`: Search query string or list of queries
- `search_engine`: "google", "bing", or "yandex"
- Other parameters same as scrape()
```
    
</details>
<details>
    <summary>🔗 <strong>scrape(...)</strong></summary>

Scrapes a single URL or list of URLs using the Web Unlocker.

```python
- `url`: Single URL string or list of URLs
- `zone`: Zone identifier (auto-configured if None)
- `format`: "json" or "raw"
- `method`: HTTP method
- `country`: Two-letter country code
- `data_format`: "markdown", "screenshot", etc.
- `async_request`: Enable async processing
- `max_workers`: Max parallel workers (default: 10)
- `timeout`: Request timeout in seconds (default: 30)
```

</details>
<details>
    <summary>🕷️ <strong>crawl(...)</strong></summary>

Discover and scrape multiple pages from websites with advanced filtering.

```python
- `url`: Single URL string or list of URLs to crawl (required)
- `ignore_sitemap`: Ignore sitemap when crawling (optional)
- `depth`: Maximum crawl depth relative to entered URL (optional)
- `filter`: Regex to include only certain URLs (e.g. "/product/")
- `exclude_filter`: Regex to exclude certain URLs (e.g. "/ads/")
- `custom_output_fields`: List of output fields to include (optional)
- `include_errors`: Include errors in response (default: True)
```

</details>
<details>
    <summary>🔍 <strong>parse_content(...)</strong></summary>

Extract and parse useful information from API responses.

```python
- `data`: Response data from scrape(), search(), or crawl() methods
- `extract_text`: Extract clean text content (default: True)
- `extract_links`: Extract all links from content (default: False)
- `extract_images`: Extract image URLs from content (default: False)
```

</details>
<details>
    <summary>🤖 <strong>extract(...)</strong></summary>

Extract specific information from websites using AI-powered natural language processing with OpenAI.

```python
- `query`: Natural language query describing what to extract (required)
- `url`: Single URL or list of URLs to extract from (optional - if not provided, extracts URL from query)
- `output_scheme`: JSON Schema for OpenAI Structured Outputs (optional - enables reliable JSON responses)
- `llm_key`: OpenAI API key (optional - uses OPENAI_API_KEY env variable if not provided)

# Returns: ExtractResult object (string-like with metadata attributes)
# Available attributes: .url, .query, .source_title, .token_usage, .content_length
```

</details>
<details>
    <summary>🌐 <strong>connect_browser(...)</strong></summary>

Get WebSocket endpoint for browser automation with Bright Data's scraping browser.

```python
# Required client parameters:
- `browser_username`: Username for browser API (format: "username-zone-{zone_name}")
- `browser_password`: Password for browser API authentication
- `browser_type`: "playwright", "puppeteer", or "selenium" (default: "playwright")

# Returns: WebSocket endpoint URL string
```

</details>
<details>
    <summary>💾 <strong>Download_Content(...)</strong></summary>

Save content to local file.

```python
- `content`: Content to save
- `filename`: Output filename (auto-generated if None)
- `format`: File format ("json", "csv", "txt", etc.)
```

</details>
<details>
    <summary>⚙️ <strong>Configuration Constants</strong></summary>

<p></p>

| Constant               | Default | Description                     |
| ---------------------- | ------- | ------------------------------- |
| `DEFAULT_MAX_WORKERS`  | `10`    | Max parallel tasks              |
| `DEFAULT_TIMEOUT`      | `30`    | Request timeout (in seconds)    |
| `CONNECTION_POOL_SIZE` | `20`    | Max concurrent HTTP connections |
| `MAX_RETRIES`          | `3`     | Retry attempts on failure       |
| `RETRY_BACKOFF_FACTOR` | `1.5`   | Exponential backoff multiplier  |

</details>

##  Advanced Configuration

<details>
    <summary>🔧 <strong>Environment Variables</strong></summary>

Create a `.env` file in your project root:

```env
BRIGHTDATA_API_TOKEN=your_bright_data_api_token
WEB_UNLOCKER_ZONE=your_web_unlocker_zone        # Optional
SERP_ZONE=your_serp_zone                        # Optional
BROWSER_ZONE=your_browser_zone                  # Optional
BRIGHTDATA_BROWSER_USERNAME=username-zone-name  # For browser automation
BRIGHTDATA_BROWSER_PASSWORD=your_browser_password  # For browser automation
OPENAI_API_KEY=your_openai_api_key              # For extract() function
```

</details>
<details>
    <summary>🌐 <strong>Manage Zones</strong></summary>

List all active zones

```python
# List all active zones
zones = client.list_zones()
print(f"Found {len(zones)} zones")
```

Configure a custom zone name

```python
client = bdclient(
    api_token="your_token",
    auto_create_zones=False,          # Else it creates the Zone automatically
    web_unlocker_zone="custom_zone",
    serp_zone="custom_serp_zone"
)

```

</details>
<details>
    <summary>👥 <strong>Client Management</strong></summary>
    
bdclient Class - Complete parameter list
    
```python
bdclient(
    api_token: str = None,                    # Your Bright Data API token (required)
    auto_create_zones: bool = True,           # Auto-create zones if they don't exist
    web_unlocker_zone: str = None,            # Custom web unlocker zone name
    serp_zone: str = None,                    # Custom SERP zone name
    browser_zone: str = None,                 # Custom browser zone name
    browser_username: str = None,             # Browser API username (format: "username-zone-{zone_name}")
    browser_password: str = None,             # Browser API password
    browser_type: str = "playwright",         # Browser automation tool: "playwright", "puppeteer", "selenium"
    log_level: str = "INFO",                  # Logging level: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"
    structured_logging: bool = True,          # Use structured JSON logging
    verbose: bool = None                      # Enable verbose logging (overrides log_level if True)
)
```
    
</details>
<details>
    <summary>⚠️ <strong>Error Handling</strong></summary>
    
bdclient Class
    
The SDK includes built-in input validation and retry logic

In case of zone related problems, use the **list_zones()** function to check your active zones, and check that your [**account settings**](https://brightdata.com/cp/setting/users), to verify that your API key have **"admin permissions"**.
    
</details>

## Support

For any issues, contact [Bright Data support](https://brightdata.com/contact), or open an issue in this repository.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/brightdata/brightdata-sdk-python",
    "name": "brightdata-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Bright Data <idanv@brightdata.com>",
    "keywords": "brightdata, web scraping, proxy, serp, search, data extraction",
    "author": "Bright Data",
    "author_email": "Bright Data <support@brightdata.com>",
    "download_url": "https://files.pythonhosted.org/packages/aa/22/66e5e9cb2336d5bf68bd2fea44d2c2f765828ad5baebe8170535f4597ae4/brightdata_sdk-1.1.3.tar.gz",
    "platform": null,
    "description": "\n<img width=\"1300\" height=\"200\" alt=\"sdk-banner(1)\" src=\"https://github.com/user-attachments/assets/c4a7857e-10dd-420b-947a-ed2ea5825cb8\" />\n\n<h3 align=\"center\">Python SDK by Bright Data, Easy-to-use scalable methods for web search & scraping</h3>\n<p></p>\n\n## Installation\nTo install the package, open your terminal:\n\n```python\npip install brightdata-sdk\n```\n> If using macOS, first open a virtual environment for your project\n\n## Quick Start\n\nCreate a [Bright Data](https://brightdata.com/) account and copy your API key\n\n### Initialize the Client\n\n```python\nfrom brightdata import bdclient\n\nclient = bdclient(api_token=\"your_api_token_here\") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file\n```\n\n### Launch first request\nAdd to your code a serp function\n```python\nresults = client.search(\"best selling shoes\")\n\nprint(client.parse_content(results))\n```\n\n<img width=\"4774\" height=\"2149\" alt=\"final-banner\" src=\"https://github.com/user-attachments/assets/1ef4f6ad-b5f2-469f-a260-36d1eeaf8dba\" />\n\n## Features\n\n| Feature                        | Functions                   | Description\n|--------------------------|-----------------------------|-------------------------------------\n| **Scrape every website** | `scrape`                    | Scrape every website using Bright's scraping and unti bot-detection capabilities\n| **Web search**           | `search`                    | Search google and other search engines by query (supports batch searches)\n| **Web crawling**         | `crawl`                     | Discover and scrape multiple pages from websites with advanced filtering and depth control\n| **AI-powered extraction** | `extract`                  | Extract specific information from websites using natural language queries and OpenAI\n| **Content parsing**      | `parse_content`             | Extract text, links, images and structured data from API responses (JSON or HTML)\n| **Browser automation**   | `connect_browser`           | Get WebSocket endpoint for Playwright/Selenium integration with Bright Data's scraping browser\n| **Search chatGPT**       | `search_chatGPT`            | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts\n| **Search linkedin**      | `search_linkedin.posts()`, `search_linkedin.jobs()`, `search_linkedin.profiles()` | Search LinkedIn by specific queries, and recieve structured data\n| **Scrape linkedin**      | `scrape_linkedin.posts()`, `scrape_linkedin.jobs()`, `scrape_linkedin.profiles()`, `scrape_linkedin.companies()` | Scrape LinkedIn and recieve structured data\n| **Download functions**   | `download_snapshot`, `download_content`  | Download content for both sync and async requests\n| **Client class**         | `bdclient`         | Handles authentication, automatic zone creation and managment, and options for robust error handling\n| **Parallel processing**  | **all functions**  | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats\n\n### Try usig one of the functions\n\n#### `Search()`\n```python\n# Simple single query search\nresult = client.search(\"pizza restaurants\")\n\n# Try using multiple queries (parallel processing), with custom configuration\nqueries = [\"pizza\", \"restaurants\", \"delivery\"]\nresults = client.search(\n    queries,\n    search_engine=\"bing\",\n    country=\"gb\",\n    format=\"raw\"\n)\n```\n#### `scrape()`\n```python\n# Simple single URL scrape\nresult = client.scrape(\"https://example.com\")\n\n# Multiple URLs (parallel processing) with custom options\nurls = [\"https://example1.com\", \"https://example2.com\", \"https://example3.com\"]\nresults = client.scrape(\n    \"urls\",\n    format=\"raw\",\n    country=\"gb\",\n    data_format=\"screenshot\"\n)\n```\n#### `search_chatGPT()`\n```python\nresult = client.search_chatGPT(\n    prompt=\"what day is it today?\"\n    # prompt=[\"What are the top 3 programming languages in 2024?\", \"Best hotels in New York\", \"Explain quantum computing\"],\n    # additional_prompt=[\"Can you explain why?\", \"Are you sure?\", \"\"]  \n)\n\nclient.download_content(result) # In case of timeout error, your snapshot_id is presented and you will downloaded it using download_snapshot()\n```\n\n#### `search_linkedin.`\nAvailable functions:\nclient.**`search_linkedin.posts()`**,client.**`search_linkedin.jobs()`**,client.**`search_linkedin.profiles()`**\n```python\n# Search LinkedIn profiles by name\nfirst_names = [\"James\", \"Idan\"]\nlast_names = [\"Smith\", \"Vilenski\"]\n\nresult = client.search_linkedin.profiles(first_names, last_names) # can also be changed to async\n# will print the snapshot_id, which can be downloaded using the download_snapshot() function\n```\n\n#### `scrape_linkedin.`\nAvailable functions\n\nclient.**`scrape_linkedin.posts()`**,client.**`scrape_linkedin.jobs()`**,client.**`scrape_linkedin.profiles()`**,client.**`scrape_linkedin.companies()`**\n```python\npost_urls = [\n    \"https://www.linkedin.com/posts/orlenchner_scrapecon-activity-7180537307521769472-oSYN?trk=public_profile\",\n    \"https://www.linkedin.com/pulse/getting-value-out-sunburst-guillaume-de-b%C3%A9naz%C3%A9?trk=public_profile_article_view\"\n]\n\nresults = client.scrape_linkedin.posts(post_urls) # can also be changed to async\n\nprint(results) # will print the snapshot_id, which can be downloaded using the download_snapshot() function\n```\n\n#### `crawl()`\n```python\n# Single URL crawl with filters\nresult = client.crawl(\n    url=\"https://example.com/\",\n    depth=2,\n    filter=\"/product/\",           # Only crawl URLs containing \"/product/\"\n    exclude_filter=\"/ads/\",       # Exclude URLs containing \"/ads/\"\n    custom_output_fields=[\"markdown\", \"url\", \"page_title\"]\n)\nprint(f\"Crawl initiated. Snapshot ID: {result['snapshot_id']}\")\n\n# Download crawl results\ndata = client.download_snapshot(result['snapshot_id'])\n```\n\n#### `parse_content()`\n```python\n# Parse scraping results\nscraped_data = client.scrape(\"https://example.com\")\nparsed = client.parse_content(\n    scraped_data, \n    extract_text=True, \n    extract_links=True, \n    extract_images=True\n)\nprint(f\"Title: {parsed['title']}\")\nprint(f\"Text length: {len(parsed['text'])}\")\nprint(f\"Found {len(parsed['links'])} links\")\n```\n\n#### `extract()`\n```python\n# Basic extraction (URL in query)\nresult = client.extract(\"Extract news headlines from CNN.com\")\nprint(result)\n\n# Using URL parameter with structured output\nschema = {\n    \"type\": \"object\",\n    \"properties\": {\n        \"headlines\": {\n            \"type\": \"array\",\n            \"items\": {\"type\": \"string\"}\n        }\n    },\n    \"required\": [\"headlines\"]\n}\n\nresult = client.extract(\n    query=\"Extract main headlines\",\n    url=\"https://cnn.com\",\n    output_scheme=schema\n)\nprint(result)  # Returns structured JSON matching the schema\n```\n\n#### `connect_browser()`\n```python\n# For Playwright (default browser_type)\nfrom playwright.sync_api import sync_playwright\n\nclient = bdclient(\n    api_token=\"your_api_token\",\n    browser_username=\"username-zone-browser_zone1\",\n    browser_password=\"your_password\"\n)\n\nwith sync_playwright() as playwright:\n    browser = playwright.chromium.connect_over_cdp(client.connect_browser())\n    page = browser.new_page()\n    page.goto(\"https://example.com\")\n    print(f\"Title: {page.title()}\")\n    browser.close()\n```\n\n**`download_content`** (for sync requests)\n```python\ndata = client.scrape(\"https://example.com\")\nclient.download_content(data) \n```\n**`download_snapshot`** (for async requests)\n```python\n# Save this function to seperate file\nclient.download_snapshot(\"\") # Insert your snapshot_id\n```\n\n> [!TIP]\n> Hover over the \"search\" or each function in the package, to see all its available parameters.\n\n![Hover-Over1](https://github.com/user-attachments/assets/51324485-5769-48d5-8f13-0b534385142e)\n\n## Function Parameters\n<details>\n    <summary>\ud83d\udd0d <strong>Search(...)</strong></summary>\n    \nSearches using the SERP API. Accepts the same arguments as scrape(), plus:\n\n```python\n- `query`: Search query string or list of queries\n- `search_engine`: \"google\", \"bing\", or \"yandex\"\n- Other parameters same as scrape()\n```\n    \n</details>\n<details>\n    <summary>\ud83d\udd17 <strong>scrape(...)</strong></summary>\n\nScrapes a single URL or list of URLs using the Web Unlocker.\n\n```python\n- `url`: Single URL string or list of URLs\n- `zone`: Zone identifier (auto-configured if None)\n- `format`: \"json\" or \"raw\"\n- `method`: HTTP method\n- `country`: Two-letter country code\n- `data_format`: \"markdown\", \"screenshot\", etc.\n- `async_request`: Enable async processing\n- `max_workers`: Max parallel workers (default: 10)\n- `timeout`: Request timeout in seconds (default: 30)\n```\n\n</details>\n<details>\n    <summary>\ud83d\udd77\ufe0f <strong>crawl(...)</strong></summary>\n\nDiscover and scrape multiple pages from websites with advanced filtering.\n\n```python\n- `url`: Single URL string or list of URLs to crawl (required)\n- `ignore_sitemap`: Ignore sitemap when crawling (optional)\n- `depth`: Maximum crawl depth relative to entered URL (optional)\n- `filter`: Regex to include only certain URLs (e.g. \"/product/\")\n- `exclude_filter`: Regex to exclude certain URLs (e.g. \"/ads/\")\n- `custom_output_fields`: List of output fields to include (optional)\n- `include_errors`: Include errors in response (default: True)\n```\n\n</details>\n<details>\n    <summary>\ud83d\udd0d <strong>parse_content(...)</strong></summary>\n\nExtract and parse useful information from API responses.\n\n```python\n- `data`: Response data from scrape(), search(), or crawl() methods\n- `extract_text`: Extract clean text content (default: True)\n- `extract_links`: Extract all links from content (default: False)\n- `extract_images`: Extract image URLs from content (default: False)\n```\n\n</details>\n<details>\n    <summary>\ud83e\udd16 <strong>extract(...)</strong></summary>\n\nExtract specific information from websites using AI-powered natural language processing with OpenAI.\n\n```python\n- `query`: Natural language query describing what to extract (required)\n- `url`: Single URL or list of URLs to extract from (optional - if not provided, extracts URL from query)\n- `output_scheme`: JSON Schema for OpenAI Structured Outputs (optional - enables reliable JSON responses)\n- `llm_key`: OpenAI API key (optional - uses OPENAI_API_KEY env variable if not provided)\n\n# Returns: ExtractResult object (string-like with metadata attributes)\n# Available attributes: .url, .query, .source_title, .token_usage, .content_length\n```\n\n</details>\n<details>\n    <summary>\ud83c\udf10 <strong>connect_browser(...)</strong></summary>\n\nGet WebSocket endpoint for browser automation with Bright Data's scraping browser.\n\n```python\n# Required client parameters:\n- `browser_username`: Username for browser API (format: \"username-zone-{zone_name}\")\n- `browser_password`: Password for browser API authentication\n- `browser_type`: \"playwright\", \"puppeteer\", or \"selenium\" (default: \"playwright\")\n\n# Returns: WebSocket endpoint URL string\n```\n\n</details>\n<details>\n    <summary>\ud83d\udcbe <strong>Download_Content(...)</strong></summary>\n\nSave content to local file.\n\n```python\n- `content`: Content to save\n- `filename`: Output filename (auto-generated if None)\n- `format`: File format (\"json\", \"csv\", \"txt\", etc.)\n```\n\n</details>\n<details>\n    <summary>\u2699\ufe0f <strong>Configuration Constants</strong></summary>\n\n<p></p>\n\n| Constant               | Default | Description                     |\n| ---------------------- | ------- | ------------------------------- |\n| `DEFAULT_MAX_WORKERS`  | `10`    | Max parallel tasks              |\n| `DEFAULT_TIMEOUT`      | `30`    | Request timeout (in seconds)    |\n| `CONNECTION_POOL_SIZE` | `20`    | Max concurrent HTTP connections |\n| `MAX_RETRIES`          | `3`     | Retry attempts on failure       |\n| `RETRY_BACKOFF_FACTOR` | `1.5`   | Exponential backoff multiplier  |\n\n</details>\n\n##  Advanced Configuration\n\n<details>\n    <summary>\ud83d\udd27 <strong>Environment Variables</strong></summary>\n\nCreate a `.env` file in your project root:\n\n```env\nBRIGHTDATA_API_TOKEN=your_bright_data_api_token\nWEB_UNLOCKER_ZONE=your_web_unlocker_zone        # Optional\nSERP_ZONE=your_serp_zone                        # Optional\nBROWSER_ZONE=your_browser_zone                  # Optional\nBRIGHTDATA_BROWSER_USERNAME=username-zone-name  # For browser automation\nBRIGHTDATA_BROWSER_PASSWORD=your_browser_password  # For browser automation\nOPENAI_API_KEY=your_openai_api_key              # For extract() function\n```\n\n</details>\n<details>\n    <summary>\ud83c\udf10 <strong>Manage Zones</strong></summary>\n\nList all active zones\n\n```python\n# List all active zones\nzones = client.list_zones()\nprint(f\"Found {len(zones)} zones\")\n```\n\nConfigure a custom zone name\n\n```python\nclient = bdclient(\n    api_token=\"your_token\",\n    auto_create_zones=False,          # Else it creates the Zone automatically\n    web_unlocker_zone=\"custom_zone\",\n    serp_zone=\"custom_serp_zone\"\n)\n\n```\n\n</details>\n<details>\n    <summary>\ud83d\udc65 <strong>Client Management</strong></summary>\n    \nbdclient Class - Complete parameter list\n    \n```python\nbdclient(\n    api_token: str = None,                    # Your Bright Data API token (required)\n    auto_create_zones: bool = True,           # Auto-create zones if they don't exist\n    web_unlocker_zone: str = None,            # Custom web unlocker zone name\n    serp_zone: str = None,                    # Custom SERP zone name\n    browser_zone: str = None,                 # Custom browser zone name\n    browser_username: str = None,             # Browser API username (format: \"username-zone-{zone_name}\")\n    browser_password: str = None,             # Browser API password\n    browser_type: str = \"playwright\",         # Browser automation tool: \"playwright\", \"puppeteer\", \"selenium\"\n    log_level: str = \"INFO\",                  # Logging level: \"DEBUG\", \"INFO\", \"WARNING\", \"ERROR\", \"CRITICAL\"\n    structured_logging: bool = True,          # Use structured JSON logging\n    verbose: bool = None                      # Enable verbose logging (overrides log_level if True)\n)\n```\n    \n</details>\n<details>\n    <summary>\u26a0\ufe0f <strong>Error Handling</strong></summary>\n    \nbdclient Class\n    \nThe SDK includes built-in input validation and retry logic\n\nIn case of zone related problems, use the **list_zones()** function to check your active zones, and check that your [**account settings**](https://brightdata.com/cp/setting/users), to verify that your API key have **\"admin permissions\"**.\n    \n</details>\n\n## Support\n\nFor any issues, contact [Bright Data support](https://brightdata.com/contact), or open an issue in this repository.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python SDK for Bright Data Web Scraping and SERP APIs",
    "version": "1.1.3",
    "project_urls": {
        "Bug Reports": "https://github.com/brightdata/bright-data-sdk-python/issues",
        "Changelog": "https://github.com/brightdata/bright-data-sdk-python/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/brightdata/bright-data-sdk-python#readme",
        "Homepage": "https://github.com/brightdata/bright-data-sdk-python",
        "Repository": "https://github.com/brightdata/bright-data-sdk-python"
    },
    "split_keywords": [
        "brightdata",
        " web scraping",
        " proxy",
        " serp",
        " search",
        " data extraction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "348984378a9e727e51fd61a3d237b0f586dd7ddb5a15213a2c26aa55243fb1d0",
                "md5": "44d79244e3f299ccf274be166715d867",
                "sha256": "0af57bb4d6b65dfdd5646b52ce95375124893d841468cb1e563c44597af5ce8a"
            },
            "downloads": -1,
            "filename": "brightdata_sdk-1.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "44d79244e3f299ccf274be166715d867",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 52740,
            "upload_time": "2025-09-07T18:21:13",
            "upload_time_iso_8601": "2025-09-07T18:21:13.411085Z",
            "url": "https://files.pythonhosted.org/packages/34/89/84378a9e727e51fd61a3d237b0f586dd7ddb5a15213a2c26aa55243fb1d0/brightdata_sdk-1.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "aa2266e5e9cb2336d5bf68bd2fea44d2c2f765828ad5baebe8170535f4597ae4",
                "md5": "c9fa2ed57c4afebcef8e1673eaa71014",
                "sha256": "33897308536f3320e9d71b0cec2001952fd80bd33bb8312674c4f76362597e31"
            },
            "downloads": -1,
            "filename": "brightdata_sdk-1.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "c9fa2ed57c4afebcef8e1673eaa71014",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 49773,
            "upload_time": "2025-09-07T18:21:14",
            "upload_time_iso_8601": "2025-09-07T18:21:14.703060Z",
            "url": "https://files.pythonhosted.org/packages/aa/22/66e5e9cb2336d5bf68bd2fea44d2c2f765828ad5baebe8170535f4597ae4/brightdata_sdk-1.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-07 18:21:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brightdata",
    "github_project": "brightdata-sdk-python",
    "github_not_found": true,
    "lcname": "brightdata-sdk"
}
        
Elapsed time: 2.82893s