scrapingbee


Namescrapingbee JSON
Version 2.0.1 PyPI version JSON
download
home_pagehttps://github.com/scrapingbee/scrapingbee-python
SummaryScrapingBee Python SDK
upload_time2023-10-17 15:36:16
maintainerPierre de Wulf
docs_urlNone
authorAri Bajo Rouvinen
requires_python>=3.7
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ScrapingBee Python SDK

[![lint-test-publish](https://github.com/scrapingbee/scrapingbee-python/workflows/lint-test-publish/badge.svg)](https://github.com/scrapingbee/scrapingbee-python/actions)
[![version](https://img.shields.io/pypi/v/scrapingbee.svg)](https://pypi.org/project/scrapingbee/)
[![python](https://img.shields.io/pypi/pyversions/scrapingbee.svg)](https://pypi.org/project/scrapingbee/)

[ScrapingBee](https://www.scrapingbee.com/) is a web scraping API that handles headless browsers and rotates proxies for you. The Python SDK makes it easier to interact with ScrapingBee's API.

## Installation

You can install ScrapingBee Python SDK with pip.

```bash
pip install scrapingbee
```

## Usage

The ScrapingBee Python SDK is a wrapper around the [requests](https://docs.python-requests.org/en/master/) library. ScrapingBee supports GET and POST requests.

Signup to ScrapingBee to [get your API key](https://app.scrapingbee.com/account/register) and some free credits to get started.

### Making a GET request

```python
>>> from scrapingbee import ScrapingBeeClient

>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')

>>> response = client.get(
    'https://www.scrapingbee.com/blog/', 
    params={
        # Block ads on the page you want to scrape	
        'block_ads': False,
        # Block images and CSS on the page you want to scrape	
        'block_resources': True,
        # Premium proxy geolocation
        'country_code': '',
        # Control the device the request will be sent from	
        'device': 'desktop',
        # Use some data extraction rules
        'extract_rules': {'title': 'h1'},
        # Wrap response in JSON
        'json_response': False,
        # Interact with the webpage you want to scrape 
        'js_scenario': {
            "instructions": [
                {"wait_for": "#slow_button"},
                {"click": "#slow_button"},
                {"scroll_x": 1000},
                {"wait": 1000},
                {"scroll_x": 1000},
                {"wait": 1000},            
            ]
        },
        # Use premium proxies to bypass difficult to scrape websites (10-25 credits/request)
        'premium_proxy': False,
        # Execute JavaScript code with a Headless Browser (5 credits/request)
        'render_js': True,
        # Return the original HTML before the JavaScript rendering	
        'return_page_source': False,
        # Return page screenshot as a png image
        'screenshot': False,
        # Take a full page screenshot without the window limitation
        'screenshot_full_page': False,
        # Transparently return the same HTTP code of the page requested.
        'transparent_status_code': False,
        # Wait, in miliseconds, before returning the response
        'wait': 0,
        # Wait for CSS selector before returning the response, ex ".title"
        'wait_for': '',
        # Set the browser window width in pixel
        'window_width': 1920,
        # Set the browser window height in pixel
        'window_height': 1080
    },
    headers={
        # Forward custom headers to the target website
        "key": "value"
    },
    cookies={
        # Forward custom cookies to the target website
        "name": "value"
    }
)
>>> response.text
'<!DOCTYPE html><html lang="en"><head>...'
```

ScrapingBee takes various parameters to render JavaScript, execute a custom JavaScript script, use a premium proxy from a specific geolocation and more. 

You can find all the supported parameters on [ScrapingBee's documentation](https://www.scrapingbee.com/documentation/).

You can send custom cookies and headers like you would normally do with the requests library.

## Screenshot

Here a little exemple on how to retrieve and store a screenshot from the ScrapingBee blog in its mobile resolution.

```python
>>> from scrapingbee import ScrapingBeeClient

>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')

>>> response = client.get(
    'https://www.scrapingbee.com/blog/', 
    params={
        # Take a screenshot
        'screenshot': True,
        # Specify that we need the full height
        'screenshot_full_page': True,
        # Specify a mobile width in pixel
        'window_width': 375
    }
)

>>> if response.ok:
        with open("./scrapingbee_mobile.png", "wb") as f:
            f.write(response.content)
```

## Using ScrapingBee with Scrapy

Scrapy is the most popular Python web scraping framework. You can easily [integrate ScrapingBee's API with the Scrapy middleware](https://github.com/ScrapingBee/scrapy-scrapingbee).


## Retries

The client includes a retry mechanism for 5XX responses.

```python
>>> from scrapingbee import ScrapingBeeClient

>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')

>>> response = client.get(
    'https://www.scrapingbee.com/blog/', 
    params={
        'render_js': True,
    },
    retries=5
)
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/scrapingbee/scrapingbee-python",
    "name": "scrapingbee",
    "maintainer": "Pierre de Wulf",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "hello@scrapingbee.com",
    "keywords": "",
    "author": "Ari Bajo Rouvinen",
    "author_email": "arimbr@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f8/4f/55ca54247c0c124b898ce428027f5cddd39c1377cb27e263edbbda85d348/scrapingbee-2.0.1.tar.gz",
    "platform": null,
    "description": "# ScrapingBee Python SDK\n\n[![lint-test-publish](https://github.com/scrapingbee/scrapingbee-python/workflows/lint-test-publish/badge.svg)](https://github.com/scrapingbee/scrapingbee-python/actions)\n[![version](https://img.shields.io/pypi/v/scrapingbee.svg)](https://pypi.org/project/scrapingbee/)\n[![python](https://img.shields.io/pypi/pyversions/scrapingbee.svg)](https://pypi.org/project/scrapingbee/)\n\n[ScrapingBee](https://www.scrapingbee.com/) is a web scraping API that handles headless browsers and rotates proxies for you. The Python SDK makes it easier to interact with ScrapingBee's API.\n\n## Installation\n\nYou can install ScrapingBee Python SDK with pip.\n\n```bash\npip install scrapingbee\n```\n\n## Usage\n\nThe ScrapingBee Python SDK is a wrapper around the [requests](https://docs.python-requests.org/en/master/) library. ScrapingBee supports GET and POST requests.\n\nSignup to ScrapingBee to [get your API key](https://app.scrapingbee.com/account/register) and some free credits to get started.\n\n### Making a GET request\n\n```python\n>>> from scrapingbee import ScrapingBeeClient\n\n>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')\n\n>>> response = client.get(\n    'https://www.scrapingbee.com/blog/', \n    params={\n        # Block ads on the page you want to scrape\t\n        'block_ads': False,\n        # Block images and CSS on the page you want to scrape\t\n        'block_resources': True,\n        # Premium proxy geolocation\n        'country_code': '',\n        # Control the device the request will be sent from\t\n        'device': 'desktop',\n        # Use some data extraction rules\n        'extract_rules': {'title': 'h1'},\n        # Wrap response in JSON\n        'json_response': False,\n        # Interact with the webpage you want to scrape \n        'js_scenario': {\n            \"instructions\": [\n                {\"wait_for\": \"#slow_button\"},\n                {\"click\": \"#slow_button\"},\n                {\"scroll_x\": 1000},\n                {\"wait\": 1000},\n                {\"scroll_x\": 1000},\n                {\"wait\": 1000},            \n            ]\n        },\n        # Use premium proxies to bypass difficult to scrape websites (10-25 credits/request)\n        'premium_proxy': False,\n        # Execute JavaScript code with a Headless Browser (5 credits/request)\n        'render_js': True,\n        # Return the original HTML before the JavaScript rendering\t\n        'return_page_source': False,\n        # Return page screenshot as a png image\n        'screenshot': False,\n        # Take a full page screenshot without the window limitation\n        'screenshot_full_page': False,\n        # Transparently return the same HTTP code of the page requested.\n        'transparent_status_code': False,\n        # Wait, in miliseconds, before returning the response\n        'wait': 0,\n        # Wait for CSS selector before returning the response, ex \".title\"\n        'wait_for': '',\n        # Set the browser window width in pixel\n        'window_width': 1920,\n        # Set the browser window height in pixel\n        'window_height': 1080\n    },\n    headers={\n        # Forward custom headers to the target website\n        \"key\": \"value\"\n    },\n    cookies={\n        # Forward custom cookies to the target website\n        \"name\": \"value\"\n    }\n)\n>>> response.text\n'<!DOCTYPE html><html lang=\"en\"><head>...'\n```\n\nScrapingBee takes various parameters to render JavaScript, execute a custom JavaScript script, use a premium proxy from a specific geolocation and more. \n\nYou can find all the supported parameters on [ScrapingBee's documentation](https://www.scrapingbee.com/documentation/).\n\nYou can send custom cookies and headers like you would normally do with the requests library.\n\n## Screenshot\n\nHere a little exemple on how to retrieve and store a screenshot from the ScrapingBee blog in its mobile resolution.\n\n```python\n>>> from scrapingbee import ScrapingBeeClient\n\n>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')\n\n>>> response = client.get(\n    'https://www.scrapingbee.com/blog/', \n    params={\n        # Take a screenshot\n        'screenshot': True,\n        # Specify that we need the full height\n        'screenshot_full_page': True,\n        # Specify a mobile width in pixel\n        'window_width': 375\n    }\n)\n\n>>> if response.ok:\n        with open(\"./scrapingbee_mobile.png\", \"wb\") as f:\n            f.write(response.content)\n```\n\n## Using ScrapingBee with Scrapy\n\nScrapy is the most popular Python web scraping framework. You can easily [integrate ScrapingBee's API with the Scrapy middleware](https://github.com/ScrapingBee/scrapy-scrapingbee).\n\n\n## Retries\n\nThe client includes a retry mechanism for 5XX responses.\n\n```python\n>>> from scrapingbee import ScrapingBeeClient\n\n>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')\n\n>>> response = client.get(\n    'https://www.scrapingbee.com/blog/', \n    params={\n        'render_js': True,\n    },\n    retries=5\n)\n```\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "ScrapingBee Python SDK",
    "version": "2.0.1",
    "project_urls": {
        "Homepage": "https://github.com/scrapingbee/scrapingbee-python"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b632c995c1d670f05e4879b5ba23a6d1690a930a519809111504050b889cf715",
                "md5": "3ba294de1eb887d3a5a1a63fed19e5f9",
                "sha256": "55c7eca71f5be891718795750b3149a2aead1458b0ae6716fdefeb6cf8e81e82"
            },
            "downloads": -1,
            "filename": "scrapingbee-2.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3ba294de1eb887d3a5a1a63fed19e5f9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 5201,
            "upload_time": "2023-10-17T15:36:15",
            "upload_time_iso_8601": "2023-10-17T15:36:15.137616Z",
            "url": "https://files.pythonhosted.org/packages/b6/32/c995c1d670f05e4879b5ba23a6d1690a930a519809111504050b889cf715/scrapingbee-2.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f84f55ca54247c0c124b898ce428027f5cddd39c1377cb27e263edbbda85d348",
                "md5": "4e55bbc1097e0a0a2bd424e9c4d88120",
                "sha256": "b33622d4c6111c0ae454fa23398a43dccf7b7e126cc44ac498d3411a8514069f"
            },
            "downloads": -1,
            "filename": "scrapingbee-2.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "4e55bbc1097e0a0a2bd424e9c4d88120",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 4730,
            "upload_time": "2023-10-17T15:36:16",
            "upload_time_iso_8601": "2023-10-17T15:36:16.884264Z",
            "url": "https://files.pythonhosted.org/packages/f8/4f/55ca54247c0c124b898ce428027f5cddd39c1377cb27e263edbbda85d348/scrapingbee-2.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-17 15:36:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scrapingbee",
    "github_project": "scrapingbee-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "scrapingbee"
}
        
Elapsed time: 0.19591s