zenrows


Namezenrows JSON
Version 1.3.2 PyPI version JSON
download
home_pagehttps://github.com/ZenRows/zenrows-python-sdk
SummaryPython client for ZenRows API
upload_time2023-11-13 15:18:25
maintainer
docs_urlNone
authorAnder Rodriguez
requires_python>=3.6
licenseMIT
keywords zenrows scraper scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ZenRows Python SDK
SDK to access [ZenRows](https://www.zenrows.com/) API directly from Python. ZenRows handles proxies rotation, headless browsers, and CAPTCHAs for you.

## Installation
Install the SDK with pip.

```bash
pip install zenrows
```

## Usage
Start using the API by [creating your API Key](https://www.zenrows.com/register?p=free).

The SDK uses [requests](https://docs.python-requests.org/) for HTTP requests. The client's response will be a requests `Response`.

It also uses [Retry](https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html) to automatically retry failed requests (status codes 429, 500, 502, 503, and 504). Retries are not active by default; you need to specify the number of retries, as shown below. It already includes an exponential back-off retry delay between failed requests.

```python
from zenrows import ZenRowsClient

client = ZenRowsClient("YOUR-API-KEY", retries=1)
url = "https://www.zenrows.com/"

response = client.get(url, params={
    # Our algorithm allows to automatically extract content from any website
    "autoparse": False,

    # CSS Selectors for data extraction (i.e. {"links":"a @href"} to get href attributes from links)
    "css_extractor": "",

    # Enable Javascript with a headless browser (5 credits)
    "js_render": False,

    # Use residential proxies (10 credits)
    "premium_proxy": False,

    # Make your request from a given country. Requires premium_proxy
    "proxy_country": "us",

    # Wait for a given CSS Selector to load in the DOM. Requires js_render
    "wait_for": ".content",

    # Wait a fixed amount of time in milliseconds. Requires js_render
    "wait": 2500,

    # Block specific resources from loading, check docs for the full list. Requires js_render
    "block_resources": "image,media,font",

    # Change the browser's window width and height. Requires js_render
    "window_width": 1920,
    "window_height": 1080,

    # Will automatically use either desktop or mobile user agents in the headers
    "device": "desktop",

    # Will return the status code returned by the website
    "original_status": False,
}, headers={
    "Referrer": "https://www.google.com",
    "User-Agent": "MyCustomUserAgent",
})

print(response.text)
```

You can also pass optional `params` and `headers`; the list above is a reference. For more info, check out [the documentation page](https://www.zenrows.com/documentation).

Sending headers to the target URL will overwrite our defaults. Be careful when doing it and contact us if there is any problem.

### POST Requests

The SDK also offers POST requests by calling the `client.post` method. It can receive a new parameter `data` that represents the data sent in, for example, a form. 

```python
from zenrows import ZenRowsClient

client = ZenRowsClient("YOUR-API-KEY", retries=1)
url = "https://httpbin.org/anything"

response = client.post(url, data={
    "key1": "value1",
    "key2": "value2",
})

print(response.text)
```

### Concurrency

To limit the concurrency, it uses [asyncio](https://docs.python.org/3/library/asyncio.html), which will simultaneously send a maximum of requests. The concurrency is determined by the plan you are in, so take a look at the [pricing](https://www.zenrows.com/pricing) and set it accordingly. Take into account that each client instance will have its own limit, meaning that two different scripts will not share it, and 429 (Too Many Requests) errors might arise.

The main difference with the sequential snippet above is `client.get_async` instead of `client.get`. The rest will work exactly the same, and we will support the `get` function. But the async is necessary to parallelize calls and allow async/await syntax. Remember to run the scripts with `asyncio.run` or it will fail with a `coroutine 'main' was never awaited` error.

We use `asyncio.gather` in the example below. It will wait for all the calls to finish, and the results are stored in a `responses` array. The whole list of URLs will run, even if some fail. Then each response will have the status, request, response content, and other values as usual.

```python
from zenrows import ZenRowsClient
import asyncio

client = ZenRowsClient("YOUR-API-KEY", concurrency=5, retries=1)

async def main():
    urls = [
        "https://www.zenrows.com/",
        # ...
    ]
    responses = await asyncio.gather(*[client.get_async(url) for url in urls])

    for response in responses:
        print(response.text)

asyncio.run(main())
```

## Contributing
Pull requests are welcome. For significant changes, please open an issue first to discuss what you would like to change.

## License
[MIT](./LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ZenRows/zenrows-python-sdk",
    "name": "zenrows",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "zenrows scraper scraping",
    "author": "Ander Rodriguez",
    "author_email": "ander@zenrows.com",
    "download_url": "https://files.pythonhosted.org/packages/92/c1/46bf621f872513236d32310662a5e40b0382845ed1d89412a6bf5e40e9eb/zenrows-1.3.2.tar.gz",
    "platform": null,
    "description": "# ZenRows Python SDK\nSDK to access [ZenRows](https://www.zenrows.com/) API directly from Python. ZenRows handles proxies rotation, headless browsers, and CAPTCHAs for you.\n\n## Installation\nInstall the SDK with pip.\n\n```bash\npip install zenrows\n```\n\n## Usage\nStart using the API by [creating your API Key](https://www.zenrows.com/register?p=free).\n\nThe SDK uses [requests](https://docs.python-requests.org/) for HTTP requests. The client's response will be a requests `Response`.\n\nIt also uses [Retry](https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html) to automatically retry failed requests (status codes 429, 500, 502, 503, and 504). Retries are not active by default; you need to specify the number of retries, as shown below. It already includes an exponential back-off retry delay between failed requests.\n\n```python\nfrom zenrows import ZenRowsClient\n\nclient = ZenRowsClient(\"YOUR-API-KEY\", retries=1)\nurl = \"https://www.zenrows.com/\"\n\nresponse = client.get(url, params={\n    # Our algorithm allows to automatically extract content from any website\n    \"autoparse\": False,\n\n    # CSS Selectors for data extraction (i.e. {\"links\":\"a @href\"} to get href attributes from links)\n    \"css_extractor\": \"\",\n\n    # Enable Javascript with a headless browser (5 credits)\n    \"js_render\": False,\n\n    # Use residential proxies (10 credits)\n    \"premium_proxy\": False,\n\n    # Make your request from a given country. Requires premium_proxy\n    \"proxy_country\": \"us\",\n\n    # Wait for a given CSS Selector to load in the DOM. Requires js_render\n    \"wait_for\": \".content\",\n\n    # Wait a fixed amount of time in milliseconds. Requires js_render\n    \"wait\": 2500,\n\n    # Block specific resources from loading, check docs for the full list. Requires js_render\n    \"block_resources\": \"image,media,font\",\n\n    # Change the browser's window width and height. Requires js_render\n    \"window_width\": 1920,\n    \"window_height\": 1080,\n\n    # Will automatically use either desktop or mobile user agents in the headers\n    \"device\": \"desktop\",\n\n    # Will return the status code returned by the website\n    \"original_status\": False,\n}, headers={\n    \"Referrer\": \"https://www.google.com\",\n    \"User-Agent\": \"MyCustomUserAgent\",\n})\n\nprint(response.text)\n```\n\nYou can also pass optional `params` and `headers`; the list above is a reference. For more info, check out [the documentation page](https://www.zenrows.com/documentation).\n\nSending headers to the target URL will overwrite our defaults. Be careful when doing it and contact us if there is any problem.\n\n### POST Requests\n\nThe SDK also offers POST requests by calling the `client.post` method. It can receive a new parameter `data` that represents the data sent in, for example, a form. \n\n```python\nfrom zenrows import ZenRowsClient\n\nclient = ZenRowsClient(\"YOUR-API-KEY\", retries=1)\nurl = \"https://httpbin.org/anything\"\n\nresponse = client.post(url, data={\n    \"key1\": \"value1\",\n    \"key2\": \"value2\",\n})\n\nprint(response.text)\n```\n\n### Concurrency\n\nTo limit the concurrency, it uses [asyncio](https://docs.python.org/3/library/asyncio.html), which will simultaneously send a maximum of requests. The concurrency is determined by the plan you are in, so take a look at the [pricing](https://www.zenrows.com/pricing) and set it accordingly. Take into account that each client instance will have its own limit, meaning that two different scripts will not share it, and 429 (Too Many Requests) errors might arise.\n\nThe main difference with the sequential snippet above is `client.get_async` instead of `client.get`. The rest will work exactly the same, and we will support the `get` function. But the async is necessary to parallelize calls and allow async/await syntax. Remember to run the scripts with `asyncio.run` or it will fail with a `coroutine 'main' was never awaited` error.\n\nWe use `asyncio.gather` in the example below. It will wait for all the calls to finish, and the results are stored in a `responses` array. The whole list of URLs will run, even if some fail. Then each response will have the status, request, response content, and other values as usual.\n\n```python\nfrom zenrows import ZenRowsClient\nimport asyncio\n\nclient = ZenRowsClient(\"YOUR-API-KEY\", concurrency=5, retries=1)\n\nasync def main():\n    urls = [\n        \"https://www.zenrows.com/\",\n        # ...\n    ]\n    responses = await asyncio.gather(*[client.get_async(url) for url in urls])\n\n    for response in responses:\n        print(response.text)\n\nasyncio.run(main())\n```\n\n## Contributing\nPull requests are welcome. For significant changes, please open an issue first to discuss what you would like to change.\n\n## License\n[MIT](./LICENSE)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python client for ZenRows API",
    "version": "1.3.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/ZenRows/zenrows-python-sdk/issues",
        "Documentation": "https://www.zenrows.com/documentation",
        "Homepage": "https://github.com/ZenRows/zenrows-python-sdk"
    },
    "split_keywords": [
        "zenrows",
        "scraper",
        "scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3c0a744bd008947c48d64aaab53799bbae6c6f884142faf197e27641b4bb69c9",
                "md5": "d7797a73691054614252eae18e0d7135",
                "sha256": "41f3c7403872bf69f6d46926496a018b8c0dd34fd811d250a269f10917598d88"
            },
            "downloads": -1,
            "filename": "zenrows-1.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d7797a73691054614252eae18e0d7135",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5574,
            "upload_time": "2023-11-13T15:18:23",
            "upload_time_iso_8601": "2023-11-13T15:18:23.659861Z",
            "url": "https://files.pythonhosted.org/packages/3c/0a/744bd008947c48d64aaab53799bbae6c6f884142faf197e27641b4bb69c9/zenrows-1.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92c146bf621f872513236d32310662a5e40b0382845ed1d89412a6bf5e40e9eb",
                "md5": "c7614c9c7130af9ef0f4e6a452a6719c",
                "sha256": "1465bba1c53b42a0ca7dd05239a1f27d4cca5379a5543cf8a483ec89906e5f8c"
            },
            "downloads": -1,
            "filename": "zenrows-1.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c7614c9c7130af9ef0f4e6a452a6719c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 8266,
            "upload_time": "2023-11-13T15:18:25",
            "upload_time_iso_8601": "2023-11-13T15:18:25.146069Z",
            "url": "https://files.pythonhosted.org/packages/92/c1/46bf621f872513236d32310662a5e40b0382845ed1d89412a6bf5e40e9eb/zenrows-1.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-13 15:18:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ZenRows",
    "github_project": "zenrows-python-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "zenrows"
}
        
Elapsed time: 0.14583s