scrapingant-client


Namescrapingant-client JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/ScrapingAnt/scrapingant-client-python
SummaryOfficial python client for the ScrapingAnt API.
upload_time2023-04-13 08:18:16
maintainer
docs_urlNone
authorandrii.kovalenko
requires_python~=3.7
licenseApache-2.0
keywords scrapingant api scraper scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ScrapingAnt API client for Python

[![PyPI version](https://badge.fury.io/py/scrapingant-client.svg)](https://badge.fury.io/py/scrapingant-client)

`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your Python
applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires
python 3.6+.

<!-- toc -->

- [Quick Start](#quick-start)
- [API token](#api-token)
- [API Reference](#api-reference)
- [Exceptions](#exceptions)
- [Examples](#examples)
- [Useful links](#useful-links)

<!-- tocstop -->

## Quick Start

```python3
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Scrape the example.com site
result = client.general_request('https://example.com')
print(result.content)
```

## Install

```shell
pip install scrapingant-client
```

If you need async support:

```shell
pip install scrapingant-client[async]
```

## API token

In order to get API token you'll need to register at [ScrapingAnt Service](https://app.scrapingant.com)

## API Reference

All public classes, methods and their parameters can be inspected in this API reference.

#### ScrapingAntClient(token)

Main class of this library.

| Param | Type                |
|-------|---------------------|
| token | <code>string</code> |

* * *

#### ScrapingAntClient.general_request and ScrapingAntClient.general_request_async

https://docs.scrapingant.com/request-response-format#available-parameters

| Param             | Type                                                                                                                       | Default    |
|-------------------|----------------------------------------------------------------------------------------------------------------------------|------------|
| url               | <code>string</code>                                                                                                        |            |
| method            | <code>string</code>                                                                                                        | GET        |
| cookies           | <code>List[Cookie]</code>                                                                                                  | None       |
| headers           | <code>List[Dict[str, str]]</code>                                                                                          | None       |
| js_snippet        | <code>string</code>                                                                                                        | None       |
| proxy_type        | <code>ProxyType</code>                                                                                                     | datacenter | 
| proxy_country     | <code>str</code>                                                                                                           | None       | 
| wait_for_selector | <code>str</code>                                                                                                           | None       |
| browser           | <code>boolean</code>                                                                                                       | True       |
| data              | same as [requests param 'data'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None       |
| json              | same as [requests param 'json'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None       |

**IMPORTANT NOTE:** <code>js_snippet</code> will be encoded to Base64 automatically by the ScrapingAnt client library.

* * *

#### Cookie

Class defining cookie. Currently it supports only name and value

| Param |  Type               | 
|-------|---------------------|
| name  | <code>string</code> | 
| value | <code>string</code> |

* * *

#### Response

Class defining response from API.
| Param       | Type                       |
|-------------|----------------------------|
| content     | <code>string</code>        |
| cookies     | <code>List[Cookie]</code>  |
| status_code | <code>int</code>           |
| text        | <code>string</code>        |

## Exceptions

`ScrapingantClientException` is base Exception class, used for all errors.

| Exception                            | Reason                                                                                                                       |
|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| ScrapingantInvalidTokenException     | The API token is wrong or you have exceeded the API calls request limit                                                      |
| ScrapingantInvalidInputException     | Invalid value provided. Please, look into error message for more info                                                        |
| ScrapingantInternalException         | Something went wrong with the server side code. Try again later or contact ScrapingAnt support                               |
| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally                                                                 |
| ScrapingantDetectedException         | The anti-bot detection system has detected the request. Please, retry or change the request settings.                        |
| ScrapingantTimeoutException          | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |

* * *

## Examples

### Sending custom cookies

```python3
from scrapingant_client import ScrapingAntClient
from scrapingant_client import Cookie

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

result = client.general_request(
    'https://httpbin.org/cookies',
    cookies=[
        Cookie(name='cookieName1', value='cookieVal1'),
        Cookie(name='cookieName2', value='cookieVal2'),
    ]
)
print(result.content)
# Response cookies is a list of Cookie objects
# They can be used in next requests
response_cookies = result.cookies 
```

### Executing custom JS snippet

```python
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

customJsSnippet = """
var str = 'Hello, world!';
var htmlElement = document.getElementsByTagName('html')[0];
htmlElement.innerHTML = str;
"""
result = client.general_request(
    'https://example.com',
    js_snippet=customJsSnippet,
)
print(result.content)
```

### Exception handling and retries

```python
from scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

RETRIES_COUNT = 3


def parse_html(html: str):
    ...  # Implement your data extraction here


parsed_data = None
for retry_number in range(RETRIES_COUNT):
    try:
        scrapingant_response = client.general_request(
            'https://example.com',
        )
    except ScrapingantInvalidInputException as e:
        print(f'Got invalid input exception: {{repr(e)}}')
        break  # We are not retrying if request params are not valid
    except ScrapingantClientException as e:
        print(f'Got ScrapingAnt exception {repr(e)}')
    except Exception as e:
        print(f'Got unexpected exception {repr(e)}')  # please report this kind of exceptions by creating a new issue
    else:
        try:
            parsed_data = parse_html(scrapingant_response.content)
            break  # Data is parsed successfully, so we dont need to retry
        except Exception as e:
            print(f'Got exception while parsing data {repr(e)}')

if parsed_data is None:
    print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')
    # Can sleep and retry later, or stop the script execution, and research the reason 
else:
    print(f'Successfully parsed data: {parsed_data}')
```

### Sending custom headers

```python3
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

result = client.general_request(
    'https://httpbin.org/headers',
    headers={
        'test-header': 'test-value'
    }
)
print(result.content)

# Http basic auth example
result = client.general_request(
    'https://jigsaw.w3.org/HTTP/Basic/',
    headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}
)
print(result.content)
```

### Simple async example

```python3
import asyncio

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')


async def main():
    # Scrape the example.com site
    result = await client.general_request_async('https://example.com')
    print(result.content)


asyncio.run(main())
```

### Sending POST request

```python3
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

# Sending POST request with json data
result = client.general_request(
    url="https://httpbin.org/post",
    method="POST",
    json={"test": "test"},
)
print(result.content)

# Sending POST request with bytes data
result = client.general_request(
    url="https://httpbin.org/post",
    method="POST",
    data=b'test_bytes',
)
print(result.content)
```

## Useful links

- [Scrapingant API doumentation](https://docs.scrapingant.com)
- [Scrapingant JS Client](https://github.com/scrapingant/scrapingant-client-js)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ScrapingAnt/scrapingant-client-python",
    "name": "scrapingant-client",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "~=3.7",
    "maintainer_email": "",
    "keywords": "scrapingant api scraper scraping",
    "author": "andrii.kovalenko",
    "author_email": "adrekoval@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2e/4b/430f97557174988fb386d52771eb4d58c263dec0e5aed45f062ed81d55bb/scrapingant-client-2.0.0.tar.gz",
    "platform": null,
    "description": "# ScrapingAnt API client for Python\n\n[![PyPI version](https://badge.fury.io/py/scrapingant-client.svg)](https://badge.fury.io/py/scrapingant-client)\n\n`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your Python\napplications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires\npython 3.6+.\n\n<!-- toc -->\n\n- [Quick Start](#quick-start)\n- [API token](#api-token)\n- [API Reference](#api-reference)\n- [Exceptions](#exceptions)\n- [Examples](#examples)\n- [Useful links](#useful-links)\n\n<!-- tocstop -->\n\n## Quick Start\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n# Scrape the example.com site\nresult = client.general_request('https://example.com')\nprint(result.content)\n```\n\n## Install\n\n```shell\npip install scrapingant-client\n```\n\nIf you need async support:\n\n```shell\npip install scrapingant-client[async]\n```\n\n## API token\n\nIn order to get API token you'll need to register at [ScrapingAnt Service](https://app.scrapingant.com)\n\n## API Reference\n\nAll public classes, methods and their parameters can be inspected in this API reference.\n\n#### ScrapingAntClient(token)\n\nMain class of this library.\n\n| Param | Type                |\n|-------|---------------------|\n| token | <code>string</code> |\n\n* * *\n\n#### ScrapingAntClient.general_request and ScrapingAntClient.general_request_async\n\nhttps://docs.scrapingant.com/request-response-format#available-parameters\n\n| Param             | Type                                                                                                                       | Default    |\n|-------------------|----------------------------------------------------------------------------------------------------------------------------|------------|\n| url               | <code>string</code>                                                                                                        |            |\n| method            | <code>string</code>                                                                                                        | GET        |\n| cookies           | <code>List[Cookie]</code>                                                                                                  | None       |\n| headers           | <code>List[Dict[str, str]]</code>                                                                                          | None       |\n| js_snippet        | <code>string</code>                                                                                                        | None       |\n| proxy_type        | <code>ProxyType</code>                                                                                                     | datacenter | \n| proxy_country     | <code>str</code>                                                                                                           | None       | \n| wait_for_selector | <code>str</code>                                                                                                           | None       |\n| browser           | <code>boolean</code>                                                                                                       | True       |\n| data              | same as [requests param 'data'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None       |\n| json              | same as [requests param 'json'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None       |\n\n**IMPORTANT NOTE:** <code>js_snippet</code> will be encoded to Base64 automatically by the ScrapingAnt client library.\n\n* * *\n\n#### Cookie\n\nClass defining cookie. Currently it supports only name and value\n\n| Param |  Type               | \n|-------|---------------------|\n| name  | <code>string</code> | \n| value | <code>string</code> |\n\n* * *\n\n#### Response\n\nClass defining response from API.\n| Param       | Type                       |\n|-------------|----------------------------|\n| content     | <code>string</code>        |\n| cookies     | <code>List[Cookie]</code>  |\n| status_code | <code>int</code>           |\n| text        | <code>string</code>        |\n\n## Exceptions\n\n`ScrapingantClientException` is base Exception class, used for all errors.\n\n| Exception                            | Reason                                                                                                                       |\n|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------|\n| ScrapingantInvalidTokenException     | The API token is wrong or you have exceeded the API calls request limit                                                      |\n| ScrapingantInvalidInputException     | Invalid value provided. Please, look into error message for more info                                                        |\n| ScrapingantInternalException         | Something went wrong with the server side code. Try again later or contact ScrapingAnt support                               |\n| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally                                                                 |\n| ScrapingantDetectedException         | The anti-bot detection system has detected the request. Please, retry or change the request settings.                        |\n| ScrapingantTimeoutException          | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |\n\n* * *\n\n## Examples\n\n### Sending custom cookies\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\nfrom scrapingant_client import Cookie\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\nresult = client.general_request(\n    'https://httpbin.org/cookies',\n    cookies=[\n        Cookie(name='cookieName1', value='cookieVal1'),\n        Cookie(name='cookieName2', value='cookieVal2'),\n    ]\n)\nprint(result.content)\n# Response cookies is a list of Cookie objects\n# They can be used in next requests\nresponse_cookies = result.cookies \n```\n\n### Executing custom JS snippet\n\n```python\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\ncustomJsSnippet = \"\"\"\nvar str = 'Hello, world!';\nvar htmlElement = document.getElementsByTagName('html')[0];\nhtmlElement.innerHTML = str;\n\"\"\"\nresult = client.general_request(\n    'https://example.com',\n    js_snippet=customJsSnippet,\n)\nprint(result.content)\n```\n\n### Exception handling and retries\n\n```python\nfrom scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\nRETRIES_COUNT = 3\n\n\ndef parse_html(html: str):\n    ...  # Implement your data extraction here\n\n\nparsed_data = None\nfor retry_number in range(RETRIES_COUNT):\n    try:\n        scrapingant_response = client.general_request(\n            'https://example.com',\n        )\n    except ScrapingantInvalidInputException as e:\n        print(f'Got invalid input exception: {{repr(e)}}')\n        break  # We are not retrying if request params are not valid\n    except ScrapingantClientException as e:\n        print(f'Got ScrapingAnt exception {repr(e)}')\n    except Exception as e:\n        print(f'Got unexpected exception {repr(e)}')  # please report this kind of exceptions by creating a new issue\n    else:\n        try:\n            parsed_data = parse_html(scrapingant_response.content)\n            break  # Data is parsed successfully, so we dont need to retry\n        except Exception as e:\n            print(f'Got exception while parsing data {repr(e)}')\n\nif parsed_data is None:\n    print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')\n    # Can sleep and retry later, or stop the script execution, and research the reason \nelse:\n    print(f'Successfully parsed data: {parsed_data}')\n```\n\n### Sending custom headers\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\nresult = client.general_request(\n    'https://httpbin.org/headers',\n    headers={\n        'test-header': 'test-value'\n    }\n)\nprint(result.content)\n\n# Http basic auth example\nresult = client.general_request(\n    'https://jigsaw.w3.org/HTTP/Basic/',\n    headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}\n)\nprint(result.content)\n```\n\n### Simple async example\n\n```python3\nimport asyncio\n\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\n\nasync def main():\n    # Scrape the example.com site\n    result = await client.general_request_async('https://example.com')\n    print(result.content)\n\n\nasyncio.run(main())\n```\n\n### Sending POST request\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\n# Sending POST request with json data\nresult = client.general_request(\n    url=\"https://httpbin.org/post\",\n    method=\"POST\",\n    json={\"test\": \"test\"},\n)\nprint(result.content)\n\n# Sending POST request with bytes data\nresult = client.general_request(\n    url=\"https://httpbin.org/post\",\n    method=\"POST\",\n    data=b'test_bytes',\n)\nprint(result.content)\n```\n\n## Useful links\n\n- [Scrapingant API doumentation](https://docs.scrapingant.com)\n- [Scrapingant JS Client](https://github.com/scrapingant/scrapingant-client-js)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Official python client for the ScrapingAnt API.",
    "version": "2.0.0",
    "split_keywords": [
        "scrapingant",
        "api",
        "scraper",
        "scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "094dabfdcb735b8999df0d6f80ab878ca7dce4c70c63b6c2181b8f256d4f8809",
                "md5": "7a058ec12a9acbce19f7e0f24f7e7938",
                "sha256": "50c7e3a3297f2369f63a07a93a5626d6a30c8f1988e56fe37d82ae20aa53c358"
            },
            "downloads": -1,
            "filename": "scrapingant_client-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7a058ec12a9acbce19f7e0f24f7e7938",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "~=3.7",
            "size": 11271,
            "upload_time": "2023-04-13T08:18:14",
            "upload_time_iso_8601": "2023-04-13T08:18:14.213208Z",
            "url": "https://files.pythonhosted.org/packages/09/4d/abfdcb735b8999df0d6f80ab878ca7dce4c70c63b6c2181b8f256d4f8809/scrapingant_client-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e4b430f97557174988fb386d52771eb4d58c263dec0e5aed45f062ed81d55bb",
                "md5": "92e52ddc44f65d2e4015f86a9b98a0c6",
                "sha256": "9a0a27d7607db2e738ccca7a7dd5f20c808a02f5c15ebf059e5be5c012080359"
            },
            "downloads": -1,
            "filename": "scrapingant-client-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "92e52ddc44f65d2e4015f86a9b98a0c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "~=3.7",
            "size": 8218,
            "upload_time": "2023-04-13T08:18:16",
            "upload_time_iso_8601": "2023-04-13T08:18:16.321472Z",
            "url": "https://files.pythonhosted.org/packages/2e/4b/430f97557174988fb386d52771eb4d58c263dec0e5aed45f062ed81d55bb/scrapingant-client-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-13 08:18:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "ScrapingAnt",
    "github_project": "scrapingant-client-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "scrapingant-client"
}
        
Elapsed time: 0.06236s