# ScrapingAnt API client for Python
[![PyPI version](https://badge.fury.io/py/scrapingant-client.svg)](https://badge.fury.io/py/scrapingant-client)
`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your Python
applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires
python 3.6+.
<!-- toc -->
- [Quick Start](#quick-start)
- [API token](#api-token)
- [API Reference](#api-reference)
- [Exceptions](#exceptions)
- [Examples](#examples)
- [Useful links](#useful-links)
<!-- tocstop -->
## Quick Start
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Scrape the example.com site
result = client.general_request('https://example.com')
print(result.content)
```
## Install
```shell
pip install scrapingant-client
```
If you need async support:
```shell
pip install scrapingant-client[async]
```
## API token
In order to get API token you'll need to register at [ScrapingAnt Service](https://app.scrapingant.com)
## API Reference
All public classes, methods and their parameters can be inspected in this API reference.
#### ScrapingAntClient(token)
Main class of this library.
| Param | Type |
|-------|---------------------|
| token | <code>string</code> |
* * *
#### Common arguments
- ScrapingAntClient.general_request
- ScrapingAntClient.general_request_async
- ScrapingAntClient.markdown_request
- ScrapingAntClient.markdown_request_async
https://docs.scrapingant.com/request-response-format#available-parameters
| Param | Type | Default |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|------------|
| url | <code>string</code> | |
| method | <code>string</code> | GET |
| cookies | <code>List[Cookie]</code> | None |
| headers | <code>List[Dict[str, str]]</code> | None |
| js_snippet | <code>string</code> | None |
| proxy_type | <code>ProxyType</code> | datacenter |
| proxy_country | <code>str</code> | None |
| wait_for_selector | <code>str</code> | None |
| browser | <code>boolean</code> | True |
| return_page_source | <code>boolean</code> | False |
| data | same as [requests param 'data'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |
| json | same as [requests param 'json'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |
**IMPORTANT NOTE:** <code>js_snippet</code> will be encoded to Base64 automatically by the ScrapingAnt client library.
* * *
#### Cookie
Class defining cookie. Currently it supports only name and value
| Param | Type |
|-------|---------------------|
| name | <code>string</code> |
| value | <code>string</code> |
* * *
#### Response
Class defining response from API.
| Param | Type |
|-------------|----------------------------|
| content | <code>string</code> |
| cookies | <code>List[Cookie]</code> |
| status_code | <code>int</code> |
| text | <code>string</code> |
## Exceptions
`ScrapingantClientException` is base Exception class, used for all errors.
| Exception | Reason |
|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit |
| ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |
| ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |
| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |
| ScrapingantDetectedException | The anti-bot detection system has detected the request. Please, retry or change the request settings. |
| ScrapingantTimeoutException | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |
* * *
## Examples
### Sending custom cookies
```python3
from scrapingant_client import ScrapingAntClient
from scrapingant_client import Cookie
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
result = client.general_request(
'https://httpbin.org/cookies',
cookies=[
Cookie(name='cookieName1', value='cookieVal1'),
Cookie(name='cookieName2', value='cookieVal2'),
]
)
print(result.content)
# Response cookies is a list of Cookie objects
# They can be used in next requests
response_cookies = result.cookies
```
### Executing custom JS snippet
```python
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
customJsSnippet = """
var str = 'Hello, world!';
var htmlElement = document.getElementsByTagName('html')[0];
htmlElement.innerHTML = str;
"""
result = client.general_request(
'https://example.com',
js_snippet=customJsSnippet,
)
print(result.content)
```
### Exception handling and retries
```python
from scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
RETRIES_COUNT = 3
def parse_html(html: str):
... # Implement your data extraction here
parsed_data = None
for retry_number in range(RETRIES_COUNT):
try:
scrapingant_response = client.general_request(
'https://example.com',
)
except ScrapingantInvalidInputException as e:
print(f'Got invalid input exception: {{repr(e)}}')
break # We are not retrying if request params are not valid
except ScrapingantClientException as e:
print(f'Got ScrapingAnt exception {repr(e)}')
except Exception as e:
print(f'Got unexpected exception {repr(e)}') # please report this kind of exceptions by creating a new issue
else:
try:
parsed_data = parse_html(scrapingant_response.content)
break # Data is parsed successfully, so we dont need to retry
except Exception as e:
print(f'Got exception while parsing data {repr(e)}')
if parsed_data is None:
print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')
# Can sleep and retry later, or stop the script execution, and research the reason
else:
print(f'Successfully parsed data: {parsed_data}')
```
### Sending custom headers
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
result = client.general_request(
'https://httpbin.org/headers',
headers={
'test-header': 'test-value'
}
)
print(result.content)
# Http basic auth example
result = client.general_request(
'https://jigsaw.w3.org/HTTP/Basic/',
headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}
)
print(result.content)
```
### Simple async example
```python3
import asyncio
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
async def main():
# Scrape the example.com site
result = await client.general_request_async('https://example.com')
print(result.content)
asyncio.run(main())
```
### Sending POST request
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Sending POST request with json data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
json={"test": "test"},
)
print(result.content)
# Sending POST request with bytes data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
data=b'test_bytes',
)
print(result.content)
```
### Receiving markdown
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Sending POST request with json data
result = client.markdown_request(
url="https://example.com",
)
print(result.markdown)
```
## Useful links
- [Scrapingant API doumentation](https://docs.scrapingant.com)
- [Scrapingant JS Client](https://github.com/scrapingant/scrapingant-client-js)
Raw data
{
"_id": null,
"home_page": "https://github.com/ScrapingAnt/scrapingant-client-python",
"name": "scrapingant-client",
"maintainer": null,
"docs_url": null,
"requires_python": "~=3.7",
"maintainer_email": null,
"keywords": "scrapingant api scraper scraping",
"author": "andrii.kovalenko",
"author_email": "adrekoval@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e8/36/02a3e185f29625a0be066c0935d9678708a5405fb2fa3583e83b78472fe0/scrapingant-client-2.1.0.tar.gz",
"platform": null,
"description": "# ScrapingAnt API client for Python\n\n[![PyPI version](https://badge.fury.io/py/scrapingant-client.svg)](https://badge.fury.io/py/scrapingant-client)\n\n`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your Python\napplications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires\npython 3.6+.\n\n<!-- toc -->\n\n- [Quick Start](#quick-start)\n- [API token](#api-token)\n- [API Reference](#api-reference)\n- [Exceptions](#exceptions)\n- [Examples](#examples)\n- [Useful links](#useful-links)\n\n<!-- tocstop -->\n\n## Quick Start\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n# Scrape the example.com site\nresult = client.general_request('https://example.com')\nprint(result.content)\n```\n\n## Install\n\n```shell\npip install scrapingant-client\n```\n\nIf you need async support:\n\n```shell\npip install scrapingant-client[async]\n```\n\n## API token\n\nIn order to get API token you'll need to register at [ScrapingAnt Service](https://app.scrapingant.com)\n\n## API Reference\n\nAll public classes, methods and their parameters can be inspected in this API reference.\n\n#### ScrapingAntClient(token)\n\nMain class of this library.\n\n| Param | Type |\n|-------|---------------------|\n| token | <code>string</code> |\n\n* * *\n\n#### Common arguments\n - ScrapingAntClient.general_request\n - ScrapingAntClient.general_request_async\n - ScrapingAntClient.markdown_request\n - ScrapingAntClient.markdown_request_async\n\nhttps://docs.scrapingant.com/request-response-format#available-parameters\n\n| Param | Type | Default |\n|---------------------|----------------------------------------------------------------------------------------------------------------------------|------------|\n| url | <code>string</code> | |\n| method | <code>string</code> | GET |\n| cookies | <code>List[Cookie]</code> | None |\n| headers | <code>List[Dict[str, str]]</code> | None |\n| js_snippet | <code>string</code> | None |\n| proxy_type | <code>ProxyType</code> | datacenter | \n| proxy_country | <code>str</code> | None | \n| wait_for_selector | <code>str</code> | None |\n| browser | <code>boolean</code> | True |\n| return_page_source | <code>boolean</code> | False |\n| data | same as [requests param 'data'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |\n| json | same as [requests param 'json'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |\n\n**IMPORTANT NOTE:** <code>js_snippet</code> will be encoded to Base64 automatically by the ScrapingAnt client library.\n\n* * *\n\n#### Cookie\n\nClass defining cookie. Currently it supports only name and value\n\n| Param | Type | \n|-------|---------------------|\n| name | <code>string</code> | \n| value | <code>string</code> |\n\n* * *\n\n#### Response\n\nClass defining response from API.\n| Param | Type |\n|-------------|----------------------------|\n| content | <code>string</code> |\n| cookies | <code>List[Cookie]</code> |\n| status_code | <code>int</code> |\n| text | <code>string</code> |\n\n## Exceptions\n\n`ScrapingantClientException` is base Exception class, used for all errors.\n\n| Exception | Reason |\n|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------|\n| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit |\n| ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |\n| ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |\n| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |\n| ScrapingantDetectedException | The anti-bot detection system has detected the request. Please, retry or change the request settings. |\n| ScrapingantTimeoutException | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |\n\n* * *\n\n## Examples\n\n### Sending custom cookies\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\nfrom scrapingant_client import Cookie\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\nresult = client.general_request(\n 'https://httpbin.org/cookies',\n cookies=[\n Cookie(name='cookieName1', value='cookieVal1'),\n Cookie(name='cookieName2', value='cookieVal2'),\n ]\n)\nprint(result.content)\n# Response cookies is a list of Cookie objects\n# They can be used in next requests\nresponse_cookies = result.cookies \n```\n\n### Executing custom JS snippet\n\n```python\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\ncustomJsSnippet = \"\"\"\nvar str = 'Hello, world!';\nvar htmlElement = document.getElementsByTagName('html')[0];\nhtmlElement.innerHTML = str;\n\"\"\"\nresult = client.general_request(\n 'https://example.com',\n js_snippet=customJsSnippet,\n)\nprint(result.content)\n```\n\n### Exception handling and retries\n\n```python\nfrom scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\nRETRIES_COUNT = 3\n\n\ndef parse_html(html: str):\n ... # Implement your data extraction here\n\n\nparsed_data = None\nfor retry_number in range(RETRIES_COUNT):\n try:\n scrapingant_response = client.general_request(\n 'https://example.com',\n )\n except ScrapingantInvalidInputException as e:\n print(f'Got invalid input exception: {{repr(e)}}')\n break # We are not retrying if request params are not valid\n except ScrapingantClientException as e:\n print(f'Got ScrapingAnt exception {repr(e)}')\n except Exception as e:\n print(f'Got unexpected exception {repr(e)}') # please report this kind of exceptions by creating a new issue\n else:\n try:\n parsed_data = parse_html(scrapingant_response.content)\n break # Data is parsed successfully, so we dont need to retry\n except Exception as e:\n print(f'Got exception while parsing data {repr(e)}')\n\nif parsed_data is None:\n print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')\n # Can sleep and retry later, or stop the script execution, and research the reason \nelse:\n print(f'Successfully parsed data: {parsed_data}')\n```\n\n### Sending custom headers\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\nresult = client.general_request(\n 'https://httpbin.org/headers',\n headers={\n 'test-header': 'test-value'\n }\n)\nprint(result.content)\n\n# Http basic auth example\nresult = client.general_request(\n 'https://jigsaw.w3.org/HTTP/Basic/',\n headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}\n)\nprint(result.content)\n```\n\n### Simple async example\n\n```python3\nimport asyncio\n\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\n\nasync def main():\n # Scrape the example.com site\n result = await client.general_request_async('https://example.com')\n print(result.content)\n\n\nasyncio.run(main())\n```\n\n### Sending POST request\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\n# Sending POST request with json data\nresult = client.general_request(\n url=\"https://httpbin.org/post\",\n method=\"POST\",\n json={\"test\": \"test\"},\n)\nprint(result.content)\n\n# Sending POST request with bytes data\nresult = client.general_request(\n url=\"https://httpbin.org/post\",\n method=\"POST\",\n data=b'test_bytes',\n)\nprint(result.content)\n```\n\n### Receiving markdown\n\n```python3\nfrom scrapingant_client import ScrapingAntClient\n\nclient = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')\n\n# Sending POST request with json data\nresult = client.markdown_request(\n url=\"https://example.com\",\n)\nprint(result.markdown) \n```\n\n## Useful links\n\n- [Scrapingant API doumentation](https://docs.scrapingant.com)\n- [Scrapingant JS Client](https://github.com/scrapingant/scrapingant-client-js)\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Official python client for the ScrapingAnt API.",
"version": "2.1.0",
"project_urls": {
"Homepage": "https://github.com/ScrapingAnt/scrapingant-client-python"
},
"split_keywords": [
"scrapingant",
"api",
"scraper",
"scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fcf9a1efcabf555e08173d8f4cb8162095f544e9d6a25cb216d923ffeb99978e",
"md5": "faa728f5a08d2f4dcf9fd99c9b34df78",
"sha256": "bec3688dfa12a6ddb478c5dae9065ad4e9f120915f6209de9ace76ac3533c3f2"
},
"downloads": -1,
"filename": "scrapingant_client-2.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "faa728f5a08d2f4dcf9fd99c9b34df78",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.7",
"size": 11747,
"upload_time": "2024-07-16T19:15:11",
"upload_time_iso_8601": "2024-07-16T19:15:11.521292Z",
"url": "https://files.pythonhosted.org/packages/fc/f9/a1efcabf555e08173d8f4cb8162095f544e9d6a25cb216d923ffeb99978e/scrapingant_client-2.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e83602a3e185f29625a0be066c0935d9678708a5405fb2fa3583e83b78472fe0",
"md5": "d29378e8b5cb6971024b35bca7a69bb1",
"sha256": "3d1aff60b5782623830f736665cdc0215fd8b7f9da6a4876ee80f2b1cbdddcd5"
},
"downloads": -1,
"filename": "scrapingant-client-2.1.0.tar.gz",
"has_sig": false,
"md5_digest": "d29378e8b5cb6971024b35bca7a69bb1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.7",
"size": 8657,
"upload_time": "2024-07-16T19:15:13",
"upload_time_iso_8601": "2024-07-16T19:15:13.149814Z",
"url": "https://files.pythonhosted.org/packages/e8/36/02a3e185f29625a0be066c0935d9678708a5405fb2fa3583e83b78472fe0/scrapingant-client-2.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-16 19:15:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ScrapingAnt",
"github_project": "scrapingant-client-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "scrapingant-client"
}