# reqX
### The Efficient Web Scraping Library
A flexible interface for quickly making high volumes of asynchronous HTTP requests.
## Todo
- [ ] Spoof TLS/JA3/HTTP2 fingerprint option
- [ ] Data management system
## Examples
### Run batches of async requests
These attributes can be passed iterables to build a request for each value.
```python
{'headers', 'params', 'cookies', 'content', 'data', 'files', 'json'}
```
E.g. Supplying these parameters will result in 15 requests. `len(size) * len(page) = 15`
```python
'size': [37, 61, 13],
'page': range(2, 7),
```
Contrived example showing how to batch unrelated/independent requests together to be collected and sent asynchronously.
```python
import asyncio
import reqx
urls = ['https://www.bcliquorstores.com/ajax/browse']
params = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}
headers = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}
urls2 = ['https://jsonplaceholder.typicode.com/posts']
payload = {'title': 'foo', 'body': 'bar', 'userId': range(1, 4)}
headers2 = {'content-type': 'application/json'}
urls3 = ['https://jsonplaceholder.typicode.com/posts/1']
payload2 = {'title': 'foo', 'body': 'bar', 'userId': range(7, 11)}
headers3 = {'content-type': 'application/json'}
res = asyncio.run(
reqx.collect(
[
*reqx.send('GET', urls, params=params, headers=headers, debug=True, m=20, b=2, max_retries=8),
*reqx.send('POST', urls2, json=payload, headers=headers2, debug=True, m=11, b=2, max_retries=3),
*reqx.send('PUT', urls3, json=payload2, headers=headers3, debug=True, m=7, b=2, max_retries=5),
],
http2=True,
desc='Example requests'
)
)
```
#### Explanation:
```python
asyncio.run(
# collect these tasks (asyncio.gather)
reqx.collect(
# iterable of partials
[
# collection of partials
*reqx.send(
# httpx.Request options
'GET',
urls,
params=params,
headers=headers,
debug=True,
# exponential backoff options
m=20,
b=2,
max_retries=8
),
# more collections of partials
...,
...,
],
# httpx.AsyncClient options
http2=True,
# add description to display a progress bar
desc='Example requests'
)
)
```
### Run single batch of async requests
Convenience function to send a single batch of requests.
```python
import asyncio
import reqx
urls = ['https://www.bcliquorstores.com/ajax/browse']
params = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}
headers = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}
# same options available in `reqx.send()`
res = asyncio.run(reqx.r('GET', urls, params=params, headers=headers, http2=True, desc='GET'))
```
### Run batch of downloads
Convenience function to download files.
```python
import asyncio
import reqx
urls = [...]
asyncio.run(reqx.download(urls, desc='Downloading Files'))
```
Raw data
{
"_id": null,
"home_page": "https://github.com/trevorhobenshield/reqx",
"name": "reqx",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10.10",
"maintainer_email": "",
"keywords": "http networking async concurrency efficient webscraping scraping scraper automation",
"author": "Trevor Hobenshield",
"author_email": "trevorhobenshield@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/06/96/3e102034f72f68f3c8990035b66d59aec6c0266ce6228d0a86897d90e89d/reqx-0.0.14.tar.gz",
"platform": null,
"description": "# reqX\n\n### The Efficient Web Scraping Library\n\nA flexible interface for quickly making high volumes of asynchronous HTTP requests.\n\n## Todo\n\n- [ ] Spoof TLS/JA3/HTTP2 fingerprint option\n- [ ] Data management system\n\n## Examples\n\n### Run batches of async requests\n\nThese attributes can be passed iterables to build a request for each value.\n\n```python\n{'headers', 'params', 'cookies', 'content', 'data', 'files', 'json'}\n```\n\nE.g. Supplying these parameters will result in 15 requests. `len(size) * len(page) = 15`\n\n```python\n'size': [37, 61, 13],\n'page': range(2, 7),\n```\n\nContrived example showing how to batch unrelated/independent requests together to be collected and sent asynchronously.\n\n```python\nimport asyncio\nimport reqx\n\nurls = ['https://www.bcliquorstores.com/ajax/browse']\nparams = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}\nheaders = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}\n\nurls2 = ['https://jsonplaceholder.typicode.com/posts']\npayload = {'title': 'foo', 'body': 'bar', 'userId': range(1, 4)}\nheaders2 = {'content-type': 'application/json'}\n\nurls3 = ['https://jsonplaceholder.typicode.com/posts/1']\npayload2 = {'title': 'foo', 'body': 'bar', 'userId': range(7, 11)}\nheaders3 = {'content-type': 'application/json'}\n\nres = asyncio.run(\n reqx.collect(\n [\n *reqx.send('GET', urls, params=params, headers=headers, debug=True, m=20, b=2, max_retries=8),\n *reqx.send('POST', urls2, json=payload, headers=headers2, debug=True, m=11, b=2, max_retries=3),\n *reqx.send('PUT', urls3, json=payload2, headers=headers3, debug=True, m=7, b=2, max_retries=5),\n ],\n http2=True,\n desc='Example requests'\n )\n)\n```\n\n#### Explanation:\n\n```python\nasyncio.run(\n # collect these tasks (asyncio.gather)\n reqx.collect(\n # iterable of partials\n [\n # collection of partials\n *reqx.send(\n # httpx.Request options\n 'GET',\n urls,\n params=params,\n headers=headers,\n debug=True,\n # exponential backoff options\n m=20,\n b=2,\n max_retries=8\n ),\n # more collections of partials\n ...,\n ...,\n ],\n # httpx.AsyncClient options\n http2=True,\n # add description to display a progress bar\n desc='Example requests'\n )\n)\n```\n\n### Run single batch of async requests\nConvenience function to send a single batch of requests.\n\n```python\nimport asyncio\nimport reqx\n\nurls = ['https://www.bcliquorstores.com/ajax/browse']\nparams = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}\nheaders = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}\n\n# same options available in `reqx.send()`\nres = asyncio.run(reqx.r('GET', urls, params=params, headers=headers, http2=True, desc='GET'))\n```\n\n### Run batch of downloads\n\nConvenience function to download files.\n\n```python\nimport asyncio\nimport reqx\n\nurls = [...]\n\nasyncio.run(reqx.download(urls, desc='Downloading Files'))\n```\n\n\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "The Efficient Web Scraping Library",
"version": "0.0.14",
"project_urls": {
"Homepage": "https://github.com/trevorhobenshield/reqx"
},
"split_keywords": [
"http",
"networking",
"async",
"concurrency",
"efficient",
"webscraping",
"scraping",
"scraper",
"automation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2e9c327376e3c676bb46d02d54e60f8b31475dae80ed04850881767f868ea180",
"md5": "ca3cccc5b3adfa104db40e4e83a8a809",
"sha256": "cca7b8ebe1252eed484d20ad736bf4e9742908e4eeac90a10684ed5cd504a8c7"
},
"downloads": -1,
"filename": "reqx-0.0.14-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ca3cccc5b3adfa104db40e4e83a8a809",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10.10",
"size": 19419,
"upload_time": "2023-08-23T22:09:10",
"upload_time_iso_8601": "2023-08-23T22:09:10.098056Z",
"url": "https://files.pythonhosted.org/packages/2e/9c/327376e3c676bb46d02d54e60f8b31475dae80ed04850881767f868ea180/reqx-0.0.14-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "06963e102034f72f68f3c8990035b66d59aec6c0266ce6228d0a86897d90e89d",
"md5": "0b4fc5c93332c8c121acccf68c3fa43d",
"sha256": "cbb34d31ad2eef0778e72179fbf24b9d8717c7ee51cd5180494e46ddb42aca12"
},
"downloads": -1,
"filename": "reqx-0.0.14.tar.gz",
"has_sig": false,
"md5_digest": "0b4fc5c93332c8c121acccf68c3fa43d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10.10",
"size": 20175,
"upload_time": "2023-08-23T22:09:11",
"upload_time_iso_8601": "2023-08-23T22:09:11.811596Z",
"url": "https://files.pythonhosted.org/packages/06/96/3e102034f72f68f3c8990035b66d59aec6c0266ce6228d0a86897d90e89d/reqx-0.0.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-23 22:09:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "trevorhobenshield",
"github_project": "reqx",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "reqx"
}