reqx


Namereqx JSON
Version 0.0.14 PyPI version JSON
download
home_pagehttps://github.com/trevorhobenshield/reqx
SummaryThe Efficient Web Scraping Library
upload_time2023-08-23 22:09:11
maintainer
docs_urlNone
authorTrevor Hobenshield
requires_python>=3.10.10
licenseGPLv3
keywords http networking async concurrency efficient webscraping scraping scraper automation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # reqX

### The Efficient Web Scraping Library

A flexible interface for quickly making high volumes of asynchronous HTTP requests.

## Todo

- [ ] Spoof TLS/JA3/HTTP2 fingerprint option
- [ ] Data management system

## Examples

### Run batches of async requests

These attributes can be passed iterables to build a request for each value.

```python
{'headers', 'params', 'cookies', 'content', 'data', 'files', 'json'}
```

E.g. Supplying these parameters will result in 15 requests. `len(size) * len(page) = 15`

```python
'size': [37, 61, 13],
'page': range(2, 7),
```

Contrived example showing how to batch unrelated/independent requests together to be collected and sent asynchronously.

```python
import asyncio
import reqx

urls = ['https://www.bcliquorstores.com/ajax/browse']
params = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}
headers = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}

urls2 = ['https://jsonplaceholder.typicode.com/posts']
payload = {'title': 'foo', 'body': 'bar', 'userId': range(1, 4)}
headers2 = {'content-type': 'application/json'}

urls3 = ['https://jsonplaceholder.typicode.com/posts/1']
payload2 = {'title': 'foo', 'body': 'bar', 'userId': range(7, 11)}
headers3 = {'content-type': 'application/json'}

res = asyncio.run(
    reqx.collect(
        [
            *reqx.send('GET', urls, params=params, headers=headers, debug=True, m=20, b=2, max_retries=8),
            *reqx.send('POST', urls2, json=payload, headers=headers2, debug=True, m=11, b=2, max_retries=3),
            *reqx.send('PUT', urls3, json=payload2, headers=headers3, debug=True, m=7, b=2, max_retries=5),
        ],
        http2=True,
        desc='Example requests'
    )
)
```

#### Explanation:

```python
asyncio.run(
    # collect these tasks (asyncio.gather)
    reqx.collect(
        # iterable of partials
        [
            # collection of partials
            *reqx.send(
                # httpx.Request options
                'GET',
                urls,
                params=params,
                headers=headers,
                debug=True,
                # exponential backoff options
                m=20,
                b=2,
                max_retries=8
            ),
            # more collections of partials
            ...,
            ...,
        ],
        # httpx.AsyncClient options
        http2=True,
        # add description to display a progress bar
        desc='Example requests'
    )
)
```

### Run single batch of async requests
Convenience function to send a single batch of requests.

```python
import asyncio
import reqx

urls = ['https://www.bcliquorstores.com/ajax/browse']
params = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}
headers = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}

# same options available in `reqx.send()`
res = asyncio.run(reqx.r('GET', urls, params=params, headers=headers, http2=True, desc='GET'))
```

### Run batch of downloads

Convenience function to download files.

```python
import asyncio
import reqx

urls = [...]

asyncio.run(reqx.download(urls, desc='Downloading Files'))
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/trevorhobenshield/reqx",
    "name": "reqx",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10.10",
    "maintainer_email": "",
    "keywords": "http networking async concurrency efficient webscraping scraping scraper automation",
    "author": "Trevor Hobenshield",
    "author_email": "trevorhobenshield@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/06/96/3e102034f72f68f3c8990035b66d59aec6c0266ce6228d0a86897d90e89d/reqx-0.0.14.tar.gz",
    "platform": null,
    "description": "# reqX\n\n### The Efficient Web Scraping Library\n\nA flexible interface for quickly making high volumes of asynchronous HTTP requests.\n\n## Todo\n\n- [ ] Spoof TLS/JA3/HTTP2 fingerprint option\n- [ ] Data management system\n\n## Examples\n\n### Run batches of async requests\n\nThese attributes can be passed iterables to build a request for each value.\n\n```python\n{'headers', 'params', 'cookies', 'content', 'data', 'files', 'json'}\n```\n\nE.g. Supplying these parameters will result in 15 requests. `len(size) * len(page) = 15`\n\n```python\n'size': [37, 61, 13],\n'page': range(2, 7),\n```\n\nContrived example showing how to batch unrelated/independent requests together to be collected and sent asynchronously.\n\n```python\nimport asyncio\nimport reqx\n\nurls = ['https://www.bcliquorstores.com/ajax/browse']\nparams = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}\nheaders = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}\n\nurls2 = ['https://jsonplaceholder.typicode.com/posts']\npayload = {'title': 'foo', 'body': 'bar', 'userId': range(1, 4)}\nheaders2 = {'content-type': 'application/json'}\n\nurls3 = ['https://jsonplaceholder.typicode.com/posts/1']\npayload2 = {'title': 'foo', 'body': 'bar', 'userId': range(7, 11)}\nheaders3 = {'content-type': 'application/json'}\n\nres = asyncio.run(\n    reqx.collect(\n        [\n            *reqx.send('GET', urls, params=params, headers=headers, debug=True, m=20, b=2, max_retries=8),\n            *reqx.send('POST', urls2, json=payload, headers=headers2, debug=True, m=11, b=2, max_retries=3),\n            *reqx.send('PUT', urls3, json=payload2, headers=headers3, debug=True, m=7, b=2, max_retries=5),\n        ],\n        http2=True,\n        desc='Example requests'\n    )\n)\n```\n\n#### Explanation:\n\n```python\nasyncio.run(\n    # collect these tasks (asyncio.gather)\n    reqx.collect(\n        # iterable of partials\n        [\n            # collection of partials\n            *reqx.send(\n                # httpx.Request options\n                'GET',\n                urls,\n                params=params,\n                headers=headers,\n                debug=True,\n                # exponential backoff options\n                m=20,\n                b=2,\n                max_retries=8\n            ),\n            # more collections of partials\n            ...,\n            ...,\n        ],\n        # httpx.AsyncClient options\n        http2=True,\n        # add description to display a progress bar\n        desc='Example requests'\n    )\n)\n```\n\n### Run single batch of async requests\nConvenience function to send a single batch of requests.\n\n```python\nimport asyncio\nimport reqx\n\nurls = ['https://www.bcliquorstores.com/ajax/browse']\nparams = {'size': [37, 61, 13], 'page': range(2, 7), 'category': 'spirits', 'sort': 'featuredProducts:desc'}\nheaders = {'user-agent': '(iPhone; CPU iPhone OS 15_6 like Mac OS X)'}\n\n# same options available in `reqx.send()`\nres = asyncio.run(reqx.r('GET', urls, params=params, headers=headers, http2=True, desc='GET'))\n```\n\n### Run batch of downloads\n\nConvenience function to download files.\n\n```python\nimport asyncio\nimport reqx\n\nurls = [...]\n\nasyncio.run(reqx.download(urls, desc='Downloading Files'))\n```\n\n\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "The Efficient Web Scraping Library",
    "version": "0.0.14",
    "project_urls": {
        "Homepage": "https://github.com/trevorhobenshield/reqx"
    },
    "split_keywords": [
        "http",
        "networking",
        "async",
        "concurrency",
        "efficient",
        "webscraping",
        "scraping",
        "scraper",
        "automation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e9c327376e3c676bb46d02d54e60f8b31475dae80ed04850881767f868ea180",
                "md5": "ca3cccc5b3adfa104db40e4e83a8a809",
                "sha256": "cca7b8ebe1252eed484d20ad736bf4e9742908e4eeac90a10684ed5cd504a8c7"
            },
            "downloads": -1,
            "filename": "reqx-0.0.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ca3cccc5b3adfa104db40e4e83a8a809",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10.10",
            "size": 19419,
            "upload_time": "2023-08-23T22:09:10",
            "upload_time_iso_8601": "2023-08-23T22:09:10.098056Z",
            "url": "https://files.pythonhosted.org/packages/2e/9c/327376e3c676bb46d02d54e60f8b31475dae80ed04850881767f868ea180/reqx-0.0.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06963e102034f72f68f3c8990035b66d59aec6c0266ce6228d0a86897d90e89d",
                "md5": "0b4fc5c93332c8c121acccf68c3fa43d",
                "sha256": "cbb34d31ad2eef0778e72179fbf24b9d8717c7ee51cd5180494e46ddb42aca12"
            },
            "downloads": -1,
            "filename": "reqx-0.0.14.tar.gz",
            "has_sig": false,
            "md5_digest": "0b4fc5c93332c8c121acccf68c3fa43d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10.10",
            "size": 20175,
            "upload_time": "2023-08-23T22:09:11",
            "upload_time_iso_8601": "2023-08-23T22:09:11.811596Z",
            "url": "https://files.pythonhosted.org/packages/06/96/3e102034f72f68f3c8990035b66d59aec6c0266ce6228d0a86897d90e89d/reqx-0.0.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-23 22:09:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "trevorhobenshield",
    "github_project": "reqx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "reqx"
}
        
Elapsed time: 0.20050s