HTTPX Wrapper with Rate Limiting and Caching Transports.
[](https://pypi.python.org/pypi/httpxthrottlecache)

[](https://github.com/paultiq/httpxthrottlecache/actions/workflows/build_deploy.yml)
[](https://github.com/paultiq/httpxthrottlecache/actions/workflows/test.yml)
[](https://github.com/paultiq/httpxthrottlecache/tree/python-coverage-comment-action-data)
# Introduction
The goal of this project is a combination of convenience and as a demonstration of how to assemble HTTPX Transports in different combinations.
HTTP request libraries, rate limiting and caching are topics with deep rabbit holes: the technical implementation details & the decisions an end user has to make. The convenience part of this package is abstracting away certain decisions and making certain opinionated decisions as to how caching & rate limiting should be controlled.
This came about while implementing caching & rate limiting for [edgartools](https://edgartools.readthedocs.io/en/latest/): reducing network requests and improving overall performance led to a myriad of decisions. The SEC's Edgar site has a strict
10 request per second limit, while providing not-very-helpful caching headers. Overriding these caching headers with custom rules is necessary in certain cases.
# Caching
This project provides four `cache_mode` options:
- Disabled: Rate Limiting only
- Hishel-File: Cache using Hishel using FileStorage
- Hishel-S3: Cache using Hishel using S3Storage
- FileCache: Use a simpler filecache backend that uses file modified and created time and only revalidates using last-modified. For sites where last-modified is provided.
Cache Rules are defined as a dictionary of site regular expressions to path regular expressions.
```py
{
'site_regex': {
'url_regex': duration,
'url_regex2': duration,
'.*': 3600, # cache all paths for this site for an hour
}
}
```
## Usage: Synchronous Requests
Note that the Manager object is intended to be long lived, doesn't need to be used as a context manager.
```py
from httpxthrottlecache import HttpxThrottleCache
url = "https://httpbingo.org/get"
with HttpxThrottleCache(cache_mode="Hishel-File",
cache_dir = "_cache",
rate_limiter_enabled=True,
request_per_sec_limit=10,
user_agent="your user agent") as manager:
# Single synchronous request
with manager.http_client() as client:
response = client.get(url)
print(response.status_code)
```
## Usage: Synchronous Batch
```py
from httpxthrottlecache import HttpxThrottleCache
url = "https://httpbingo.org/get"
with HttpxThrottleCache(cache_mode="Hishel-File",
cache_dir = "_cache",
rate_limiter_enabled=True,
request_per_sec_limit=10,
user_agent="your user agent") as manager:
# Batch request
responses = manager.get_batch([url,url])
print([r[0] for r in responses])
```
## Usage: Asynchronous
```py
from httpxthrottlecache import HttpxThrottleCache
import asyncio
url = "https://httpbingo.org/get"
with HttpxThrottleCache(cache_mode="Hishel-File",
cache_dir = "_cache",
rate_limiter_enabled=True,
request_per_sec_limit=10) as manager:
# Async request
async with manager.async_http_client() as client:
tasks = [client.get(url) for _ in range(2)]
responses = await asyncio.gather(*tasks)
print(responses)
```
## FileCache
The FileCache implementation ignores response caching headers. Instead, it treats data as "fresh" for a client-provided max age. The max age is defined in a cacherule, as defined above.
Once the max age is expired, the FileCache Transport will revalidate the data using the Last-Modified date. TODO: Revalidate using ETAG as well.
The FileCache implementation stores files as the raw bytes plus a .meta sidecar. The .meta provides headers, such as Last-Modified, which are used for revalidation. The raw bytes are in the native format - binary files are in their native format, compressed gzip streams are stored as compressed gzip data, etc.
FileCache uses [FileLock](https://pypi.org/project/filelock/) to ensure only one writer to a cached object. This means that (currently) multiple simultaneous cache misses will stack up waiting to write to file. This locking is intended mainly to allow multiple processes to share the same cache.
FileCache initially stages data to a .tmp file, then upon completion, copies to the final file.
No cache cleanup is done - that's your problem.
# Rate Limiting
Rate limiting is implemented via [pyrate_limiter](https://pyratelimiter.readthedocs.io/en/latest/). This is a leaky bucket implementation that allows a configurable number of requests per time interval.
pyrate_limiter supports a variable of backends. The default backend is in-memory, and a single Limiter can be used for both sync and asyncio requests, across multiple threads. Alternative limiters can be used for multiprocess and distributed rate limiting, see [examples](https://github.com/vutran1710/PyrateLimiter/tree/master/examples) for more.
Raw data
{
"_id": null,
"home_page": null,
"name": "httpxthrottlecache",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.09",
"maintainer_email": null,
"keywords": "edgar, hishel, httpx, pyrate_limiter, sec",
"author": null,
"author_email": "Paul T <paul@iqmo.com>",
"download_url": "https://files.pythonhosted.org/packages/6d/35/d665527399971ec8c2e277569c12994283281eda488bc44b22270bb9fa8f/httpxthrottlecache-0.1.7.tar.gz",
"platform": null,
"description": "HTTPX Wrapper with Rate Limiting and Caching Transports.\n\n[](https://pypi.python.org/pypi/httpxthrottlecache)\n\n\n[](https://github.com/paultiq/httpxthrottlecache/actions/workflows/build_deploy.yml)\n[](https://github.com/paultiq/httpxthrottlecache/actions/workflows/test.yml)\n[](https://github.com/paultiq/httpxthrottlecache/tree/python-coverage-comment-action-data)\n\n\n# Introduction\n\nThe goal of this project is a combination of convenience and as a demonstration of how to assemble HTTPX Transports in different combinations. \n\nHTTP request libraries, rate limiting and caching are topics with deep rabbit holes: the technical implementation details & the decisions an end user has to make. The convenience part of this package is abstracting away certain decisions and making certain opinionated decisions as to how caching & rate limiting should be controlled. \n\nThis came about while implementing caching & rate limiting for [edgartools](https://edgartools.readthedocs.io/en/latest/): reducing network requests and improving overall performance led to a myriad of decisions. The SEC's Edgar site has a strict \n10 request per second limit, while providing not-very-helpful caching headers. Overriding these caching headers with custom rules is necessary in certain cases. \n\n# Caching\n\nThis project provides four `cache_mode` options:\n- Disabled: Rate Limiting only\n- Hishel-File: Cache using Hishel using FileStorage\n- Hishel-S3: Cache using Hishel using S3Storage\n- FileCache: Use a simpler filecache backend that uses file modified and created time and only revalidates using last-modified. For sites where last-modified is provided. \n\nCache Rules are defined as a dictionary of site regular expressions to path regular expressions. \n```py\n{\n 'site_regex': {\n 'url_regex': duration,\n 'url_regex2': duration,\n '.*': 3600, # cache all paths for this site for an hour\n }\n}\n```\n\n\n## Usage: Synchronous Requests\n\nNote that the Manager object is intended to be long lived, doesn't need to be used as a context manager.\n\n```py\nfrom httpxthrottlecache import HttpxThrottleCache\n\nurl = \"https://httpbingo.org/get\"\n\nwith HttpxThrottleCache(cache_mode=\"Hishel-File\", \n cache_dir = \"_cache\", \n rate_limiter_enabled=True, \n request_per_sec_limit=10, \n user_agent=\"your user agent\") as manager:\n\n # Single synchronous request\n with manager.http_client() as client:\n response = client.get(url)\n print(response.status_code)\n```\n\n## Usage: Synchronous Batch\n```py\nfrom httpxthrottlecache import HttpxThrottleCache\n\nurl = \"https://httpbingo.org/get\"\n\nwith HttpxThrottleCache(cache_mode=\"Hishel-File\", \n cache_dir = \"_cache\", \n rate_limiter_enabled=True, \n request_per_sec_limit=10, \n user_agent=\"your user agent\") as manager:\n\n# Batch request\nresponses = manager.get_batch([url,url])\nprint([r[0] for r in responses])\n```\n\n## Usage: Asynchronous\n\n```py\nfrom httpxthrottlecache import HttpxThrottleCache\nimport asyncio \n\nurl = \"https://httpbingo.org/get\"\nwith HttpxThrottleCache(cache_mode=\"Hishel-File\", \n cache_dir = \"_cache\", \n rate_limiter_enabled=True, \n request_per_sec_limit=10) as manager:\n\n # Async request\n async with manager.async_http_client() as client:\n tasks = [client.get(url) for _ in range(2)]\n responses = await asyncio.gather(*tasks)\n print(responses)\n```\n\n## FileCache\n\nThe FileCache implementation ignores response caching headers. Instead, it treats data as \"fresh\" for a client-provided max age. The max age is defined in a cacherule, as defined above.\n\nOnce the max age is expired, the FileCache Transport will revalidate the data using the Last-Modified date. TODO: Revalidate using ETAG as well. \n\nThe FileCache implementation stores files as the raw bytes plus a .meta sidecar. The .meta provides headers, such as Last-Modified, which are used for revalidation. The raw bytes are in the native format - binary files are in their native format, compressed gzip streams are stored as compressed gzip data, etc. \n\nFileCache uses [FileLock](https://pypi.org/project/filelock/) to ensure only one writer to a cached object. This means that (currently) multiple simultaneous cache misses will stack up waiting to write to file. This locking is intended mainly to allow multiple processes to share the same cache. \n\nFileCache initially stages data to a .tmp file, then upon completion, copies to the final file. \n\nNo cache cleanup is done - that's your problem.\n\n# Rate Limiting\n\nRate limiting is implemented via [pyrate_limiter](https://pyratelimiter.readthedocs.io/en/latest/). This is a leaky bucket implementation that allows a configurable number of requests per time interval. \n\npyrate_limiter supports a variable of backends. The default backend is in-memory, and a single Limiter can be used for both sync and asyncio requests, across multiple threads. Alternative limiters can be used for multiprocess and distributed rate limiting, see [examples](https://github.com/vutran1710/PyrateLimiter/tree/master/examples) for more. ",
"bugtrack_url": null,
"license": null,
"summary": "Rate Limiting and Caching HTTPX Client",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/paultiq/httpxthrottlecache",
"Issues": "https://github.com/paultiq/httpxthrottlecache/issues",
"Repository": "https://github.com/paultiq/httpxthrottlecache"
},
"split_keywords": [
"edgar",
" hishel",
" httpx",
" pyrate_limiter",
" sec"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6c21cbdd56769c56a33aa1f317b425c760206ece6019a81980fd121064d23af4",
"md5": "1b334c6888acef2af759d649c96e7eba",
"sha256": "e9bb6d959f5b200db16edbf7d15f223f211ac0839016ce5a026663e062e129e1"
},
"downloads": -1,
"filename": "httpxthrottlecache-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1b334c6888acef2af759d649c96e7eba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.09",
"size": 15298,
"upload_time": "2025-08-13T12:32:14",
"upload_time_iso_8601": "2025-08-13T12:32:14.942699Z",
"url": "https://files.pythonhosted.org/packages/6c/21/cbdd56769c56a33aa1f317b425c760206ece6019a81980fd121064d23af4/httpxthrottlecache-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6d35d665527399971ec8c2e277569c12994283281eda488bc44b22270bb9fa8f",
"md5": "e1169b0dadb31eaf043fe5a1d55a05c1",
"sha256": "b501cbb4d7ef719e93935902cb502290f4769c8739a2903dbc5eb6de7c5d11ee"
},
"downloads": -1,
"filename": "httpxthrottlecache-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "e1169b0dadb31eaf043fe5a1d55a05c1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.09",
"size": 16159,
"upload_time": "2025-08-13T12:32:16",
"upload_time_iso_8601": "2025-08-13T12:32:16.147279Z",
"url": "https://files.pythonhosted.org/packages/6d/35/d665527399971ec8c2e277569c12994283281eda488bc44b22270bb9fa8f/httpxthrottlecache-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-13 12:32:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "paultiq",
"github_project": "httpxthrottlecache",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "httpxthrottlecache"
}