cached-historical-data-fetcher


Namecached-historical-data-fetcher JSON
Version 0.2.24 PyPI version JSON
download
home_pagehttps://github.com/34j/cached-historical-data-fetcher
SummaryPython utility for fetching any historical data using caching.
upload_time2024-11-15 15:35:05
maintainerNone
docs_urlNone
author34j
requires_python<3.13,>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Cached Historical Data Fetcher

<p align="center">
  <a href="https://github.com/34j/cached-historical-data-fetcher/actions/workflows/ci.yml?query=branch%3Amain">
    <img src="https://img.shields.io/github/actions/workflow/status/34j/cached-historical-data-fetcher/ci.yml?branch=main&label=CI&logo=github&style=flat-square" alt="CI Status" >
  </a>
  <a href="https://cached-historical-data-fetcher.readthedocs.io">
    <img src="https://img.shields.io/readthedocs/cached-historical-data-fetcher.svg?logo=read-the-docs&logoColor=fff&style=flat-square" alt="Documentation Status">
  </a>
  <a href="https://codecov.io/gh/34j/cached-historical-data-fetcher">
    <img src="https://img.shields.io/codecov/c/github/34j/cached-historical-data-fetcher.svg?logo=codecov&logoColor=fff&style=flat-square" alt="Test coverage percentage">
  </a>
</p>
<p align="center">
  <a href="https://python-poetry.org/">
    <img src="https://img.shields.io/badge/packaging-poetry-299bd7?style=flat-square&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAASCAYAAABrXO8xAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAJJSURBVHgBfZLPa1NBEMe/s7tNXoxW1KJQKaUHkXhQvHgW6UHQQ09CBS/6V3hKc/AP8CqCrUcpmop3Cx48eDB4yEECjVQrlZb80CRN8t6OM/teagVxYZi38+Yz853dJbzoMV3MM8cJUcLMSUKIE8AzQ2PieZzFxEJOHMOgMQQ+dUgSAckNXhapU/NMhDSWLs1B24A8sO1xrN4NECkcAC9ASkiIJc6k5TRiUDPhnyMMdhKc+Zx19l6SgyeW76BEONY9exVQMzKExGKwwPsCzza7KGSSWRWEQhyEaDXp6ZHEr416ygbiKYOd7TEWvvcQIeusHYMJGhTwF9y7sGnSwaWyFAiyoxzqW0PM/RjghPxF2pWReAowTEXnDh0xgcLs8l2YQmOrj3N7ByiqEoH0cARs4u78WgAVkoEDIDoOi3AkcLOHU60RIg5wC4ZuTC7FaHKQm8Hq1fQuSOBvX/sodmNJSB5geaF5CPIkUeecdMxieoRO5jz9bheL6/tXjrwCyX/UYBUcjCaWHljx1xiX6z9xEjkYAzbGVnB8pvLmyXm9ep+W8CmsSHQQY77Zx1zboxAV0w7ybMhQmfqdmmw3nEp1I0Z+FGO6M8LZdoyZnuzzBdjISicKRnpxzI9fPb+0oYXsNdyi+d3h9bm9MWYHFtPeIZfLwzmFDKy1ai3p+PDls1Llz4yyFpferxjnyjJDSEy9CaCx5m2cJPerq6Xm34eTrZt3PqxYO1XOwDYZrFlH1fWnpU38Y9HRze3lj0vOujZcXKuuXm3jP+s3KbZVra7y2EAAAAAASUVORK5CYII=" alt="Poetry">
  </a>
  <a href="https://github.com/ambv/black">
    <img src="https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square" alt="black">
  </a>
  <a href="https://github.com/pre-commit/pre-commit">
    <img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit">
  </a>
</p>
<p align="center">
  <a href="https://pypi.org/project/cached-historical-data-fetcher/">
    <img src="https://img.shields.io/pypi/v/cached-historical-data-fetcher.svg?logo=python&logoColor=fff&style=flat-square" alt="PyPI Version">
  </a>
  <img src="https://img.shields.io/pypi/pyversions/cached-historical-data-fetcher.svg?style=flat-square&logo=python&amp;logoColor=fff" alt="Supported Python versions">
  <img src="https://img.shields.io/pypi/l/cached-historical-data-fetcher.svg?style=flat-square" alt="License">
</p>

Python utility for fetching any historical data using caching.
Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc.

## Installation

Install this via pip (or your favourite package manager):

```shell
pip install cached-historical-data-fetcher
```

## Features

- Uses cache built on top of [`joblib`](https://github.com/joblib/joblib), [`lz4`](https://github.com/lz4/lz4) and [`aiofiles`](https://github.com/Tinche/aiofiles).
- Ready to use with [`asyncio`](https://docs.python.org/3/library/asyncio.html), [`aiohttp`](https://github.com/aio-libs/aiohttp), [`aiohttp-client-cache`](https://github.com/requests-cache/aiohttp-client-cache). Uses `asyncio.gather` for fetching chunks in parallel. (For performance reasons, only using `aiohttp-client-cache` is probably not a good idea when fetching large number of chunks (web requests).)
- Based on [`pandas`](https://github.com/pandas-dev/pandas) and supports `MultiIndex`.

## Usage

### `HistoricalDataCache`, `HistoricalDataCacheWithChunk` and `HistoricalDataCacheWithFixedChunk`

Override `get_one()` method to fetch data for one chunk. `update()` method will call `get_one()` for each unfetched chunk and concatenate results, then save to cache.

```python
from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk
from pandas import DataFrame, Timedelta, Timestamp
from typing import Any

# define cache class
class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds
    interval = Timedelta(days=1) # interval between chunks, can be any type
    start_index = Timestamp.utcnow().floor("10D") # start index, can be any type

    async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"day": [start.day]}, index=[start])

# get complete data
print(await MyCacheWithFixedChunk().update())
```

```shell
                           day
2023-09-30 00:00:00+00:00   30
2023-10-01 00:00:00+00:00    1
2023-10-02 00:00:00+00:00    2
```

**See [example.ipynb](example.ipynb) for real-world example.**

### `IdCacheWithFixedChunk`

Override `get_one` method to fetch data for one chunk in the same way as in `HistoricalDataCacheWithFixedChunk`.
After updating `ids` by calling `set_ids()`, `update()` method will call `get_one()` for every unfetched id and concatenate results, then save to cache.

```python
from cached_historical_data_fetcher import IdCacheWithFixedChunk
from pandas import DataFrame
from typing import Any

class MyIdCache(IdCacheWithFixedChunk[str, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds

    async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"id+hello": [start + "+hello"]}, index=[start])

cache = MyIdCache() # create cache
cache.set_ids(["a"]) # set ids
cache.set_ids(["b"]) # set ids again, now `cache.ids` is ["a", "b"]
print(await cache.update(reload=True)) # discard previous cache and fetch again
cache.set_ids(["b", "c"]) # set ids again, now `cache.ids` is ["a", "b", "c"]
print(await cache.update()) # fetch only new data
```

```shell
       id+hello
    a   a+hello
    b   b+hello
       id+hello
    a   a+hello
    b   b+hello
    c   c+hello
```

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

<!-- prettier-ignore-start -->
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
<!-- markdownlint-disable -->
<!-- markdownlint-enable -->
<!-- ALL-CONTRIBUTORS-LIST:END -->
<!-- prettier-ignore-end -->

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/34j/cached-historical-data-fetcher",
    "name": "cached-historical-data-fetcher",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "34j",
    "author_email": "34j.95a2p@simplelogin.com",
    "download_url": "https://files.pythonhosted.org/packages/2e/a6/43ed7601d9b586ff927d295a61de85cc51b46ba7251f271acfb29dacd09f/cached_historical_data_fetcher-0.2.24.tar.gz",
    "platform": null,
    "description": "# Cached Historical Data Fetcher\n\n<p align=\"center\">\n  <a href=\"https://github.com/34j/cached-historical-data-fetcher/actions/workflows/ci.yml?query=branch%3Amain\">\n    <img src=\"https://img.shields.io/github/actions/workflow/status/34j/cached-historical-data-fetcher/ci.yml?branch=main&label=CI&logo=github&style=flat-square\" alt=\"CI Status\" >\n  </a>\n  <a href=\"https://cached-historical-data-fetcher.readthedocs.io\">\n    <img src=\"https://img.shields.io/readthedocs/cached-historical-data-fetcher.svg?logo=read-the-docs&logoColor=fff&style=flat-square\" alt=\"Documentation Status\">\n  </a>\n  <a href=\"https://codecov.io/gh/34j/cached-historical-data-fetcher\">\n    <img src=\"https://img.shields.io/codecov/c/github/34j/cached-historical-data-fetcher.svg?logo=codecov&logoColor=fff&style=flat-square\" alt=\"Test coverage percentage\">\n  </a>\n</p>\n<p align=\"center\">\n  <a href=\"https://python-poetry.org/\">\n    <img src=\"https://img.shields.io/badge/packaging-poetry-299bd7?style=flat-square&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAASCAYAAABrXO8xAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAJJSURBVHgBfZLPa1NBEMe/s7tNXoxW1KJQKaUHkXhQvHgW6UHQQ09CBS/6V3hKc/AP8CqCrUcpmop3Cx48eDB4yEECjVQrlZb80CRN8t6OM/teagVxYZi38+Yz853dJbzoMV3MM8cJUcLMSUKIE8AzQ2PieZzFxEJOHMOgMQQ+dUgSAckNXhapU/NMhDSWLs1B24A8sO1xrN4NECkcAC9ASkiIJc6k5TRiUDPhnyMMdhKc+Zx19l6SgyeW76BEONY9exVQMzKExGKwwPsCzza7KGSSWRWEQhyEaDXp6ZHEr416ygbiKYOd7TEWvvcQIeusHYMJGhTwF9y7sGnSwaWyFAiyoxzqW0PM/RjghPxF2pWReAowTEXnDh0xgcLs8l2YQmOrj3N7ByiqEoH0cARs4u78WgAVkoEDIDoOi3AkcLOHU60RIg5wC4ZuTC7FaHKQm8Hq1fQuSOBvX/sodmNJSB5geaF5CPIkUeecdMxieoRO5jz9bheL6/tXjrwCyX/UYBUcjCaWHljx1xiX6z9xEjkYAzbGVnB8pvLmyXm9ep+W8CmsSHQQY77Zx1zboxAV0w7ybMhQmfqdmmw3nEp1I0Z+FGO6M8LZdoyZnuzzBdjISicKRnpxzI9fPb+0oYXsNdyi+d3h9bm9MWYHFtPeIZfLwzmFDKy1ai3p+PDls1Llz4yyFpferxjnyjJDSEy9CaCx5m2cJPerq6Xm34eTrZt3PqxYO1XOwDYZrFlH1fWnpU38Y9HRze3lj0vOujZcXKuuXm3jP+s3KbZVra7y2EAAAAAASUVORK5CYII=\" alt=\"Poetry\">\n  </a>\n  <a href=\"https://github.com/ambv/black\">\n    <img src=\"https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square\" alt=\"black\">\n  </a>\n  <a href=\"https://github.com/pre-commit/pre-commit\">\n    <img src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square\" alt=\"pre-commit\">\n  </a>\n</p>\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/cached-historical-data-fetcher/\">\n    <img src=\"https://img.shields.io/pypi/v/cached-historical-data-fetcher.svg?logo=python&logoColor=fff&style=flat-square\" alt=\"PyPI Version\">\n  </a>\n  <img src=\"https://img.shields.io/pypi/pyversions/cached-historical-data-fetcher.svg?style=flat-square&logo=python&amp;logoColor=fff\" alt=\"Supported Python versions\">\n  <img src=\"https://img.shields.io/pypi/l/cached-historical-data-fetcher.svg?style=flat-square\" alt=\"License\">\n</p>\n\nPython utility for fetching any historical data using caching.\nSuitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc.\n\n## Installation\n\nInstall this via pip (or your favourite package manager):\n\n```shell\npip install cached-historical-data-fetcher\n```\n\n## Features\n\n- Uses cache built on top of [`joblib`](https://github.com/joblib/joblib), [`lz4`](https://github.com/lz4/lz4) and [`aiofiles`](https://github.com/Tinche/aiofiles).\n- Ready to use with [`asyncio`](https://docs.python.org/3/library/asyncio.html), [`aiohttp`](https://github.com/aio-libs/aiohttp), [`aiohttp-client-cache`](https://github.com/requests-cache/aiohttp-client-cache). Uses `asyncio.gather` for fetching chunks in parallel. (For performance reasons, only using `aiohttp-client-cache` is probably not a good idea when fetching large number of chunks (web requests).)\n- Based on [`pandas`](https://github.com/pandas-dev/pandas) and supports `MultiIndex`.\n\n## Usage\n\n### `HistoricalDataCache`, `HistoricalDataCacheWithChunk` and `HistoricalDataCacheWithFixedChunk`\n\nOverride `get_one()` method to fetch data for one chunk. `update()` method will call `get_one()` for each unfetched chunk and concatenate results, then save to cache.\n\n```python\nfrom cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk\nfrom pandas import DataFrame, Timedelta, Timestamp\nfrom typing import Any\n\n# define cache class\nclass MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]):\n    delay_seconds = 0.0 # delay between chunks (requests) in seconds\n    interval = Timedelta(days=1) # interval between chunks, can be any type\n    start_index = Timestamp.utcnow().floor(\"10D\") # start index, can be any type\n\n    async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame:\n        \"\"\"Fetch data for one chunk.\"\"\"\n        return DataFrame({\"day\": [start.day]}, index=[start])\n\n# get complete data\nprint(await MyCacheWithFixedChunk().update())\n```\n\n```shell\n                           day\n2023-09-30 00:00:00+00:00   30\n2023-10-01 00:00:00+00:00    1\n2023-10-02 00:00:00+00:00    2\n```\n\n**See [example.ipynb](example.ipynb) for real-world example.**\n\n### `IdCacheWithFixedChunk`\n\nOverride `get_one` method to fetch data for one chunk in the same way as in `HistoricalDataCacheWithFixedChunk`.\nAfter updating `ids` by calling `set_ids()`, `update()` method will call `get_one()` for every unfetched id and concatenate results, then save to cache.\n\n```python\nfrom cached_historical_data_fetcher import IdCacheWithFixedChunk\nfrom pandas import DataFrame\nfrom typing import Any\n\nclass MyIdCache(IdCacheWithFixedChunk[str, Any]):\n    delay_seconds = 0.0 # delay between chunks (requests) in seconds\n\n    async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame:\n        \"\"\"Fetch data for one chunk.\"\"\"\n        return DataFrame({\"id+hello\": [start + \"+hello\"]}, index=[start])\n\ncache = MyIdCache() # create cache\ncache.set_ids([\"a\"]) # set ids\ncache.set_ids([\"b\"]) # set ids again, now `cache.ids` is [\"a\", \"b\"]\nprint(await cache.update(reload=True)) # discard previous cache and fetch again\ncache.set_ids([\"b\", \"c\"]) # set ids again, now `cache.ids` is [\"a\", \"b\", \"c\"]\nprint(await cache.update()) # fetch only new data\n```\n\n```shell\n       id+hello\n    a   a+hello\n    b   b+hello\n       id+hello\n    a   a+hello\n    b   b+hello\n    c   c+hello\n```\n\n## Contributors \u2728\n\nThanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):\n\n<!-- prettier-ignore-start -->\n<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->\n<!-- markdownlint-disable -->\n<!-- markdownlint-enable -->\n<!-- ALL-CONTRIBUTORS-LIST:END -->\n<!-- prettier-ignore-end -->\n\nThis project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python utility for fetching any historical data using caching.",
    "version": "0.2.24",
    "project_urls": {
        "Bug Tracker": "https://github.com/34j/cached-historical-data-fetcher/issues",
        "Changelog": "https://github.com/34j/cached-historical-data-fetcher/blob/main/CHANGELOG.md",
        "Documentation": "https://cached-historical-data-fetcher.readthedocs.io",
        "Homepage": "https://github.com/34j/cached-historical-data-fetcher",
        "Repository": "https://github.com/34j/cached-historical-data-fetcher"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dd092b16aa68fe463d06792b437d17db444fb05d4d16cc78794570de5692b9de",
                "md5": "c16231585388360139ce62500be3d1e9",
                "sha256": "60712533621f5957ca9e6b31c1693aa8eaf31320a98fe91fdb89ab0a66e33e1d"
            },
            "downloads": -1,
            "filename": "cached_historical_data_fetcher-0.2.24-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c16231585388360139ce62500be3d1e9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 13797,
            "upload_time": "2024-11-15T15:35:03",
            "upload_time_iso_8601": "2024-11-15T15:35:03.346156Z",
            "url": "https://files.pythonhosted.org/packages/dd/09/2b16aa68fe463d06792b437d17db444fb05d4d16cc78794570de5692b9de/cached_historical_data_fetcher-0.2.24-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2ea643ed7601d9b586ff927d295a61de85cc51b46ba7251f271acfb29dacd09f",
                "md5": "31f12be217e21d0285d50369d2c79071",
                "sha256": "629f9552cc629d5f00b67c85ecef1fe9139b829b79dbfa3020a32bbc04101e44"
            },
            "downloads": -1,
            "filename": "cached_historical_data_fetcher-0.2.24.tar.gz",
            "has_sig": false,
            "md5_digest": "31f12be217e21d0285d50369d2c79071",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 13531,
            "upload_time": "2024-11-15T15:35:05",
            "upload_time_iso_8601": "2024-11-15T15:35:05.145341Z",
            "url": "https://files.pythonhosted.org/packages/2e/a6/43ed7601d9b586ff927d295a61de85cc51b46ba7251f271acfb29dacd09f/cached_historical_data_fetcher-0.2.24.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-15 15:35:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "34j",
    "github_project": "cached-historical-data-fetcher",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cached-historical-data-fetcher"
}
        
34j
Elapsed time: 0.43132s