cache-pandas


Namecache-pandas JSON
Version 1.0.0 PyPI version JSON
download
home_page
SummaryEasily cache pandas DataFrames to file and in memory.
upload_time2023-01-01 18:57:37
maintainer
docs_urlNone
author
requires_python>=3.7
licenseMIT
keywords pandas cache csv
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cache-pandas
Easily cache outputs of functions that generate pandas DataFrames to file or in memory. Useful for data science projects where a large DataFrame is generated, but will not change for some time, so it could be cached to file. The next time the script runs, it will use the cached version. 


## Caching pandas dataframes to csv file
`cache-pandas` includes the decorator `cache_to_csv`, which will cache the result of a function (returning a DataFrame) to a csv file. The next time the function or script is run, it will take that cached file, instead of calling the function again. 

An optional expiration time can also be set. This might be useful for a webscraper where the output DataFrame may change once a day, but within the day, it will be the same. If the decorated function is called after the specified cache expiration, the DataFrame will be regenerated.

### Example

The following example will cache the resulting DataFrame to `file.csv`. It will regenerate the DataFrame and its cache if the function is called a second time atleast 100 seconds after the first.

```python
from cache_pandas import cache_to_csv

@cache_to_csv("file.csv", refresh_time=100)
def sample_constant_function() -> pd.DataFrame:
    """Sample function that returns a constant DataFrame, for testing purpose."""
    data = {
        "ints": list(range(NUM_SAMPLES)),
        "strs": [str(i) for i in range(NUM_SAMPLES)],
        "floats": [float(i) for i in range(NUM_SAMPLES)],
    }

    return pd.DataFrame.from_dict(data)
```

### Args
```
filepath: Filepath to save the cached CSV.
refresh_time: Time seconds. If the file has not been updated in longer than refresh_time, generate the file
    anew. If `None`, the file will never be regenerated if a cached version exists.
create_dirs: Whether to create necessary directories containing the given filepath.
```


## Caching pandas dataframes to memory
`cache-pandas` includes the decorator `timed_lru_cache`, which will cache the result of a function (returning a DataFrame) to a memory, using `functools.lru_cache`.

An optional expiration time can also be set. This might be useful for a webscraper where the output DataFrame may change once a day, but within the day, it will be the same. If the decorated function is called after the specified cache expiration, the DataFrame will be regenerated.

### Example

The following example will cache the resulting DataFrame in memory. It will regenerate the DataFrame and its cache if the function is called a second time atleast 100 seconds after the first.

```python
from cache_pandas import timed_lru_cache

@timed_lru_cache(seconds=100, maxsize=None)
def sample_constant_function() -> pd.DataFrame:
    """Sample function that returns a constant DataFrame, for testing purpose."""
    data = {
        "ints": list(range(NUM_SAMPLES)),
        "strs": [str(i) for i in range(NUM_SAMPLES)],
        "floats": [float(i) for i in range(NUM_SAMPLES)],
    }

    return pd.DataFrame.from_dict(data)
```

### Args
```
seconds: Number of seconds to retain the cache.
maxsize: Maximum number of items to store in the cache. See `functools.lru_cache` for more details.
typed: Whether arguments of different types will be cached separately. See `functools.lru_cache` for more details.
```

## Composing `cache_to_csv` and `timed_lru_cache`
`cache_to_csv` and `timed_lru_cache` can even be composed together. Usually the correct way to do this is to wrap `timed_lru_cache` last, because `cache_to_csv` will always check the file before calling the function. The refresh time can even differ between the two caching mechanisms.

### Example
```python
from cache_pandas import timed_lru_cache, cache_to_csv

@timed_lru_cache(seconds=100, maxsize=None)
@cache_to_csv("file.csv", refresh_time=100)
def sample_constant_function() -> pd.DataFrame:
    """Sample function that returns a constant DataFrame, for testing purpose."""
    data = {
        "ints": list(range(NUM_SAMPLES)),
        "strs": [str(i) for i in range(NUM_SAMPLES)],
        "floats": [float(i) for i in range(NUM_SAMPLES)],
    }

    return pd.DataFrame.from_dict(data)
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "cache-pandas",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "pandas,cache,csv",
    "author": "",
    "author_email": "Andrew Sansom <andrew@euleriancircuit.com>",
    "download_url": "https://files.pythonhosted.org/packages/ae/65/726368ad7c7099c690c9765ef336f77f81a7e329411ecdbb1e0957555f7b/cache-pandas-1.0.0.tar.gz",
    "platform": null,
    "description": "# cache-pandas\nEasily cache outputs of functions that generate pandas DataFrames to file or in memory. Useful for data science projects where a large DataFrame is generated, but will not change for some time, so it could be cached to file. The next time the script runs, it will use the cached version. \n\n\n## Caching pandas dataframes to csv file\n`cache-pandas` includes the decorator `cache_to_csv`, which will cache the result of a function (returning a DataFrame) to a csv file. The next time the function or script is run, it will take that cached file, instead of calling the function again. \n\nAn optional expiration time can also be set. This might be useful for a webscraper where the output DataFrame may change once a day, but within the day, it will be the same. If the decorated function is called after the specified cache expiration, the DataFrame will be regenerated.\n\n### Example\n\nThe following example will cache the resulting DataFrame to `file.csv`. It will regenerate the DataFrame and its cache if the function is called a second time atleast 100 seconds after the first.\n\n```python\nfrom cache_pandas import cache_to_csv\n\n@cache_to_csv(\"file.csv\", refresh_time=100)\ndef sample_constant_function() -> pd.DataFrame:\n    \"\"\"Sample function that returns a constant DataFrame, for testing purpose.\"\"\"\n    data = {\n        \"ints\": list(range(NUM_SAMPLES)),\n        \"strs\": [str(i) for i in range(NUM_SAMPLES)],\n        \"floats\": [float(i) for i in range(NUM_SAMPLES)],\n    }\n\n    return pd.DataFrame.from_dict(data)\n```\n\n### Args\n```\nfilepath: Filepath to save the cached CSV.\nrefresh_time: Time seconds. If the file has not been updated in longer than refresh_time, generate the file\n    anew. If `None`, the file will never be regenerated if a cached version exists.\ncreate_dirs: Whether to create necessary directories containing the given filepath.\n```\n\n\n## Caching pandas dataframes to memory\n`cache-pandas` includes the decorator `timed_lru_cache`, which will cache the result of a function (returning a DataFrame) to a memory, using `functools.lru_cache`.\n\nAn optional expiration time can also be set. This might be useful for a webscraper where the output DataFrame may change once a day, but within the day, it will be the same. If the decorated function is called after the specified cache expiration, the DataFrame will be regenerated.\n\n### Example\n\nThe following example will cache the resulting DataFrame in memory. It will regenerate the DataFrame and its cache if the function is called a second time atleast 100 seconds after the first.\n\n```python\nfrom cache_pandas import timed_lru_cache\n\n@timed_lru_cache(seconds=100, maxsize=None)\ndef sample_constant_function() -> pd.DataFrame:\n    \"\"\"Sample function that returns a constant DataFrame, for testing purpose.\"\"\"\n    data = {\n        \"ints\": list(range(NUM_SAMPLES)),\n        \"strs\": [str(i) for i in range(NUM_SAMPLES)],\n        \"floats\": [float(i) for i in range(NUM_SAMPLES)],\n    }\n\n    return pd.DataFrame.from_dict(data)\n```\n\n### Args\n```\nseconds: Number of seconds to retain the cache.\nmaxsize: Maximum number of items to store in the cache. See `functools.lru_cache` for more details.\ntyped: Whether arguments of different types will be cached separately. See `functools.lru_cache` for more details.\n```\n\n## Composing `cache_to_csv` and `timed_lru_cache`\n`cache_to_csv` and `timed_lru_cache` can even be composed together. Usually the correct way to do this is to wrap `timed_lru_cache` last, because `cache_to_csv` will always check the file before calling the function. The refresh time can even differ between the two caching mechanisms.\n\n### Example\n```python\nfrom cache_pandas import timed_lru_cache, cache_to_csv\n\n@timed_lru_cache(seconds=100, maxsize=None)\n@cache_to_csv(\"file.csv\", refresh_time=100)\ndef sample_constant_function() -> pd.DataFrame:\n    \"\"\"Sample function that returns a constant DataFrame, for testing purpose.\"\"\"\n    data = {\n        \"ints\": list(range(NUM_SAMPLES)),\n        \"strs\": [str(i) for i in range(NUM_SAMPLES)],\n        \"floats\": [float(i) for i in range(NUM_SAMPLES)],\n    }\n\n    return pd.DataFrame.from_dict(data)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Easily cache pandas DataFrames to file and in memory.",
    "version": "1.0.0",
    "split_keywords": [
        "pandas",
        "cache",
        "csv"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "34fda8166bbd149accd9c13400dacf2a",
                "sha256": "534cf9570553409dcdeee762c81fbbea62d9c4c9eb032d1a94cc2ee4a2c7592a"
            },
            "downloads": -1,
            "filename": "cache_pandas-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "34fda8166bbd149accd9c13400dacf2a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 5679,
            "upload_time": "2023-01-01T18:57:36",
            "upload_time_iso_8601": "2023-01-01T18:57:36.094926Z",
            "url": "https://files.pythonhosted.org/packages/86/d3/2c7325929eb9598976ea359aa0162b19911825ba798fc8f80efe19ac97a7/cache_pandas-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "1c5ee16f39ca91a2dc43b9b23fb111f3",
                "sha256": "ff397c5696caa736750bccc1bd53a2545a30ea7aef1b639b1ac8e5efbec8ac1e"
            },
            "downloads": -1,
            "filename": "cache-pandas-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1c5ee16f39ca91a2dc43b9b23fb111f3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 7470,
            "upload_time": "2023-01-01T18:57:37",
            "upload_time_iso_8601": "2023-01-01T18:57:37.331442Z",
            "url": "https://files.pythonhosted.org/packages/ae/65/726368ad7c7099c690c9765ef336f77f81a7e329411ecdbb1e0957555f7b/cache-pandas-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-01 18:57:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "cache-pandas"
}
        
Elapsed time: 0.02509s