federleicht


Namefederleicht JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
Summarylightweigth function decorators to cache your `pandas.DataFrame` as feather.
upload_time2024-12-01 17:00:19
maintainerNone
docs_urlNone
authorChristoph Dörrer
requires_python<4.0,>=3.9
licenseMIT
keywords cache feather pandas pyarrow
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # federleicht

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/federleicht)](https://pypi.org/project/federleicht/)
[![PyPI - Version](https://img.shields.io/pypi/v/federleicht)](https://pypi.org/project/federleicht/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/federleicht)](https://pypi.org/project/federleicht/)
[![PyPI - License](https://img.shields.io/pypi/l/federleicht)](https://raw.githubusercontent.com/d-chris/federleicht/main/LICENSE)
[![GitHub - Pytest](https://img.shields.io/github/actions/workflow/status/d-chris/federleicht/pytest.yml?logo=github&label=pytest)](https://github.com/d-chris/federleicht/actions/workflows/pytest.yml)
[![GitHub - Page](https://img.shields.io/website?url=https%3A%2F%2Fd-chris.github.io%2Ffederleicht&up_message=pdoc&logo=github&label=documentation)](https://d-chris.github.io/federleicht)
[![GitHub - Release](https://img.shields.io/github/v/tag/d-chris/federleicht?logo=github&label=github)](https://github.com/d-chris/federleicht)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://raw.githubusercontent.com/d-chris/federleicht/main/.pre-commit-config.yaml)
[![codecov](https://codecov.io/gh/d-chris/federleicht/graph/badge.svg?token=9FYKODTD9D)](https://codecov.io/gh/d-chris/federleicht)

---

`federleicht` is a Python package providing a cache decorator for `pandas.DataFrame`, utilizing the lightweight and efficient `pyarrow` feather file format.

`federleicht.cache_dataframe` is designed to decorate functions that return `pandas.DataFrame` objects. The decorator saves the DataFrame to a feather file on the first call and loads it automatically on subsequent calls if the file exists.

## Key Features

- Feather Integration: Save and load `pandas.DataFrame` effortlessly using the Feather format, known for its speed and simplicity.
- Decorator Simplicity: Add caching functionality to your functions with a single decorator line.
- Efficient Caching: Avoid redundant computations by reusing cached results.

## Cache Expiry

To implement cache expiry, `federleicht` requires all arguments of the decorated function to be serializable. The cache will expire under the following conditions:

- Argument Sensitivity: Cache will expire if the arguments (`args` / `kwargs`) of the decorated function change.
- When a `os.PathLike` object is passed as an argument, the cache will expire if the file size and / or modification time changes.
- Code Change Detection: Cache will expire if the implementation / code of the decorated function changes during development.
- Time-based Expiry: Cache will expire when it is older than a given `timedelta`.
- In addition to the immutable built-in data types, the following types for arguments are supported:
  - `os.PathLike`
  - `pandas.DataFrame`
  - `pandas.Series`
  - `numpy.ndarray`
  - `datetime.datetime`
  - `types.FunctionType`

## Installation

Install federleicht from PyPI:

```cmd
pip install federleicht
```

Normally, `md5` is used for hashing the arguments, but for even faster hashing, you can try `xxhash` as an optional dependency:

```cmd
pip install federleicht[xxhash]
```

## Usage

Here's a quick example:

```python
import pandas as pd
from federleicht import cache_dataframe

@cache_dataframe
def generate_large_dataframe():
    # Simulate a heavy computation
    return pd.DataFrame({"col1": range(10000), "col2": range(10000)})

df = generate_large_dataframe()
```

## Benchmark

[![Static Badge](https://img.shields.io/badge/kaggle-alessandrolobello-lightblue?logo=kaggle&logoColor=lightblue)](https://www.kaggle.com/datasets/alessandrolobello/the-ultimate-earthquake-dataset-from-1990-2023)

- **file**: Eartquakes-1990-2023.csv
- **size**: 494.8 mb
- **lines**: 3,445,752

Functions which are used to benchmark the performance of the `cache_dataframe` decorator.

```python
def read_data(file: str, **kwargs) -> pd.DataFrame:
    """
    Read the earthquake dataset from a CSV file to Benchmark cache.

    Perform some data type conversions and return the DataFrame.
    """
    df = pd.read_csv(
        file,
        header=0,
        dtype={
            "status": "category",
            "tsunami": "boolean",
            "data_type": "category",
            "state": "category",
        },
        **kwargs,
    )

    df["time"] = pd.to_datetime(df["time"], unit="ms")
    df["date"] = pd.to_datetime(df["date"], format="mixed")

    return df
```

The `pandas.DataFrame` without the `attrs` dictionary will be cached in the `.pandas_cache` directory and will only expire if the file changes. For more details, see the [Cache Expiry](#cache-expiry) section.

```python
@cache_dataframe
def read_cache(file: pathlib.Path, **kwargs) -> pd.DataFrame:
    return read_data(file, **kwargs)
```

### Benchmark Results

Results strongly depend on the system configuration and the file system. The following results are obtained on:

- **OS**: Windows
- **OS Version**: 10.0.19044
- **Python**: 3.11.9
- **CPU**: AMD64 Family 23 Model 104 Stepping 1, AuthenticAMD

|   nrows | read_data [s] | build_cache [s] | read_cache [s] |
| ------: | ------------: | --------------: | -------------: |
|   10000 |         0.060 |           0.076 |          0.037 |
|   32170 |         0.172 |           0.193 |          0.033 |
|  103493 |         0.536 |           0.569 |          0.067 |
|  332943 |         1.658 |           1.791 |          0.143 |
| 1071093 |         5.383 |           5.465 |          0.366 |
| 3445752 |        16.750 |          17.720 |          1.141 |

![BenchmarkPlot ](https://raw.githubusercontent.com/d-chris/federleicht/refs/heads/main/benchmark.webp)

## Dependencies

[![PyPI - pandas](https://img.shields.io/pypi/v/pandas?logo=pandas&logoColor=white&label=pandas)](https://pypi.org/project/pandas/)
[![PyPI - pyarrow](https://img.shields.io/pypi/v/pyarrow?logo=pypi&logoColor=white&label=pyarrow)](https://pypi.org/project/pyarrow/)
[![PyPI - xxhash](https://img.shields.io/pypi/v/xxhash?logo=pypi&logoColor=white&label=xxhash)](https://pypi.org/project/xxhash/)

---

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "federleicht",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "cache, feather, pandas, pyarrow",
    "author": "Christoph D\u00f6rrer",
    "author_email": "d-chris@web.de",
    "download_url": "https://files.pythonhosted.org/packages/69/70/57b564168043706288d1d4d77201b427cde369dd171613eaef01c250a988/federleicht-0.1.0.tar.gz",
    "platform": null,
    "description": "# federleicht\n\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/federleicht)](https://pypi.org/project/federleicht/)\n[![PyPI - Version](https://img.shields.io/pypi/v/federleicht)](https://pypi.org/project/federleicht/)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/federleicht)](https://pypi.org/project/federleicht/)\n[![PyPI - License](https://img.shields.io/pypi/l/federleicht)](https://raw.githubusercontent.com/d-chris/federleicht/main/LICENSE)\n[![GitHub - Pytest](https://img.shields.io/github/actions/workflow/status/d-chris/federleicht/pytest.yml?logo=github&label=pytest)](https://github.com/d-chris/federleicht/actions/workflows/pytest.yml)\n[![GitHub - Page](https://img.shields.io/website?url=https%3A%2F%2Fd-chris.github.io%2Ffederleicht&up_message=pdoc&logo=github&label=documentation)](https://d-chris.github.io/federleicht)\n[![GitHub - Release](https://img.shields.io/github/v/tag/d-chris/federleicht?logo=github&label=github)](https://github.com/d-chris/federleicht)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://raw.githubusercontent.com/d-chris/federleicht/main/.pre-commit-config.yaml)\n[![codecov](https://codecov.io/gh/d-chris/federleicht/graph/badge.svg?token=9FYKODTD9D)](https://codecov.io/gh/d-chris/federleicht)\n\n---\n\n`federleicht` is a Python package providing a cache decorator for `pandas.DataFrame`, utilizing the lightweight and efficient `pyarrow` feather file format.\n\n`federleicht.cache_dataframe` is designed to decorate functions that return `pandas.DataFrame` objects. The decorator saves the DataFrame to a feather file on the first call and loads it automatically on subsequent calls if the file exists.\n\n## Key Features\n\n- Feather Integration: Save and load `pandas.DataFrame` effortlessly using the Feather format, known for its speed and simplicity.\n- Decorator Simplicity: Add caching functionality to your functions with a single decorator line.\n- Efficient Caching: Avoid redundant computations by reusing cached results.\n\n## Cache Expiry\n\nTo implement cache expiry, `federleicht` requires all arguments of the decorated function to be serializable. The cache will expire under the following conditions:\n\n- Argument Sensitivity: Cache will expire if the arguments (`args` / `kwargs`) of the decorated function change.\n- When a `os.PathLike` object is passed as an argument, the cache will expire if the file size and / or modification time changes.\n- Code Change Detection: Cache will expire if the implementation / code of the decorated function changes during development.\n- Time-based Expiry: Cache will expire when it is older than a given `timedelta`.\n- In addition to the immutable built-in data types, the following types for arguments are supported:\n  - `os.PathLike`\n  - `pandas.DataFrame`\n  - `pandas.Series`\n  - `numpy.ndarray`\n  - `datetime.datetime`\n  - `types.FunctionType`\n\n## Installation\n\nInstall federleicht from PyPI:\n\n```cmd\npip install federleicht\n```\n\nNormally, `md5` is used for hashing the arguments, but for even faster hashing, you can try `xxhash` as an optional dependency:\n\n```cmd\npip install federleicht[xxhash]\n```\n\n## Usage\n\nHere's a quick example:\n\n```python\nimport pandas as pd\nfrom federleicht import cache_dataframe\n\n@cache_dataframe\ndef generate_large_dataframe():\n    # Simulate a heavy computation\n    return pd.DataFrame({\"col1\": range(10000), \"col2\": range(10000)})\n\ndf = generate_large_dataframe()\n```\n\n## Benchmark\n\n[![Static Badge](https://img.shields.io/badge/kaggle-alessandrolobello-lightblue?logo=kaggle&logoColor=lightblue)](https://www.kaggle.com/datasets/alessandrolobello/the-ultimate-earthquake-dataset-from-1990-2023)\n\n- **file**: Eartquakes-1990-2023.csv\n- **size**: 494.8 mb\n- **lines**: 3,445,752\n\nFunctions which are used to benchmark the performance of the `cache_dataframe` decorator.\n\n```python\ndef read_data(file: str, **kwargs) -> pd.DataFrame:\n    \"\"\"\n    Read the earthquake dataset from a CSV file to Benchmark cache.\n\n    Perform some data type conversions and return the DataFrame.\n    \"\"\"\n    df = pd.read_csv(\n        file,\n        header=0,\n        dtype={\n            \"status\": \"category\",\n            \"tsunami\": \"boolean\",\n            \"data_type\": \"category\",\n            \"state\": \"category\",\n        },\n        **kwargs,\n    )\n\n    df[\"time\"] = pd.to_datetime(df[\"time\"], unit=\"ms\")\n    df[\"date\"] = pd.to_datetime(df[\"date\"], format=\"mixed\")\n\n    return df\n```\n\nThe `pandas.DataFrame` without the `attrs` dictionary will be cached in the `.pandas_cache` directory and will only expire if the file changes. For more details, see the [Cache Expiry](#cache-expiry) section.\n\n```python\n@cache_dataframe\ndef read_cache(file: pathlib.Path, **kwargs) -> pd.DataFrame:\n    return read_data(file, **kwargs)\n```\n\n### Benchmark Results\n\nResults strongly depend on the system configuration and the file system. The following results are obtained on:\n\n- **OS**: Windows\n- **OS Version**: 10.0.19044\n- **Python**: 3.11.9\n- **CPU**: AMD64 Family 23 Model 104 Stepping 1, AuthenticAMD\n\n|   nrows | read_data [s] | build_cache [s] | read_cache [s] |\n| ------: | ------------: | --------------: | -------------: |\n|   10000 |         0.060 |           0.076 |          0.037 |\n|   32170 |         0.172 |           0.193 |          0.033 |\n|  103493 |         0.536 |           0.569 |          0.067 |\n|  332943 |         1.658 |           1.791 |          0.143 |\n| 1071093 |         5.383 |           5.465 |          0.366 |\n| 3445752 |        16.750 |          17.720 |          1.141 |\n\n![BenchmarkPlot ](https://raw.githubusercontent.com/d-chris/federleicht/refs/heads/main/benchmark.webp)\n\n## Dependencies\n\n[![PyPI - pandas](https://img.shields.io/pypi/v/pandas?logo=pandas&logoColor=white&label=pandas)](https://pypi.org/project/pandas/)\n[![PyPI - pyarrow](https://img.shields.io/pypi/v/pyarrow?logo=pypi&logoColor=white&label=pyarrow)](https://pypi.org/project/pyarrow/)\n[![PyPI - xxhash](https://img.shields.io/pypi/v/xxhash?logo=pypi&logoColor=white&label=xxhash)](https://pypi.org/project/xxhash/)\n\n---\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "lightweigth function decorators to cache your `pandas.DataFrame` as feather.",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://d-chris.github.io/federleicht",
        "Repository": "https://github.com/d-chris/federleicht"
    },
    "split_keywords": [
        "cache",
        " feather",
        " pandas",
        " pyarrow"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bf5523bb0bc9f70aed130442b29c68eed24fc9bb51f2ee9e3f84c92157ffbfbb",
                "md5": "1b9526441622a3a78f765baba1278b67",
                "sha256": "ce8bb9c83444d104e3f9e2ee04010ffb87885efbb8cbc7be249efd0e405e3d1c"
            },
            "downloads": -1,
            "filename": "federleicht-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1b9526441622a3a78f765baba1278b67",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 11288,
            "upload_time": "2024-12-01T17:00:18",
            "upload_time_iso_8601": "2024-12-01T17:00:18.349398Z",
            "url": "https://files.pythonhosted.org/packages/bf/55/23bb0bc9f70aed130442b29c68eed24fc9bb51f2ee9e3f84c92157ffbfbb/federleicht-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "697057b564168043706288d1d4d77201b427cde369dd171613eaef01c250a988",
                "md5": "837f1ad8a539edd7ed415939b3da2cda",
                "sha256": "ebb4ede1f9f13a210a85711f20df7001e0f0d3531bf4ff84c71c34851cb68995"
            },
            "downloads": -1,
            "filename": "federleicht-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "837f1ad8a539edd7ed415939b3da2cda",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 11059,
            "upload_time": "2024-12-01T17:00:19",
            "upload_time_iso_8601": "2024-12-01T17:00:19.974339Z",
            "url": "https://files.pythonhosted.org/packages/69/70/57b564168043706288d1d4d77201b427cde369dd171613eaef01c250a988/federleicht-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-01 17:00:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "d-chris",
    "github_project": "federleicht",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "federleicht"
}
        
Elapsed time: 0.36900s