data-cache


Namedata-cache JSON
Version 0.1.6 PyPI version JSON
download
home_pagehttps://github.com/statnett/data_cache
SummaryData Cache
upload_time2021-06-02 20:57:35
maintainer
docs_urlNone
authorStatnett Datascience
requires_python>=3.6.1,<4.0.0
licenseMIT
keywords cache pandas numpy decorator
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version](https://img.shields.io/pypi/v/data-cache)](https://pypi.org/project/data-cache/)
[![Python Versions](https://img.shields.io/pypi/pyversions/data-cache)](https://pypi.org/project/data-cache/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![](https://github.com/statnett/data_cache/workflows/Tests/badge.svg)](https://github.com/statnett/data_cache/actions?query=workflow%3ATests)
[![codecov](https://codecov.io/gh/statnett/data_cache/branch/master/graph/badge.svg)](https://codecov.io/gh/statnett/data_cache)

# Data cache

Works by hashing the combinations of arguments of a function call with
the function name to create a unique id of a table retrieval.  If
the function call is new the original function will be called, and the
resulting tables(s) will be stored in a HDFStore indexed by the
hashed key.  Next time the function is called with the same args the
tables(s) will be retrieved from the store instead of executing the
function.

The hashing of the arguments is done by first applying str() on the
argument, and then taking th md5 hash of the combination of these args
together with the function name.  This means that if a argument for
some reason does not have a str representation the key generation will
fail.  To omit this issue one can specify which arguments the cache
should consider such that 'un-stringable' arguments are skipped.  This
functionality is also used for skipping arguments the should by design
not be considered for the key-generation like for example
database-clients.


#### Setting cache file location

The module automatically creates a `cache/data.h5` relative to
`__main__`, to change this set the environment variable
`CACHE_PATH` to be the desired directory of the `data.h5` file.

#### Disabling the cache with env-variable

To disable the cache set the environment variable
`DISABLE_CACHE` to `TRUE`.

### Usage

#### Decorating functions

```python
from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd

@pandas_cache
def simple_func():
    sleep(5)
    return pd.DataFrame([[1,2,3], [2,3,4]])


t0 = datetime.now()
print(simple_func())
print(datetime.now() - t0)

t0 = datetime.now()
print(simple_func())
print(datetime.now() - t0)
```
```commandline
   0  1  2
0  1  2  3
1  2  3  4
0:00:05.343027
   0  1  2
0  1  2  3
1  2  3  4
0:00:00.015987
```

#### Decorating class methods

The decorator ignores arguments named 'self' such that it will work across different instances of the same object.

```python
from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd


class PandasClass:
    def __init__(self):
        print(self)

    @pandas_cache
    def simple_func(self):
        sleep(5)
        return pd.DataFrame([[1,2,3], [2,3,4]])

c = PandasClass()
t0 = datetime.now()
print(c.simple_func())
print(datetime.now() - t0)

c = PandasClass()
t0 = datetime.now()
print(c.simple_func())
print(datetime.now() - t0)
```
```commandline
<__main__.PandasClass object at 0x003451F0>
   0  1  2
0  1  2  3
1  2  3  4
0:00:05.375342
<__main__.PandasClass object at 0x124814B0>
   0  1  2
0  1  2  3
1  2  3  4
0:00:00.014959
```

#### Selecting arguments

```python
from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd

@pandas_cache("a", "c")
def simple_func(a, b, c=True):
    sleep(5)
    return pd.DataFrame([[1,2,3], [2,3,4]])


t0 = datetime.now()
print(simple_func(a=1, b=2))
print(datetime.now() - t0)

# b is not considered
t0 = datetime.now()
print(simple_func(a=1, b=3))
print(datetime.now() - t0)
```
```commandline
   0  1  2
0  1  2  3
1  2  3  4
0:00:05.619620
   0  1  2
0  1  2  3
1  2  3  4
0:00:00.017980
```

#### Multi-DataFrame returns

```python
from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd


@pandas_cache("a", "c")
def simple_func(a, *args, **kwargs):
    sleep(5)
    return pd.DataFrame([[1,2,3], [2,3,4]]), pd.DataFrame([[1,2,3], [2,3,4]]) * 10


t0 = datetime.now()
print(simple_func(1, b=2, c=True))
print(datetime.now() - t0)

t0 = datetime.now()
print(simple_func(a=1, b=3, c=True))
print(datetime.now() - t0)
```
```commandline
(   0  1  2
0  1  2  3
1  2  3  4,     0   1   2
0  10  20  30
1  20  30  40)
0:00:05.368545
(   0  1  2
0  1  2  3
1  2  3  4,     0   1   2
0  10  20  30
1  20  30  40)
0:00:00.019578
```

#### Disabling cache for tests

Caching can be disabled using the environment variable DISABLE_CACHE to TRUE

```python
from mock import patch
def test_cached_function():
    with patch.dict("os.environ", {"DISABLE_CACHE": "TRUE"}, clear=True):
        assert cached_function() == target
```

#### Numpy caching

```python
from data_cache import numpy_cache
from time import sleep
from datetime import datetime
import numpy as np


@numpy_cache("a", "c")
def simple_func(a, *args, **kwargs):
    sleep(5)
    return np.array([[1, 2, 3], [2, 3, 4]]), np.array([[1, 2, 3], [2, 3, 4]]) * 10


t0 = datetime.now()
print(simple_func(1, b=2, c=True))
print(datetime.now() - t0)

t0 = datetime.now()
print(simple_func(a=1, b=3, c=True))
print(datetime.now() - t0)
```

```commandline
(array([[1, 2, 3],
       [2, 3, 4]]), array([[10, 20, 30],
       [20, 30, 40]]))
0:00:05.009084
(array([[1, 2, 3],
       [2, 3, 4]]), array([[10, 20, 30],
       [20, 30, 40]]))
0:00:00.002000
```

#### Metadata

Metadata is automatically stored with the data on the group node containing the
DataFrame/Array.

```python
from data_cache import numpy_cache, pandas_cache, read_metadata
import pandas as pd
import numpy as np
from datetime import datetime


@pandas_cache
def function1(a, *args, b=1, **kwargs):
    return pd.DataFrame()

@numpy_cache
def function2(a, *args, b=1, **kwargs):
    return np.array([])

function1(1, True, datetime.date(2019, 11, 11))
function2(2, False, b=2, c=1.1)
read_metadata("path_to_data.h5")
```
results:
```json
{
    "/a86f0a323bf20998b5deda81e9f90bb49/a5d320e5dcdc5d3f35a4ca366980b2dc1": {
        "a": "1",
        "arglist": "(True, datetime.date(2019, 11, 11))",
        "b": "1",
        "date_stored": "01/05/2020, 10:00:00",
        "function_name": "function1",
        "module_path": "path_to_module"
    },
    "/a56ad8af46bc5fd8b9320b00b12e6c115/a62734531fc99855292c9db04d5eba60a": {
        "a": "2",
        "arglist": "(False,)",
        "b": "2",
        "c": "1.1",
        "date_stored": "01/05/2020, 10:00:00",
        "function_name": "function2",
        "module_path":  "path_to_module"
    }
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/statnett/data_cache",
    "name": "data-cache",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6.1,<4.0.0",
    "maintainer_email": "",
    "keywords": "cache,pandas,numpy,decorator",
    "author": "Statnett Datascience",
    "author_email": "Datascience.Drift@statnett.no",
    "download_url": "https://files.pythonhosted.org/packages/10/af/8d2d7b2f8142f848fe310022c77c999a093b327369b408e855975e19d2ec/data_cache-0.1.6.tar.gz",
    "platform": "",
    "description": "[![PyPI version](https://img.shields.io/pypi/v/data-cache)](https://pypi.org/project/data-cache/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/data-cache)](https://pypi.org/project/data-cache/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![](https://github.com/statnett/data_cache/workflows/Tests/badge.svg)](https://github.com/statnett/data_cache/actions?query=workflow%3ATests)\n[![codecov](https://codecov.io/gh/statnett/data_cache/branch/master/graph/badge.svg)](https://codecov.io/gh/statnett/data_cache)\n\n# Data cache\n\nWorks by hashing the combinations of arguments of a function call with\nthe function name to create a unique id of a table retrieval.  If\nthe function call is new the original function will be called, and the\nresulting tables(s) will be stored in a HDFStore indexed by the\nhashed key.  Next time the function is called with the same args the\ntables(s) will be retrieved from the store instead of executing the\nfunction.\n\nThe hashing of the arguments is done by first applying str() on the\nargument, and then taking th md5 hash of the combination of these args\ntogether with the function name.  This means that if a argument for\nsome reason does not have a str representation the key generation will\nfail.  To omit this issue one can specify which arguments the cache\nshould consider such that 'un-stringable' arguments are skipped.  This\nfunctionality is also used for skipping arguments the should by design\nnot be considered for the key-generation like for example\ndatabase-clients.\n\n\n#### Setting cache file location\n\nThe module automatically creates a `cache/data.h5` relative to\n`__main__`, to change this set the environment variable\n`CACHE_PATH` to be the desired directory of the `data.h5` file.\n\n#### Disabling the cache with env-variable\n\nTo disable the cache set the environment variable\n`DISABLE_CACHE` to `TRUE`.\n\n### Usage\n\n#### Decorating functions\n\n```python\nfrom data_cache import pandas_cache\nfrom time import sleep\nfrom datetime import datetime\nimport pandas as pd\n\n@pandas_cache\ndef simple_func():\n    sleep(5)\n    return pd.DataFrame([[1,2,3], [2,3,4]])\n\n\nt0 = datetime.now()\nprint(simple_func())\nprint(datetime.now() - t0)\n\nt0 = datetime.now()\nprint(simple_func())\nprint(datetime.now() - t0)\n```\n```commandline\n   0  1  2\n0  1  2  3\n1  2  3  4\n0:00:05.343027\n   0  1  2\n0  1  2  3\n1  2  3  4\n0:00:00.015987\n```\n\n#### Decorating class methods\n\nThe decorator ignores arguments named 'self' such that it will work across different instances of the same object.\n\n```python\nfrom data_cache import pandas_cache\nfrom time import sleep\nfrom datetime import datetime\nimport pandas as pd\n\n\nclass PandasClass:\n    def __init__(self):\n        print(self)\n\n    @pandas_cache\n    def simple_func(self):\n        sleep(5)\n        return pd.DataFrame([[1,2,3], [2,3,4]])\n\nc = PandasClass()\nt0 = datetime.now()\nprint(c.simple_func())\nprint(datetime.now() - t0)\n\nc = PandasClass()\nt0 = datetime.now()\nprint(c.simple_func())\nprint(datetime.now() - t0)\n```\n```commandline\n<__main__.PandasClass object at 0x003451F0>\n   0  1  2\n0  1  2  3\n1  2  3  4\n0:00:05.375342\n<__main__.PandasClass object at 0x124814B0>\n   0  1  2\n0  1  2  3\n1  2  3  4\n0:00:00.014959\n```\n\n#### Selecting arguments\n\n```python\nfrom data_cache import pandas_cache\nfrom time import sleep\nfrom datetime import datetime\nimport pandas as pd\n\n@pandas_cache(\"a\", \"c\")\ndef simple_func(a, b, c=True):\n    sleep(5)\n    return pd.DataFrame([[1,2,3], [2,3,4]])\n\n\nt0 = datetime.now()\nprint(simple_func(a=1, b=2))\nprint(datetime.now() - t0)\n\n# b is not considered\nt0 = datetime.now()\nprint(simple_func(a=1, b=3))\nprint(datetime.now() - t0)\n```\n```commandline\n   0  1  2\n0  1  2  3\n1  2  3  4\n0:00:05.619620\n   0  1  2\n0  1  2  3\n1  2  3  4\n0:00:00.017980\n```\n\n#### Multi-DataFrame returns\n\n```python\nfrom data_cache import pandas_cache\nfrom time import sleep\nfrom datetime import datetime\nimport pandas as pd\n\n\n@pandas_cache(\"a\", \"c\")\ndef simple_func(a, *args, **kwargs):\n    sleep(5)\n    return pd.DataFrame([[1,2,3], [2,3,4]]), pd.DataFrame([[1,2,3], [2,3,4]]) * 10\n\n\nt0 = datetime.now()\nprint(simple_func(1, b=2, c=True))\nprint(datetime.now() - t0)\n\nt0 = datetime.now()\nprint(simple_func(a=1, b=3, c=True))\nprint(datetime.now() - t0)\n```\n```commandline\n(   0  1  2\n0  1  2  3\n1  2  3  4,     0   1   2\n0  10  20  30\n1  20  30  40)\n0:00:05.368545\n(   0  1  2\n0  1  2  3\n1  2  3  4,     0   1   2\n0  10  20  30\n1  20  30  40)\n0:00:00.019578\n```\n\n#### Disabling cache for tests\n\nCaching can be disabled using the environment variable DISABLE_CACHE to TRUE\n\n```python\nfrom mock import patch\ndef test_cached_function():\n    with patch.dict(\"os.environ\", {\"DISABLE_CACHE\": \"TRUE\"}, clear=True):\n        assert cached_function() == target\n```\n\n#### Numpy caching\n\n```python\nfrom data_cache import numpy_cache\nfrom time import sleep\nfrom datetime import datetime\nimport numpy as np\n\n\n@numpy_cache(\"a\", \"c\")\ndef simple_func(a, *args, **kwargs):\n    sleep(5)\n    return np.array([[1, 2, 3], [2, 3, 4]]), np.array([[1, 2, 3], [2, 3, 4]]) * 10\n\n\nt0 = datetime.now()\nprint(simple_func(1, b=2, c=True))\nprint(datetime.now() - t0)\n\nt0 = datetime.now()\nprint(simple_func(a=1, b=3, c=True))\nprint(datetime.now() - t0)\n```\n\n```commandline\n(array([[1, 2, 3],\n       [2, 3, 4]]), array([[10, 20, 30],\n       [20, 30, 40]]))\n0:00:05.009084\n(array([[1, 2, 3],\n       [2, 3, 4]]), array([[10, 20, 30],\n       [20, 30, 40]]))\n0:00:00.002000\n```\n\n#### Metadata\n\nMetadata is automatically stored with the data on the group node containing the\nDataFrame/Array.\n\n```python\nfrom data_cache import numpy_cache, pandas_cache, read_metadata\nimport pandas as pd\nimport numpy as np\nfrom datetime import datetime\n\n\n@pandas_cache\ndef function1(a, *args, b=1, **kwargs):\n    return pd.DataFrame()\n\n@numpy_cache\ndef function2(a, *args, b=1, **kwargs):\n    return np.array([])\n\nfunction1(1, True, datetime.date(2019, 11, 11))\nfunction2(2, False, b=2, c=1.1)\nread_metadata(\"path_to_data.h5\")\n```\nresults:\n```json\n{\n    \"/a86f0a323bf20998b5deda81e9f90bb49/a5d320e5dcdc5d3f35a4ca366980b2dc1\": {\n        \"a\": \"1\",\n        \"arglist\": \"(True, datetime.date(2019, 11, 11))\",\n        \"b\": \"1\",\n        \"date_stored\": \"01/05/2020, 10:00:00\",\n        \"function_name\": \"function1\",\n        \"module_path\": \"path_to_module\"\n    },\n    \"/a56ad8af46bc5fd8b9320b00b12e6c115/a62734531fc99855292c9db04d5eba60a\": {\n        \"a\": \"2\",\n        \"arglist\": \"(False,)\",\n        \"b\": \"2\",\n        \"c\": \"1.1\",\n        \"date_stored\": \"01/05/2020, 10:00:00\",\n        \"function_name\": \"function2\",\n        \"module_path\":  \"path_to_module\"\n    }\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Data Cache",
    "version": "0.1.6",
    "project_urls": {
        "Homepage": "https://github.com/statnett/data_cache",
        "Repository": "https://github.com/statnett/data_cache"
    },
    "split_keywords": [
        "cache",
        "pandas",
        "numpy",
        "decorator"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e53923886d94dbddb15cf917b59bfc64e89de09dcb103532c8affd67bed6555",
                "md5": "00688da0133ea94a187eb568ae999922",
                "sha256": "57d33ce393c7edac9962857172d26b5fc2f7fa79f8567d1e84ff675121534252"
            },
            "downloads": -1,
            "filename": "data_cache-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "00688da0133ea94a187eb568ae999922",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.1,<4.0.0",
            "size": 6727,
            "upload_time": "2021-06-02T20:57:34",
            "upload_time_iso_8601": "2021-06-02T20:57:34.613583Z",
            "url": "https://files.pythonhosted.org/packages/9e/53/923886d94dbddb15cf917b59bfc64e89de09dcb103532c8affd67bed6555/data_cache-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "10af8d2d7b2f8142f848fe310022c77c999a093b327369b408e855975e19d2ec",
                "md5": "b0fad0d8d845d625acd9f9b7210d37e5",
                "sha256": "b3fc7ede7b90ec8d3036b37b3bfe1e779ca03ae7a3f33ef7015936421c69b65e"
            },
            "downloads": -1,
            "filename": "data_cache-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "b0fad0d8d845d625acd9f9b7210d37e5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.1,<4.0.0",
            "size": 7752,
            "upload_time": "2021-06-02T20:57:35",
            "upload_time_iso_8601": "2021-06-02T20:57:35.873285Z",
            "url": "https://files.pythonhosted.org/packages/10/af/8d2d7b2f8142f848fe310022c77c999a093b327369b408e855975e19d2ec/data_cache-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-06-02 20:57:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "statnett",
    "github_project": "data_cache",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "data-cache"
}
        
Elapsed time: 0.41234s