# Cacheables
[](https://github.com/thomelane/cacheables/actions/workflows/build.yml)
[](https://codecov.io/gh/thomelane/cacheables)
Cacheables is a Python package that makes it easy to cache function outputs.
Cacheables is well suited to building efficient data workflows, because:
* functions will only recompute if their inputs have changed.
* everything is versioned: the functions, the inputs and the outputs.
* the cache is reused between different processes/executions (stored on [`DiskCache`](https://github.com/thomelane/cacheables/blob/21bf54fb67b7f9cb2699915da3969b36a2519d9c/cacheables/caches/disk.py#L13) by default).
* cached outputs are readable since you choose the file format ([`PickleSerializer`](https://github.com/thomelane/cacheables/blob/21bf54fb67b7f9cb2699915da3969b36a2519d9c/cacheables/serializers.py#L29C27-L29C27) is just a default).
## Install
```bash
pip install cacheables
```
## Basic Example
`@cacheable` is the decorator that makes a function cacheable.
```python
# basic_example.py
from cacheables import cacheable
from time import sleep
@cacheable
def foo(text: str) -> int:
sleep(1) # simulate a long running function
return len(text)
if __name__ == "__main__":
foo("hello")
with foo.enable_cache():
foo("world")
# python basic_example.py # 2 seconds
# python basic_example.py # 1 seconds (foo("world") used cache)
```
When the cache is enabled on a function, the following happens:
* an `input_key` will be calculated from the function arguments
* if the `input_key` exists in the cache
* the output will be loaded from the cache
* using `cache.read` and then `serializer.deserialize`
* and the output will be returned
* if the `input_key` doesn't exist in the cache
* the original function will execute to get an output
* the output will be dumped in the cache
* using `serializer.serialize` and then `cache.write`
* and the output will be returned
## Standard Example
Cacheables is well suited to building efficient data workflows.
As a simple example, let's assume we have a data workflow that processes a
string by removing the vowels, reversing the output, and then finally
concatenating that output with the original string. We'll assume that two of
these steps are computationally expensive (`remove_vowels` and `concatenate`),
so we decorate them with `@cacheable`.
After running the workflow twice (showing that the cached results are used), we
modify the workflow by removing the `reverse` step. Only `concatenate` is run on
the third workflow execution, which is much more efficient than running the
whole workflow (including `remove_vowels`) again.
```python
# standard_example.py
from cacheables import cacheable, enable_all_caches
from time import sleep
@cacheable
def remove_vowels(text: str) -> str:
sleep(1) # simulate a long running function
return ''.join([char for char in text if char not in "aeiou"])
def reverse(text: str) -> str:
return text[::-1]
@cacheable
def concatenate(reversed_text: str, text: str) -> str:
sleep(1) # simulate a long running function
return (reversed_text + text)
def run_workflow(text: str) -> int:
t = remove_vowels(text)
t = reverse(t)
output = concatenate(t, text)
return output
if __name__ == "__main__":
enable_all_caches()
run_workflow("cache this") # 2 seconds
run_workflow("cache this") # 0 seconds
def run_workflow(text: str) -> int:
t = remove_vowels(text)
# t = reverse(t) # removed
output = concatenate(t, text)
return output
run_workflow("cache this") # 1 second
# python standard_example.py # 5 seconds
# python standard_example.py # 0 seconds (current and previous are still both cached)
```
## Advanced Example
Cacheables has many other features, a few of which are shown below.
```python
# advanced_example.py
from cacheables import cacheable, enable_all_caches, enable_logging
from cacheables.caches import DiskCache
from cacheables.serializers import JsonSerializer
from time import sleep
@cacheable(
function_id="example",
cache=DiskCache(base_path="~/.cache"),
serializer=JsonSerializer(),
exclude_args_fn=lambda e: e in ["verbose"]
)
def foo(text: str, verbose: bool = False) -> int:
sleep(1) # simulate a long running function
return len(text)
if __name__ == "__main__":
enable_all_caches()
enable_logging()
foo("cache this") # 1 seconds
foo("cache this", verbose=True) # 0 seconds
# manually write output to cache
input_id = foo.get_input_id("and cache that")
foo.dump_output(14, input_id)
foo("and cache that") # 0 seconds
# manually read output from cache
input_id = foo.get_input_id("cache this")
foo.load_output(input_id) # 0 seconds
# show output path in cache
foo.get_output_path(input_id)
# ~/.cache/functions/example/inputs/cf5b2ab47064bd0e/aab3238922bcc25a.json
# only use certain outputs in cache, recompute others
with foo.enable_cache(filter=lambda output: output <= 10):
foo("cache this") # 0 seconds
foo("and cache that") # 1 seconds
# overwrite cache
with foo.enable_cache(read=False, write=True):
foo("cache this") # 1 seconds
# python advanced_example.py # 3 seconds
# python advanced_example.py # 2 seconds (first foo("cache this") used cache)
```
### PickleSerializer & DiskCache
When you use `@cacheable` without any argument, `PickleSerializer` and
`DiskCache` will be used by default.
After executing a function like `foo("hello")` with the cache enabled, you can
expect to see the following files on disk:
```
<cwd>/.cacheables
└── functions
└── <function_id>
└── inputs
└── <input_id>
├── <output_id>.pickle
└── metadata.json
```
* `function_id`
* An `function_id` uniquely identifies a function. Unless specified using the `function_id` argument to `cacheable`, the `function_id` will take the following form: `module.submodule:foo`.
* `input_id`
* An `input_id` uniquely identifies a set of inputs to a function. We assume that changes to the inputs of a function will result in a change to the output of the function. Under the hood, each `input_id` is created by first hashing each individual input argument (which is itself cached!) and then hashing all of the argument hashes into a single hash.
* `output_id`
* An `output_id` uniquely identifies an output to a function. Similar to the `input_id`, it is a hash of the function's output.
## Other Documentation
See the [official documentation](https://thomelane.github.io/cacheables/) for more details.
Start by wrapping your function with the `@cacheable` decorator.
```python
@cacheable
def foo(text: str) -> int:
sleep(10) # simulate a long running function
return len(text)
```
Customization is possible by passing in arguments to the decorator.
```python
@cacheable(
function_id="example",
cache=DiskCache(base_path="~/.cache"),
serializer=JsonSerializer(),
exclude_args_fn=lambda e: e in ["verbose"]
)
def foo(text: str, verbose: bool = False) -> int:
sleep(10) # simulate a long running function
return len(text)
```
See the `@cacheable` docstring for more details.
#### Caching
Use `foo.enable_cache()` to enable the cache on a single function or
`enable_all_caches` to enable the cache on all functions.
```python
@cacheable
def foobar(text: str) -> int:
sleep(10) # simulate another long running function
return len(text)
foo.clear_cache()
foo("hello") # returns after 10 seconds
foo("hello") # returns after 10 seconds
foo.enable_cache()
foo("hello") # returns after 10 seconds (writes to cache)
foo("hello") # returns immediately (reads from cache)
foobar("hello") # returns after 10 seconds
foobar("hello") # returns after 10 seconds
enable_all_caches()
foobar("hello") # returns after 10 seconds (writes to cache)
foobar("hello") # returns immediately (reads from cache)
```
You can also use both of these as context managers, if you only want to enable
the cache temporarily within a certain scope.
```python
foo.clear_cache()
foobar.clear_cache()
foo("hello") # returns after 10 seconds
foo("hello") # returns after 10 seconds
with foo.enable_cache():
foo("hello") # returns after 10 seconds (writes to cache)
foo("hello") # returns immediately (reads from cache)
foo("hello") # returns after 10 seconds
with foo.enable_cache(), bar.enable_cache():
foo("hello") # returns immediately (reads from cache)
foobar("hello") # returns after 10 seconds (writes to cache)
foobar("hello") # returns immediately (reads from cache)
foo("hello") # returns after 10 seconds
foobar("hello") # returns after 10 seconds
with enable_all_caches():
foo("hello") # returns immediately (reads from cache)
foobar("hello") # returns immediately (reads from cache)
foo("hello") # returns after 10 seconds
foobar("hello") # returns after 10 seconds
```
#### Cache Setting
When a cacheable function is called after `enable_cache`, the cache will be
read from and written too. Sometimes you might need to leave the results in the
cache untouched, or even overwrite the results in the cache. You can do this by
specifying the `read` and `write` arguments.
```python
foo.enable_cache(read=False, write=True)
foo("hello") # foo called, and result added to cache
foo("hello") # foo called, and result re-added to cache
```
You have three levels of cache settings:
* Function: controlled by `foo.enable_cache`/`foo.disable_cache`
* Global: controlled by `enable_all_caches`/`disable_all_caches`
* Environment: controlled by `CACHEABLES_ENABLED`/`CACHEABLES_DISABLED`
When nothing is explicitly enabled/disabled (i.e. default), the cache will be disabled so that the cacheable function runs without any caching. When *any* level is explicitly set to disabled, the cache will be disabled, regardless of the other level settings (even if they are explicitly set to enabled).
#### Output load
Often you just want to load a result from the cache, but not execute it.
You can do this by using the `load_output` method.
```python
input_id = foo.get_input_id("hello")
output = foo.load_output(input_id) # will error if result is not in cache
```
#### Output dump
Some more advanced use-cases might want to manually write results to the cache (e.g. batched processing). You can do this by using the `dump_output` method.
```python
input_id = foo.get_input_id("hello")
output = foo.dump_output(5, input_id)
```
## Development
```bash
poetry install
poetry run task test # pytest
poetry run task format # black
poetry run task lint # ruff
```
Use pre-commit to automatically format and lint before each commit.
```bash
pre-commit install
```
Raw data
{
"_id": null,
"home_page": "https://github.com/thomelane/cacheables",
"name": "cacheables",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Thom Lane",
"author_email": "thom.e.lane@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/bd/13/b6790395db73c38db2a5acb3bdf8621c6827cb8af56c85c54d8726937769/cacheables-0.3.0.tar.gz",
"platform": null,
"description": "# Cacheables\n\n[](https://github.com/thomelane/cacheables/actions/workflows/build.yml)\n[](https://codecov.io/gh/thomelane/cacheables)\n\nCacheables is a Python package that makes it easy to cache function outputs.\n\nCacheables is well suited to building efficient data workflows, because:\n\n* functions will only recompute if their inputs have changed.\n* everything is versioned: the functions, the inputs and the outputs.\n* the cache is reused between different processes/executions (stored on [`DiskCache`](https://github.com/thomelane/cacheables/blob/21bf54fb67b7f9cb2699915da3969b36a2519d9c/cacheables/caches/disk.py#L13) by default).\n* cached outputs are readable since you choose the file format ([`PickleSerializer`](https://github.com/thomelane/cacheables/blob/21bf54fb67b7f9cb2699915da3969b36a2519d9c/cacheables/serializers.py#L29C27-L29C27) is just a default).\n\n## Install\n\n```bash\npip install cacheables\n```\n\n## Basic Example\n\n`@cacheable` is the decorator that makes a function cacheable.\n\n```python\n# basic_example.py\nfrom cacheables import cacheable\nfrom time import sleep\n\n@cacheable\ndef foo(text: str) -> int:\n sleep(1) # simulate a long running function\n return len(text)\n\nif __name__ == \"__main__\":\n foo(\"hello\")\n with foo.enable_cache():\n foo(\"world\")\n\n# python basic_example.py # 2 seconds\n# python basic_example.py # 1 seconds (foo(\"world\") used cache)\n```\n\nWhen the cache is enabled on a function, the following happens:\n\n* an `input_key` will be calculated from the function arguments\n* if the `input_key` exists in the cache\n * the output will be loaded from the cache\n * using `cache.read` and then `serializer.deserialize`\n * and the output will be returned\n* if the `input_key` doesn't exist in the cache\n * the original function will execute to get an output\n * the output will be dumped in the cache\n * using `serializer.serialize` and then `cache.write`\n * and the output will be returned\n\n## Standard Example\n\nCacheables is well suited to building efficient data workflows.\n\nAs a simple example, let's assume we have a data workflow that processes a\nstring by removing the vowels, reversing the output, and then finally\nconcatenating that output with the original string. We'll assume that two of\nthese steps are computationally expensive (`remove_vowels` and `concatenate`),\nso we decorate them with `@cacheable`.\n\nAfter running the workflow twice (showing that the cached results are used), we\nmodify the workflow by removing the `reverse` step. Only `concatenate` is run on\nthe third workflow execution, which is much more efficient than running the\nwhole workflow (including `remove_vowels`) again.\n\n```python\n# standard_example.py\nfrom cacheables import cacheable, enable_all_caches\nfrom time import sleep\n\n@cacheable\ndef remove_vowels(text: str) -> str:\n sleep(1) # simulate a long running function\n return ''.join([char for char in text if char not in \"aeiou\"])\n\ndef reverse(text: str) -> str:\n return text[::-1]\n\n@cacheable\ndef concatenate(reversed_text: str, text: str) -> str:\n sleep(1) # simulate a long running function\n return (reversed_text + text)\n\ndef run_workflow(text: str) -> int:\n t = remove_vowels(text)\n t = reverse(t)\n output = concatenate(t, text)\n return output\n\n\nif __name__ == \"__main__\":\n enable_all_caches()\n\n run_workflow(\"cache this\") # 2 seconds\n run_workflow(\"cache this\") # 0 seconds\n\n def run_workflow(text: str) -> int:\n t = remove_vowels(text)\n # t = reverse(t) # removed\n output = concatenate(t, text)\n return output\n\n run_workflow(\"cache this\") # 1 second\n\n# python standard_example.py # 5 seconds\n# python standard_example.py # 0 seconds (current and previous are still both cached)\n```\n\n## Advanced Example\n\nCacheables has many other features, a few of which are shown below.\n\n```python\n# advanced_example.py\nfrom cacheables import cacheable, enable_all_caches, enable_logging\nfrom cacheables.caches import DiskCache\nfrom cacheables.serializers import JsonSerializer\nfrom time import sleep\n\n@cacheable(\n function_id=\"example\",\n cache=DiskCache(base_path=\"~/.cache\"),\n serializer=JsonSerializer(),\n exclude_args_fn=lambda e: e in [\"verbose\"]\n)\ndef foo(text: str, verbose: bool = False) -> int:\n sleep(1) # simulate a long running function\n return len(text)\n\nif __name__ == \"__main__\":\n enable_all_caches()\n enable_logging()\n\n foo(\"cache this\") # 1 seconds\n foo(\"cache this\", verbose=True) # 0 seconds\n\n # manually write output to cache\n input_id = foo.get_input_id(\"and cache that\")\n foo.dump_output(14, input_id)\n foo(\"and cache that\") # 0 seconds\n\n # manually read output from cache\n input_id = foo.get_input_id(\"cache this\")\n foo.load_output(input_id) # 0 seconds\n\n # show output path in cache\n foo.get_output_path(input_id)\n # ~/.cache/functions/example/inputs/cf5b2ab47064bd0e/aab3238922bcc25a.json\n\n # only use certain outputs in cache, recompute others\n with foo.enable_cache(filter=lambda output: output <= 10):\n foo(\"cache this\") # 0 seconds\n foo(\"and cache that\") # 1 seconds\n\n # overwrite cache\n with foo.enable_cache(read=False, write=True):\n foo(\"cache this\") # 1 seconds\n\n# python advanced_example.py # 3 seconds\n# python advanced_example.py # 2 seconds (first foo(\"cache this\") used cache)\n```\n\n### PickleSerializer & DiskCache\n\nWhen you use `@cacheable` without any argument, `PickleSerializer` and\n`DiskCache` will be used by default.\n\nAfter executing a function like `foo(\"hello\")` with the cache enabled, you can\nexpect to see the following files on disk:\n\n```\n<cwd>/.cacheables\n\u2514\u2500\u2500 functions\n \u2514\u2500\u2500 <function_id>\n \u2514\u2500\u2500 inputs\n \u2514\u2500\u2500 <input_id>\n \u251c\u2500\u2500 <output_id>.pickle\n \u2514\u2500\u2500 metadata.json\n```\n\n* `function_id`\n * An `function_id` uniquely identifies a function. Unless specified using the `function_id` argument to `cacheable`, the `function_id` will take the following form: `module.submodule:foo`.\n* `input_id`\n * An `input_id` uniquely identifies a set of inputs to a function. We assume that changes to the inputs of a function will result in a change to the output of the function. Under the hood, each `input_id` is created by first hashing each individual input argument (which is itself cached!) and then hashing all of the argument hashes into a single hash.\n* `output_id`\n * An `output_id` uniquely identifies an output to a function. Similar to the `input_id`, it is a hash of the function's output.\n\n## Other Documentation\n\nSee the [official documentation](https://thomelane.github.io/cacheables/) for more details.\n\nStart by wrapping your function with the `@cacheable` decorator.\n\n```python\n@cacheable\ndef foo(text: str) -> int:\n sleep(10) # simulate a long running function\n return len(text)\n```\n\nCustomization is possible by passing in arguments to the decorator.\n\n```python\n@cacheable(\n function_id=\"example\",\n cache=DiskCache(base_path=\"~/.cache\"),\n serializer=JsonSerializer(),\n exclude_args_fn=lambda e: e in [\"verbose\"]\n)\ndef foo(text: str, verbose: bool = False) -> int:\n sleep(10) # simulate a long running function\n return len(text)\n```\n\nSee the `@cacheable` docstring for more details.\n\n#### Caching\n\nUse `foo.enable_cache()` to enable the cache on a single function or\n`enable_all_caches` to enable the cache on all functions.\n\n```python\n@cacheable\ndef foobar(text: str) -> int:\n sleep(10) # simulate another long running function\n return len(text)\n\nfoo.clear_cache()\nfoo(\"hello\") # returns after 10 seconds\nfoo(\"hello\") # returns after 10 seconds\n\nfoo.enable_cache()\nfoo(\"hello\") # returns after 10 seconds (writes to cache)\nfoo(\"hello\") # returns immediately (reads from cache)\nfoobar(\"hello\") # returns after 10 seconds\nfoobar(\"hello\") # returns after 10 seconds\n\nenable_all_caches()\nfoobar(\"hello\") # returns after 10 seconds (writes to cache)\nfoobar(\"hello\") # returns immediately (reads from cache)\n```\n\nYou can also use both of these as context managers, if you only want to enable\nthe cache temporarily within a certain scope.\n\n```python\nfoo.clear_cache()\nfoobar.clear_cache()\n\nfoo(\"hello\") # returns after 10 seconds\nfoo(\"hello\") # returns after 10 seconds\n\nwith foo.enable_cache():\n foo(\"hello\") # returns after 10 seconds (writes to cache)\n foo(\"hello\") # returns immediately (reads from cache)\nfoo(\"hello\") # returns after 10 seconds\n\nwith foo.enable_cache(), bar.enable_cache():\n foo(\"hello\") # returns immediately (reads from cache)\n foobar(\"hello\") # returns after 10 seconds (writes to cache)\n foobar(\"hello\") # returns immediately (reads from cache)\nfoo(\"hello\") # returns after 10 seconds\nfoobar(\"hello\") # returns after 10 seconds\n\nwith enable_all_caches():\n foo(\"hello\") # returns immediately (reads from cache)\n foobar(\"hello\") # returns immediately (reads from cache)\nfoo(\"hello\") # returns after 10 seconds\nfoobar(\"hello\") # returns after 10 seconds\n```\n\n#### Cache Setting\n\nWhen a cacheable function is called after `enable_cache`, the cache will be\nread from and written too. Sometimes you might need to leave the results in the\ncache untouched, or even overwrite the results in the cache. You can do this by\nspecifying the `read` and `write` arguments.\n\n```python\nfoo.enable_cache(read=False, write=True)\nfoo(\"hello\") # foo called, and result added to cache\nfoo(\"hello\") # foo called, and result re-added to cache\n```\n\nYou have three levels of cache settings:\n\n* Function: controlled by `foo.enable_cache`/`foo.disable_cache`\n* Global: controlled by `enable_all_caches`/`disable_all_caches`\n* Environment: controlled by `CACHEABLES_ENABLED`/`CACHEABLES_DISABLED`\n\nWhen nothing is explicitly enabled/disabled (i.e. default), the cache will be disabled so that the cacheable function runs without any caching. When *any* level is explicitly set to disabled, the cache will be disabled, regardless of the other level settings (even if they are explicitly set to enabled).\n\n#### Output load\n\nOften you just want to load a result from the cache, but not execute it.\nYou can do this by using the `load_output` method.\n\n```python\ninput_id = foo.get_input_id(\"hello\")\noutput = foo.load_output(input_id) # will error if result is not in cache\n```\n\n#### Output dump\n\nSome more advanced use-cases might want to manually write results to the cache (e.g. batched processing). You can do this by using the `dump_output` method.\n\n```python\ninput_id = foo.get_input_id(\"hello\")\noutput = foo.dump_output(5, input_id)\n```\n\n\n## Development\n\n```bash\npoetry install\npoetry run task test # pytest\npoetry run task format # black\npoetry run task lint # ruff\n```\n\nUse pre-commit to automatically format and lint before each commit.\n\n```bash\npre-commit install\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "",
"version": "0.3.0",
"project_urls": {
"Homepage": "https://github.com/thomelane/cacheables",
"Repository": "https://github.com/thomelane/cacheables"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "24380f2e63297721536eb8aa884e49552e402526b31c2c706efbe4f8a48fa6ad",
"md5": "ef8da6e8d940c53b7546d86f046e9f16",
"sha256": "6bb51af76ba3c3561273f22a190671db34591a5beb5ff45ae277936e46e2e0ae"
},
"downloads": -1,
"filename": "cacheables-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ef8da6e8d940c53b7546d86f046e9f16",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 14409,
"upload_time": "2023-08-29T08:26:20",
"upload_time_iso_8601": "2023-08-29T08:26:20.815018Z",
"url": "https://files.pythonhosted.org/packages/24/38/0f2e63297721536eb8aa884e49552e402526b31c2c706efbe4f8a48fa6ad/cacheables-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bd13b6790395db73c38db2a5acb3bdf8621c6827cb8af56c85c54d8726937769",
"md5": "518e558bb99d39edf95a8b49a2cc063b",
"sha256": "4c1032fe2ce273fc0792db91dc91811f30e80fa2849d6c64972165d752b50977"
},
"downloads": -1,
"filename": "cacheables-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "518e558bb99d39edf95a8b49a2cc063b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 13595,
"upload_time": "2023-08-29T08:26:22",
"upload_time_iso_8601": "2023-08-29T08:26:22.404932Z",
"url": "https://files.pythonhosted.org/packages/bd/13/b6790395db73c38db2a5acb3bdf8621c6827cb8af56c85c54d8726937769/cacheables-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-29 08:26:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thomelane",
"github_project": "cacheables",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cacheables"
}