Name | pkld JSON |
Version |
1.0.2
JSON |
| download |
home_page | https://github.com/shobrook/pkld |
Summary | Persistent caching for Python functions |
upload_time | 2025-01-03 20:10:50 |
maintainer | None |
docs_url | None |
author | shobrook |
requires_python | >=3 |
license | MIT |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pkld
`pkld` (pronounced "pickled") caches function calls to your disk.
This saves you from repeating the same function calls every time you run your code. It's especially useful in data engineering or machine learning pipelines, where function calls are often expensive or time-consuming.
```python
from pkld import pkld
@pkld
def foo(input):
# Slow or expensive operations...
return stuff
```
**Features:**
- Uses [pickle](https://docs.python.org/3/library/pickle.html) to store function outputs locally
- Supports functions with mutable or un-hashable arguments (e.g. dicts, lists, numpy arrays)
- Can also be used as an **in-memory (i.e. transient) cache**
- Supports asynchronous functions
- Thread-safe
## Installation
```bash
> pip install pkld
```
## Usage
To use, just add the `@pkld` decorator to your function:
```python
from pkld import pkld
@pkld
def foo():
return stuff
```
Then if you run the program, the function will be executed:
```python
stuff = foo() # Takes a long time
```
And if you run it again:
```python
stuff = foo() # Fast af
```
The function will _not_ execute, and instead the output will be pulled from the cache.
### Clearing the cache
Every pickled function has a `clear` method attached to it. You can use it to reset the cache:
```python
foo.clear()
```
### Disabling the cache
You can disable caching for a pickled function using the `disabled` parameter:
```python
@pkld(disabled=True)
def foo():
return stuff
```
This will execute the function as if it wasn't decorated, which is useful if you modify the function and need to invalidate the cache.
### Changing cache location
By default, pickled function outputs are stored in the same directory as the files the functions are defined in. You'll find them in a folder called `.pkljar`.
```
codebase/
│
├── my_file.py # foo is defined in here
│
└── .pkljar/
├── foo_cd7648e2.pkl # foo w/ one set of args
└── foo_95ad612b.pkl # foo w/ a different set of args
```
However, you can change this by setting the `cache_dir` parameter:
```python
@pkld(cache_dir="~/my_cache_dir")
def foo():
return stuff
```
You can also specify a cache directory for _all_ pickled functions:
```python
from pkld import set_cache_dir
set_cache_dir("~/my_cache_dir")
```
### Using the memory cache
`pkld` caches results to disk by default. But you can also use it as an in-memory cache:
```python
@pkld(store="memory")
def foo():
return stuff # Output will be loaded/stored in memory
```
This is preferred if you only care about memoizing operations _within_ a single run of your program, rather than _across_ runs.
You can also enable both in-memory and on-disk caching by setting `store="both"`. Loading from a memory cache is faster than a disk cache. So by using both, you can get the speed benefits of in-memory and the persistence benefits of on-disk.
## API
**pkld()**
- `cache_fp`
- `verbose`
## Limitations
TODO: Provide examples
Only certain functions can and should be pickled:
1. Functions should not have side-effects.
2. If function arguments are mutable, they should _not_ be mutated by the function.
3. Not all methods in classes should be cached.
4. Don't pickle functions that take less than a second. The disk I/O overhead will negate the benefits of caching. You _can_ use the in-memory cache, though.
5. Functions that return an unpickleable object, e.g. sockets or database connections, cannot be cached.
<!--6. Functions _must_ be pure and deterministic. Meaning they should produce the same output given the same input, and should not have side-effects.-->
## Authors
Created by [Paul Bogdan](https://github.com/paulcbogdan) and [Jonathan Shobrook.](https://github.com/shobrook)
Raw data
{
"_id": null,
"home_page": "https://github.com/shobrook/pkld",
"name": "pkld",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": null,
"author": "shobrook",
"author_email": "shobrookj@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/1c/4a/0cb8298a4ff19068a5086e0f1766ff2c2b606a3e7bce5f548089e174918b/pkld-1.0.2.tar.gz",
"platform": null,
"description": "# pkld\n\n`pkld` (pronounced \"pickled\") caches function calls to your disk.\n\nThis saves you from repeating the same function calls every time you run your code. It's especially useful in data engineering or machine learning pipelines, where function calls are often expensive or time-consuming.\n\n```python\nfrom pkld import pkld\n\n@pkld\ndef foo(input):\n # Slow or expensive operations...\n return stuff\n```\n\n**Features:**\n\n- Uses [pickle](https://docs.python.org/3/library/pickle.html) to store function outputs locally\n- Supports functions with mutable or un-hashable arguments (e.g. dicts, lists, numpy arrays)\n- Can also be used as an **in-memory (i.e. transient) cache**\n- Supports asynchronous functions\n- Thread-safe\n\n## Installation\n\n```bash\n> pip install pkld\n```\n\n## Usage\n\nTo use, just add the `@pkld` decorator to your function:\n\n```python\nfrom pkld import pkld\n\n@pkld\ndef foo():\n return stuff\n```\n\nThen if you run the program, the function will be executed:\n\n```python\nstuff = foo() # Takes a long time\n```\n\nAnd if you run it again:\n\n```python\nstuff = foo() # Fast af\n```\n\nThe function will _not_ execute, and instead the output will be pulled from the cache.\n\n### Clearing the cache\n\nEvery pickled function has a `clear` method attached to it. You can use it to reset the cache:\n\n```python\nfoo.clear()\n```\n\n### Disabling the cache\n\nYou can disable caching for a pickled function using the `disabled` parameter:\n\n```python\n@pkld(disabled=True)\ndef foo():\n return stuff\n```\n\nThis will execute the function as if it wasn't decorated, which is useful if you modify the function and need to invalidate the cache.\n\n### Changing cache location\n\nBy default, pickled function outputs are stored in the same directory as the files the functions are defined in. You'll find them in a folder called `.pkljar`.\n\n```\ncodebase/\n\u2502\n\u251c\u2500\u2500 my_file.py # foo is defined in here\n\u2502\n\u2514\u2500\u2500 .pkljar/\n \u251c\u2500\u2500 foo_cd7648e2.pkl # foo w/ one set of args\n \u2514\u2500\u2500 foo_95ad612b.pkl # foo w/ a different set of args\n```\n\nHowever, you can change this by setting the `cache_dir` parameter:\n\n```python\n@pkld(cache_dir=\"~/my_cache_dir\")\ndef foo():\n return stuff\n```\n\nYou can also specify a cache directory for _all_ pickled functions:\n\n```python\nfrom pkld import set_cache_dir\n\nset_cache_dir(\"~/my_cache_dir\")\n```\n\n### Using the memory cache\n\n`pkld` caches results to disk by default. But you can also use it as an in-memory cache:\n\n```python\n@pkld(store=\"memory\")\ndef foo():\n return stuff # Output will be loaded/stored in memory\n```\n\nThis is preferred if you only care about memoizing operations _within_ a single run of your program, rather than _across_ runs.\n\nYou can also enable both in-memory and on-disk caching by setting `store=\"both\"`. Loading from a memory cache is faster than a disk cache. So by using both, you can get the speed benefits of in-memory and the persistence benefits of on-disk.\n\n## API\n\n**pkld()**\n\n- `cache_fp`\n- `verbose`\n\n## Limitations\n\nTODO: Provide examples\n\nOnly certain functions can and should be pickled:\n\n1. Functions should not have side-effects.\n2. If function arguments are mutable, they should _not_ be mutated by the function.\n3. Not all methods in classes should be cached.\n4. Don't pickle functions that take less than a second. The disk I/O overhead will negate the benefits of caching. You _can_ use the in-memory cache, though.\n5. Functions that return an unpickleable object, e.g. sockets or database connections, cannot be cached.\n\n<!--6. Functions _must_ be pure and deterministic. Meaning they should produce the same output given the same input, and should not have side-effects.-->\n\n## Authors\n\nCreated by [Paul Bogdan](https://github.com/paulcbogdan) and [Jonathan Shobrook.](https://github.com/shobrook)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Persistent caching for Python functions",
"version": "1.0.2",
"project_urls": {
"Homepage": "https://github.com/shobrook/pkld"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1c4a0cb8298a4ff19068a5086e0f1766ff2c2b606a3e7bce5f548089e174918b",
"md5": "e0c37057307f1db56614b4bfcc1bfc45",
"sha256": "d7b7285a85e46de527c2f6eb0f6ccc848f97ae0d125d03da2b8707865992879e"
},
"downloads": -1,
"filename": "pkld-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "e0c37057307f1db56614b4bfcc1bfc45",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 7173,
"upload_time": "2025-01-03T20:10:50",
"upload_time_iso_8601": "2025-01-03T20:10:50.961996Z",
"url": "https://files.pythonhosted.org/packages/1c/4a/0cb8298a4ff19068a5086e0f1766ff2c2b606a3e7bce5f548089e174918b/pkld-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-03 20:10:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shobrook",
"github_project": "pkld",
"github_not_found": true,
"lcname": "pkld"
}