# PersistDict
Just a DIY version [sqldict](https://github.com/piskvorky/sqlitedict): looks like a dict and acts like a dict but is persistent via an [LMDB database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database). Makes heavy use of [lmdb-dict](https://github.com/uchicago-dsi/lmdb-dict) behind the scenes.
## Why?
I ran into issue with langchain's caches when developping [wdoc](https://github.com/thiswillbeyourgithub/WDoc) (my RAG lib, optimized for my use) and after months of waiting I decided to fix it myself. And instead of trusting sqldict's implementation with langchain's concurrency I made my own.
This makes it very easy to add persistent cache to anything.
Also it was easy to do thanks to my [BrownieCutter](https://pypi.org/project/BrownieCutter/).
I initially made an implementation that used sqlite (with support for encryption, compression and handled concurrency via a singleton) but then I stumbled upon [lmdb-dict](https://github.com/uchicago-dsi/lmdb-dict) which is very probably way better as it's done by pros. It's based on [LMDB](https://en.wikipedia.org/wiki/LMDB) which is a more suitable for what I was after when doing PersistDict than sqlite3. If you want to use the sqlite version take a look at version before `2.0.0`.
## Features:
- **threadsafe**: if several threads try to access the same db it won't be a
problem. Even if multiple other threads use also another db. And if several
python scripts run at the same time and try to access the same db, LMDB
should make them wait appropriately.
- **atime and ctime**: each entry includes a creation time and a last access time.
- **expiration**: won't grow too large because old keys are automatically removed after a given amount of days.
- **cached**: Uses a `LRUCache128` from [cachetools](https://github.com/tkem/cachetools/).
- **customizable serializer for keys and values**: This can enable encryption, compression etc... By default, keys are compressed as [lmdb has a 511 default key length](https://stackoverflow.com/questions/66456228/increase-max-key-size-lmdb-key-value-database-in-python).
- **only one dependency needed** Only `lmdb-dict-full` is needed. If you have [beartype](https://github.com/beartype/beartype/) installed it will be used, same with [loguru](https://loguru.readthedocs.io/).
## Usage:
* Download from pypi with `pip install PersistDict`
* Or from git:
* `git clone https://github.com/thiswillbeyourgithub/PersistDict`
* `cd PersistDict`
* `pip install -e .`
* To run tests: `cd PersistDict ; python -m pytest test.py`
``` python
from PersistDict import PersistDict
# create the object
d = PersistDict(
database_path=a_path,
# verbose=True,
# expiration_days=30,
)
# then treat it like a dict:
d["a"] = 1
# You can even create it via __call__, like a dict:
# d = d(a=1, b="b", c=str) # this actually calls __call__ but is only
# allowed once per PersistDict, just like a regular dict
# it's a child from dict
assert isinstance(d, dict)
# prints like a dict
print(d)
# {'a': 1, 'b': 'b', 'c': str}
# Supports the same methods
assert sorted(list(d.keys())) == ["a", "b", "c"], d
assert "b" in d
del d["b"]
assert list(d.keys()) == ["a", "c"], d
assert len(d) == 2, d
assert d.__repr__() == {"a": 1, "c": str}.__repr__()
assert d.__str__() == {"a": 1, "c": str}.__str__()
# supports all the same types as value as pickle (or more if you change
# the serializer)
d["d"] = None
# If you create another object pointing at the same db, they will share the
# same cache and won't corrupt the db:
d2 = PersistDict(
database_path=dbp,
verbose=True,
)
list(d.keys()) == list(d2.keys()), d2
```
Raw data
{
"_id": null,
"home_page": "https://github.com/thiswillbeyourgithub/PersistDict",
"name": "PersistDict",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "dict, persistence, persistent, storage, lmdb, db, compressed, compression, metadata, browniecutter",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/68/ef/8c7a173e98608f52fcb862e7681de957988dabdbd578cd5eb6274c96c9bf/persistdict-0.2.2.tar.gz",
"platform": null,
"description": "# PersistDict\n\nJust a DIY version [sqldict](https://github.com/piskvorky/sqlitedict): looks like a dict and acts like a dict but is persistent via an [LMDB database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database). Makes heavy use of [lmdb-dict](https://github.com/uchicago-dsi/lmdb-dict) behind the scenes.\n\n## Why?\n\nI ran into issue with langchain's caches when developping [wdoc](https://github.com/thiswillbeyourgithub/WDoc) (my RAG lib, optimized for my use) and after months of waiting I decided to fix it myself. And instead of trusting sqldict's implementation with langchain's concurrency I made my own.\nThis makes it very easy to add persistent cache to anything.\nAlso it was easy to do thanks to my [BrownieCutter](https://pypi.org/project/BrownieCutter/).\nI initially made an implementation that used sqlite (with support for encryption, compression and handled concurrency via a singleton) but then I stumbled upon [lmdb-dict](https://github.com/uchicago-dsi/lmdb-dict) which is very probably way better as it's done by pros. It's based on [LMDB](https://en.wikipedia.org/wiki/LMDB) which is a more suitable for what I was after when doing PersistDict than sqlite3. If you want to use the sqlite version take a look at version before `2.0.0`.\n\n## Features:\n- **threadsafe**: if several threads try to access the same db it won't be a\n problem. Even if multiple other threads use also another db. And if several\n python scripts run at the same time and try to access the same db, LMDB\n should make them wait appropriately.\n- **atime and ctime**: each entry includes a creation time and a last access time.\n- **expiration**: won't grow too large because old keys are automatically removed after a given amount of days.\n- **cached**: Uses a `LRUCache128` from [cachetools](https://github.com/tkem/cachetools/).\n- **customizable serializer for keys and values**: This can enable encryption, compression etc... By default, keys are compressed as [lmdb has a 511 default key length](https://stackoverflow.com/questions/66456228/increase-max-key-size-lmdb-key-value-database-in-python).\n- **only one dependency needed** Only `lmdb-dict-full` is needed. If you have [beartype](https://github.com/beartype/beartype/) installed it will be used, same with [loguru](https://loguru.readthedocs.io/).\n\n\n## Usage:\n* Download from pypi with `pip install PersistDict`\n* Or from git:\n * `git clone https://github.com/thiswillbeyourgithub/PersistDict`\n * `cd PersistDict`\n * `pip install -e .`\n * To run tests: `cd PersistDict ; python -m pytest test.py`\n\n``` python\nfrom PersistDict import PersistDict\n\n# create the object\nd = PersistDict(\n database_path=a_path,\n # verbose=True,\n # expiration_days=30,\n)\n# then treat it like a dict:\nd[\"a\"] = 1\n\n# You can even create it via __call__, like a dict:\n# d = d(a=1, b=\"b\", c=str) # this actually calls __call__ but is only\n# allowed once per PersistDict, just like a regular dict\n\n# it's a child from dict\nassert isinstance(d, dict)\n\n# prints like a dict\nprint(d)\n# {'a': 1, 'b': 'b', 'c': str}\n\n# Supports the same methods\nassert sorted(list(d.keys())) == [\"a\", \"b\", \"c\"], d\nassert \"b\" in d\ndel d[\"b\"]\nassert list(d.keys()) == [\"a\", \"c\"], d\nassert len(d) == 2, d\nassert d.__repr__() == {\"a\": 1, \"c\": str}.__repr__()\nassert d.__str__() == {\"a\": 1, \"c\": str}.__str__()\n\n# supports all the same types as value as pickle (or more if you change\n# the serializer)\nd[\"d\"] = None\n\n# If you create another object pointing at the same db, they will share the\n# same cache and won't corrupt the db:\nd2 = PersistDict(\ndatabase_path=dbp,\nverbose=True,\n)\nlist(d.keys()) == list(d2.keys()), d2\n```\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "Looks like a dict and acts like a dict but is persistent via an LMDB db",
"version": "0.2.2",
"project_urls": {
"Homepage": "https://github.com/thiswillbeyourgithub/PersistDict"
},
"split_keywords": [
"dict",
" persistence",
" persistent",
" storage",
" lmdb",
" db",
" compressed",
" compression",
" metadata",
" browniecutter"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3aa609ea659fbd663bd14b3c27f2ecb2b8a55147b495878913072ba08d934128",
"md5": "5664bcc3f6714172a83db43d91daadd3",
"sha256": "ae82da3d16892a3af932f12cf41a873bbc7bf810603d9787001e6ad911685ac1"
},
"downloads": -1,
"filename": "PersistDict-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5664bcc3f6714172a83db43d91daadd3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 20915,
"upload_time": "2024-12-05T15:28:46",
"upload_time_iso_8601": "2024-12-05T15:28:46.386735Z",
"url": "https://files.pythonhosted.org/packages/3a/a6/09ea659fbd663bd14b3c27f2ecb2b8a55147b495878913072ba08d934128/PersistDict-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "68ef8c7a173e98608f52fcb862e7681de957988dabdbd578cd5eb6274c96c9bf",
"md5": "94ffbae90b3420be274baa477f105142",
"sha256": "516962ccbaca9df96980611414ed8e910e455a87744614f7746bcccbf1c120dc"
},
"downloads": -1,
"filename": "persistdict-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "94ffbae90b3420be274baa477f105142",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 21836,
"upload_time": "2024-12-05T15:28:48",
"upload_time_iso_8601": "2024-12-05T15:28:48.920771Z",
"url": "https://files.pythonhosted.org/packages/68/ef/8c7a173e98608f52fcb862e7681de957988dabdbd578cd5eb6274c96c9bf/persistdict-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-05 15:28:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thiswillbeyourgithub",
"github_project": "PersistDict",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "persistdict"
}