# lmdb-object-store
Lightweight thread-safe Python object store on top of **LMDB**.
Provides a dict-like API (with buffering and atomic multi-put), automatic map-size growth, and fast zero-copy reads without having to handle LMDB's lower-level details.
---
## Features
* **Dict-like interface**: `store[key] = obj`, `obj = store.get(key)`, `del store[key]`, `key in store`.
* **Atomic multi-put**: `put_many(items)` writes large amounts of data **atomically** in a single transaction.
* **Write buffering**: small writes are batched in memory; `flush()` persists them (auto-flush on size threshold).
* **Auto map-size growth**: retries on `MapFullError` by growing the LMDB map (2× or +64MiB), up to an optional cap.
* **Fast reads**: Uses LMDB zero-copy reads to minimize copies; `get_many()` efficiently combines buffered and DB reads.
* **Flexible keys**: bytes/bytearray/memoryview always supported; `str` keys allowed when `key_encoding='utf-8'` option is used.
---
## Installation
```bash
pip install lmdb-object-store
```
### Requirements
* Python **3.10+**
* The [`lmdb`](https://pypi.org/project/lmdb/) Python package (wheels available for major platforms)
---
## Quick start
```python
from lmdb_object_store import LmdbObjectStore
# Create or open an LMDB-backed object store
with LmdbObjectStore(
"path/to/db",
batch_size=1000, # flush buffer when it reaches this many entries
autoflush_on_read=True, # flush pending writes before reads
key_encoding="utf-8", # allow str keys (encoded with UTF-8)
# Any lmdb.open(...) kwargs may be passed here, e.g.:
map_size=128 * 1024 * 1024, # 128 MiB initial map size
subdir=True, # create directory layout
readonly=False,
# max_map_size is recognized (cap for auto-resize):
max_map_size=4 * 1024 * 1024 * 1024, # 4 GiB cap
) as store:
# Put / get like a dict (values are pickled)
store["user:42"] = {"name": "Ada", "plan": "pro"}
print(store.get("user:42")) # {'name': 'Ada', 'plan': 'pro'}
# Existence checks
if "user:42" in store: # __contains__ has NO side effects (no flush)
assert store.exists("user:42", flush=False) is True
# Delete
del store["user:42"]
# Batch write, atomically (single transaction)
items = {f"k{i}": {"i": i} for i in range(10_000)}
store.put_many(items)
# Fetch many at once
found, not_found = store.get_many(["k1", "kX"], decode_keys=True)
# found -> {'k1': {'i': 1}}, not_found -> ['kX']
# Clean close with strict error policy if needed:
store = LmdbObjectStore("path/to/db", key_encoding="utf-8")
try:
# ... work with store ...
store.close(strict=True) # re-raise if final flush fails (after cleanup)
finally:
# idempotent
try: store.close()
except Exception: pass
```
---
## API Overview
### Constructor
```python
LmdbObjectStore(
db_path: str,
batch_size: int = 1000,
*,
autoflush_on_read: bool = True,
key_encoding: str | None = None,
key_errors: str = "strict",
str_normalize: str | None = None,
**lmdb_kwargs,
)
```
* **db\_path**: LMDB environment path (passed to `lmdb.open`).
* **batch\_size**: pending buffer size threshold to auto-flush.
* **autoflush\_on\_read**: if `True`, flushes buffer before reads (`get`, `get_many`, `exists(flush=None)`).
* **key\_encoding**: enable `str` keys (e.g. `"utf-8"`). If `None`, only bytes-like keys are allowed.
* **key\_errors**: error strategy for encoding `str` keys (`"strict"`, `"ignore"`, `"replace"`, ...).
* **str\_normalize**: Unicode normalization for `str` keys (e.g., `"NFC"`, `"NFKC"`).
* **lmdb\_kwargs**: forwarded to `lmdb.open(...)` (e.g., `map_size`, `subdir`, `readonly`, etc).
Special: **`max_map_size`** (cap for automatic map growth) is also recognized.
### Put / Get
```python
store.put(key, obj) # buffer write
obj = store.get(key, default=None) # read; from buffer if present, else DB
store.flush() # persist the write buffer
```
* Values are serialized with `pickle` (highest protocol).
* `get()` uses zero-copy buffers internally; unpickling happens once per value.
### Atomic multi-put
```python
store.put_many(items: Mapping[Any, Any] | Iterable[tuple[Any, Any]])
```
* Writes **all** items in **one LMDB write transaction**.
* If a `MapFullError` occurs, the store will **grow the map** (2× or +64MiB) up to `max_map_size` and **retry from the beginning**.
* **Note**: `put_many()` first flushes any pending buffered writes; the atomic transaction only includes the `items` passed to this call.
### Get many
```python
found, not_found = store.get_many(
keys: Sequence[Any],
*,
decode_keys: bool = False,
decode_not_found: bool | None = None, # None → follow decode_keys
)
```
* Returns a tuple:
* `found`: `{key: value}` for keys found (key type is `bytes` by default, or `str` if `decode_keys=True`).
* `not_found`: list of input keys not found (decoded to `str` if `decode_not_found=True`).
* Efficiently merges results from the write buffer and DB; `autoflush_on_read` applies unless overridden via other APIs.
### Existence & containment
```python
store.exists(key, *, flush: bool | None = None) -> bool
key in store # __contains__ → NO flush, purely checks current state
```
* `flush=None` (default) follows `autoflush_on_read`.
* `flush=False` guarantees **no** implicit flush (useful for side-effect-free checks).
### Deletion
```python
del store[key] # KeyError if not present
store.delete(key) # schedules deletion via buffer (dict-like semantics)
```
### Lifecycle
```python
with LmdbObjectStore(...) as store:
...
# or
store.close(strict: bool = False)
```
* `close(strict=True)` re-raises the last flush error **after** closing the environment; otherwise it logs and completes.
---
## Concurrency Model
* Internally uses `RLock` + `Condition` and a simple **reader count** to coordinate:
* Multiple concurrent readers are allowed.
* Writers hold the lock (buffer mutation + flush/commit).
* `close()` sets a “closing” flag and **waits** until the active reader count reaches zero.
* Designed for **thread-safety within a single process**. While LMDB itself supports multi-process access, this wrapper's locking is process-local; if you need multi-process writes, coordinate at a higher level.
---
## Map Size & Auto-Resize
* Initial size is given by `map_size` (forwarded to `lmdb.open`).
* When a write/commit hits `MapFullError`:
* The store **grows** to `max(current*2, current+64MiB)`, capped at `max_map_size` if provided,
* and **retries** the operation.
* If the cap is reached and still insufficient, the error is propagated.
---
## Performance Tips
* **Batch writes**: Keep `batch_size` large enough for your workload; call `flush()` at logical boundaries.
* **Use `put_many()`** for bulk inserts—single transaction with fewer fsyncs.
* **Avoid unnecessary decodes**: If you don't need `str` keys on output, leave `decode_*` parameters off.
* **Key type**: If possible, pass bytes keys directly (saves encoding overhead).
---
## Error Handling
* **Unpickling failures** are raised as a `RuntimeError` (with key context) when reading from buffer/DB.
* **Missing keys**: `__getitem__` and `del` raise `KeyError`; `get()` returns `default`.
* **Final flush failure on close**: re-raised if `strict=True`; otherwise logged.
---
## Security Note
This library uses Python `pickle` for value serialization. **Never unpickle data from untrusted sources**.
---
## Configuration Reference
* `batch_size: int` – buffer size threshold to auto-flush (default: 1000).
* `autoflush_on_read: bool` – flush before reads (default: `True`).
* `key_encoding: Optional[str]` – enable `str` keys (e.g., `"utf-8"`). If `None`, only bytes-like keys are accepted.
* `key_errors: str` – encoding error handling (`"strict"`, `"ignore"`, `"replace"`).
* `str_normalize: Optional[str]` – Unicode normalization for `str` keys (`"NFC"`, `"NFKC"`, ...).
* `lmdb_kwargs` – forwarded to `lmdb.open(...)`:
* `map_size`, `subdir`, `readonly`, `lock`, ...
* `max_map_size` (recognized by this wrapper to cap auto-growth).
Raw data
{
"_id": null,
"home_page": null,
"name": "lmdb-object-store",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "yuka30240 <yuuka30240@gmail.com>",
"keywords": "atomic, database, key-value, lmdb, object-store, thread-safe",
"author": null,
"author_email": "yuka30240 <yuuka30240@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/2c/36/d2c6f74ac422a9c8dd4bfcb21aaa0e400241bb7881177cbebd358122b436/lmdb_object_store-0.1.0.tar.gz",
"platform": null,
"description": "# lmdb-object-store\n\nLightweight thread-safe Python object store on top of **LMDB**.\nProvides a dict-like API (with buffering and atomic multi-put), automatic map-size growth, and fast zero-copy reads without having to handle LMDB's lower-level details.\n\n---\n\n## Features\n\n* **Dict-like interface**: `store[key] = obj`, `obj = store.get(key)`, `del store[key]`, `key in store`.\n* **Atomic multi-put**: `put_many(items)` writes large amounts of data **atomically** in a single transaction.\n* **Write buffering**: small writes are batched in memory; `flush()` persists them (auto-flush on size threshold).\n* **Auto map-size growth**: retries on `MapFullError` by growing the LMDB map (2\u00d7 or +64MiB), up to an optional cap.\n* **Fast reads**: Uses LMDB zero-copy reads to minimize copies; `get_many()` efficiently combines buffered and DB reads.\n* **Flexible keys**: bytes/bytearray/memoryview always supported; `str` keys allowed when `key_encoding='utf-8'` option is used.\n\n---\n\n## Installation\n\n```bash\npip install lmdb-object-store\n```\n\n### Requirements\n\n* Python **3.10+**\n* The [`lmdb`](https://pypi.org/project/lmdb/) Python package (wheels available for major platforms)\n\n---\n\n## Quick start\n\n```python\nfrom lmdb_object_store import LmdbObjectStore\n\n# Create or open an LMDB-backed object store\nwith LmdbObjectStore(\n \"path/to/db\",\n batch_size=1000, # flush buffer when it reaches this many entries\n autoflush_on_read=True, # flush pending writes before reads\n key_encoding=\"utf-8\", # allow str keys (encoded with UTF-8)\n # Any lmdb.open(...) kwargs may be passed here, e.g.:\n map_size=128 * 1024 * 1024, # 128 MiB initial map size\n subdir=True, # create directory layout\n readonly=False,\n # max_map_size is recognized (cap for auto-resize):\n max_map_size=4 * 1024 * 1024 * 1024, # 4 GiB cap\n) as store:\n\n # Put / get like a dict (values are pickled)\n store[\"user:42\"] = {\"name\": \"Ada\", \"plan\": \"pro\"}\n print(store.get(\"user:42\")) # {'name': 'Ada', 'plan': 'pro'}\n\n # Existence checks\n if \"user:42\" in store: # __contains__ has NO side effects (no flush)\n assert store.exists(\"user:42\", flush=False) is True\n\n # Delete\n del store[\"user:42\"]\n\n # Batch write, atomically (single transaction)\n items = {f\"k{i}\": {\"i\": i} for i in range(10_000)}\n store.put_many(items)\n\n # Fetch many at once\n found, not_found = store.get_many([\"k1\", \"kX\"], decode_keys=True)\n # found -> {'k1': {'i': 1}}, not_found -> ['kX']\n\n# Clean close with strict error policy if needed:\nstore = LmdbObjectStore(\"path/to/db\", key_encoding=\"utf-8\")\ntry:\n # ... work with store ...\n store.close(strict=True) # re-raise if final flush fails (after cleanup)\nfinally:\n # idempotent\n try: store.close()\n except Exception: pass\n```\n\n---\n\n## API Overview\n\n### Constructor\n\n```python\nLmdbObjectStore(\n db_path: str,\n batch_size: int = 1000,\n *,\n autoflush_on_read: bool = True,\n key_encoding: str | None = None,\n key_errors: str = \"strict\",\n str_normalize: str | None = None,\n **lmdb_kwargs,\n)\n```\n\n* **db\\_path**: LMDB environment path (passed to `lmdb.open`).\n* **batch\\_size**: pending buffer size threshold to auto-flush.\n* **autoflush\\_on\\_read**: if `True`, flushes buffer before reads (`get`, `get_many`, `exists(flush=None)`).\n* **key\\_encoding**: enable `str` keys (e.g. `\"utf-8\"`). If `None`, only bytes-like keys are allowed.\n* **key\\_errors**: error strategy for encoding `str` keys (`\"strict\"`, `\"ignore\"`, `\"replace\"`, ...).\n* **str\\_normalize**: Unicode normalization for `str` keys (e.g., `\"NFC\"`, `\"NFKC\"`).\n* **lmdb\\_kwargs**: forwarded to `lmdb.open(...)` (e.g., `map_size`, `subdir`, `readonly`, etc).\n Special: **`max_map_size`** (cap for automatic map growth) is also recognized.\n\n### Put / Get\n\n```python\nstore.put(key, obj) # buffer write\nobj = store.get(key, default=None) # read; from buffer if present, else DB\nstore.flush() # persist the write buffer\n```\n\n* Values are serialized with `pickle` (highest protocol).\n* `get()` uses zero-copy buffers internally; unpickling happens once per value.\n\n### Atomic multi-put\n\n```python\nstore.put_many(items: Mapping[Any, Any] | Iterable[tuple[Any, Any]])\n```\n\n* Writes **all** items in **one LMDB write transaction**.\n* If a `MapFullError` occurs, the store will **grow the map** (2\u00d7 or +64MiB) up to `max_map_size` and **retry from the beginning**.\n* **Note**: `put_many()` first flushes any pending buffered writes; the atomic transaction only includes the `items` passed to this call.\n\n### Get many\n\n```python\nfound, not_found = store.get_many(\n keys: Sequence[Any],\n *,\n decode_keys: bool = False,\n decode_not_found: bool | None = None, # None \u2192 follow decode_keys\n)\n```\n\n* Returns a tuple:\n\n * `found`: `{key: value}` for keys found (key type is `bytes` by default, or `str` if `decode_keys=True`).\n * `not_found`: list of input keys not found (decoded to `str` if `decode_not_found=True`).\n* Efficiently merges results from the write buffer and DB; `autoflush_on_read` applies unless overridden via other APIs.\n\n### Existence & containment\n\n```python\nstore.exists(key, *, flush: bool | None = None) -> bool\nkey in store # __contains__ \u2192 NO flush, purely checks current state\n```\n\n* `flush=None` (default) follows `autoflush_on_read`.\n* `flush=False` guarantees **no** implicit flush (useful for side-effect-free checks).\n\n### Deletion\n\n```python\ndel store[key] # KeyError if not present\nstore.delete(key) # schedules deletion via buffer (dict-like semantics)\n```\n\n### Lifecycle\n\n```python\nwith LmdbObjectStore(...) as store:\n ...\n# or\nstore.close(strict: bool = False)\n```\n\n* `close(strict=True)` re-raises the last flush error **after** closing the environment; otherwise it logs and completes.\n\n---\n\n## Concurrency Model\n\n* Internally uses `RLock` + `Condition` and a simple **reader count** to coordinate:\n\n * Multiple concurrent readers are allowed.\n * Writers hold the lock (buffer mutation + flush/commit).\n * `close()` sets a \u201cclosing\u201d flag and **waits** until the active reader count reaches zero.\n* Designed for **thread-safety within a single process**. While LMDB itself supports multi-process access, this wrapper's locking is process-local; if you need multi-process writes, coordinate at a higher level.\n\n---\n\n## Map Size & Auto-Resize\n\n* Initial size is given by `map_size` (forwarded to `lmdb.open`).\n* When a write/commit hits `MapFullError`:\n\n * The store **grows** to `max(current*2, current+64MiB)`, capped at `max_map_size` if provided,\n * and **retries** the operation.\n* If the cap is reached and still insufficient, the error is propagated.\n\n---\n\n## Performance Tips\n\n* **Batch writes**: Keep `batch_size` large enough for your workload; call `flush()` at logical boundaries.\n* **Use `put_many()`** for bulk inserts\u2014single transaction with fewer fsyncs.\n* **Avoid unnecessary decodes**: If you don't need `str` keys on output, leave `decode_*` parameters off.\n* **Key type**: If possible, pass bytes keys directly (saves encoding overhead).\n\n---\n\n## Error Handling\n\n* **Unpickling failures** are raised as a `RuntimeError` (with key context) when reading from buffer/DB.\n* **Missing keys**: `__getitem__` and `del` raise `KeyError`; `get()` returns `default`.\n* **Final flush failure on close**: re-raised if `strict=True`; otherwise logged.\n\n---\n\n## Security Note\n\nThis library uses Python `pickle` for value serialization. **Never unpickle data from untrusted sources**.\n\n---\n\n## Configuration Reference\n\n* `batch_size: int` \u2013 buffer size threshold to auto-flush (default: 1000).\n* `autoflush_on_read: bool` \u2013 flush before reads (default: `True`).\n* `key_encoding: Optional[str]` \u2013 enable `str` keys (e.g., `\"utf-8\"`). If `None`, only bytes-like keys are accepted.\n* `key_errors: str` \u2013 encoding error handling (`\"strict\"`, `\"ignore\"`, `\"replace\"`).\n* `str_normalize: Optional[str]` \u2013 Unicode normalization for `str` keys (`\"NFC\"`, `\"NFKC\"`, ...).\n* `lmdb_kwargs` \u2013 forwarded to `lmdb.open(...)`:\n\n * `map_size`, `subdir`, `readonly`, `lock`, ...\n * `max_map_size` (recognized by this wrapper to cap auto-growth).",
"bugtrack_url": null,
"license": "MIT",
"summary": "Lightweight LMDB-backed object store for Python",
"version": "0.1.0",
"project_urls": {
"Changelog": "https://github.com/yuka30240/lmdb-object-store/releases",
"Homepage": "https://github.com/yuka30240/lmdb-object-store",
"Issues": "https://github.com/yuka30240/lmdb-object-store/issues",
"Repository": "https://github.com/yuka30240/lmdb-object-store"
},
"split_keywords": [
"atomic",
" database",
" key-value",
" lmdb",
" object-store",
" thread-safe"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9acf8963b7e4c682ce1e12245292e9411751ba237030db4e35223889113f8aac",
"md5": "0e27bb87b0da283582488d1452a2c254",
"sha256": "327b61ee6f3247a12e272e19190493f556f90882b6ac077e927da0c7be0e6362"
},
"downloads": -1,
"filename": "lmdb_object_store-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0e27bb87b0da283582488d1452a2c254",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 12501,
"upload_time": "2025-08-19T02:04:21",
"upload_time_iso_8601": "2025-08-19T02:04:21.967925Z",
"url": "https://files.pythonhosted.org/packages/9a/cf/8963b7e4c682ce1e12245292e9411751ba237030db4e35223889113f8aac/lmdb_object_store-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2c36d2c6f74ac422a9c8dd4bfcb21aaa0e400241bb7881177cbebd358122b436",
"md5": "4fe6067073a4f8b93597a2527acb00d3",
"sha256": "309a2b91547787a86484af6a34aa05ac86281952f38fa162e941dee87f863c45"
},
"downloads": -1,
"filename": "lmdb_object_store-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "4fe6067073a4f8b93597a2527acb00d3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 30478,
"upload_time": "2025-08-19T02:04:23",
"upload_time_iso_8601": "2025-08-19T02:04:23.742344Z",
"url": "https://files.pythonhosted.org/packages/2c/36/d2c6f74ac422a9c8dd4bfcb21aaa0e400241bb7881177cbebd358122b436/lmdb_object_store-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-19 02:04:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yuka30240",
"github_project": "lmdb-object-store",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "lmdb-object-store"
}