lmdb-object-store


Namelmdb-object-store JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryLightweight LMDB-backed object store for Python
upload_time2025-08-19 02:04:23
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords atomic database key-value lmdb object-store thread-safe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # lmdb-object-store

Lightweight thread-safe Python object store on top of **LMDB**.
Provides a dict-like API (with buffering and atomic multi-put), automatic map-size growth, and fast zero-copy reads without having to handle LMDB's lower-level details.

---

## Features

* **Dict-like interface**: `store[key] = obj`, `obj = store.get(key)`, `del store[key]`, `key in store`.
* **Atomic multi-put**: `put_many(items)` writes large amounts of data **atomically** in a single transaction.
* **Write buffering**: small writes are batched in memory; `flush()` persists them (auto-flush on size threshold).
* **Auto map-size growth**: retries on `MapFullError` by growing the LMDB map (2× or +64MiB), up to an optional cap.
* **Fast reads**: Uses LMDB zero-copy reads to minimize copies; `get_many()` efficiently combines buffered and DB reads.
* **Flexible keys**: bytes/bytearray/memoryview always supported; `str` keys allowed when `key_encoding='utf-8'` option is used.

---

## Installation

```bash
pip install lmdb-object-store
```

### Requirements

* Python **3.10+**
* The [`lmdb`](https://pypi.org/project/lmdb/) Python package (wheels available for major platforms)

---

## Quick start

```python
from lmdb_object_store import LmdbObjectStore

# Create or open an LMDB-backed object store
with LmdbObjectStore(
    "path/to/db",
    batch_size=1000,                # flush buffer when it reaches this many entries
    autoflush_on_read=True,         # flush pending writes before reads
    key_encoding="utf-8",           # allow str keys (encoded with UTF-8)
    # Any lmdb.open(...) kwargs may be passed here, e.g.:
    map_size=128 * 1024 * 1024,     # 128 MiB initial map size
    subdir=True,                    # create directory layout
    readonly=False,
    # max_map_size is recognized (cap for auto-resize):
    max_map_size=4 * 1024 * 1024 * 1024,  # 4 GiB cap
) as store:

    # Put / get like a dict (values are pickled)
    store["user:42"] = {"name": "Ada", "plan": "pro"}
    print(store.get("user:42"))  # {'name': 'Ada', 'plan': 'pro'}

    # Existence checks
    if "user:42" in store:        # __contains__ has NO side effects (no flush)
        assert store.exists("user:42", flush=False) is True

    # Delete
    del store["user:42"]

    # Batch write, atomically (single transaction)
    items = {f"k{i}": {"i": i} for i in range(10_000)}
    store.put_many(items)

    # Fetch many at once
    found, not_found = store.get_many(["k1", "kX"], decode_keys=True)
    # found -> {'k1': {'i': 1}}, not_found -> ['kX']

# Clean close with strict error policy if needed:
store = LmdbObjectStore("path/to/db", key_encoding="utf-8")
try:
    # ... work with store ...
    store.close(strict=True)  # re-raise if final flush fails (after cleanup)
finally:
    # idempotent
    try: store.close()
    except Exception: pass
```

---

## API Overview

### Constructor

```python
LmdbObjectStore(
    db_path: str,
    batch_size: int = 1000,
    *,
    autoflush_on_read: bool = True,
    key_encoding: str | None = None,
    key_errors: str = "strict",
    str_normalize: str | None = None,
    **lmdb_kwargs,
)
```

* **db\_path**: LMDB environment path (passed to `lmdb.open`).
* **batch\_size**: pending buffer size threshold to auto-flush.
* **autoflush\_on\_read**: if `True`, flushes buffer before reads (`get`, `get_many`, `exists(flush=None)`).
* **key\_encoding**: enable `str` keys (e.g. `"utf-8"`). If `None`, only bytes-like keys are allowed.
* **key\_errors**: error strategy for encoding `str` keys (`"strict"`, `"ignore"`, `"replace"`, ...).
* **str\_normalize**: Unicode normalization for `str` keys (e.g., `"NFC"`, `"NFKC"`).
* **lmdb\_kwargs**: forwarded to `lmdb.open(...)` (e.g., `map_size`, `subdir`, `readonly`, etc).
  Special: **`max_map_size`** (cap for automatic map growth) is also recognized.

### Put / Get

```python
store.put(key, obj)                          # buffer write
obj = store.get(key, default=None)           # read; from buffer if present, else DB
store.flush()                                # persist the write buffer
```

* Values are serialized with `pickle` (highest protocol).
* `get()` uses zero-copy buffers internally; unpickling happens once per value.

### Atomic multi-put

```python
store.put_many(items: Mapping[Any, Any] | Iterable[tuple[Any, Any]])
```

* Writes **all** items in **one LMDB write transaction**.
* If a `MapFullError` occurs, the store will **grow the map** (2× or +64MiB) up to `max_map_size` and **retry from the beginning**.
* **Note**: `put_many()` first flushes any pending buffered writes; the atomic transaction only includes the `items` passed to this call.

### Get many

```python
found, not_found = store.get_many(
    keys: Sequence[Any],
    *,
    decode_keys: bool = False,
    decode_not_found: bool | None = None,   # None → follow decode_keys
)
```

* Returns a tuple:

  * `found`: `{key: value}` for keys found (key type is `bytes` by default, or `str` if `decode_keys=True`).
  * `not_found`: list of input keys not found (decoded to `str` if `decode_not_found=True`).
* Efficiently merges results from the write buffer and DB; `autoflush_on_read` applies unless overridden via other APIs.

### Existence & containment

```python
store.exists(key, *, flush: bool | None = None) -> bool
key in store  # __contains__ → NO flush, purely checks current state
```

* `flush=None` (default) follows `autoflush_on_read`.
* `flush=False` guarantees **no** implicit flush (useful for side-effect-free checks).

### Deletion

```python
del store[key]       # KeyError if not present
store.delete(key)    # schedules deletion via buffer (dict-like semantics)
```

### Lifecycle

```python
with LmdbObjectStore(...) as store:
    ...
# or
store.close(strict: bool = False)
```

* `close(strict=True)` re-raises the last flush error **after** closing the environment; otherwise it logs and completes.

---

## Concurrency Model

* Internally uses `RLock` + `Condition` and a simple **reader count** to coordinate:

  * Multiple concurrent readers are allowed.
  * Writers hold the lock (buffer mutation + flush/commit).
  * `close()` sets a “closing” flag and **waits** until the active reader count reaches zero.
* Designed for **thread-safety within a single process**. While LMDB itself supports multi-process access, this wrapper's locking is process-local; if you need multi-process writes, coordinate at a higher level.

---

## Map Size & Auto-Resize

* Initial size is given by `map_size` (forwarded to `lmdb.open`).
* When a write/commit hits `MapFullError`:

  * The store **grows** to `max(current*2, current+64MiB)`, capped at `max_map_size` if provided,
  * and **retries** the operation.
* If the cap is reached and still insufficient, the error is propagated.

---

## Performance Tips

* **Batch writes**: Keep `batch_size` large enough for your workload; call `flush()` at logical boundaries.
* **Use `put_many()`** for bulk inserts—single transaction with fewer fsyncs.
* **Avoid unnecessary decodes**: If you don't need `str` keys on output, leave `decode_*` parameters off.
* **Key type**: If possible, pass bytes keys directly (saves encoding overhead).

---

## Error Handling

* **Unpickling failures** are raised as a `RuntimeError` (with key context) when reading from buffer/DB.
* **Missing keys**: `__getitem__` and `del` raise `KeyError`; `get()` returns `default`.
* **Final flush failure on close**: re-raised if `strict=True`; otherwise logged.

---

## Security Note

This library uses Python `pickle` for value serialization. **Never unpickle data from untrusted sources**.

---

## Configuration Reference

* `batch_size: int` – buffer size threshold to auto-flush (default: 1000).
* `autoflush_on_read: bool` – flush before reads (default: `True`).
* `key_encoding: Optional[str]` – enable `str` keys (e.g., `"utf-8"`). If `None`, only bytes-like keys are accepted.
* `key_errors: str` – encoding error handling (`"strict"`, `"ignore"`, `"replace"`).
* `str_normalize: Optional[str]` – Unicode normalization for `str` keys (`"NFC"`, `"NFKC"`, ...).
* `lmdb_kwargs` – forwarded to `lmdb.open(...)`:

  * `map_size`, `subdir`, `readonly`, `lock`, ...
  * `max_map_size` (recognized by this wrapper to cap auto-growth).
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lmdb-object-store",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "yuka30240 <yuuka30240@gmail.com>",
    "keywords": "atomic, database, key-value, lmdb, object-store, thread-safe",
    "author": null,
    "author_email": "yuka30240 <yuuka30240@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/2c/36/d2c6f74ac422a9c8dd4bfcb21aaa0e400241bb7881177cbebd358122b436/lmdb_object_store-0.1.0.tar.gz",
    "platform": null,
    "description": "# lmdb-object-store\n\nLightweight thread-safe Python object store on top of **LMDB**.\nProvides a dict-like API (with buffering and atomic multi-put), automatic map-size growth, and fast zero-copy reads without having to handle LMDB's lower-level details.\n\n---\n\n## Features\n\n* **Dict-like interface**: `store[key] = obj`, `obj = store.get(key)`, `del store[key]`, `key in store`.\n* **Atomic multi-put**: `put_many(items)` writes large amounts of data **atomically** in a single transaction.\n* **Write buffering**: small writes are batched in memory; `flush()` persists them (auto-flush on size threshold).\n* **Auto map-size growth**: retries on `MapFullError` by growing the LMDB map (2\u00d7 or +64MiB), up to an optional cap.\n* **Fast reads**: Uses LMDB zero-copy reads to minimize copies; `get_many()` efficiently combines buffered and DB reads.\n* **Flexible keys**: bytes/bytearray/memoryview always supported; `str` keys allowed when `key_encoding='utf-8'` option is used.\n\n---\n\n## Installation\n\n```bash\npip install lmdb-object-store\n```\n\n### Requirements\n\n* Python **3.10+**\n* The [`lmdb`](https://pypi.org/project/lmdb/) Python package (wheels available for major platforms)\n\n---\n\n## Quick start\n\n```python\nfrom lmdb_object_store import LmdbObjectStore\n\n# Create or open an LMDB-backed object store\nwith LmdbObjectStore(\n    \"path/to/db\",\n    batch_size=1000,                # flush buffer when it reaches this many entries\n    autoflush_on_read=True,         # flush pending writes before reads\n    key_encoding=\"utf-8\",           # allow str keys (encoded with UTF-8)\n    # Any lmdb.open(...) kwargs may be passed here, e.g.:\n    map_size=128 * 1024 * 1024,     # 128 MiB initial map size\n    subdir=True,                    # create directory layout\n    readonly=False,\n    # max_map_size is recognized (cap for auto-resize):\n    max_map_size=4 * 1024 * 1024 * 1024,  # 4 GiB cap\n) as store:\n\n    # Put / get like a dict (values are pickled)\n    store[\"user:42\"] = {\"name\": \"Ada\", \"plan\": \"pro\"}\n    print(store.get(\"user:42\"))  # {'name': 'Ada', 'plan': 'pro'}\n\n    # Existence checks\n    if \"user:42\" in store:        # __contains__ has NO side effects (no flush)\n        assert store.exists(\"user:42\", flush=False) is True\n\n    # Delete\n    del store[\"user:42\"]\n\n    # Batch write, atomically (single transaction)\n    items = {f\"k{i}\": {\"i\": i} for i in range(10_000)}\n    store.put_many(items)\n\n    # Fetch many at once\n    found, not_found = store.get_many([\"k1\", \"kX\"], decode_keys=True)\n    # found -> {'k1': {'i': 1}}, not_found -> ['kX']\n\n# Clean close with strict error policy if needed:\nstore = LmdbObjectStore(\"path/to/db\", key_encoding=\"utf-8\")\ntry:\n    # ... work with store ...\n    store.close(strict=True)  # re-raise if final flush fails (after cleanup)\nfinally:\n    # idempotent\n    try: store.close()\n    except Exception: pass\n```\n\n---\n\n## API Overview\n\n### Constructor\n\n```python\nLmdbObjectStore(\n    db_path: str,\n    batch_size: int = 1000,\n    *,\n    autoflush_on_read: bool = True,\n    key_encoding: str | None = None,\n    key_errors: str = \"strict\",\n    str_normalize: str | None = None,\n    **lmdb_kwargs,\n)\n```\n\n* **db\\_path**: LMDB environment path (passed to `lmdb.open`).\n* **batch\\_size**: pending buffer size threshold to auto-flush.\n* **autoflush\\_on\\_read**: if `True`, flushes buffer before reads (`get`, `get_many`, `exists(flush=None)`).\n* **key\\_encoding**: enable `str` keys (e.g. `\"utf-8\"`). If `None`, only bytes-like keys are allowed.\n* **key\\_errors**: error strategy for encoding `str` keys (`\"strict\"`, `\"ignore\"`, `\"replace\"`, ...).\n* **str\\_normalize**: Unicode normalization for `str` keys (e.g., `\"NFC\"`, `\"NFKC\"`).\n* **lmdb\\_kwargs**: forwarded to `lmdb.open(...)` (e.g., `map_size`, `subdir`, `readonly`, etc).\n  Special: **`max_map_size`** (cap for automatic map growth) is also recognized.\n\n### Put / Get\n\n```python\nstore.put(key, obj)                          # buffer write\nobj = store.get(key, default=None)           # read; from buffer if present, else DB\nstore.flush()                                # persist the write buffer\n```\n\n* Values are serialized with `pickle` (highest protocol).\n* `get()` uses zero-copy buffers internally; unpickling happens once per value.\n\n### Atomic multi-put\n\n```python\nstore.put_many(items: Mapping[Any, Any] | Iterable[tuple[Any, Any]])\n```\n\n* Writes **all** items in **one LMDB write transaction**.\n* If a `MapFullError` occurs, the store will **grow the map** (2\u00d7 or +64MiB) up to `max_map_size` and **retry from the beginning**.\n* **Note**: `put_many()` first flushes any pending buffered writes; the atomic transaction only includes the `items` passed to this call.\n\n### Get many\n\n```python\nfound, not_found = store.get_many(\n    keys: Sequence[Any],\n    *,\n    decode_keys: bool = False,\n    decode_not_found: bool | None = None,   # None \u2192 follow decode_keys\n)\n```\n\n* Returns a tuple:\n\n  * `found`: `{key: value}` for keys found (key type is `bytes` by default, or `str` if `decode_keys=True`).\n  * `not_found`: list of input keys not found (decoded to `str` if `decode_not_found=True`).\n* Efficiently merges results from the write buffer and DB; `autoflush_on_read` applies unless overridden via other APIs.\n\n### Existence & containment\n\n```python\nstore.exists(key, *, flush: bool | None = None) -> bool\nkey in store  # __contains__ \u2192 NO flush, purely checks current state\n```\n\n* `flush=None` (default) follows `autoflush_on_read`.\n* `flush=False` guarantees **no** implicit flush (useful for side-effect-free checks).\n\n### Deletion\n\n```python\ndel store[key]       # KeyError if not present\nstore.delete(key)    # schedules deletion via buffer (dict-like semantics)\n```\n\n### Lifecycle\n\n```python\nwith LmdbObjectStore(...) as store:\n    ...\n# or\nstore.close(strict: bool = False)\n```\n\n* `close(strict=True)` re-raises the last flush error **after** closing the environment; otherwise it logs and completes.\n\n---\n\n## Concurrency Model\n\n* Internally uses `RLock` + `Condition` and a simple **reader count** to coordinate:\n\n  * Multiple concurrent readers are allowed.\n  * Writers hold the lock (buffer mutation + flush/commit).\n  * `close()` sets a \u201cclosing\u201d flag and **waits** until the active reader count reaches zero.\n* Designed for **thread-safety within a single process**. While LMDB itself supports multi-process access, this wrapper's locking is process-local; if you need multi-process writes, coordinate at a higher level.\n\n---\n\n## Map Size & Auto-Resize\n\n* Initial size is given by `map_size` (forwarded to `lmdb.open`).\n* When a write/commit hits `MapFullError`:\n\n  * The store **grows** to `max(current*2, current+64MiB)`, capped at `max_map_size` if provided,\n  * and **retries** the operation.\n* If the cap is reached and still insufficient, the error is propagated.\n\n---\n\n## Performance Tips\n\n* **Batch writes**: Keep `batch_size` large enough for your workload; call `flush()` at logical boundaries.\n* **Use `put_many()`** for bulk inserts\u2014single transaction with fewer fsyncs.\n* **Avoid unnecessary decodes**: If you don't need `str` keys on output, leave `decode_*` parameters off.\n* **Key type**: If possible, pass bytes keys directly (saves encoding overhead).\n\n---\n\n## Error Handling\n\n* **Unpickling failures** are raised as a `RuntimeError` (with key context) when reading from buffer/DB.\n* **Missing keys**: `__getitem__` and `del` raise `KeyError`; `get()` returns `default`.\n* **Final flush failure on close**: re-raised if `strict=True`; otherwise logged.\n\n---\n\n## Security Note\n\nThis library uses Python `pickle` for value serialization. **Never unpickle data from untrusted sources**.\n\n---\n\n## Configuration Reference\n\n* `batch_size: int` \u2013 buffer size threshold to auto-flush (default: 1000).\n* `autoflush_on_read: bool` \u2013 flush before reads (default: `True`).\n* `key_encoding: Optional[str]` \u2013 enable `str` keys (e.g., `\"utf-8\"`). If `None`, only bytes-like keys are accepted.\n* `key_errors: str` \u2013 encoding error handling (`\"strict\"`, `\"ignore\"`, `\"replace\"`).\n* `str_normalize: Optional[str]` \u2013 Unicode normalization for `str` keys (`\"NFC\"`, `\"NFKC\"`, ...).\n* `lmdb_kwargs` \u2013 forwarded to `lmdb.open(...)`:\n\n  * `map_size`, `subdir`, `readonly`, `lock`, ...\n  * `max_map_size` (recognized by this wrapper to cap auto-growth).",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Lightweight LMDB-backed object store for Python",
    "version": "0.1.0",
    "project_urls": {
        "Changelog": "https://github.com/yuka30240/lmdb-object-store/releases",
        "Homepage": "https://github.com/yuka30240/lmdb-object-store",
        "Issues": "https://github.com/yuka30240/lmdb-object-store/issues",
        "Repository": "https://github.com/yuka30240/lmdb-object-store"
    },
    "split_keywords": [
        "atomic",
        " database",
        " key-value",
        " lmdb",
        " object-store",
        " thread-safe"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9acf8963b7e4c682ce1e12245292e9411751ba237030db4e35223889113f8aac",
                "md5": "0e27bb87b0da283582488d1452a2c254",
                "sha256": "327b61ee6f3247a12e272e19190493f556f90882b6ac077e927da0c7be0e6362"
            },
            "downloads": -1,
            "filename": "lmdb_object_store-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0e27bb87b0da283582488d1452a2c254",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 12501,
            "upload_time": "2025-08-19T02:04:21",
            "upload_time_iso_8601": "2025-08-19T02:04:21.967925Z",
            "url": "https://files.pythonhosted.org/packages/9a/cf/8963b7e4c682ce1e12245292e9411751ba237030db4e35223889113f8aac/lmdb_object_store-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2c36d2c6f74ac422a9c8dd4bfcb21aaa0e400241bb7881177cbebd358122b436",
                "md5": "4fe6067073a4f8b93597a2527acb00d3",
                "sha256": "309a2b91547787a86484af6a34aa05ac86281952f38fa162e941dee87f863c45"
            },
            "downloads": -1,
            "filename": "lmdb_object_store-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4fe6067073a4f8b93597a2527acb00d3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 30478,
            "upload_time": "2025-08-19T02:04:23",
            "upload_time_iso_8601": "2025-08-19T02:04:23.742344Z",
            "url": "https://files.pythonhosted.org/packages/2c/36/d2c6f74ac422a9c8dd4bfcb21aaa0e400241bb7881177cbebd358122b436/lmdb_object_store-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-19 02:04:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yuka30240",
    "github_project": "lmdb-object-store",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lmdb-object-store"
}
        
Elapsed time: 1.54472s