booklet


Namebooklet JSON
Version 0.1.14 PyPI version JSON
download
home_pageNone
SummaryA python key-value file database
upload_time2024-04-22 04:43:56
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords dbm shelve
VCS
bugtrack_url
requirements portalocker
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Booklet
==================================

Introduction
------------
Booklet is a pure python key-value file database. It allows for multiple serializers for both the keys and values. Booklet uses the `MutableMapping <https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes>`_ class API which is the same as python's dictionary in addition to some `dbm <https://docs.python.org/3/library/dbm.html>`_ methods (i.e. sync and prune).
It is thread-safe (using thread locks on writes) and multiprocessing-safe (using file locks).

Deletes do not remove data from the file directly. Similarly, reassigning a value to an existing key adds a new key/value set to the file. During normal usage, the user will not notice a difference when requesting a key/value set, but the file size will grow. If size becomes an issue because of lots of deletes or reassignments, then the user should run the "prune" method to remove old key/value sets.

Installation
------------
Install via pip::

  pip install booklet

Or conda::

  conda install -c mullenkamp booklet


I'll probably put it on conda-forge once I feel like it's up to an appropriate standard...


Serialization
-----------------------------
Both the keys and values stored in Booklet must be bytes when written to disk. This is the default when "open" is called. Booklet allows for various serializers to be used for taking input keys and values and converting them to bytes. There are many in-built serializers. Check the booklet.available_serializers list for what's available. Some serializers require additional packages to be installed (e.g. orjson, zstd, etc). If you want to serialize to json, then it is highly recommended to use orjson or msgpack as they are substantially faster than the standard json python module. If in-built serializers are assigned at initial file creation, then they will be saved on future reading and writing on the same file (i.e. they don't need to be passed after the first time). Setting a serializer to None will not do any serializing, and the input must be bytes.
The user can also pass custom serializers to the key_serializer and value_serializer parameters. These must have "dumps" and "loads" static methods. This allows the user to chain a serializer and a compressor together if desired. Custom serializers must be passed for writing and reading as they are not stored in the booklet file.

.. code:: python

  import booklet

  print(booklet.available_serializers)


Usage
-----
The docstrings have a lot of info about the classes and methods. Files should be opened with the booklet.open function. Read the docstrings of the open function for more details.

Write data using the context manager
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

  import booklet

  with booklet.open('test.blt', 'n', value_serializer='pickle', key_serializer='str') as db:
    db['test_key'] = ['one', 2, 'three', 4]


Read data using the context manager
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

  with booklet.open('test.blt', 'r') as db:
    test_data = db['test_key']

Notice that you don't need to pass serializer parameters when reading (and additional writing) when in-built serializers are used. Booklet stores this info on the initial file creation.

In most cases, the user should use python's context manager "with" when reading and writing data. This will ensure data is properly written and locks are released on the file. If the context manager is not used, then the user must be sure to run the db.sync() (or db.close()) at the end of a series of writes to ensure the data has been fully written to disk. Only after the writes have been synced can additional reads occur. Make sure you close your file or you'll run into file deadlocks!

Write data without using the context manager
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

  import booklet

  db = booklet.open('test.blt', 'n', value_serializer='pickle', key_serializer='str')

  db['test_key'] = ['one', 2, 'three', 4]
  db['2nd_test_key'] = ['five', 6, 'seven', 8]

  db.sync()  # Normally not necessary if the user closes the file after writing
  db.close() # Will also run sync as part of the closing process


Read data without using the context manager
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python

  db = booklet.open('test.blt') # 'r' is the default flag

  test_data1 = db['test_key']
  test_data2 = db['2nd_test_key']

  db.close()


Custom serializers
~~~~~~~~~~~~~~~~~~
.. code:: python

  import orjson

  class Orjson:
    def dumps(obj):
        return orjson.dumps(obj, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_OMIT_MICROSECONDS | orjson.OPT_SERIALIZE_NUMPY)
    def loads(obj):
        return orjson.loads(obj)

  with booklet.open('test.blt', 'n', value_serializer=Orjson, key_serializer='str') as db:
    db['test_key'] = ['one', 2, 'three', 4]


The Orjson class is actually already built into the package. You can pass the string 'orjson' to either serializer parameters to use the above serializer. This is just an example of a serializer.

Here's another example with compression.

.. code:: python

  import orjson
  import zstandard as zstd

  class OrjsonZstd:
    def dumps(obj):
        return zstd.compress(orjson.dumps(obj, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_OMIT_MICROSECONDS | orjson.OPT_SERIALIZE_NUMPY))
    def loads(obj):
        return orjson.loads(zstd.decompress(obj))

  with booklet.open('test.blt', 'n', value_serializer=OrjsonZstd, key_serializer='str') as db:
    db['big_test'] = list(range(1000000))

  with booklet.open('test.blt', 'r', value_serializer=OrjsonZstd) as db:
    big_test_data = db['big_test']

If you use a custom serializer, then you'll always need to pass it to booklet.open for additional reading and writing.


The open flag follows the standard dbm options:

+---------+-------------------------------------------+
| Value   | Meaning                                   |
+=========+===========================================+
| ``'r'`` | Open existing database for reading only   |
|         | (default)                                 |
+---------+-------------------------------------------+
| ``'w'`` | Open existing database for reading and    |
|         | writing                                   |
+---------+-------------------------------------------+
| ``'c'`` | Open database for reading and writing,    |
|         | creating it if it doesn't exist           |
+---------+-------------------------------------------+
| ``'n'`` | Always create a new, empty database, open |
|         | for reading and writing                   |
+---------+-------------------------------------------+


TODO
-----
Starting in version 0.1.8, there is a prune method. It removes "deleted" keys and values from the file, but it currently leaves the old indeces in the hash table. The old indeces should generally not cause a performance issue (and definitely not a file size issue), but it would be nice to have these removed as part of the prune method one day.


Benchmarks
-----------
From my initial tests, the performance is comparable to other very fast key-value databases (e.g. gdbm, lmdb).
Proper benchmarks will be coming soon...

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "booklet",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "dbm, shelve",
    "author": null,
    "author_email": "Mike Kittridge <mullenkamp1@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d5/3c/a081e9a8c61e3bbdca5057c0fb563dc0278356d2d284ed5ea58aec459679/booklet-0.1.14.tar.gz",
    "platform": null,
    "description": "Booklet\n==================================\n\nIntroduction\n------------\nBooklet is a pure python key-value file database. It allows for multiple serializers for both the keys and values. Booklet uses the `MutableMapping <https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes>`_ class API which is the same as python's dictionary in addition to some `dbm <https://docs.python.org/3/library/dbm.html>`_ methods (i.e. sync and prune).\nIt is thread-safe (using thread locks on writes) and multiprocessing-safe (using file locks).\n\nDeletes do not remove data from the file directly. Similarly, reassigning a value to an existing key adds a new key/value set to the file. During normal usage, the user will not notice a difference when requesting a key/value set, but the file size will grow. If size becomes an issue because of lots of deletes or reassignments, then the user should run the \"prune\" method to remove old key/value sets.\n\nInstallation\n------------\nInstall via pip::\n\n  pip install booklet\n\nOr conda::\n\n  conda install -c mullenkamp booklet\n\n\nI'll probably put it on conda-forge once I feel like it's up to an appropriate standard...\n\n\nSerialization\n-----------------------------\nBoth the keys and values stored in Booklet must be bytes when written to disk. This is the default when \"open\" is called. Booklet allows for various serializers to be used for taking input keys and values and converting them to bytes. There are many in-built serializers. Check the booklet.available_serializers list for what's available. Some serializers require additional packages to be installed (e.g. orjson, zstd, etc). If you want to serialize to json, then it is highly recommended to use orjson or msgpack as they are substantially faster than the standard json python module. If in-built serializers are assigned at initial file creation, then they will be saved on future reading and writing on the same file (i.e. they don't need to be passed after the first time). Setting a serializer to None will not do any serializing, and the input must be bytes.\nThe user can also pass custom serializers to the key_serializer and value_serializer parameters. These must have \"dumps\" and \"loads\" static methods. This allows the user to chain a serializer and a compressor together if desired. Custom serializers must be passed for writing and reading as they are not stored in the booklet file.\n\n.. code:: python\n\n  import booklet\n\n  print(booklet.available_serializers)\n\n\nUsage\n-----\nThe docstrings have a lot of info about the classes and methods. Files should be opened with the booklet.open function. Read the docstrings of the open function for more details.\n\nWrite data using the context manager\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n.. code:: python\n\n  import booklet\n\n  with booklet.open('test.blt', 'n', value_serializer='pickle', key_serializer='str') as db:\n    db['test_key'] = ['one', 2, 'three', 4]\n\n\nRead data using the context manager\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n.. code:: python\n\n  with booklet.open('test.blt', 'r') as db:\n    test_data = db['test_key']\n\nNotice that you don't need to pass serializer parameters when reading (and additional writing) when in-built serializers are used. Booklet stores this info on the initial file creation.\n\nIn most cases, the user should use python's context manager \"with\" when reading and writing data. This will ensure data is properly written and locks are released on the file. If the context manager is not used, then the user must be sure to run the db.sync() (or db.close()) at the end of a series of writes to ensure the data has been fully written to disk. Only after the writes have been synced can additional reads occur. Make sure you close your file or you'll run into file deadlocks!\n\nWrite data without using the context manager\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n.. code:: python\n\n  import booklet\n\n  db = booklet.open('test.blt', 'n', value_serializer='pickle', key_serializer='str')\n\n  db['test_key'] = ['one', 2, 'three', 4]\n  db['2nd_test_key'] = ['five', 6, 'seven', 8]\n\n  db.sync()  # Normally not necessary if the user closes the file after writing\n  db.close() # Will also run sync as part of the closing process\n\n\nRead data without using the context manager\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n.. code:: python\n\n  db = booklet.open('test.blt') # 'r' is the default flag\n\n  test_data1 = db['test_key']\n  test_data2 = db['2nd_test_key']\n\n  db.close()\n\n\nCustom serializers\n~~~~~~~~~~~~~~~~~~\n.. code:: python\n\n  import orjson\n\n  class Orjson:\n    def dumps(obj):\n        return orjson.dumps(obj, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_OMIT_MICROSECONDS | orjson.OPT_SERIALIZE_NUMPY)\n    def loads(obj):\n        return orjson.loads(obj)\n\n  with booklet.open('test.blt', 'n', value_serializer=Orjson, key_serializer='str') as db:\n    db['test_key'] = ['one', 2, 'three', 4]\n\n\nThe Orjson class is actually already built into the package. You can pass the string 'orjson' to either serializer parameters to use the above serializer. This is just an example of a serializer.\n\nHere's another example with compression.\n\n.. code:: python\n\n  import orjson\n  import zstandard as zstd\n\n  class OrjsonZstd:\n    def dumps(obj):\n        return zstd.compress(orjson.dumps(obj, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_OMIT_MICROSECONDS | orjson.OPT_SERIALIZE_NUMPY))\n    def loads(obj):\n        return orjson.loads(zstd.decompress(obj))\n\n  with booklet.open('test.blt', 'n', value_serializer=OrjsonZstd, key_serializer='str') as db:\n    db['big_test'] = list(range(1000000))\n\n  with booklet.open('test.blt', 'r', value_serializer=OrjsonZstd) as db:\n    big_test_data = db['big_test']\n\nIf you use a custom serializer, then you'll always need to pass it to booklet.open for additional reading and writing.\n\n\nThe open flag follows the standard dbm options:\n\n+---------+-------------------------------------------+\n| Value   | Meaning                                   |\n+=========+===========================================+\n| ``'r'`` | Open existing database for reading only   |\n|         | (default)                                 |\n+---------+-------------------------------------------+\n| ``'w'`` | Open existing database for reading and    |\n|         | writing                                   |\n+---------+-------------------------------------------+\n| ``'c'`` | Open database for reading and writing,    |\n|         | creating it if it doesn't exist           |\n+---------+-------------------------------------------+\n| ``'n'`` | Always create a new, empty database, open |\n|         | for reading and writing                   |\n+---------+-------------------------------------------+\n\n\nTODO\n-----\nStarting in version 0.1.8, there is a prune method. It removes \"deleted\" keys and values from the file, but it currently leaves the old indeces in the hash table. The old indeces should generally not cause a performance issue (and definitely not a file size issue), but it would be nice to have these removed as part of the prune method one day.\n\n\nBenchmarks\n-----------\nFrom my initial tests, the performance is comparable to other very fast key-value databases (e.g. gdbm, lmdb).\nProper benchmarks will be coming soon...\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A python key-value file database",
    "version": "0.1.14",
    "project_urls": {
        "Homepage": "https://github.com/mullenkamp/booklet"
    },
    "split_keywords": [
        "dbm",
        " shelve"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "039b135b35256014e9d6665308a82db364eec988cba84c11bad54379702034c5",
                "md5": "09179eb2aeabd01d9ba629e4e8b8b4e6",
                "sha256": "f4ae52f333cd55a67f3567f6c0dbdff9c1ecf77cd7d9478d61c39ad327fad0a7"
            },
            "downloads": -1,
            "filename": "booklet-0.1.14-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "09179eb2aeabd01d9ba629e4e8b8b4e6",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 20519,
            "upload_time": "2024-04-22T04:43:58",
            "upload_time_iso_8601": "2024-04-22T04:43:58.283746Z",
            "url": "https://files.pythonhosted.org/packages/03/9b/135b35256014e9d6665308a82db364eec988cba84c11bad54379702034c5/booklet-0.1.14-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d53ca081e9a8c61e3bbdca5057c0fb563dc0278356d2d284ed5ea58aec459679",
                "md5": "37e384a47ba708b07f3a834c35a5672f",
                "sha256": "ff4215f9771beda00da5823a1272aa2a78578b7ead588c9ea37f408b387fdae5"
            },
            "downloads": -1,
            "filename": "booklet-0.1.14.tar.gz",
            "has_sig": false,
            "md5_digest": "37e384a47ba708b07f3a834c35a5672f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18619,
            "upload_time": "2024-04-22T04:43:56",
            "upload_time_iso_8601": "2024-04-22T04:43:56.284190Z",
            "url": "https://files.pythonhosted.org/packages/d5/3c/a081e9a8c61e3bbdca5057c0fb563dc0278356d2d284ed5ea58aec459679/booklet-0.1.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-22 04:43:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mullenkamp",
    "github_project": "booklet",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "portalocker",
            "specs": []
        }
    ],
    "lcname": "booklet"
}
        
Elapsed time: 0.25899s