persidict


Namepersidict JSON
Version 0.15.1 PyPI version JSON
download
home_pagehttps://github.com/pythagoras-dev/persidict
SummarySimple persistent key-value store for Python. Values are stored as files on a disk or as S3 objects on AWS cloud.
upload_time2024-12-09 21:20:40
maintainerNone
docs_urlNone
authorVlad (Volodymyr) Pavlov
requires_python>=3.10
licenseNone
keywords persistence dicts distributed parallel
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # persidict

Simple persistent dictionaries for Python.

## What Is It?

`persidict` offers a simple persistent key-value store for Python. 
It saves the content of the dictionary in a folder on a disk 
or in an S3 bucket on AWS. Each value is stored as a separate file / S3 object.
Only text strings or sequences of strings are allowed as keys.

Unlike other persistent dictionaries (e.g. Python's native `shelve`), 
`persidict` is designed for use in highly **distributed environments**, 
where multiple instances of a program run concurrently across many machines.

## Usage
Class `FileDirDict` is a persistent dictionary that stores its content 
in a folder on a disk.

    from persidict import FileDirDict    
    my_dictionary = FileDirDict(base_dir="my_folder")

Once created, it can be used as a regular Python dictionary 
that stores key-value pairs. A key must be a sequence of strings, 
a value can be any (pickleable) Python object:

    my_dictionary["Eliza"] = "MIT Eliza was a mock psychotherapist."
    my_dictionary["Eliza","year"] = 1965
    my_dictionary["Eliza","authors"] = ["Joseph Weizenbaum"]
    
    my_dictionary["Shoebox"] = "IBM Shoebox performed arithmetic operations"
    my_dictionary["Shoebox"] += " on voice commands."
    my_dictionary["Shoebox", "year"] = 1961
    my_dictionary["Shoebox", "authors"] = ["W.C. Dersch", "E.A. Quade"]

    for k in my_dictionary:
        print(list(k), "==>",  my_dictionary[k])

    if not "Eliza" in my_dictionary:
        print("Something is wrong")

If you run the code above, it will produce the following output:

    >>> ['Eliza'] ==> MIT Eliza was a mock psychotherapist.
    >>> ['Shoebox'] ==> IBM Shoebox performed arithmetic operations on voice commands.
    >>> ['Shoebox', 'authors'] ==> ['W.C. Dersch', 'E.A. Quade']
    >>> ['Shoebox', 'year'] ==> 1961
    >>> ['Eliza', 'authors'] ==> ['Joseph Weizenbaum']
    >>> ['Eliza', 'year'] ==> 1965

The dictionary automatically creates a folder named "my_folder" 
on the local disk. Each key-value pair is stored as 
a separate file within this folder.

If the key is a string, it becomes the filename for the object. 
If the key is a sequence of strings, all strings except the last 
are used to create nested subfolders within the main folder. 
The final string in the sequence serves as the filename for the object, 
which is stored in the deepest subfolder.

Persistent dictionaries only accept sequences of strings as keys. 
Any pickleable Python object can be used as a value. 
Unlike regular Python dictionaries, insertion order is not preserved.

    del my_dictionary
    new_dict = FileDirDict(base_dir="my_folder")
    print("len(new_dict) == ",len(new_dict))

The code above will create a new object named new_dict and then will
print its length: 

    >>> len(new_dict) == 6

The length is 6, because the dictionary was already stored on a disk 
in the "my_folder" directory, which contained 6 pickle files.

Technically, `FileDirDict` saves its content in a folder on a local disk. 
But you can share this folder with other machines 
(for example, using Dropbox or NFS), and work with the same dictionary 
simultaneously from multiple computers (from multiple instances of your program). 
This approach would allow you to use a persistent dictionary in 
a system that is distributed over dozens of computers.

If you need to run your program on hundreds (or more) computers, 
class `S3Dict` is a better choice: it's a persistent dictionary that 
stores its content in an AWS S3 bucket.

    from persidict import S3Dict
    my_cloud_dictionary = S3Dict(bucket_name="my_bucket")

Once created, it can be used as a regular Python dictionary.

## Key Classes

* `SafeStrTuple` - an immutable sequence of URL/filename-safe non-empty strings.
* `PersiDict` - an abstract base class for persistent dictionaries. 
* `FileDirDict` - a persistent dictionary that stores its content 
in a folder on a disk.
* `S3Dict` - a persistent dictionary that stores its content 
in an AWS S3 bucket.

## Key Similarities With Python Built-in Dictionaries

`PersiDict` and its subclasses can be used as regular Python dictionaries. 

* You can use square brackets to get, set, or delete values. 
* You can iterate over keys, values, or items. 
* You can check if a key is in the dictionary. 
* You can check whether two dicts are equal
(meaning they contain the same key-value pairs).
* You can get the length of the dictionary.
* Methods `keys()`, `values()`, `items()`, `get()`, `clear()`
, `setdefault()`, `update()` etc. work as expected.

## Key Differences From Python Built-in Dictionaries

`PersiDict` and its subclasses persist values between program executions, 
as well as make it possible to concurrently run programs 
that simultaneously work with the same instance of a dictionary.

* Keys must be sequences of URL/filename-safe non-empty strings.
* Values must be pickleable Python objects.
* You can constrain values to be an instance of a specific class.
* Insertion order is not preserved.
* You can not assign initial key-value pairs to a dictionary in its constructor.
* `PersiDict` API has additional methods `delete_if_exists()`, `timestamp()`,
`get_subdict()`, `subdicts()`, `random_keys()`, `newest_keys()`, 
`oldest_keys()`, `newest_values()`, `oldest_values()`, 
`get_params()`, `get_metaparams()`, and `get_default_metaparams()`,
which are not available in native Python dicts.

## Fine Tuning

`PersiDict` subclasses have a number of parameters that can be used 
to impact behaviour of a dictionary. 

* `base_class_for_values` - A base class for values stored in a dictionary.  
If specified, it will be used to check types of values in the dictionary. 
If not specified (if set to `None`), no type checking will be performed 
and all types will be allowed.
* `file_type` - a string that specifies the type of files used to store objects.
If `file_type` has one of two values: "pkl" or "json", it defines 
which file format will be used by the dictionary to store values. 
For all other values of `file_type`, the file format will always be plain
text. "pkl" or "json" allow to store arbitrary Python objects,
while all other file_type-s only work with str objects; 
it means `base_class_for_values` must be explicitly set to `str` 
if `file_type` is not set to "pkl" or "json".
* `immutable_items` - a boolean that specifies whether items in a dictionary 
can be modified/deleted. It enables various distributed cache optimizations 
for remote storage. True means an append-only dictionary. 
False means normal dict-like behaviour. The default value is False. 
* `digest_len` - a length of a hash signature suffix which `PersiDict` 
automatically adds to each string in a key while mapping the key to 
the address of a value in a persistent storage backend 
(e.g. a filename or an S3 objectname). It is needed to ensure correct work
of persistent dictionaries with case-insensitive (even if case-preserving) 
filesystems, such as MacOS HFS. The default value is 8. 


## How To Get It?

The source code is hosted on GitHub at:
[https://github.com/pythagoras-dev/persidict](https://github.com/pythagoras-dev/persidict) 

Binary installers for the latest released version are available at the Python package index at:
[https://pypi.org/project/persidict](https://pypi.org/project/persidict)

        pip install persidict

## Dependencies

* [jsonpickle](https://jsonpickle.github.io)
* [joblib](https://joblib.readthedocs.io)
* [lz4](https://python-lz4.readthedocs.io)
* [pandas](https://pandas.pydata.org)
* [numpy](https://numpy.org)
* [boto3](https://boto3.readthedocs.io)
* [pytest](https://pytest.org)
* [moto](http://getmoto.org)

## Key Contacts

* [Vlad (Volodymyr) Pavlov](https://www.linkedin.com/in/vlpavlov/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pythagoras-dev/persidict",
    "name": "persidict",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "persistence, dicts, distributed, parallel",
    "author": "Vlad (Volodymyr) Pavlov",
    "author_email": "vlpavlov@ieee.org",
    "download_url": "https://files.pythonhosted.org/packages/5a/a2/4225d00b3ee13fc98e5477a342aebb4299f9c6a46d59e88e0cdccd656446/persidict-0.15.1.tar.gz",
    "platform": null,
    "description": "# persidict\n\nSimple persistent dictionaries for Python.\n\n## What Is It?\n\n`persidict` offers a simple persistent key-value store for Python. \nIt saves the content of the dictionary in a folder on a disk \nor in an S3 bucket on AWS. Each value is stored as a separate file / S3 object.\nOnly text strings or sequences of strings are allowed as keys.\n\nUnlike other persistent dictionaries (e.g. Python's native `shelve`), \n`persidict` is designed for use in highly **distributed environments**, \nwhere multiple instances of a program run concurrently across many machines.\n\n## Usage\nClass `FileDirDict` is a persistent dictionary that stores its content \nin a folder on a disk.\n\n    from persidict import FileDirDict    \n    my_dictionary = FileDirDict(base_dir=\"my_folder\")\n\nOnce created, it can be used as a regular Python dictionary \nthat stores key-value pairs. A key must be a sequence of strings, \na value can be any (pickleable) Python object:\n\n    my_dictionary[\"Eliza\"] = \"MIT Eliza was a mock psychotherapist.\"\n    my_dictionary[\"Eliza\",\"year\"] = 1965\n    my_dictionary[\"Eliza\",\"authors\"] = [\"Joseph Weizenbaum\"]\n    \n    my_dictionary[\"Shoebox\"] = \"IBM Shoebox performed arithmetic operations\"\n    my_dictionary[\"Shoebox\"] += \" on voice commands.\"\n    my_dictionary[\"Shoebox\", \"year\"] = 1961\n    my_dictionary[\"Shoebox\", \"authors\"] = [\"W.C. Dersch\", \"E.A. Quade\"]\n\n    for k in my_dictionary:\n        print(list(k), \"==>\",  my_dictionary[k])\n\n    if not \"Eliza\" in my_dictionary:\n        print(\"Something is wrong\")\n\nIf you run the code above, it will produce the following output:\n\n    >>> ['Eliza'] ==> MIT Eliza was a mock psychotherapist.\n    >>> ['Shoebox'] ==> IBM Shoebox performed arithmetic operations on voice commands.\n    >>> ['Shoebox', 'authors'] ==> ['W.C. Dersch', 'E.A. Quade']\n    >>> ['Shoebox', 'year'] ==> 1961\n    >>> ['Eliza', 'authors'] ==> ['Joseph Weizenbaum']\n    >>> ['Eliza', 'year'] ==> 1965\n\nThe dictionary automatically creates a folder named \"my_folder\" \non the local disk. Each key-value pair is stored as \na separate file within this folder.\n\nIf the key is a string, it becomes the filename for the object. \nIf the key is a sequence of strings, all strings except the last \nare used to create nested subfolders within the main folder. \nThe final string in the sequence serves as the filename for the object, \nwhich is stored in the deepest subfolder.\n\nPersistent dictionaries only accept sequences of strings as keys. \nAny pickleable Python object can be used as a value. \nUnlike regular Python dictionaries, insertion order is not preserved.\n\n    del my_dictionary\n    new_dict = FileDirDict(base_dir=\"my_folder\")\n    print(\"len(new_dict) == \",len(new_dict))\n\nThe code above will create a new object named new_dict and then will\nprint its length: \n\n    >>> len(new_dict) == 6\n\nThe length is 6, because the dictionary was already stored on a disk \nin the \"my_folder\" directory, which contained 6 pickle files.\n\nTechnically, `FileDirDict` saves its content in a folder on a local disk. \nBut you can share this folder with other machines \n(for example, using Dropbox or NFS), and work with the same dictionary \nsimultaneously from multiple computers (from multiple instances of your program). \nThis approach would allow you to use a persistent dictionary in \na system that is distributed over dozens of computers.\n\nIf you need to run your program on hundreds (or more) computers, \nclass `S3Dict` is a better choice: it's a persistent dictionary that \nstores its content in an AWS S3 bucket.\n\n    from persidict import S3Dict\n    my_cloud_dictionary = S3Dict(bucket_name=\"my_bucket\")\n\nOnce created, it can be used as a regular Python dictionary.\n\n## Key Classes\n\n* `SafeStrTuple` - an immutable sequence of URL/filename-safe non-empty strings.\n* `PersiDict` - an abstract base class for persistent dictionaries. \n* `FileDirDict` - a persistent dictionary that stores its content \nin a folder on a disk.\n* `S3Dict` - a persistent dictionary that stores its content \nin an AWS S3 bucket.\n\n## Key Similarities With Python Built-in Dictionaries\n\n`PersiDict` and its subclasses can be used as regular Python dictionaries. \n\n* You can use square brackets to get, set, or delete values. \n* You can iterate over keys, values, or items. \n* You can check if a key is in the dictionary. \n* You can check whether two dicts are equal\n(meaning they contain the same key-value pairs).\n* You can get the length of the dictionary.\n* Methods `keys()`, `values()`, `items()`, `get()`, `clear()`\n, `setdefault()`, `update()` etc. work as expected.\n\n## Key Differences From Python Built-in Dictionaries\n\n`PersiDict` and its subclasses persist values between program executions, \nas well as make it possible to concurrently run programs \nthat simultaneously work with the same instance of a dictionary.\n\n* Keys must be sequences of URL/filename-safe non-empty strings.\n* Values must be pickleable Python objects.\n* You can constrain values to be an instance of a specific class.\n* Insertion order is not preserved.\n* You can not assign initial key-value pairs to a dictionary in its constructor.\n* `PersiDict` API has additional methods `delete_if_exists()`, `timestamp()`,\n`get_subdict()`, `subdicts()`, `random_keys()`, `newest_keys()`, \n`oldest_keys()`, `newest_values()`, `oldest_values()`, \n`get_params()`, `get_metaparams()`, and `get_default_metaparams()`,\nwhich are not available in native Python dicts.\n\n## Fine Tuning\n\n`PersiDict` subclasses have a number of parameters that can be used \nto impact behaviour of a dictionary. \n\n* `base_class_for_values` - A base class for values stored in a dictionary.  \nIf specified, it will be used to check types of values in the dictionary. \nIf not specified (if set to `None`), no type checking will be performed \nand all types will be allowed.\n* `file_type` - a string that specifies the type of files used to store objects.\nIf `file_type` has one of two values: \"pkl\" or \"json\", it defines \nwhich file format will be used by the dictionary to store values. \nFor all other values of `file_type`, the file format will always be plain\ntext. \"pkl\" or \"json\" allow to store arbitrary Python objects,\nwhile all other file_type-s only work with str objects; \nit means `base_class_for_values` must be explicitly set to `str` \nif `file_type` is not set to \"pkl\" or \"json\".\n* `immutable_items` - a boolean that specifies whether items in a dictionary \ncan be modified/deleted. It enables various distributed cache optimizations \nfor remote storage. True means an append-only dictionary. \nFalse means normal dict-like behaviour. The default value is False. \n* `digest_len` - a length of a hash signature suffix which `PersiDict` \nautomatically adds to each string in a key while mapping the key to \nthe address of a value in a persistent storage backend \n(e.g. a filename or an S3 objectname). It is needed to ensure correct work\nof persistent dictionaries with case-insensitive (even if case-preserving) \nfilesystems, such as MacOS HFS. The default value is 8. \n\n\n## How To Get It?\n\nThe source code is hosted on GitHub at:\n[https://github.com/pythagoras-dev/persidict](https://github.com/pythagoras-dev/persidict) \n\nBinary installers for the latest released version are available at the Python package index at:\n[https://pypi.org/project/persidict](https://pypi.org/project/persidict)\n\n        pip install persidict\n\n## Dependencies\n\n* [jsonpickle](https://jsonpickle.github.io)\n* [joblib](https://joblib.readthedocs.io)\n* [lz4](https://python-lz4.readthedocs.io)\n* [pandas](https://pandas.pydata.org)\n* [numpy](https://numpy.org)\n* [boto3](https://boto3.readthedocs.io)\n* [pytest](https://pytest.org)\n* [moto](http://getmoto.org)\n\n## Key Contacts\n\n* [Vlad (Volodymyr) Pavlov](https://www.linkedin.com/in/vlpavlov/)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Simple persistent key-value store for Python. Values are stored as files on a disk or as S3 objects on AWS cloud.",
    "version": "0.15.1",
    "project_urls": {
        "Homepage": "https://github.com/pythagoras-dev/persidict"
    },
    "split_keywords": [
        "persistence",
        " dicts",
        " distributed",
        " parallel"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5aa24225d00b3ee13fc98e5477a342aebb4299f9c6a46d59e88e0cdccd656446",
                "md5": "adf1bad96eaf1404ec5e16066cd68c5e",
                "sha256": "ec23c6e363b8adda49bb89539e46a4c3850d4d0f1a3f88e7ee6bc79607e92e3e"
            },
            "downloads": -1,
            "filename": "persidict-0.15.1.tar.gz",
            "has_sig": false,
            "md5_digest": "adf1bad96eaf1404ec5e16066cd68c5e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 17158,
            "upload_time": "2024-12-09T21:20:40",
            "upload_time_iso_8601": "2024-12-09T21:20:40.549990Z",
            "url": "https://files.pythonhosted.org/packages/5a/a2/4225d00b3ee13fc98e5477a342aebb4299f9c6a46d59e88e0cdccd656446/persidict-0.15.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-09 21:20:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pythagoras-dev",
    "github_project": "persidict",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "persidict"
}
        
Elapsed time: 0.38625s