deduplicationdict


Namededuplicationdict JSON
Version 1.0.4 PyPI version JSON
download
home_page
SummaryA dictionary that de-duplicates values.
upload_time2023-07-03 03:30:12
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords python dict deduplication optimization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DeDuplicationDict

[![PyPI version](https://badge.fury.io/py/deduplicationdict.svg)](https://badge.fury.io/py/deduplicationdict)
[![Python package](https://github.com/Vivswan/DeDuplicationDict/actions/workflows/unittest.yaml/badge.svg)](https://github.com/Vivswan/DeDuplicationDict/actions/workflows/unittest.yaml)
[![Documentation Status](https://readthedocs.org/projects/deduplicationdict/badge/?version=release)](https://deduplicationdict.readthedocs.io/en/release/?badge=release)
[![Python](https://img.shields.io/badge/python-3.7--3.12-blue)](https://badge.fury.io/py/deduplicationdict)
[![License: MPL 2.0](https://img.shields.io/badge/License-MPL_2.0-blue.svg)](https://opensource.org/licenses/MPL-2.0)

[![Github](https://img.shields.io/badge/GitHub-Vivswan%2FDeDuplicationDict-blue)](https://github.com/Vivswan/DeDuplicationDict)

A dictionary that de-duplicates values.

A dictionary-like class that deduplicates values by storing them in a separate dictionary and replacing
them with their corresponding hash values. This class is particularly useful for large dictionaries with
repetitive entries, as it can save memory by storing values only once and substituting recurring values
with their hash representations.

This class supports nested structures by automatically converting nested dictionaries into
`DeDuplicationDict` instances. It also provides various conversion methods to convert between regular
dictionaries and `DeDuplicationDict` instances.

## Installation

```bash
pip install deduplicationdict
```

## Usage

```python
from deduplicationdict import DeDuplicationDict

# Create a new DeDuplicationDict instance
dedup_dict = DeDuplicationDict.from_dict({'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})
# or
dedup_dict = DeDuplicationDict(**{'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})

# Add a new duplicate key-value pair
dedup_dict['d'] = [1, 2, 3]
dedup_dict['e'] = [1, 2, 3]

# Print the dictionary
print(f"dedup_dict.to_dict(): {dedup_dict.to_dict()}")
# output: {'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7], 'd': [1, 2, 3], 'e': [1, 2, 3]}

# Print the deduplicated dictionary internal
print(f"dedup_dict.key_dict: {dedup_dict.key_dict}")
# output: {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}
print(f"dedup_dict.value_dict: {dedup_dict.value_dict}")
# output: {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}

# Print the deduplicated dictionary
print(f"to_json_save_dict: {dedup_dict.to_json_save_dict()}")
# output: {'key_dict': {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}, 'value_dict': {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}}

assert dedup_dict["a"] == [5, 6, 7]
assert dedup_dict["b"] == 2
assert dedup_dict["c"] == [5, 6, 7]
assert dedup_dict["d"] == [1, 2, 3]
assert dedup_dict["e"] == [1, 2, 3]
assert DeDuplicationDict.from_json_save_dict(dedup_dict.to_json_save_dict()).to_dict() == dedup_dict.to_dict()
```

Usage with [SqliteDict](https://github.com/RaRe-Technologies/sqlitedict):
[SqliteDeDuplicationDict.py](https://gist.github.com/Vivswan/6fca547b2927e0bf11743869058d4b10)

## Results from Testing

| Method              | JSON Memory (MB) | In-Memory (MB) |
|:--------------------|:----------------:|:--------------:|
| `dict`              |    14.089 MB     |   27.542 MB    |
| `DeDuplicationDict` |    1.7906 MB     |    3.806 MB    |
|                     |                  |                |
| _Memory Saving_     |    **7.868x**    |   **7.235x**   |

![dict vs DeDuplicationDict](https://github.com/Vivswan/DeDuplicationDict/raw/release/docs/_static/dict_vs_DeDuplicationDict.svg)

[//]: # (![dict vs DeDuplicationDict](docs/_static/dict_vs_DeDuplicationDict.svg))

## Documentation

The documentation for this project is hosted on [Read the Docs](https://deduplicationdict.readthedocs.io/en/release/).

## License

This project is licensed under the terms of the [Mozilla Public License 2.0](https://opensource.org/licenses/MPL-2.0).


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "deduplicationdict",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Vivswan Shah <vivswanshah@pitt.edu>",
    "keywords": "python,dict,deduplication,optimization",
    "author": "",
    "author_email": "Vivswan Shah <vivswanshah@pitt.edu>",
    "download_url": "https://files.pythonhosted.org/packages/fc/1c/d3c7bba92dc5572cc1f50e0ed1a88f82b11b8ef1c318f66b5728b07b027e/deduplicationdict-1.0.4.tar.gz",
    "platform": null,
    "description": "# DeDuplicationDict\n\n[![PyPI version](https://badge.fury.io/py/deduplicationdict.svg)](https://badge.fury.io/py/deduplicationdict)\n[![Python package](https://github.com/Vivswan/DeDuplicationDict/actions/workflows/unittest.yaml/badge.svg)](https://github.com/Vivswan/DeDuplicationDict/actions/workflows/unittest.yaml)\n[![Documentation Status](https://readthedocs.org/projects/deduplicationdict/badge/?version=release)](https://deduplicationdict.readthedocs.io/en/release/?badge=release)\n[![Python](https://img.shields.io/badge/python-3.7--3.12-blue)](https://badge.fury.io/py/deduplicationdict)\n[![License: MPL 2.0](https://img.shields.io/badge/License-MPL_2.0-blue.svg)](https://opensource.org/licenses/MPL-2.0)\n\n[![Github](https://img.shields.io/badge/GitHub-Vivswan%2FDeDuplicationDict-blue)](https://github.com/Vivswan/DeDuplicationDict)\n\nA dictionary that de-duplicates values.\n\nA dictionary-like class that deduplicates values by storing them in a separate dictionary and replacing\nthem with their corresponding hash values. This class is particularly useful for large dictionaries with\nrepetitive entries, as it can save memory by storing values only once and substituting recurring values\nwith their hash representations.\n\nThis class supports nested structures by automatically converting nested dictionaries into\n`DeDuplicationDict` instances. It also provides various conversion methods to convert between regular\ndictionaries and `DeDuplicationDict` instances.\n\n## Installation\n\n```bash\npip install deduplicationdict\n```\n\n## Usage\n\n```python\nfrom deduplicationdict import DeDuplicationDict\n\n# Create a new DeDuplicationDict instance\ndedup_dict = DeDuplicationDict.from_dict({'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})\n# or\ndedup_dict = DeDuplicationDict(**{'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})\n\n# Add a new duplicate key-value pair\ndedup_dict['d'] = [1, 2, 3]\ndedup_dict['e'] = [1, 2, 3]\n\n# Print the dictionary\nprint(f\"dedup_dict.to_dict(): {dedup_dict.to_dict()}\")\n# output: {'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7], 'd': [1, 2, 3], 'e': [1, 2, 3]}\n\n# Print the deduplicated dictionary internal\nprint(f\"dedup_dict.key_dict: {dedup_dict.key_dict}\")\n# output: {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}\nprint(f\"dedup_dict.value_dict: {dedup_dict.value_dict}\")\n# output: {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}\n\n# Print the deduplicated dictionary\nprint(f\"to_json_save_dict: {dedup_dict.to_json_save_dict()}\")\n# output: {'key_dict': {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}, 'value_dict': {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}}\n\nassert dedup_dict[\"a\"] == [5, 6, 7]\nassert dedup_dict[\"b\"] == 2\nassert dedup_dict[\"c\"] == [5, 6, 7]\nassert dedup_dict[\"d\"] == [1, 2, 3]\nassert dedup_dict[\"e\"] == [1, 2, 3]\nassert DeDuplicationDict.from_json_save_dict(dedup_dict.to_json_save_dict()).to_dict() == dedup_dict.to_dict()\n```\n\nUsage with [SqliteDict](https://github.com/RaRe-Technologies/sqlitedict):\n[SqliteDeDuplicationDict.py](https://gist.github.com/Vivswan/6fca547b2927e0bf11743869058d4b10)\n\n## Results from Testing\n\n| Method              | JSON Memory (MB) | In-Memory (MB) |\n|:--------------------|:----------------:|:--------------:|\n| `dict`              |    14.089 MB     |   27.542 MB    |\n| `DeDuplicationDict` |    1.7906 MB     |    3.806 MB    |\n|                     |                  |                |\n| _Memory Saving_     |    **7.868x**    |   **7.235x**   |\n\n![dict vs DeDuplicationDict](https://github.com/Vivswan/DeDuplicationDict/raw/release/docs/_static/dict_vs_DeDuplicationDict.svg)\n\n[//]: # (![dict vs DeDuplicationDict]&#40;docs/_static/dict_vs_DeDuplicationDict.svg&#41;)\n\n## Documentation\n\nThe documentation for this project is hosted on [Read the Docs](https://deduplicationdict.readthedocs.io/en/release/).\n\n## License\n\nThis project is licensed under the terms of the [Mozilla Public License 2.0](https://opensource.org/licenses/MPL-2.0).\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A dictionary that de-duplicates values.",
    "version": "1.0.4",
    "project_urls": {
        "Author": "https://vivswan.github.io/",
        "Bug Reports": "https://github.com/Vivswan/DeDuplicationDict/issues",
        "Documentation": "https://deduplicationdict.readthedocs.io/en/latest/",
        "Homepage": "https://github.com/Vivswan/DeDuplicationDict",
        "Say Thanks!": "https://vivswan.github.io/",
        "Source": "https://github.com/Vivswan/DeDuplicationDict"
    },
    "split_keywords": [
        "python",
        "dict",
        "deduplication",
        "optimization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b876835d737a1443b956f075d11f0aa1213c27ef634a3261913a71350791652f",
                "md5": "52b52f1710985ad70367ec6f5210f486",
                "sha256": "81ed7d41d18f78a241ef426d3b24c6a6dd00ac6ffa642cc0e8fd2eddb673e3b8"
            },
            "downloads": -1,
            "filename": "deduplicationdict-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "52b52f1710985ad70367ec6f5210f486",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 11690,
            "upload_time": "2023-07-03T03:30:10",
            "upload_time_iso_8601": "2023-07-03T03:30:10.711861Z",
            "url": "https://files.pythonhosted.org/packages/b8/76/835d737a1443b956f075d11f0aa1213c27ef634a3261913a71350791652f/deduplicationdict-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc1cd3c7bba92dc5572cc1f50e0ed1a88f82b11b8ef1c318f66b5728b07b027e",
                "md5": "e93b8f7290103db3aeebaa38b59e4208",
                "sha256": "412cba02a591d04ffc958a060dcda6a58e1b6562151b485e36d44a643f2c6c6d"
            },
            "downloads": -1,
            "filename": "deduplicationdict-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "e93b8f7290103db3aeebaa38b59e4208",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 11732,
            "upload_time": "2023-07-03T03:30:12",
            "upload_time_iso_8601": "2023-07-03T03:30:12.612185Z",
            "url": "https://files.pythonhosted.org/packages/fc/1c/d3c7bba92dc5572cc1f50e0ed1a88f82b11b8ef1c318f66b5728b07b027e/deduplicationdict-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-03 03:30:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Vivswan",
    "github_project": "DeDuplicationDict",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "deduplicationdict"
}
        
Elapsed time: 0.09080s