dyna-store


Namedyna-store JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/brightnetwork/dyna-store
SummaryDynamic metadata storage
upload_time2024-05-22 16:59:39
maintainerNone
docs_urlNone
authorbrightnetwork
requires_python<4.0,>=3.10
licenseMIT
keywords id meta store high-cardinality metadata
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![image](https://img.shields.io/pypi/v/dyna-store.svg)](https://pypi.python.org/pypi/dyna-store)
[![image](https://img.shields.io/pypi/l/dyna-store.svg)](https://pypi.python.org/pypi/dyna-store)
[![image](https://img.shields.io/pypi/pyversions/dyna-store.svg)](https://pypi.python.org/pypi/dyna-store)
[![Code Coverage](https://img.shields.io/codecov/c/github/brightnetwork/dyna-store)](https://app.codecov.io/gh/brightnetwork/dyna-store)
[![Actions status](https://github.com/brightnetwork/dyna-store/workflows/test/badge.svg)](https://github.com/brightnetwork/dyna-store/actions)

# `dyna-store`

Efficient handling of high cardinality metadata.

In order to explain the main concept, let's go through an example:

## Use case

We have an online shopping website, where user are shown recommended products.
Each recommendation can lead to a bunch of user events (`viewed`, `clicked`, `purchased`, etc.). These events are stored in a database for further analysis.
For each event, we want to be able to relate it to the recommendation that led to it - in particular when the recommendation was made, with which algorithm, etc.

### Without dyna-store

An approach would be to store all the recommendations in a database table:

| id | userId | timestamp | algorithm |
| --- | --- | --- | --- |
| BShivYLGif | user1 | 1716384775942 | algo-1 |
| 5SvIjZIXMm | user1 | 1716384793233 | algo-2 |
| DkBoUmvMs0 | user2 | 1716384489455 | algo-2 |
| Nm8NabCct8 | user2 | 1716384483847 | algo-2 |
| 5ZO053OGpX | user2 | 1716384448985 | algo-2 |

Each recommendation would have a unique identifier (the primary key), maybe generated by the database.

Then, we can attach to each event the recommendation id. At any point we can query the recommendation table to get the details of the recommendation.

This works, but has some limitation: the recommendation table can grow very large:
- if you are computing recommendations on the fly, a single user session can generate a lot of recommendations
- if you are pre-computing recommendations (to ensure a fast first page view), that's also a lot of data to every day.

### With dyna-store

Dyna store intend to address this limitation, by:
- store less information in the database
- store more information in the recommendation id itself

We will first split our fields between two categories:
- the low cardinality field (`algorithm` in our example) - they don't have many different values and can be stored in the database
- the high cardinality field (`userId`, `timestamp` in our example) - they have many different values and will be stored in the id.

Then in the databse we will store the low cardinality fields, as well as the information needed to parse the informations contained in the id in a new `template` table:

| id | userId | timestamp | algorithm |
| --- | --- | --- | --- |
| BShivYLGif | { __hcf: 1, i: 0, l: 5, t: "string" } | { __hcf: 1, i: 5, l: 5, t: "datetime" } | algo-1 |
| 5SvIjZIXMm | { __hcf: 1, i: 0, l: 5, t: "string" } | { __hcf: 1, i: 5, l: 5, t: "datetime" } | algo-2 |

then the recommendation id will contain two part:
- the database id of the template - `BShivYLGif`
- the high cardinality fields, b62 encoded - `user1dXed`
which will give us the recommendation id `BShivYLGif-user1dXed`

This id will be then attached to each event. From the id we can regenerate the original metadata, assuming we have access to the templates table.
In some cases, that can lead to a drastic reduction of the amount of data stored in the database.


## Usage

```python3
from datetime import datetime

from dyna_store import DynaStore, HighCardinality, LowCardinality, Metadata, MetadataId
from pydantic import BaseModel

# create a pydantic model for your recommendations metadata
# you need to wrap your fields in either HighCardinality or LowCardinality
# HighCardinality fields will be stored in the id
# LowCardinality fields will be stored in the templates
class Recommendation(BaseModel):
    userId: HighCardinality[str]
    timestamp: HighCardinality[datetime]
    algorithm: LowCardinality[str]

# create a store by extending the DynaStore class
class RecommendationStore(DynaStore[Recommendation]):
    def save_metadata(self, _metadata: Metadata) -> MetadataId:
        # here you need to handle the saving of the metadata
        # could be in your database, in a file, etc.
        # you need to create and return a unique id for this metadata.
        pass

    def load_metadata(self, _id: MetadataId) -> Metadata:
        # here you need to handle the loading of the metadata from an id
        # could from your database, from a file, etc.
        pass

store = RecommendationStore(Recommendation)

# saving recommendations
id = store.create(Recommendation(userId="user1", timestamp=datetime.now(), algorithm="algo-1"))
# returns a Recommendation id

# loading recommendations
store.parse(id)
# returns a Recommendation object
```


## FAQ

### What database does it support?

all. none. You need to handle the storage of the metadata yourself. It could be in a database, in a file, etc.

### What about security?

the high cardinality fields are stored in the id, so they are not encrypted. Anyone in possession of this id could:
- access the high cardinality fields values
- generate new ids with the different high cardinality fields values

This needs to be taken into account.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/brightnetwork/dyna-store",
    "name": "dyna-store",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "id, meta, store, high-cardinality, metadata",
    "author": "brightnetwork",
    "author_email": "dev@brightnetwork.co.uk",
    "download_url": "https://files.pythonhosted.org/packages/13/b4/63ee8e13a9228c5b782e4d75841aa8899656c54dec2352c8c9853003a4b7/dyna_store-0.0.4.tar.gz",
    "platform": null,
    "description": "[![image](https://img.shields.io/pypi/v/dyna-store.svg)](https://pypi.python.org/pypi/dyna-store)\n[![image](https://img.shields.io/pypi/l/dyna-store.svg)](https://pypi.python.org/pypi/dyna-store)\n[![image](https://img.shields.io/pypi/pyversions/dyna-store.svg)](https://pypi.python.org/pypi/dyna-store)\n[![Code Coverage](https://img.shields.io/codecov/c/github/brightnetwork/dyna-store)](https://app.codecov.io/gh/brightnetwork/dyna-store)\n[![Actions status](https://github.com/brightnetwork/dyna-store/workflows/test/badge.svg)](https://github.com/brightnetwork/dyna-store/actions)\n\n# `dyna-store`\n\nEfficient handling of high cardinality metadata.\n\nIn order to explain the main concept, let's go through an example:\n\n## Use case\n\nWe have an online shopping website, where user are shown recommended products.\nEach recommendation can lead to a bunch of user events (`viewed`, `clicked`, `purchased`, etc.). These events are stored in a database for further analysis.\nFor each event, we want to be able to relate it to the recommendation that led to it - in particular when the recommendation was made, with which algorithm, etc.\n\n### Without dyna-store\n\nAn approach would be to store all the recommendations in a database table:\n\n| id | userId | timestamp | algorithm |\n| --- | --- | --- | --- |\n| BShivYLGif | user1 | 1716384775942 | algo-1 |\n| 5SvIjZIXMm | user1 | 1716384793233 | algo-2 |\n| DkBoUmvMs0 | user2 | 1716384489455 | algo-2 |\n| Nm8NabCct8 | user2 | 1716384483847 | algo-2 |\n| 5ZO053OGpX | user2 | 1716384448985 | algo-2 |\n\nEach recommendation would have a unique identifier (the primary key), maybe generated by the database.\n\nThen, we can attach to each event the recommendation id. At any point we can query the recommendation table to get the details of the recommendation.\n\nThis works, but has some limitation: the recommendation table can grow very large:\n- if you are computing recommendations on the fly, a single user session can generate a lot of recommendations\n- if you are pre-computing recommendations (to ensure a fast first page view), that's also a lot of data to every day.\n\n### With dyna-store\n\nDyna store intend to address this limitation, by:\n- store less information in the database\n- store more information in the recommendation id itself\n\nWe will first split our fields between two categories:\n- the low cardinality field (`algorithm` in our example) - they don't have many different values and can be stored in the database\n- the high cardinality field (`userId`, `timestamp` in our example) - they have many different values and will be stored in the id.\n\nThen in the databse we will store the low cardinality fields, as well as the information needed to parse the informations contained in the id in a new `template` table:\n\n| id | userId | timestamp | algorithm |\n| --- | --- | --- | --- |\n| BShivYLGif | { __hcf: 1, i: 0, l: 5, t: \"string\" } | { __hcf: 1, i: 5, l: 5, t: \"datetime\" } | algo-1 |\n| 5SvIjZIXMm | { __hcf: 1, i: 0, l: 5, t: \"string\" } | { __hcf: 1, i: 5, l: 5, t: \"datetime\" } | algo-2 |\n\nthen the recommendation id will contain two part:\n- the database id of the template - `BShivYLGif`\n- the high cardinality fields, b62 encoded - `user1dXed`\nwhich will give us the recommendation id `BShivYLGif-user1dXed`\n\nThis id will be then attached to each event. From the id we can regenerate the original metadata, assuming we have access to the templates table.\nIn some cases, that can lead to a drastic reduction of the amount of data stored in the database.\n\n\n## Usage\n\n```python3\nfrom datetime import datetime\n\nfrom dyna_store import DynaStore, HighCardinality, LowCardinality, Metadata, MetadataId\nfrom pydantic import BaseModel\n\n# create a pydantic model for your recommendations metadata\n# you need to wrap your fields in either HighCardinality or LowCardinality\n# HighCardinality fields will be stored in the id\n# LowCardinality fields will be stored in the templates\nclass Recommendation(BaseModel):\n    userId: HighCardinality[str]\n    timestamp: HighCardinality[datetime]\n    algorithm: LowCardinality[str]\n\n# create a store by extending the DynaStore class\nclass RecommendationStore(DynaStore[Recommendation]):\n    def save_metadata(self, _metadata: Metadata) -> MetadataId:\n        # here you need to handle the saving of the metadata\n        # could be in your database, in a file, etc.\n        # you need to create and return a unique id for this metadata.\n        pass\n\n    def load_metadata(self, _id: MetadataId) -> Metadata:\n        # here you need to handle the loading of the metadata from an id\n        # could from your database, from a file, etc.\n        pass\n\nstore = RecommendationStore(Recommendation)\n\n# saving recommendations\nid = store.create(Recommendation(userId=\"user1\", timestamp=datetime.now(), algorithm=\"algo-1\"))\n# returns a Recommendation id\n\n# loading recommendations\nstore.parse(id)\n# returns a Recommendation object\n```\n\n\n## FAQ\n\n### What database does it support?\n\nall. none. You need to handle the storage of the metadata yourself. It could be in a database, in a file, etc.\n\n### What about security?\n\nthe high cardinality fields are stored in the id, so they are not encrypted. Anyone in possession of this id could:\n- access the high cardinality fields values\n- generate new ids with the different high cardinality fields values\n\nThis needs to be taken into account.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Dynamic metadata storage",
    "version": "0.0.4",
    "project_urls": {
        "Documentation": "https://github.com/brightnetwork/dyna-store",
        "Homepage": "https://github.com/brightnetwork/dyna-store",
        "Repository": "https://github.com/brightnetwork/dyna-store"
    },
    "split_keywords": [
        "id",
        " meta",
        " store",
        " high-cardinality",
        " metadata"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "119fcf5536407e6f9852717192b4da9a2d9e063a3ea88845d39e7874256d4d00",
                "md5": "976cc72d220fdaddaeae50f655b0b0f3",
                "sha256": "7e9ac303f8a9cb2dd6106b377092c966d46bfad76224b0b028b8e6015753f5c8"
            },
            "downloads": -1,
            "filename": "dyna_store-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "976cc72d220fdaddaeae50f655b0b0f3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 6200,
            "upload_time": "2024-05-22T16:59:37",
            "upload_time_iso_8601": "2024-05-22T16:59:37.532695Z",
            "url": "https://files.pythonhosted.org/packages/11/9f/cf5536407e6f9852717192b4da9a2d9e063a3ea88845d39e7874256d4d00/dyna_store-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "13b463ee8e13a9228c5b782e4d75841aa8899656c54dec2352c8c9853003a4b7",
                "md5": "cd3e76ba6742d0413037f5c5f2160aa7",
                "sha256": "f6b03b6fe9b6aae6b11e60ac801c58cad3ec7d1629af6d468aa9a389e21fd040"
            },
            "downloads": -1,
            "filename": "dyna_store-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "cd3e76ba6742d0413037f5c5f2160aa7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 5533,
            "upload_time": "2024-05-22T16:59:39",
            "upload_time_iso_8601": "2024-05-22T16:59:39.600528Z",
            "url": "https://files.pythonhosted.org/packages/13/b4/63ee8e13a9228c5b782e4d75841aa8899656c54dec2352c8c9853003a4b7/dyna_store-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-22 16:59:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brightnetwork",
    "github_project": "dyna-store",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dyna-store"
}
        
Elapsed time: 4.54947s