hnhm


Namehnhm JSON
Version 0.0.11 PyPI version JSON
download
home_pagehttps://marchinho11.github.io/hnhm
SummaryhNhM – highly Normalized hybrid Model.
upload_time2024-03-01 18:31:12
maintainerArseny Egorov
docs_urlNone
authorArseny Egorov
requires_python>=3.10,<3.11
license
keywords hnhm datavault anchormodeling python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # <img src="https://raw.githubusercontent.com/marchinho11/hnhm/main/docs/assets/logo.png" height="40px">
[![codecov](https://codecov.io/gh/marchinho11/hnhm/branch/main/graph/badge.svg?token=PFB1111T2D)](https://codecov.io/gh/marchinho11/hnhm)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

**hNhM**(highly Normalized hybrid Model) – data modeling methodology that enables iterative and Agile-driven
 modifications to your Data Warehouse (DWH). The methodology is based on Anchor Modeling and Data Vault.


The idea is to represent each Entity using 3 types of tables:
- `Hub` stores Business and Surrogate keys
- `Attribute` stores the history of changes (e.g. Name, Age, Gender...)
- `Group` is used to reduce the number of underlying tables by grouping multiple Attributes in one Group.
  **The best practice is to define a Group by the same data source.**


The purpose of `hnhm` is to:
* **Define** Entities, Links and Flows *declaratively* using Python
  * Describe them "logically"
  * The "physical" layer is managed by `hnhm`
* **Apply** the changes to a DWH iteratively
  * `hnhm` detects changes in the entities and creates new Attributes, Groups, or Links
* **Load** data from the Staging layer using Flows in the correct order
  * Ignore, Update, or keep the history using `SCD2`


Implementation of this package is based on report
 **"How we implemented our data storage model — highly Normalized hybrid Model"**
 by Evgeny Ermakov and Nikolai Grebenshchikov. [1) Yandex Report, habr.com](https://habr.com/ru/company/yandex/blog/557140/).
 [2) SmartData Conference, youtube.com](https://youtu.be/2fPqDvHsd0w)


* [Tutorial](#tutorial)
  + [Prerequisites](#prerequisites)
  + [Initialize DWH](#initialize-dwh)
  + [Start modeling](#start-modeling)
  + [Add Attribute and Group](#add-attribute-and-group)
* [Concepts](#concepts)
  + [Logical level (Python classes)](#logical-level-python-classes)
  + [Physical level (Tables)](#physical-level-tables)
  + [Change types of Attributes and Groups](#change-types-of-attributes-and-groups)
* [Supported Databases](#supported-databases)


## Tutorial
You can find the full code in the [`dwh`](dwh) folder.

### Prerequisites
Install the library
```shell
pip install hnhm
```

Create the `dwh` directory
```shell
mkdir dwh
```

Spin up the Postgres Database
```shell
# dwh/docker-compose.yml
version: "3.9"

volumes:
  postgres_data: { }

services:
  postgres:
    image: postgres:15
    volumes:
      - "postgres_data:/var/lib/postgresql/data"
    ports:
      - "5433:5432"
    environment:
      POSTGRES_DB: hnhm
      POSTGRES_USER: hnhm
      POSTGRES_PASSWORD: 123
      
# $ docker compose up -d
```

### Initialize DWH
Create the `__hnhm__.py` file with the following contents:
```python
# dwh/__hnhm__.py
from hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState

sql = PostgresPsycopgSql(
    database="hnhm",
    user="hnhm",
    password="123",
    port=5433
)

registry = HnhmRegistry(
    hnhm=HnHm(
        sql=sql,
        state=FileState("state.json"),
    ),
)
```

- `PostgresPsycopgSql` generates and executes SQL
- `HnhmRegistry` stores the `hnhm` object, entities and links
- `HnHm` implements core logic and manages the state
- `FileState` stores the state of your DWH in a file


Apply the changes to your DWH:
```shell
$ hnhm apply dwh

Importing 'registry' object from the module: 'dwh.__hnhm__'.

Your DWH is up to date.
```

Our DWH is up-to-date because we haven't added any entities and links yet.

### Start modeling
Let's add the new Entity. Add the `dwh/user.py` file with the following contents:
```python
# dwh/user.py
from hnhm import (
    Layout,
    LayoutType,
    String,
    ChangeType,
    HnhmEntity,
)


class User(HnhmEntity):
    """User data."""

    __layout__ = Layout(name="user", type=LayoutType.HNHM)

    user_id = String("User ID.", change_type=ChangeType.IGNORE)

    __keys__ = [user_id]

```

Add the `User` entity to the registry:
```diff
# dwh/__hnhm__.py
from hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState
+ from dwh.user import User

sql = PostgresPsycopgSql(
    database="hnhm",
    user="hnhm",
    password="123",
    port=5433
)

registry = HnhmRegistry(
+   entities=[User()],
    hnhm=HnHm(
        sql=sql,
        state=FileState("state.json"),
    ),
)
```

Apply the changes to your DWH:
```shell
$ hnhm apply dwh

Importing 'registry' object from the module: 'dwh.__hnhm__'.

Plan:

+ entity 'HNHM.user'
  + view 'user'
  + hub 'user'

Apply migrations? [y/N]: y
Applied!
```

We added the new entity `User` to our DWH
```sql
-- View on top the DDS tables
select * from entity__user;

-- Hub
select * from hub__user;
```

### Add Attribute and Group
Let's add an Attribute and a Group to our Entity. Edit the `dwh/user.py` file:
```diff
# dwh/user.py
from hnhm import (
    Layout,
    LayoutType,
    String,
    ChangeType,
    HnhmEntity,
+   Integer
)


class User(HnhmEntity):
    """User data."""

    __layout__ = Layout(name="user", type=LayoutType.HNHM)

    user_id = String("User ID.", change_type=ChangeType.IGNORE)
+   age = Integer("Age.", change_type=ChangeType.UPDATE)
+   first_name = String("First name.", change_type=ChangeType.NEW, group="name")
+   last_name = String("Last name.", change_type=ChangeType.NEW, group="name")

    __keys__ = [user_id]

```

Apply the changes to your DWH:
```shell
$ hnhm apply dwh

Importing 'registry' object from the module: 'dwh.__hnhm__'.

Plan:

[u] entity 'HNHM.user'
  + attribute 'age'
  + group 'name'
    |attribute 'first_name'
    |attribute 'last_name'
  [u] view 'user'

Apply migrations? [y/N]: y
Applied!
```


Take a look at newly created tables
```sql
-- View on top of the DDS tables was updated
select * from entity__user;

-- Attribute 'age'
select * from attr__user__age;

-- Group 'name'
select * from group__user__name;
```

The physical result:
```
view: entity__user
┌────────────────────────────────────────────────────────────────┐
│┌───────────────────┐   ┌────────────────┐   ┌─────────────────┐│
 │ group__user__name │   │ hub__user      │   │ attr__user__age │
 │                   │   │                │   │                 │
 │ + user_sk (FK)    ├──►│ + user_sk (PK) │◄──┤ + user_sk (FK)  │
 │ + first_name      │   │ + user_id_bk   │   │ + age           │
 │ + last_name       │   │ + valid_from   │   │ + valid_from    │
 │ + valid_from      │   │ + _hash        │   │ + _hash         │
 │ + valid_to        │   │ + _source      │   │ + _source       │
 │ + _hash           │   │ + _loaded_at   │   │ + _loaded_at    │
 │ + _source         │   └────────────────┘   └─────────────────┘
 │ + _loaded_at      │
 └───────────────────┘
```


## Concepts
### Logical level (Python classes)
* **Entity**: business entity (User, Review, Order, Booking)
* **Link**: the relationship between Entities (UserOrder, UserBooking)
* **Flow**: helps to load data from the stage layer to Entities and Links

### Physical level (Tables)
* **Hub**: hub table contains Entity's Business Keys and Surrogate Key(MD5 hash of concatenated business keys)
* **Attribute**: attribute table contains FK to Entity's surrogate key, history of attribute changes, and the `valid_from` column
* **Group**: group table contains FK to Entity's surrogate key, history of changes to group attributes, and the `valid_from` column
* **Link**: link table contains FKs to Entities surrogate keys. Historicity by `SCD2`

### Change types of Attributes and Groups
* `IGNORE`: insert the latest new data, ignore updates
* `UPDATE`: insert the latest new data, update
* `NEW`: full history using `SCD2`. Adds the `valid_to` column


## Supported Databases
- [x] Postgres


            

Raw data

            {
    "_id": null,
    "home_page": "https://marchinho11.github.io/hnhm",
    "name": "hnhm",
    "maintainer": "Arseny Egorov",
    "docs_url": null,
    "requires_python": ">=3.10,<3.11",
    "maintainer_email": "egoroff-ars@yandex.ru",
    "keywords": "hnhm,datavault,anchormodeling,python",
    "author": "Arseny Egorov",
    "author_email": "egoroff-ars@yandex.ru",
    "download_url": "https://files.pythonhosted.org/packages/49/64/901926e527cd60dd0b51e47e360d675aab3e89a5a31bc69544112f28d80b/hnhm-0.0.11.tar.gz",
    "platform": null,
    "description": "# <img src=\"https://raw.githubusercontent.com/marchinho11/hnhm/main/docs/assets/logo.png\" height=\"40px\">\n[![codecov](https://codecov.io/gh/marchinho11/hnhm/branch/main/graph/badge.svg?token=PFB1111T2D)](https://codecov.io/gh/marchinho11/hnhm)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n**hNhM**(highly Normalized hybrid Model) \u2013 data modeling methodology that enables iterative and Agile-driven\n modifications to your Data Warehouse (DWH). The methodology is based on Anchor Modeling and Data Vault.\n\n\nThe idea is to represent each Entity using 3 types of tables:\n- `Hub` stores Business and Surrogate keys\n- `Attribute` stores the history of changes (e.g. Name, Age, Gender...)\n- `Group` is used to reduce the number of underlying tables by grouping multiple Attributes in one Group.\n  **The best practice is to define a Group by the same data source.**\n\n\nThe purpose of `hnhm` is to:\n* **Define** Entities, Links and Flows *declaratively* using Python\n  * Describe them \"logically\"\n  * The \"physical\" layer is managed by `hnhm`\n* **Apply** the changes to a DWH iteratively\n  * `hnhm` detects changes in the entities and creates new Attributes, Groups, or Links\n* **Load** data from the Staging layer using Flows in the correct order\n  * Ignore, Update, or keep the history using `SCD2`\n\n\nImplementation of this package is based on report\n **\"How we implemented our data storage model \u2014 highly Normalized hybrid Model\"**\n by Evgeny Ermakov and Nikolai Grebenshchikov. [1) Yandex Report, habr.com](https://habr.com/ru/company/yandex/blog/557140/).\n [2) SmartData Conference, youtube.com](https://youtu.be/2fPqDvHsd0w)\n\n\n* [Tutorial](#tutorial)\n  + [Prerequisites](#prerequisites)\n  + [Initialize DWH](#initialize-dwh)\n  + [Start modeling](#start-modeling)\n  + [Add Attribute and Group](#add-attribute-and-group)\n* [Concepts](#concepts)\n  + [Logical level (Python classes)](#logical-level-python-classes)\n  + [Physical level (Tables)](#physical-level-tables)\n  + [Change types of Attributes and Groups](#change-types-of-attributes-and-groups)\n* [Supported Databases](#supported-databases)\n\n\n## Tutorial\nYou can find the full code in the [`dwh`](dwh) folder.\n\n### Prerequisites\nInstall the library\n```shell\npip install hnhm\n```\n\nCreate the `dwh` directory\n```shell\nmkdir dwh\n```\n\nSpin up the Postgres Database\n```shell\n# dwh/docker-compose.yml\nversion: \"3.9\"\n\nvolumes:\n  postgres_data: { }\n\nservices:\n  postgres:\n    image: postgres:15\n    volumes:\n      - \"postgres_data:/var/lib/postgresql/data\"\n    ports:\n      - \"5433:5432\"\n    environment:\n      POSTGRES_DB: hnhm\n      POSTGRES_USER: hnhm\n      POSTGRES_PASSWORD: 123\n      \n# $ docker compose up -d\n```\n\n### Initialize DWH\nCreate the `__hnhm__.py` file with the following contents:\n```python\n# dwh/__hnhm__.py\nfrom hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState\n\nsql = PostgresPsycopgSql(\n    database=\"hnhm\",\n    user=\"hnhm\",\n    password=\"123\",\n    port=5433\n)\n\nregistry = HnhmRegistry(\n    hnhm=HnHm(\n        sql=sql,\n        state=FileState(\"state.json\"),\n    ),\n)\n```\n\n- `PostgresPsycopgSql` generates and executes SQL\n- `HnhmRegistry` stores the `hnhm` object, entities and links\n- `HnHm` implements core logic and manages the state\n- `FileState` stores the state of your DWH in a file\n\n\nApply the changes to your DWH:\n```shell\n$ hnhm apply dwh\n\nImporting 'registry' object from the module: 'dwh.__hnhm__'.\n\nYour DWH is up to date.\n```\n\nOur DWH is up-to-date because we haven't added any entities and links yet.\n\n### Start modeling\nLet's add the new Entity. Add the `dwh/user.py` file with the following contents:\n```python\n# dwh/user.py\nfrom hnhm import (\n    Layout,\n    LayoutType,\n    String,\n    ChangeType,\n    HnhmEntity,\n)\n\n\nclass User(HnhmEntity):\n    \"\"\"User data.\"\"\"\n\n    __layout__ = Layout(name=\"user\", type=LayoutType.HNHM)\n\n    user_id = String(\"User ID.\", change_type=ChangeType.IGNORE)\n\n    __keys__ = [user_id]\n\n```\n\nAdd the `User` entity to the registry:\n```diff\n# dwh/__hnhm__.py\nfrom hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState\n+ from dwh.user import User\n\nsql = PostgresPsycopgSql(\n    database=\"hnhm\",\n    user=\"hnhm\",\n    password=\"123\",\n    port=5433\n)\n\nregistry = HnhmRegistry(\n+   entities=[User()],\n    hnhm=HnHm(\n        sql=sql,\n        state=FileState(\"state.json\"),\n    ),\n)\n```\n\nApply the changes to your DWH:\n```shell\n$ hnhm apply dwh\n\nImporting 'registry' object from the module: 'dwh.__hnhm__'.\n\nPlan:\n\n+ entity 'HNHM.user'\n  + view 'user'\n  + hub 'user'\n\nApply migrations? [y/N]: y\nApplied!\n```\n\nWe added the new entity `User` to our DWH\n```sql\n-- View on top the DDS tables\nselect * from entity__user;\n\n-- Hub\nselect * from hub__user;\n```\n\n### Add Attribute and Group\nLet's add an Attribute and a Group to our Entity. Edit the `dwh/user.py` file:\n```diff\n# dwh/user.py\nfrom hnhm import (\n    Layout,\n    LayoutType,\n    String,\n    ChangeType,\n    HnhmEntity,\n+   Integer\n)\n\n\nclass User(HnhmEntity):\n    \"\"\"User data.\"\"\"\n\n    __layout__ = Layout(name=\"user\", type=LayoutType.HNHM)\n\n    user_id = String(\"User ID.\", change_type=ChangeType.IGNORE)\n+   age = Integer(\"Age.\", change_type=ChangeType.UPDATE)\n+   first_name = String(\"First name.\", change_type=ChangeType.NEW, group=\"name\")\n+   last_name = String(\"Last name.\", change_type=ChangeType.NEW, group=\"name\")\n\n    __keys__ = [user_id]\n\n```\n\nApply the changes to your DWH:\n```shell\n$ hnhm apply dwh\n\nImporting 'registry' object from the module: 'dwh.__hnhm__'.\n\nPlan:\n\n[u] entity 'HNHM.user'\n  + attribute 'age'\n  + group 'name'\n    |attribute 'first_name'\n    |attribute 'last_name'\n  [u] view 'user'\n\nApply migrations? [y/N]: y\nApplied!\n```\n\n\nTake a look at newly created tables\n```sql\n-- View on top of the DDS tables was updated\nselect * from entity__user;\n\n-- Attribute 'age'\nselect * from attr__user__age;\n\n-- Group 'name'\nselect * from group__user__name;\n```\n\nThe physical result:\n```\nview: entity__user\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\u2502\n \u2502 group__user__name \u2502   \u2502 hub__user      \u2502   \u2502 attr__user__age \u2502\n \u2502                   \u2502   \u2502                \u2502   \u2502                 \u2502\n \u2502 + user_sk (FK)    \u251c\u2500\u2500\u25ba\u2502 + user_sk (PK) \u2502\u25c4\u2500\u2500\u2524 + user_sk (FK)  \u2502\n \u2502 + first_name      \u2502   \u2502 + user_id_bk   \u2502   \u2502 + age           \u2502\n \u2502 + last_name       \u2502   \u2502 + valid_from   \u2502   \u2502 + valid_from    \u2502\n \u2502 + valid_from      \u2502   \u2502 + _hash        \u2502   \u2502 + _hash         \u2502\n \u2502 + valid_to        \u2502   \u2502 + _source      \u2502   \u2502 + _source       \u2502\n \u2502 + _hash           \u2502   \u2502 + _loaded_at   \u2502   \u2502 + _loaded_at    \u2502\n \u2502 + _source         \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 + _loaded_at      \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n\n## Concepts\n### Logical level (Python classes)\n* **Entity**: business entity (User, Review, Order, Booking)\n* **Link**: the relationship between Entities (UserOrder, UserBooking)\n* **Flow**: helps to load data from the stage layer to Entities and Links\n\n### Physical level (Tables)\n* **Hub**: hub table contains Entity's Business Keys and Surrogate Key(MD5 hash of concatenated business keys)\n* **Attribute**: attribute table contains FK to Entity's surrogate key, history of attribute changes, and the `valid_from` column\n* **Group**: group table contains FK to Entity's surrogate key, history of changes to group attributes, and the `valid_from` column\n* **Link**: link table contains FKs to Entities surrogate keys. Historicity by `SCD2`\n\n### Change types of Attributes and Groups\n* `IGNORE`: insert the latest new data, ignore updates\n* `UPDATE`: insert the latest new data, update\n* `NEW`: full history using `SCD2`. Adds the `valid_to` column\n\n\n## Supported Databases\n- [x] Postgres\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "hNhM \u2013 highly Normalized hybrid Model.",
    "version": "0.0.11",
    "project_urls": {
        "Documentation": "https://marchinho11.github.io/hnhm",
        "Homepage": "https://marchinho11.github.io/hnhm",
        "Repository": "https://github.com/marchinho11/hnhm"
    },
    "split_keywords": [
        "hnhm",
        "datavault",
        "anchormodeling",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "01cebacc2c989b363d738d8c0bfb33b4a144cb1584345fddf3f7e06141152589",
                "md5": "f97a4778d588954da1c05fd08ca70416",
                "sha256": "2719d1c516eec517662d054203f61fc15e8319588794be68041e977e7e3fecf4"
            },
            "downloads": -1,
            "filename": "hnhm-0.0.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f97a4778d588954da1c05fd08ca70416",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10,<3.11",
            "size": 29401,
            "upload_time": "2024-03-01T18:31:11",
            "upload_time_iso_8601": "2024-03-01T18:31:11.116221Z",
            "url": "https://files.pythonhosted.org/packages/01/ce/bacc2c989b363d738d8c0bfb33b4a144cb1584345fddf3f7e06141152589/hnhm-0.0.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4964901926e527cd60dd0b51e47e360d675aab3e89a5a31bc69544112f28d80b",
                "md5": "92c4a460ba4ab4442165273bf104e711",
                "sha256": "be87db8d63646ed7d896d2d99aa288559cccc35642d70665a8589db150e3e854"
            },
            "downloads": -1,
            "filename": "hnhm-0.0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "92c4a460ba4ab4442165273bf104e711",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10,<3.11",
            "size": 20947,
            "upload_time": "2024-03-01T18:31:12",
            "upload_time_iso_8601": "2024-03-01T18:31:12.911332Z",
            "url": "https://files.pythonhosted.org/packages/49/64/901926e527cd60dd0b51e47e360d675aab3e89a5a31bc69544112f28d80b/hnhm-0.0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-01 18:31:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "marchinho11",
    "github_project": "hnhm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hnhm"
}
        
Elapsed time: 0.27148s