# <img src="https://raw.githubusercontent.com/marchinho11/hnhm/main/docs/assets/logo.png" height="40px">
[![codecov](https://codecov.io/gh/marchinho11/hnhm/branch/main/graph/badge.svg?token=PFB1111T2D)](https://codecov.io/gh/marchinho11/hnhm)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
**hNhM**(highly Normalized hybrid Model) – data modeling methodology that enables iterative and Agile-driven
modifications to your Data Warehouse (DWH). The methodology is based on Anchor Modeling and Data Vault.
The idea is to represent each Entity using 3 types of tables:
- `Hub` stores Business and Surrogate keys
- `Attribute` stores the history of changes (e.g. Name, Age, Gender...)
- `Group` is used to reduce the number of underlying tables by grouping multiple Attributes in one Group.
**The best practice is to define a Group by the same data source.**
The purpose of `hnhm` is to:
* **Define** Entities, Links and Flows *declaratively* using Python
* Describe them "logically"
* The "physical" layer is managed by `hnhm`
* **Apply** the changes to a DWH iteratively
* `hnhm` detects changes in the entities and creates new Attributes, Groups, or Links
* **Load** data from the Staging layer using Flows in the correct order
* Ignore, Update, or keep the history using `SCD2`
Implementation of this package is based on report
**"How we implemented our data storage model — highly Normalized hybrid Model"**
by Evgeny Ermakov and Nikolai Grebenshchikov. [1) Yandex Report, habr.com](https://habr.com/ru/company/yandex/blog/557140/).
[2) SmartData Conference, youtube.com](https://youtu.be/2fPqDvHsd0w)
* [Tutorial](#tutorial)
+ [Prerequisites](#prerequisites)
+ [Initialize DWH](#initialize-dwh)
+ [Start modeling](#start-modeling)
+ [Add Attribute and Group](#add-attribute-and-group)
* [Concepts](#concepts)
+ [Logical level (Python classes)](#logical-level-python-classes)
+ [Physical level (Tables)](#physical-level-tables)
+ [Change types of Attributes and Groups](#change-types-of-attributes-and-groups)
* [Supported Databases](#supported-databases)
## Tutorial
You can find the full code in the [`dwh`](dwh) folder.
### Prerequisites
Install the library
```shell
pip install hnhm
```
Create the `dwh` directory
```shell
mkdir dwh
```
Spin up the Postgres Database
```shell
# dwh/docker-compose.yml
version: "3.9"
volumes:
postgres_data: { }
services:
postgres:
image: postgres:15
volumes:
- "postgres_data:/var/lib/postgresql/data"
ports:
- "5433:5432"
environment:
POSTGRES_DB: hnhm
POSTGRES_USER: hnhm
POSTGRES_PASSWORD: 123
# $ docker compose up -d
```
### Initialize DWH
Create the `__hnhm__.py` file with the following contents:
```python
# dwh/__hnhm__.py
from hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState
sql = PostgresPsycopgSql(
database="hnhm",
user="hnhm",
password="123",
port=5433
)
registry = HnhmRegistry(
hnhm=HnHm(
sql=sql,
state=FileState("state.json"),
),
)
```
- `PostgresPsycopgSql` generates and executes SQL
- `HnhmRegistry` stores the `hnhm` object, entities and links
- `HnHm` implements core logic and manages the state
- `FileState` stores the state of your DWH in a file
Apply the changes to your DWH:
```shell
$ hnhm apply dwh
Importing 'registry' object from the module: 'dwh.__hnhm__'.
Your DWH is up to date.
```
Our DWH is up-to-date because we haven't added any entities and links yet.
### Start modeling
Let's add the new Entity. Add the `dwh/user.py` file with the following contents:
```python
# dwh/user.py
from hnhm import (
Layout,
LayoutType,
String,
ChangeType,
HnhmEntity,
)
class User(HnhmEntity):
"""User data."""
__layout__ = Layout(name="user", type=LayoutType.HNHM)
user_id = String("User ID.", change_type=ChangeType.IGNORE)
__keys__ = [user_id]
```
Add the `User` entity to the registry:
```diff
# dwh/__hnhm__.py
from hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState
+ from dwh.user import User
sql = PostgresPsycopgSql(
database="hnhm",
user="hnhm",
password="123",
port=5433
)
registry = HnhmRegistry(
+ entities=[User()],
hnhm=HnHm(
sql=sql,
state=FileState("state.json"),
),
)
```
Apply the changes to your DWH:
```shell
$ hnhm apply dwh
Importing 'registry' object from the module: 'dwh.__hnhm__'.
Plan:
+ entity 'HNHM.user'
+ view 'user'
+ hub 'user'
Apply migrations? [y/N]: y
Applied!
```
We added the new entity `User` to our DWH
```sql
-- View on top the DDS tables
select * from entity__user;
-- Hub
select * from hub__user;
```
### Add Attribute and Group
Let's add an Attribute and a Group to our Entity. Edit the `dwh/user.py` file:
```diff
# dwh/user.py
from hnhm import (
Layout,
LayoutType,
String,
ChangeType,
HnhmEntity,
+ Integer
)
class User(HnhmEntity):
"""User data."""
__layout__ = Layout(name="user", type=LayoutType.HNHM)
user_id = String("User ID.", change_type=ChangeType.IGNORE)
+ age = Integer("Age.", change_type=ChangeType.UPDATE)
+ first_name = String("First name.", change_type=ChangeType.NEW, group="name")
+ last_name = String("Last name.", change_type=ChangeType.NEW, group="name")
__keys__ = [user_id]
```
Apply the changes to your DWH:
```shell
$ hnhm apply dwh
Importing 'registry' object from the module: 'dwh.__hnhm__'.
Plan:
[u] entity 'HNHM.user'
+ attribute 'age'
+ group 'name'
|attribute 'first_name'
|attribute 'last_name'
[u] view 'user'
Apply migrations? [y/N]: y
Applied!
```
Take a look at newly created tables
```sql
-- View on top of the DDS tables was updated
select * from entity__user;
-- Attribute 'age'
select * from attr__user__age;
-- Group 'name'
select * from group__user__name;
```
The physical result:
```
view: entity__user
┌────────────────────────────────────────────────────────────────┐
│┌───────────────────┐ ┌────────────────┐ ┌─────────────────┐│
│ group__user__name │ │ hub__user │ │ attr__user__age │
│ │ │ │ │ │
│ + user_sk (FK) ├──►│ + user_sk (PK) │◄──┤ + user_sk (FK) │
│ + first_name │ │ + user_id_bk │ │ + age │
│ + last_name │ │ + valid_from │ │ + valid_from │
│ + valid_from │ │ + _hash │ │ + _hash │
│ + valid_to │ │ + _source │ │ + _source │
│ + _hash │ │ + _loaded_at │ │ + _loaded_at │
│ + _source │ └────────────────┘ └─────────────────┘
│ + _loaded_at │
└───────────────────┘
```
## Concepts
### Logical level (Python classes)
* **Entity**: business entity (User, Review, Order, Booking)
* **Link**: the relationship between Entities (UserOrder, UserBooking)
* **Flow**: helps to load data from the stage layer to Entities and Links
### Physical level (Tables)
* **Hub**: hub table contains Entity's Business Keys and Surrogate Key(MD5 hash of concatenated business keys)
* **Attribute**: attribute table contains FK to Entity's surrogate key, history of attribute changes, and the `valid_from` column
* **Group**: group table contains FK to Entity's surrogate key, history of changes to group attributes, and the `valid_from` column
* **Link**: link table contains FKs to Entities surrogate keys. Historicity by `SCD2`
### Change types of Attributes and Groups
* `IGNORE`: insert the latest new data, ignore updates
* `UPDATE`: insert the latest new data, update
* `NEW`: full history using `SCD2`. Adds the `valid_to` column
## Supported Databases
- [x] Postgres
Raw data
{
"_id": null,
"home_page": "https://marchinho11.github.io/hnhm",
"name": "hnhm",
"maintainer": "Arseny Egorov",
"docs_url": null,
"requires_python": ">=3.10,<3.11",
"maintainer_email": "egoroff-ars@yandex.ru",
"keywords": "hnhm,datavault,anchormodeling,python",
"author": "Arseny Egorov",
"author_email": "egoroff-ars@yandex.ru",
"download_url": "https://files.pythonhosted.org/packages/49/64/901926e527cd60dd0b51e47e360d675aab3e89a5a31bc69544112f28d80b/hnhm-0.0.11.tar.gz",
"platform": null,
"description": "# <img src=\"https://raw.githubusercontent.com/marchinho11/hnhm/main/docs/assets/logo.png\" height=\"40px\">\n[![codecov](https://codecov.io/gh/marchinho11/hnhm/branch/main/graph/badge.svg?token=PFB1111T2D)](https://codecov.io/gh/marchinho11/hnhm)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n**hNhM**(highly Normalized hybrid Model) \u2013 data modeling methodology that enables iterative and Agile-driven\n modifications to your Data Warehouse (DWH). The methodology is based on Anchor Modeling and Data Vault.\n\n\nThe idea is to represent each Entity using 3 types of tables:\n- `Hub` stores Business and Surrogate keys\n- `Attribute` stores the history of changes (e.g. Name, Age, Gender...)\n- `Group` is used to reduce the number of underlying tables by grouping multiple Attributes in one Group.\n **The best practice is to define a Group by the same data source.**\n\n\nThe purpose of `hnhm` is to:\n* **Define** Entities, Links and Flows *declaratively* using Python\n * Describe them \"logically\"\n * The \"physical\" layer is managed by `hnhm`\n* **Apply** the changes to a DWH iteratively\n * `hnhm` detects changes in the entities and creates new Attributes, Groups, or Links\n* **Load** data from the Staging layer using Flows in the correct order\n * Ignore, Update, or keep the history using `SCD2`\n\n\nImplementation of this package is based on report\n **\"How we implemented our data storage model \u2014 highly Normalized hybrid Model\"**\n by Evgeny Ermakov and Nikolai Grebenshchikov. [1) Yandex Report, habr.com](https://habr.com/ru/company/yandex/blog/557140/).\n [2) SmartData Conference, youtube.com](https://youtu.be/2fPqDvHsd0w)\n\n\n* [Tutorial](#tutorial)\n + [Prerequisites](#prerequisites)\n + [Initialize DWH](#initialize-dwh)\n + [Start modeling](#start-modeling)\n + [Add Attribute and Group](#add-attribute-and-group)\n* [Concepts](#concepts)\n + [Logical level (Python classes)](#logical-level-python-classes)\n + [Physical level (Tables)](#physical-level-tables)\n + [Change types of Attributes and Groups](#change-types-of-attributes-and-groups)\n* [Supported Databases](#supported-databases)\n\n\n## Tutorial\nYou can find the full code in the [`dwh`](dwh) folder.\n\n### Prerequisites\nInstall the library\n```shell\npip install hnhm\n```\n\nCreate the `dwh` directory\n```shell\nmkdir dwh\n```\n\nSpin up the Postgres Database\n```shell\n# dwh/docker-compose.yml\nversion: \"3.9\"\n\nvolumes:\n postgres_data: { }\n\nservices:\n postgres:\n image: postgres:15\n volumes:\n - \"postgres_data:/var/lib/postgresql/data\"\n ports:\n - \"5433:5432\"\n environment:\n POSTGRES_DB: hnhm\n POSTGRES_USER: hnhm\n POSTGRES_PASSWORD: 123\n \n# $ docker compose up -d\n```\n\n### Initialize DWH\nCreate the `__hnhm__.py` file with the following contents:\n```python\n# dwh/__hnhm__.py\nfrom hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState\n\nsql = PostgresPsycopgSql(\n database=\"hnhm\",\n user=\"hnhm\",\n password=\"123\",\n port=5433\n)\n\nregistry = HnhmRegistry(\n hnhm=HnHm(\n sql=sql,\n state=FileState(\"state.json\"),\n ),\n)\n```\n\n- `PostgresPsycopgSql` generates and executes SQL\n- `HnhmRegistry` stores the `hnhm` object, entities and links\n- `HnHm` implements core logic and manages the state\n- `FileState` stores the state of your DWH in a file\n\n\nApply the changes to your DWH:\n```shell\n$ hnhm apply dwh\n\nImporting 'registry' object from the module: 'dwh.__hnhm__'.\n\nYour DWH is up to date.\n```\n\nOur DWH is up-to-date because we haven't added any entities and links yet.\n\n### Start modeling\nLet's add the new Entity. Add the `dwh/user.py` file with the following contents:\n```python\n# dwh/user.py\nfrom hnhm import (\n Layout,\n LayoutType,\n String,\n ChangeType,\n HnhmEntity,\n)\n\n\nclass User(HnhmEntity):\n \"\"\"User data.\"\"\"\n\n __layout__ = Layout(name=\"user\", type=LayoutType.HNHM)\n\n user_id = String(\"User ID.\", change_type=ChangeType.IGNORE)\n\n __keys__ = [user_id]\n\n```\n\nAdd the `User` entity to the registry:\n```diff\n# dwh/__hnhm__.py\nfrom hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState\n+ from dwh.user import User\n\nsql = PostgresPsycopgSql(\n database=\"hnhm\",\n user=\"hnhm\",\n password=\"123\",\n port=5433\n)\n\nregistry = HnhmRegistry(\n+ entities=[User()],\n hnhm=HnHm(\n sql=sql,\n state=FileState(\"state.json\"),\n ),\n)\n```\n\nApply the changes to your DWH:\n```shell\n$ hnhm apply dwh\n\nImporting 'registry' object from the module: 'dwh.__hnhm__'.\n\nPlan:\n\n+ entity 'HNHM.user'\n + view 'user'\n + hub 'user'\n\nApply migrations? [y/N]: y\nApplied!\n```\n\nWe added the new entity `User` to our DWH\n```sql\n-- View on top the DDS tables\nselect * from entity__user;\n\n-- Hub\nselect * from hub__user;\n```\n\n### Add Attribute and Group\nLet's add an Attribute and a Group to our Entity. Edit the `dwh/user.py` file:\n```diff\n# dwh/user.py\nfrom hnhm import (\n Layout,\n LayoutType,\n String,\n ChangeType,\n HnhmEntity,\n+ Integer\n)\n\n\nclass User(HnhmEntity):\n \"\"\"User data.\"\"\"\n\n __layout__ = Layout(name=\"user\", type=LayoutType.HNHM)\n\n user_id = String(\"User ID.\", change_type=ChangeType.IGNORE)\n+ age = Integer(\"Age.\", change_type=ChangeType.UPDATE)\n+ first_name = String(\"First name.\", change_type=ChangeType.NEW, group=\"name\")\n+ last_name = String(\"Last name.\", change_type=ChangeType.NEW, group=\"name\")\n\n __keys__ = [user_id]\n\n```\n\nApply the changes to your DWH:\n```shell\n$ hnhm apply dwh\n\nImporting 'registry' object from the module: 'dwh.__hnhm__'.\n\nPlan:\n\n[u] entity 'HNHM.user'\n + attribute 'age'\n + group 'name'\n |attribute 'first_name'\n |attribute 'last_name'\n [u] view 'user'\n\nApply migrations? [y/N]: y\nApplied!\n```\n\n\nTake a look at newly created tables\n```sql\n-- View on top of the DDS tables was updated\nselect * from entity__user;\n\n-- Attribute 'age'\nselect * from attr__user__age;\n\n-- Group 'name'\nselect * from group__user__name;\n```\n\nThe physical result:\n```\nview: entity__user\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\u2502\n \u2502 group__user__name \u2502 \u2502 hub__user \u2502 \u2502 attr__user__age \u2502\n \u2502 \u2502 \u2502 \u2502 \u2502 \u2502\n \u2502 + user_sk (FK) \u251c\u2500\u2500\u25ba\u2502 + user_sk (PK) \u2502\u25c4\u2500\u2500\u2524 + user_sk (FK) \u2502\n \u2502 + first_name \u2502 \u2502 + user_id_bk \u2502 \u2502 + age \u2502\n \u2502 + last_name \u2502 \u2502 + valid_from \u2502 \u2502 + valid_from \u2502\n \u2502 + valid_from \u2502 \u2502 + _hash \u2502 \u2502 + _hash \u2502\n \u2502 + valid_to \u2502 \u2502 + _source \u2502 \u2502 + _source \u2502\n \u2502 + _hash \u2502 \u2502 + _loaded_at \u2502 \u2502 + _loaded_at \u2502\n \u2502 + _source \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 + _loaded_at \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n\n## Concepts\n### Logical level (Python classes)\n* **Entity**: business entity (User, Review, Order, Booking)\n* **Link**: the relationship between Entities (UserOrder, UserBooking)\n* **Flow**: helps to load data from the stage layer to Entities and Links\n\n### Physical level (Tables)\n* **Hub**: hub table contains Entity's Business Keys and Surrogate Key(MD5 hash of concatenated business keys)\n* **Attribute**: attribute table contains FK to Entity's surrogate key, history of attribute changes, and the `valid_from` column\n* **Group**: group table contains FK to Entity's surrogate key, history of changes to group attributes, and the `valid_from` column\n* **Link**: link table contains FKs to Entities surrogate keys. Historicity by `SCD2`\n\n### Change types of Attributes and Groups\n* `IGNORE`: insert the latest new data, ignore updates\n* `UPDATE`: insert the latest new data, update\n* `NEW`: full history using `SCD2`. Adds the `valid_to` column\n\n\n## Supported Databases\n- [x] Postgres\n\n",
"bugtrack_url": null,
"license": "",
"summary": "hNhM \u2013 highly Normalized hybrid Model.",
"version": "0.0.11",
"project_urls": {
"Documentation": "https://marchinho11.github.io/hnhm",
"Homepage": "https://marchinho11.github.io/hnhm",
"Repository": "https://github.com/marchinho11/hnhm"
},
"split_keywords": [
"hnhm",
"datavault",
"anchormodeling",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "01cebacc2c989b363d738d8c0bfb33b4a144cb1584345fddf3f7e06141152589",
"md5": "f97a4778d588954da1c05fd08ca70416",
"sha256": "2719d1c516eec517662d054203f61fc15e8319588794be68041e977e7e3fecf4"
},
"downloads": -1,
"filename": "hnhm-0.0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f97a4778d588954da1c05fd08ca70416",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<3.11",
"size": 29401,
"upload_time": "2024-03-01T18:31:11",
"upload_time_iso_8601": "2024-03-01T18:31:11.116221Z",
"url": "https://files.pythonhosted.org/packages/01/ce/bacc2c989b363d738d8c0bfb33b4a144cb1584345fddf3f7e06141152589/hnhm-0.0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4964901926e527cd60dd0b51e47e360d675aab3e89a5a31bc69544112f28d80b",
"md5": "92c4a460ba4ab4442165273bf104e711",
"sha256": "be87db8d63646ed7d896d2d99aa288559cccc35642d70665a8589db150e3e854"
},
"downloads": -1,
"filename": "hnhm-0.0.11.tar.gz",
"has_sig": false,
"md5_digest": "92c4a460ba4ab4442165273bf104e711",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<3.11",
"size": 20947,
"upload_time": "2024-03-01T18:31:12",
"upload_time_iso_8601": "2024-03-01T18:31:12.911332Z",
"url": "https://files.pythonhosted.org/packages/49/64/901926e527cd60dd0b51e47e360d675aab3e89a5a31bc69544112f28d80b/hnhm-0.0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-01 18:31:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "marchinho11",
"github_project": "hnhm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hnhm"
}