# `erdgen`
> DBT YML ERD Generator
[![pypi](https://img.shields.io/pypi/v/erdgen?style=for-the-badge)](https://pypi.org/project/erdgen/)
## Overview
This Python program generates Database Markup Language (DBML) Entity Relationship Diagram's (ERD) from the relationships node in your dbt YML files. The script parses the YML files, extracts relationships and columns, and outputs a DBML schema.
The program is pretty opinionated. It requires each YML file to only contain one model. Further, the "relationships" node in dbt yml is a made-up construct.
This program is useful for automated ERD generation if your dbt project doesn't have referential integrity or explicit SQL relationships. If your SQL models have defined SQL relationships there are better tools for automated ERD generation.
## Usage
```bash
python -m erdgen --directory <directory> --include_non_join_keys <True/False>
```
### Args
- `--directory`: Directory to search for YAML files. The default value is the current directory ('.').
- `--include_non_join_keys`: Boolean flag to indicate whether to include non-join keys in the DBML. The default value is `False`.
The DBML will be printed to the console. You can redirect this output to a file if desired.
## YML File Structure
The YML files are expected to have the following structure:
```yml
version: 2
models:
- name: Computer
description: beep boop beep
columns:
- name: computerId
description: The unique identifier of computer
# other non-join key columns as necessary
relationships:
- name: files
description: The files are in the computer!?
type: one_to_many
table: computer_files
join:
- local: computerId
remote: computerId
```
**note**: Each YML file should contain only one model under the `models` node.
### Relationships
The `relationships` node in the YML files represents the relationship between the current model and other models. It is composed of several sub-nodes:
- `name`: The name of the relationship.
- `description`: A brief description of the relationship.
- `type`: The type of the relationship. It can be `one_to_one`, `one_to_many`, `many_to_one`, or `many_to_many`.
- `table`: The name of the other model involved in the relationship.
- `join`: A list of the columns that are used for the join between the current model and the other model. Each item in the list is composed of `local` and - `remote` nodes, representing the column in the current model and the column in the other model, respectively.
## Output
The output is a DBML schema that includes the tables, columns, and references based on the relationships defined in the YML files. The output is printed to the console.
## Notes
- If a YML model file has no `relationships`, and `include_non_join_keys` is `False`, all columns from the YML are included in the DBML table. This is helpful as other models may have a `relationship` to this model, and there is no way to know which column is being referenced (well there is but I didn't bother figuring this out)
- Regardless of whether `include_non_join_keys` is `True` or `False`, columns that contain `Id` or `id` in them are always included. These are likely join keys that do not have a relationship yet.
## Improvements
- All data types are int, account for the actual data type via metadata in the YML file
- All relationships are 1:1, account for the cardinality via the relationship `type`
- What about `.yaml` files lol
## DEV
### Create venv
```bash
python -m venv env
```
### Activate venv
- unix
```bash
source env/bin/activate
```
- windows
```bash
env\Scripts\activate.bat
```
### Install Packages
```bash
pip install -r requirements.txt
```
### Test
```bash
make test
```
### Format
```bash
make format
```
```bash
make lint
```
### Version & Release
```bash
bump2version <major/minor/patch>
```
```bash
make release
```
**note** Don't forget to `git push` with `--tags`
### pre-commit
#### Setup
```bash
pre-commit install
```
#### Run all
```bash
make pre-commit
```
Raw data
{
"_id": null,
"home_page": "https://github.com/neo-andrew-moss/erdgen",
"name": "erdgen",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Andrew Moss",
"author_email": "andrew.moss@neofinancial.com",
"download_url": "https://files.pythonhosted.org/packages/04/37/5e0ca69c255cec20ed8be0d8a08e55176e4ebc22ac182c5b8e539effa2da/erdgen-0.1.2.tar.gz",
"platform": null,
"description": "# `erdgen`\n\n> DBT YML ERD Generator\n\n[![pypi](https://img.shields.io/pypi/v/erdgen?style=for-the-badge)](https://pypi.org/project/erdgen/)\n\n## Overview\n\nThis Python program generates Database Markup Language (DBML) Entity Relationship Diagram's (ERD) from the relationships node in your dbt YML files. The script parses the YML files, extracts relationships and columns, and outputs a DBML schema.\n\nThe program is pretty opinionated. It requires each YML file to only contain one model. Further, the \"relationships\" node in dbt yml is a made-up construct.\n\nThis program is useful for automated ERD generation if your dbt project doesn't have referential integrity or explicit SQL relationships. If your SQL models have defined SQL relationships there are better tools for automated ERD generation.\n\n## Usage\n\n```bash\npython -m erdgen --directory <directory> --include_non_join_keys <True/False>\n```\n\n### Args\n\n- `--directory`: Directory to search for YAML files. The default value is the current directory ('.').\n- `--include_non_join_keys`: Boolean flag to indicate whether to include non-join keys in the DBML. The default value is `False`.\n\nThe DBML will be printed to the console. You can redirect this output to a file if desired.\n\n## YML File Structure\n\nThe YML files are expected to have the following structure:\n\n```yml\nversion: 2\n\nmodels:\n - name: Computer\n description: beep boop beep\n columns:\n - name: computerId\n description: The unique identifier of computer\n # other non-join key columns as necessary\n relationships:\n - name: files\n description: The files are in the computer!?\n type: one_to_many\n table: computer_files\n join:\n - local: computerId\n remote: computerId\n```\n\n**note**: Each YML file should contain only one model under the `models` node.\n\n### Relationships\n\nThe `relationships` node in the YML files represents the relationship between the current model and other models. It is composed of several sub-nodes:\n\n- `name`: The name of the relationship.\n- `description`: A brief description of the relationship.\n- `type`: The type of the relationship. It can be `one_to_one`, `one_to_many`, `many_to_one`, or `many_to_many`.\n- `table`: The name of the other model involved in the relationship.\n- `join`: A list of the columns that are used for the join between the current model and the other model. Each item in the list is composed of `local` and - `remote` nodes, representing the column in the current model and the column in the other model, respectively.\n\n## Output\n\nThe output is a DBML schema that includes the tables, columns, and references based on the relationships defined in the YML files. The output is printed to the console.\n\n## Notes\n\n- If a YML model file has no `relationships`, and `include_non_join_keys` is `False`, all columns from the YML are included in the DBML table. This is helpful as other models may have a `relationship` to this model, and there is no way to know which column is being referenced (well there is but I didn't bother figuring this out)\n- Regardless of whether `include_non_join_keys` is `True` or `False`, columns that contain `Id` or `id` in them are always included. These are likely join keys that do not have a relationship yet.\n\n## Improvements\n\n- All data types are int, account for the actual data type via metadata in the YML file\n- All relationships are 1:1, account for the cardinality via the relationship `type`\n- What about `.yaml` files lol\n\n## DEV\n\n### Create venv\n\n```bash\npython -m venv env\n```\n\n### Activate venv\n\n- unix\n\n```bash\nsource env/bin/activate\n```\n\n- windows\n\n```bash\nenv\\Scripts\\activate.bat\n```\n\n### Install Packages\n\n```bash\npip install -r requirements.txt\n```\n\n### Test\n\n```bash\nmake test\n```\n\n### Format\n\n```bash\nmake format\n```\n\n```bash\nmake lint\n```\n\n### Version & Release\n\n```bash\nbump2version <major/minor/patch>\n```\n\n```bash\nmake release\n```\n\n**note** Don't forget to `git push` with `--tags`\n\n### pre-commit\n\n#### Setup\n\n```bash\npre-commit install\n```\n\n#### Run all\n\n```bash\nmake pre-commit\n```\n\n\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Generate a DBML ERD from DBT YML relationships",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/neo-andrew-moss/erdgen"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "04375e0ca69c255cec20ed8be0d8a08e55176e4ebc22ac182c5b8e539effa2da",
"md5": "1218847e93c7e8c4646e0067222943b6",
"sha256": "b038ce530fcb89d5a050768523de5c0694a28790129c3933ee7a9cf0771e406c"
},
"downloads": -1,
"filename": "erdgen-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "1218847e93c7e8c4646e0067222943b6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 6072,
"upload_time": "2023-06-14T17:53:15",
"upload_time_iso_8601": "2023-06-14T17:53:15.056331Z",
"url": "https://files.pythonhosted.org/packages/04/37/5e0ca69c255cec20ed8be0d8a08e55176e4ebc22ac182c5b8e539effa2da/erdgen-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-14 17:53:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "neo-andrew-moss",
"github_project": "erdgen",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "erdgen"
}