Name | databricks-dbt-factory JSON |
Version |
0.0.1
JSON |
| download |
home_page | None |
Summary | Databricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks. |
upload_time | 2024-12-07 16:53:14 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | MIT |
keywords |
databricks
dbt
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
Databricks dbt factory
===
Databricks dbt factory is a simple library to generate Databricks Job tasks based on dbt manifest.
The tool can overwrite tasks in the existing Databricks job definition (in-place update, or creating new definition).
[![PyPI - Version](https://img.shields.io/pypi/v/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)
-----
**Table of Contents**
- [Installation](#installation)
- [Usage](#usage)
- [Contribution](#contribution)
- [License](#license)
# Motivation
The current integration of dbt with Databricks Workflows treats the entire dbt project as a single execution unit (black box), limiting flexibility and debugging options.
This project breaks down each dbt object (seed/snapshot/model/test) into separate Workflow task offering several key benefits:
* Simplified Troubleshooting: Isolating tasks makes it easier to identify and resolve issues specific to a single model
* Enhanced Logging and Notifications: Provides more detailed logs and precise error alerts, improving debugging efficiency
* Better Retriability: Enables retrying only the failed model tasks, saving time and resources compared to rerunning the entire project
* Seamless Testing: Allows running dbt data tests on tables immediately after a model completes, ensuring faster validation and feedback
### Databricks Workflows run all dbt objects at once:
![before](docs/before.png?)
![dbt_task](docs/dbt_task.png?)
### The tool generates workflows where dbt objects are run as individual Databricks Workflow tasks:
![after](docs/after.png?)
![workflow](docs/workflow.png?)
# Installation
```shell
pip install databricks-dbt-factory
```
# Usage
Update tasks in the existing Databricks job definition and write the results to `job_definition_new.yaml`:
```shell
databricks_dbt_factory \
--dbt-manifest-path tests/test_data/manifest.json \
--input-job-spec-path tests/test_data/job_definition_template.yaml \
--target-job-spec-path job_definition_new.yaml \
--source GIT \
--target dev
```
**Arguments:**
- `--new-job-name` (type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.
- `--dbt-manifest-path` (type: str, required): Path to the manifest file.
- `--input-job-spec-path` (type: str, required): Path to the input job spec file.
- `--target-job-spec-path` (type: str, required): Path to the target job spec file.
- `--target` (type: str, required): dbt target to use.
- `--source` (type: str, optional, default: None): Optional project source. If not provided, WORKSPACE will be used.
- `--warehouse_id` (type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.
- `--schema` (type: str, optional, default: None): Optional schema to write to.
- `--catalog` (type: str, optional, default: None): Optional catalog to write to.
- `--profiles-directory` (type: str, optional, default: None): Optional (relative) path to the profiles directory.
- `--project-directory` (type: str, optional, default: None): Optional (relative) path to the project directory.
- `--environment-key` (type: str, optional, default: Default): Optional (relative) key of an environment.
- `--extra-dbt-command-options` (type: str, optional, default: ""): Optional additional dbt command options.
- `--run-tests` (type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.
- `--dry-run` (type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.
You can also check all input arguments by running `databricks_dbt_factory --help`.
Demo of the tool can be found [here](https://github.com/mwojtyczka/dbt-demo).
# Contribution
See contribution guidance [here](CONTRIBUTING.md).
# License
`databricks-dbt-factory` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
Raw data
{
"_id": null,
"home_page": null,
"name": "databricks-dbt-factory",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "Databricks, dbt",
"author": null,
"author_email": "Marcin Wojtyczka <marcin.wojtyczka@databricks.com>",
"download_url": "https://files.pythonhosted.org/packages/bd/3f/9190f27374fa527543b532249e8828b101c68675bd5a183ac7174bcf7e70/databricks_dbt_factory-0.0.1.tar.gz",
"platform": null,
"description": "Databricks dbt factory\n===\n\nDatabricks dbt factory is a simple library to generate Databricks Job tasks based on dbt manifest.\nThe tool can overwrite tasks in the existing Databricks job definition (in-place update, or creating new definition).\n\n[![PyPI - Version](https://img.shields.io/pypi/v/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)\n\n-----\n\n**Table of Contents**\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Contribution](#contribution)\n- [License](#license)\n\n# Motivation\n\nThe current integration of dbt with Databricks Workflows treats the entire dbt project as a single execution unit (black box), limiting flexibility and debugging options.\n\nThis project breaks down each dbt object (seed/snapshot/model/test) into separate Workflow task offering several key benefits:\n* Simplified Troubleshooting: Isolating tasks makes it easier to identify and resolve issues specific to a single model\n* Enhanced Logging and Notifications: Provides more detailed logs and precise error alerts, improving debugging efficiency\n* Better Retriability: Enables retrying only the failed model tasks, saving time and resources compared to rerunning the entire project\n* Seamless Testing: Allows running dbt data tests on tables immediately after a model completes, ensuring faster validation and feedback\n\n### Databricks Workflows run all dbt objects at once:\n![before](docs/before.png?)\n\n![dbt_task](docs/dbt_task.png?)\n\n### The tool generates workflows where dbt objects are run as individual Databricks Workflow tasks:\n![after](docs/after.png?)\n\n![workflow](docs/workflow.png?)\n\n# Installation\n\n```shell\npip install databricks-dbt-factory\n```\n\n# Usage\n\nUpdate tasks in the existing Databricks job definition and write the results to `job_definition_new.yaml`:\n```shell\ndatabricks_dbt_factory \\\n --dbt-manifest-path tests/test_data/manifest.json \\\n --input-job-spec-path tests/test_data/job_definition_template.yaml \\\n --target-job-spec-path job_definition_new.yaml \\\n --source GIT \\\n --target dev\n```\n\n**Arguments:**\n- `--new-job-name` (type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.\n- `--dbt-manifest-path` (type: str, required): Path to the manifest file.\n- `--input-job-spec-path` (type: str, required): Path to the input job spec file.\n- `--target-job-spec-path` (type: str, required): Path to the target job spec file.\n- `--target` (type: str, required): dbt target to use.\n- `--source` (type: str, optional, default: None): Optional project source. If not provided, WORKSPACE will be used.\n- `--warehouse_id` (type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.\n- `--schema` (type: str, optional, default: None): Optional schema to write to.\n- `--catalog` (type: str, optional, default: None): Optional catalog to write to.\n- `--profiles-directory` (type: str, optional, default: None): Optional (relative) path to the profiles directory.\n- `--project-directory` (type: str, optional, default: None): Optional (relative) path to the project directory.\n- `--environment-key` (type: str, optional, default: Default): Optional (relative) key of an environment.\n- `--extra-dbt-command-options` (type: str, optional, default: \"\"): Optional additional dbt command options.\n- `--run-tests` (type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.\n- `--dry-run` (type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.\n\nYou can also check all input arguments by running `databricks_dbt_factory --help`.\n\nDemo of the tool can be found [here](https://github.com/mwojtyczka/dbt-demo).\n\n# Contribution\n\nSee contribution guidance [here](CONTRIBUTING.md).\n\n# License\n\n`databricks-dbt-factory` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Databricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks.",
"version": "0.0.1",
"project_urls": {
"Documentation": "https://github.com/mwojtyczka/databricks-dbt-factory#readme",
"Issues": "https://github.com/mwojtyczka/databricks-dbt-factory/issues",
"Source": "https://github.com/mwojtyczka/databricks-dbt-factory"
},
"split_keywords": [
"databricks",
" dbt"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cb650d7a08849e7cbd827db6424c5bcf771a93621faf214ea486922b5fb344b9",
"md5": "f5a8c514666acdd7edec8385dc1d94d2",
"sha256": "6b6b3937fe8a5af80fc2c35ccb7c5988c7a6dfb2dd0bc9438530503b363e3070"
},
"downloads": -1,
"filename": "databricks_dbt_factory-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f5a8c514666acdd7edec8385dc1d94d2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 10600,
"upload_time": "2024-12-07T16:53:13",
"upload_time_iso_8601": "2024-12-07T16:53:13.862732Z",
"url": "https://files.pythonhosted.org/packages/cb/65/0d7a08849e7cbd827db6424c5bcf771a93621faf214ea486922b5fb344b9/databricks_dbt_factory-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bd3f9190f27374fa527543b532249e8828b101c68675bd5a183ac7174bcf7e70",
"md5": "05f111bf74b0857bce5e98f2ed95b289",
"sha256": "2900af90648974e4ec33ec952b7ddabb386b6357eff6e9eaa824760950122f66"
},
"downloads": -1,
"filename": "databricks_dbt_factory-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "05f111bf74b0857bce5e98f2ed95b289",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 17752,
"upload_time": "2024-12-07T16:53:14",
"upload_time_iso_8601": "2024-12-07T16:53:14.865456Z",
"url": "https://files.pythonhosted.org/packages/bd/3f/9190f27374fa527543b532249e8828b101c68675bd5a183ac7174bcf7e70/databricks_dbt_factory-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-07 16:53:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mwojtyczka",
"github_project": "databricks-dbt-factory#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "databricks-dbt-factory"
}