databricks-dbt-factory


Namedatabricks-dbt-factory JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryDatabricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks.
upload_time2024-12-07 16:53:14
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords databricks dbt
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Databricks dbt factory
===

Databricks dbt factory is a simple library to generate Databricks Job tasks based on dbt manifest.
The tool can overwrite tasks in the existing Databricks job definition (in-place update, or creating new definition).

[![PyPI - Version](https://img.shields.io/pypi/v/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)

-----

**Table of Contents**

- [Installation](#installation)
- [Usage](#usage)
- [Contribution](#contribution)
- [License](#license)

# Motivation

The current integration of dbt with Databricks Workflows treats the entire dbt project as a single execution unit (black box), limiting flexibility and debugging options.

This project breaks down each dbt object (seed/snapshot/model/test) into separate Workflow task offering several key benefits:
* Simplified Troubleshooting: Isolating tasks makes it easier to identify and resolve issues specific to a single model
* Enhanced Logging and Notifications: Provides more detailed logs and precise error alerts, improving debugging efficiency
* Better Retriability: Enables retrying only the failed model tasks, saving time and resources compared to rerunning the entire project
* Seamless Testing: Allows running dbt data tests on tables immediately after a model completes, ensuring faster validation and feedback

### Databricks Workflows run all dbt objects at once:
![before](docs/before.png?)

![dbt_task](docs/dbt_task.png?)

### The tool generates workflows where dbt objects are run as individual Databricks Workflow tasks:
![after](docs/after.png?)

![workflow](docs/workflow.png?)

# Installation

```shell
pip install databricks-dbt-factory
```

# Usage

Update tasks in the existing Databricks job definition and write the results to `job_definition_new.yaml`:
```shell
databricks_dbt_factory  \
  --dbt-manifest-path tests/test_data/manifest.json \
  --input-job-spec-path tests/test_data/job_definition_template.yaml \
  --target-job-spec-path job_definition_new.yaml \
  --source GIT \
  --target dev
```

**Arguments:**
- `--new-job-name` (type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.
- `--dbt-manifest-path` (type: str, required): Path to the manifest file.
- `--input-job-spec-path` (type: str, required): Path to the input job spec file.
- `--target-job-spec-path` (type: str, required): Path to the target job spec file.
- `--target` (type: str, required): dbt target to use.
- `--source` (type: str, optional, default: None): Optional project source. If not provided, WORKSPACE will be used.
- `--warehouse_id` (type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.
- `--schema` (type: str, optional, default: None): Optional schema to write to.
- `--catalog` (type: str, optional, default: None): Optional catalog to write to.
- `--profiles-directory` (type: str, optional, default: None): Optional (relative) path to the profiles directory.
- `--project-directory` (type: str, optional, default: None): Optional (relative) path to the project directory.
- `--environment-key` (type: str, optional, default: Default): Optional (relative) key of an environment.
- `--extra-dbt-command-options` (type: str, optional, default: ""): Optional additional dbt command options.
- `--run-tests` (type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.
- `--dry-run` (type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.

You can also check all input arguments by running `databricks_dbt_factory --help`.

Demo of the tool can be found [here](https://github.com/mwojtyczka/dbt-demo).

# Contribution

See contribution guidance [here](CONTRIBUTING.md).

# License

`databricks-dbt-factory` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "databricks-dbt-factory",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Databricks, dbt",
    "author": null,
    "author_email": "Marcin Wojtyczka <marcin.wojtyczka@databricks.com>",
    "download_url": "https://files.pythonhosted.org/packages/bd/3f/9190f27374fa527543b532249e8828b101c68675bd5a183ac7174bcf7e70/databricks_dbt_factory-0.0.1.tar.gz",
    "platform": null,
    "description": "Databricks dbt factory\n===\n\nDatabricks dbt factory is a simple library to generate Databricks Job tasks based on dbt manifest.\nThe tool can overwrite tasks in the existing Databricks job definition (in-place update, or creating new definition).\n\n[![PyPI - Version](https://img.shields.io/pypi/v/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)\n\n-----\n\n**Table of Contents**\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Contribution](#contribution)\n- [License](#license)\n\n# Motivation\n\nThe current integration of dbt with Databricks Workflows treats the entire dbt project as a single execution unit (black box), limiting flexibility and debugging options.\n\nThis project breaks down each dbt object (seed/snapshot/model/test) into separate Workflow task offering several key benefits:\n* Simplified Troubleshooting: Isolating tasks makes it easier to identify and resolve issues specific to a single model\n* Enhanced Logging and Notifications: Provides more detailed logs and precise error alerts, improving debugging efficiency\n* Better Retriability: Enables retrying only the failed model tasks, saving time and resources compared to rerunning the entire project\n* Seamless Testing: Allows running dbt data tests on tables immediately after a model completes, ensuring faster validation and feedback\n\n### Databricks Workflows run all dbt objects at once:\n![before](docs/before.png?)\n\n![dbt_task](docs/dbt_task.png?)\n\n### The tool generates workflows where dbt objects are run as individual Databricks Workflow tasks:\n![after](docs/after.png?)\n\n![workflow](docs/workflow.png?)\n\n# Installation\n\n```shell\npip install databricks-dbt-factory\n```\n\n# Usage\n\nUpdate tasks in the existing Databricks job definition and write the results to `job_definition_new.yaml`:\n```shell\ndatabricks_dbt_factory  \\\n  --dbt-manifest-path tests/test_data/manifest.json \\\n  --input-job-spec-path tests/test_data/job_definition_template.yaml \\\n  --target-job-spec-path job_definition_new.yaml \\\n  --source GIT \\\n  --target dev\n```\n\n**Arguments:**\n- `--new-job-name` (type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.\n- `--dbt-manifest-path` (type: str, required): Path to the manifest file.\n- `--input-job-spec-path` (type: str, required): Path to the input job spec file.\n- `--target-job-spec-path` (type: str, required): Path to the target job spec file.\n- `--target` (type: str, required): dbt target to use.\n- `--source` (type: str, optional, default: None): Optional project source. If not provided, WORKSPACE will be used.\n- `--warehouse_id` (type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.\n- `--schema` (type: str, optional, default: None): Optional schema to write to.\n- `--catalog` (type: str, optional, default: None): Optional catalog to write to.\n- `--profiles-directory` (type: str, optional, default: None): Optional (relative) path to the profiles directory.\n- `--project-directory` (type: str, optional, default: None): Optional (relative) path to the project directory.\n- `--environment-key` (type: str, optional, default: Default): Optional (relative) key of an environment.\n- `--extra-dbt-command-options` (type: str, optional, default: \"\"): Optional additional dbt command options.\n- `--run-tests` (type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.\n- `--dry-run` (type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.\n\nYou can also check all input arguments by running `databricks_dbt_factory --help`.\n\nDemo of the tool can be found [here](https://github.com/mwojtyczka/dbt-demo).\n\n# Contribution\n\nSee contribution guidance [here](CONTRIBUTING.md).\n\n# License\n\n`databricks-dbt-factory` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Databricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks.",
    "version": "0.0.1",
    "project_urls": {
        "Documentation": "https://github.com/mwojtyczka/databricks-dbt-factory#readme",
        "Issues": "https://github.com/mwojtyczka/databricks-dbt-factory/issues",
        "Source": "https://github.com/mwojtyczka/databricks-dbt-factory"
    },
    "split_keywords": [
        "databricks",
        " dbt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb650d7a08849e7cbd827db6424c5bcf771a93621faf214ea486922b5fb344b9",
                "md5": "f5a8c514666acdd7edec8385dc1d94d2",
                "sha256": "6b6b3937fe8a5af80fc2c35ccb7c5988c7a6dfb2dd0bc9438530503b363e3070"
            },
            "downloads": -1,
            "filename": "databricks_dbt_factory-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f5a8c514666acdd7edec8385dc1d94d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 10600,
            "upload_time": "2024-12-07T16:53:13",
            "upload_time_iso_8601": "2024-12-07T16:53:13.862732Z",
            "url": "https://files.pythonhosted.org/packages/cb/65/0d7a08849e7cbd827db6424c5bcf771a93621faf214ea486922b5fb344b9/databricks_dbt_factory-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd3f9190f27374fa527543b532249e8828b101c68675bd5a183ac7174bcf7e70",
                "md5": "05f111bf74b0857bce5e98f2ed95b289",
                "sha256": "2900af90648974e4ec33ec952b7ddabb386b6357eff6e9eaa824760950122f66"
            },
            "downloads": -1,
            "filename": "databricks_dbt_factory-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "05f111bf74b0857bce5e98f2ed95b289",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 17752,
            "upload_time": "2024-12-07T16:53:14",
            "upload_time_iso_8601": "2024-12-07T16:53:14.865456Z",
            "url": "https://files.pythonhosted.org/packages/bd/3f/9190f27374fa527543b532249e8828b101c68675bd5a183ac7174bcf7e70/databricks_dbt_factory-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-07 16:53:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mwojtyczka",
    "github_project": "databricks-dbt-factory#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "databricks-dbt-factory"
}
        
Elapsed time: 0.40994s