databricks-dbt-factory


Namedatabricks-dbt-factory JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryDatabricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks.
upload_time2025-07-08 21:49:31
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords databricks dbt
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Databricks dbt factory
===

Databricks dbt Factory is a lightweight library that generates a Databricks Workflow task for each dbt model, based on your dbt manifest.
It creates a DAG of tasks that run each dbt model, test, seed, and snapshot as a separate task in Databricks Workflows.

The tool can create or update tasks directly within an existing job specification such as Databricks Assets Bundle (DAB).

[![PyPI - Version](https://img.shields.io/pypi/v/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)

-----

**Table of Contents**

- [Motivation](#motivation)
- [How it works](#benefits)
- [Installation](#installation)
- [Usage](#usage)
- [Contribution](#contribution)
- [License](#license)

# Motivation

By default, dbt's integration with Databricks Workflows treats an entire dbt project as a single execution unit — a black box.

Databricks dbt Factory changes that by updating Databricks Workflow specs to run dbt objects (models, tests, seeds, snapshots) as individual tasks.

![before](docs/dbt-factory.png?)

### Benefits

✅ Simplified troubleshooting — Quickly pinpoint and fix issues at the model level.

✅ Enhanced logging & notifications — Gain detailed logs and precise error alerts for faster debugging.

✅ Improved retriability — Retry only the failed model tasks without rerunning the full project.

✅ Seamless testing — Automatically run dbt data tests on tables right after each model finishes, enabling faster validation and feedback.

# How it works

![after](docs/arch.png?)

The tool reads the dbt manifest file and the existing DAB workflow definition, and generates a new definition.

# Installation

```shell
pip install databricks-dbt-factory
```

# Usage

Update tasks in the existing Databricks workflow (job) definition and write new spec to `job_definition_new.yaml`:
```shell
databricks_dbt_factory  \
  --dbt-manifest-path tests/test_data/manifest.json \
  --input-job-spec-path tests/test_data/job_definition_template.yaml \
  --target-job-spec-path job_definition_new.yaml \
  --source GIT \
  --target dev
```

Note that `--input-job-spec-path` and `--target-job-spec-path` can be the same file, in which case the job spec is updated in place.

**Arguments:**
- `--new-job-name` (type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.
- `--dbt-manifest-path` (type: str, required): Path to the dbt manifest file.
- `--input-job-spec-path` (type: str, required): Path to the input job spec file.
- `--target-job-spec-path` (type: str, required): Path to the target job spec file.
- `--target` (type: str, required): optional dbt target to use. If not provided, the default target from the dbt profile will be used.
- `--source` (type: str, optional, default: None): Optional dbt project source (`GIT` or `WORKSPACE`). If not provided, `WORKSPACE` will be used.
- `--warehouse_id` (type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.
- `--schema` (type: str, optional, default: None): Optional metastore schema (database) to use in the dbt task.
- `--catalog` (type: str, optional, default: None): Optional metastore catalog to use in the dbt task.
- `--profiles-directory` (type: str, optional, default: None): Optional (relative) path to the job profiles directory to use in the dbt task.
- `--project-directory` (type: str, optional, default: None): Optional (relative) workspace path to the dbt project directory to use in the dbt task.
- `--environment-key` (type: str, optional, default: Default): Optional (relative) key of an environment.
- `--extra-dbt-command-options` (type: str, optional, default: ""): Optional additional dbt command options to include.
- `--run-tests` (type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.
- `--enable-dbt-deps` (type: bool, optional, default: False): Whether to run dbt deps before each task. Disabled by default.
- `--dbt-tasks-deps` (type: str, optional, default: None): Optional comma separated list of tasks for which dbt deps should be run (e.g. "diamonds_prices,second_dbt_model"). Only in effect if `--enable-dbt-deps` is enabled.
- `--dry-run` (type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.

You can also check all input arguments by running `databricks_dbt_factory --help`.

Demo of the tool can be found [here](https://github.com/mwojtyczka/dbt-demo).

# Contribution

See contribution guidance [here](CONTRIBUTING.md).

# License

`databricks-dbt-factory` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "databricks-dbt-factory",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Databricks, dbt",
    "author": null,
    "author_email": "Marcin Wojtyczka <marcin.wojtyczka@databricks.com>",
    "download_url": "https://files.pythonhosted.org/packages/85/ee/aad937587d628d41e2140e761c814361c96d257f01ad66728a405f66ce2b/databricks_dbt_factory-0.1.1.tar.gz",
    "platform": null,
    "description": "Databricks dbt factory\n===\n\nDatabricks dbt Factory is a lightweight library that generates a Databricks Workflow task for each dbt model, based on your dbt manifest.\nIt creates a DAG of tasks that run each dbt model, test, seed, and snapshot as a separate task in Databricks Workflows.\n\nThe tool can create or update tasks directly within an existing job specification such as Databricks Assets Bundle (DAB).\n\n[![PyPI - Version](https://img.shields.io/pypi/v/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/databricks-dbt-factory.svg)](https://pypi.org/project/databricks-dbt-factory)\n\n-----\n\n**Table of Contents**\n\n- [Motivation](#motivation)\n- [How it works](#benefits)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Contribution](#contribution)\n- [License](#license)\n\n# Motivation\n\nBy default, dbt's integration with Databricks Workflows treats an entire dbt project as a single execution unit \u2014 a black box.\n\nDatabricks dbt Factory changes that by updating Databricks Workflow specs to run dbt objects (models, tests, seeds, snapshots) as individual tasks.\n\n![before](docs/dbt-factory.png?)\n\n### Benefits\n\n\u2705 Simplified troubleshooting \u2014 Quickly pinpoint and fix issues at the model level.\n\n\u2705 Enhanced logging & notifications \u2014 Gain detailed logs and precise error alerts for faster debugging.\n\n\u2705 Improved retriability \u2014 Retry only the failed model tasks without rerunning the full project.\n\n\u2705 Seamless testing \u2014 Automatically run dbt data tests on tables right after each model finishes, enabling faster validation and feedback.\n\n# How it works\n\n![after](docs/arch.png?)\n\nThe tool reads the dbt manifest file and the existing DAB workflow definition, and generates a new definition.\n\n# Installation\n\n```shell\npip install databricks-dbt-factory\n```\n\n# Usage\n\nUpdate tasks in the existing Databricks workflow (job) definition and write new spec to `job_definition_new.yaml`:\n```shell\ndatabricks_dbt_factory  \\\n  --dbt-manifest-path tests/test_data/manifest.json \\\n  --input-job-spec-path tests/test_data/job_definition_template.yaml \\\n  --target-job-spec-path job_definition_new.yaml \\\n  --source GIT \\\n  --target dev\n```\n\nNote that `--input-job-spec-path` and `--target-job-spec-path` can be the same file, in which case the job spec is updated in place.\n\n**Arguments:**\n- `--new-job-name` (type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.\n- `--dbt-manifest-path` (type: str, required): Path to the dbt manifest file.\n- `--input-job-spec-path` (type: str, required): Path to the input job spec file.\n- `--target-job-spec-path` (type: str, required): Path to the target job spec file.\n- `--target` (type: str, required): optional dbt target to use. If not provided, the default target from the dbt profile will be used.\n- `--source` (type: str, optional, default: None): Optional dbt project source (`GIT` or `WORKSPACE`). If not provided, `WORKSPACE` will be used.\n- `--warehouse_id` (type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.\n- `--schema` (type: str, optional, default: None): Optional metastore schema (database) to use in the dbt task.\n- `--catalog` (type: str, optional, default: None): Optional metastore catalog to use in the dbt task.\n- `--profiles-directory` (type: str, optional, default: None): Optional (relative) path to the job profiles directory to use in the dbt task.\n- `--project-directory` (type: str, optional, default: None): Optional (relative) workspace path to the dbt project directory to use in the dbt task.\n- `--environment-key` (type: str, optional, default: Default): Optional (relative) key of an environment.\n- `--extra-dbt-command-options` (type: str, optional, default: \"\"): Optional additional dbt command options to include.\n- `--run-tests` (type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.\n- `--enable-dbt-deps` (type: bool, optional, default: False): Whether to run dbt deps before each task. Disabled by default.\n- `--dbt-tasks-deps` (type: str, optional, default: None): Optional comma separated list of tasks for which dbt deps should be run (e.g. \"diamonds_prices,second_dbt_model\"). Only in effect if `--enable-dbt-deps` is enabled.\n- `--dry-run` (type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.\n\nYou can also check all input arguments by running `databricks_dbt_factory --help`.\n\nDemo of the tool can be found [here](https://github.com/mwojtyczka/dbt-demo).\n\n# Contribution\n\nSee contribution guidance [here](CONTRIBUTING.md).\n\n# License\n\n`databricks-dbt-factory` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Databricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks.",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/mwojtyczka/databricks-dbt-factory#readme",
        "Issues": "https://github.com/mwojtyczka/databricks-dbt-factory/issues",
        "Source": "https://github.com/mwojtyczka/databricks-dbt-factory"
    },
    "split_keywords": [
        "databricks",
        " dbt"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a6b6bf52421089bca7bf6dbf264239b34f3db30d0581612a91ba7923167e76dd",
                "md5": "f47b7e3287abf26f21e6d10f09b2115d",
                "sha256": "a0320c04651920520cb54d8157619214a569d253ed40e706de0af27734aa3846"
            },
            "downloads": -1,
            "filename": "databricks_dbt_factory-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f47b7e3287abf26f21e6d10f09b2115d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 11788,
            "upload_time": "2025-07-08T21:49:29",
            "upload_time_iso_8601": "2025-07-08T21:49:29.932298Z",
            "url": "https://files.pythonhosted.org/packages/a6/b6/bf52421089bca7bf6dbf264239b34f3db30d0581612a91ba7923167e76dd/databricks_dbt_factory-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "85eeaad937587d628d41e2140e761c814361c96d257f01ad66728a405f66ce2b",
                "md5": "cd697a8f27d9f00b83774b99e6d87474",
                "sha256": "db6810aa4796481d965f014145878089da9d1248575f890bb50b93d4e7ff146b"
            },
            "downloads": -1,
            "filename": "databricks_dbt_factory-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "cd697a8f27d9f00b83774b99e6d87474",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 18789,
            "upload_time": "2025-07-08T21:49:31",
            "upload_time_iso_8601": "2025-07-08T21:49:31.320264Z",
            "url": "https://files.pythonhosted.org/packages/85/ee/aad937587d628d41e2140e761c814361c96d257f01ad66728a405f66ce2b/databricks_dbt_factory-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-08 21:49:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mwojtyczka",
    "github_project": "databricks-dbt-factory#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "databricks-dbt-factory"
}
        
Elapsed time: 0.48917s