dbt-af


Namedbt-af JSON
Version 0.9.3 PyPI version JSON
download
home_pagehttps://github.com/Toloka/dbt-af
SummaryDistibuted dbt runs on Apache Airflow
upload_time2024-11-19 12:56:37
maintainerNone
docs_urlNone
authorNikita Yurasov
requires_python<3.12,>=3.10
licenseApache-2.0
keywords python airflow dbt
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI - Version](https://img.shields.io/pypi/v/dbt-af)](https://pypi.org/project/dbt-af/)
[![GitHub Build](https://github.com/Toloka/dbt-af/workflows/Tests/badge.svg)](https://github.com/Toloka/dbt-af/actions)

[![License](https://img.shields.io/:license-Apache%202-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0.txt)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dbt-af.svg)](https://pypi.org/project/dbt-af/)
[![PyPI - Downloads](https://img.shields.io/pepy/dt/dbt-af)](https://pypi.org/project/dbt-af/)

[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

# dbt-af: distributed run of dbt models using Airflow

## Overview

**_dbt-af_** is a tool that allows you to run dbt models in a distributed manner using Airflow.
It acts as a wrapper around the Airflow DAG,
allowing you to run the models independently while preserving their dependencies.

![dbt-af](docs/static/airflow_dag_layout.png)

### Why?

1. **_dbt-af_** is [domain-driven](https://www.datamesh-architecture.com/#what-is-data-mesh).
   It is designed to separate models from different domains into different DAGs.
   This allows you to run models from different domains in parallel.
2. **_dbt-af_** is **dbt-first** solution.
   It is designed to make analytics' life easier.
   End-users could even not know that Airflow is used to schedule their models.
   dbt-model's config is an entry point for all your settings and customizations.
3. **_dbt-af_** brings scheduling to dbt. From `@monthly` to `@hourly` and even [more](examples/manual_scheduling.md).
4. **_dbt-af_** is an ETL-driven tool.
   You can separate your models into tiers or ETL stages
   and build graphs showing the dependencies between models within each tier or stage.
5. **_dbt-af_** brings additional features to use different dbt targets simultaneously, different tests scenarios, and
   maintenance tasks.

## Installation

To install `dbt-af` run `pip install dbt-af`.

To contribute we recommend to use `poetry` to install package dependencies. Run `poetry install --with=dev` to install
all dependencies.

## _dbt-af_ by Example

All tutorials and examples are located in the [examples](examples/README.md) folder.

To get basic Airflow DAGs for your dbt project, you need to put the following code into your `dags` folder:

```python
# LABELS: dag, airflow (it's required for airflow dag-processor)
from dbt_af.dags import compile_dbt_af_dags
from dbt_af.conf import Config, DbtDefaultTargetsConfig, DbtProjectConfig

# specify here all settings for your dbt project
config = Config(
    dbt_project=DbtProjectConfig(
        dbt_project_name='my_dbt_project',
        dbt_project_path='/path/to/my_dbt_project',
        dbt_models_path='/path/to/my_dbt_project/models',
        dbt_profiles_path='/path/to/my_dbt_project',
        dbt_target_path='/path/to/my_dbt_project/target',
        dbt_log_path='/path/to/my_dbt_project/logs',
        dbt_schema='my_dbt_schema',
    ),
    dbt_default_targets=DbtDefaultTargetsConfig(default_target='dev'),
    is_dev=False,  # set to True if you want to turn on dry-run mode
)

dags = compile_dbt_af_dags(manifest_path='/path/to/my_dbt_project/target/manifest.json', config=config)
for dag_name, dag in dags.items():
    globals()[dag_name] = dag
```

In _dbt_project.yml_ you need to set up default targets for all nodes in your project
(see [example](examples/dags/dbt_project.yml)):

```yaml
sql_cluster: "dev"
daily_sql_cluster: "dev"
py_cluster: "dev"
bf_cluster: "dev"
```

This will create Airflow DAGs for your dbt project.

Check out the documentation for more details [here](docs/docs.md).

## Features

1. **_dbt-af_** is essentially designed to work with large projects (1000+ models).
   When dealing with a significant number of dbt objects across different domains,
   it becomes crucial to have all DAGs auto-generated.
   **_dbt-af_** takes care of this by generating all the necessary DAGs for your dbt project and structuring them by
   domains.
2. Each dbt run is separated into a different Airflow task. All tasks receive a date interval from the Airflow DAG
   context. By using the passed date interval in your dbt models, you ensure the *idempotency* of your dbt runs.
3. _**dbt-af**_ lowers the entry threshold for non-infrastructure team members.
   This means that analytics professionals, data scientists,
   and data engineers can focus on their dbt models and important business logic
   rather than spending time on Airflow DAGs.

## Project Information

- [Docs](docs/docs.md)
- [PyPI](https://pypi.org/project/dbt-af/)
- [Contributing](CONTRIBUTING.md)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Toloka/dbt-af",
    "name": "dbt-af",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.10",
    "maintainer_email": null,
    "keywords": "python, airflow, dbt",
    "author": "Nikita Yurasov",
    "author_email": "nikitayurasov@toloka.ai",
    "download_url": "https://files.pythonhosted.org/packages/10/9a/7f005027efd5b20337c951ef7a6684380b14d083500f3d5fc0df52e17593/dbt_af-0.9.3.tar.gz",
    "platform": null,
    "description": "[![PyPI - Version](https://img.shields.io/pypi/v/dbt-af)](https://pypi.org/project/dbt-af/)\n[![GitHub Build](https://github.com/Toloka/dbt-af/workflows/Tests/badge.svg)](https://github.com/Toloka/dbt-af/actions)\n\n[![License](https://img.shields.io/:license-Apache%202-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0.txt)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dbt-af.svg)](https://pypi.org/project/dbt-af/)\n[![PyPI - Downloads](https://img.shields.io/pepy/dt/dbt-af)](https://pypi.org/project/dbt-af/)\n\n[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# dbt-af: distributed run of dbt models using Airflow\n\n## Overview\n\n**_dbt-af_** is a tool that allows you to run dbt models in a distributed manner using Airflow.\nIt acts as a wrapper around the Airflow DAG,\nallowing you to run the models independently while preserving their dependencies.\n\n![dbt-af](docs/static/airflow_dag_layout.png)\n\n### Why?\n\n1. **_dbt-af_** is [domain-driven](https://www.datamesh-architecture.com/#what-is-data-mesh).\n   It is designed to separate models from different domains into different DAGs.\n   This allows you to run models from different domains in parallel.\n2. **_dbt-af_** is **dbt-first** solution.\n   It is designed to make analytics' life easier.\n   End-users could even not know that Airflow is used to schedule their models.\n   dbt-model's config is an entry point for all your settings and customizations.\n3. **_dbt-af_** brings scheduling to dbt. From `@monthly` to `@hourly` and even [more](examples/manual_scheduling.md).\n4. **_dbt-af_** is an ETL-driven tool.\n   You can separate your models into tiers or ETL stages\n   and build graphs showing the dependencies between models within each tier or stage.\n5. **_dbt-af_** brings additional features to use different dbt targets simultaneously, different tests scenarios, and\n   maintenance tasks.\n\n## Installation\n\nTo install `dbt-af` run `pip install dbt-af`.\n\nTo contribute we recommend to use `poetry` to install package dependencies. Run `poetry install --with=dev` to install\nall dependencies.\n\n## _dbt-af_ by Example\n\nAll tutorials and examples are located in the [examples](examples/README.md) folder.\n\nTo get basic Airflow DAGs for your dbt project, you need to put the following code into your `dags` folder:\n\n```python\n# LABELS: dag, airflow (it's required for airflow dag-processor)\nfrom dbt_af.dags import compile_dbt_af_dags\nfrom dbt_af.conf import Config, DbtDefaultTargetsConfig, DbtProjectConfig\n\n# specify here all settings for your dbt project\nconfig = Config(\n    dbt_project=DbtProjectConfig(\n        dbt_project_name='my_dbt_project',\n        dbt_project_path='/path/to/my_dbt_project',\n        dbt_models_path='/path/to/my_dbt_project/models',\n        dbt_profiles_path='/path/to/my_dbt_project',\n        dbt_target_path='/path/to/my_dbt_project/target',\n        dbt_log_path='/path/to/my_dbt_project/logs',\n        dbt_schema='my_dbt_schema',\n    ),\n    dbt_default_targets=DbtDefaultTargetsConfig(default_target='dev'),\n    is_dev=False,  # set to True if you want to turn on dry-run mode\n)\n\ndags = compile_dbt_af_dags(manifest_path='/path/to/my_dbt_project/target/manifest.json', config=config)\nfor dag_name, dag in dags.items():\n    globals()[dag_name] = dag\n```\n\nIn _dbt_project.yml_ you need to set up default targets for all nodes in your project\n(see [example](examples/dags/dbt_project.yml)):\n\n```yaml\nsql_cluster: \"dev\"\ndaily_sql_cluster: \"dev\"\npy_cluster: \"dev\"\nbf_cluster: \"dev\"\n```\n\nThis will create Airflow DAGs for your dbt project.\n\nCheck out the documentation for more details [here](docs/docs.md).\n\n## Features\n\n1. **_dbt-af_** is essentially designed to work with large projects (1000+ models).\n   When dealing with a significant number of dbt objects across different domains,\n   it becomes crucial to have all DAGs auto-generated.\n   **_dbt-af_** takes care of this by generating all the necessary DAGs for your dbt project and structuring them by\n   domains.\n2. Each dbt run is separated into a different Airflow task. All tasks receive a date interval from the Airflow DAG\n   context. By using the passed date interval in your dbt models, you ensure the *idempotency* of your dbt runs.\n3. _**dbt-af**_ lowers the entry threshold for non-infrastructure team members.\n   This means that analytics professionals, data scientists,\n   and data engineers can focus on their dbt models and important business logic\n   rather than spending time on Airflow DAGs.\n\n## Project Information\n\n- [Docs](docs/docs.md)\n- [PyPI](https://pypi.org/project/dbt-af/)\n- [Contributing](CONTRIBUTING.md)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Distibuted dbt runs on Apache Airflow",
    "version": "0.9.3",
    "project_urls": {
        "Documentation": "https://github.com/Toloka/dbt-af/blob/main/examples/README.md",
        "Homepage": "https://github.com/Toloka/dbt-af",
        "Repository": "https://github.com/Toloka/dbt-af"
    },
    "split_keywords": [
        "python",
        " airflow",
        " dbt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "797c91b4db4dc5bc58851ff257156fb33f6812eccba25551e3f35f9c28931bc5",
                "md5": "121dfddee2315e0793d8837ebfdba94f",
                "sha256": "b90b7d09dbffe0ee2c5d7c7182887392963ece61ac37c505b75b1088520059c2"
            },
            "downloads": -1,
            "filename": "dbt_af-0.9.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "121dfddee2315e0793d8837ebfdba94f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.10",
            "size": 56386,
            "upload_time": "2024-11-19T12:56:35",
            "upload_time_iso_8601": "2024-11-19T12:56:35.846196Z",
            "url": "https://files.pythonhosted.org/packages/79/7c/91b4db4dc5bc58851ff257156fb33f6812eccba25551e3f35f9c28931bc5/dbt_af-0.9.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "109a7f005027efd5b20337c951ef7a6684380b14d083500f3d5fc0df52e17593",
                "md5": "b26c9957d13fa1bc7cc555d3cd0e0dd4",
                "sha256": "86ab7bef9d15965c889bf347019188b1ba19846cc3cfe1fa062a0c5bf9307f4c"
            },
            "downloads": -1,
            "filename": "dbt_af-0.9.3.tar.gz",
            "has_sig": false,
            "md5_digest": "b26c9957d13fa1bc7cc555d3cd0e0dd4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.10",
            "size": 41999,
            "upload_time": "2024-11-19T12:56:37",
            "upload_time_iso_8601": "2024-11-19T12:56:37.378683Z",
            "url": "https://files.pythonhosted.org/packages/10/9a/7f005027efd5b20337c951ef7a6684380b14d083500f3d5fc0df52e17593/dbt_af-0.9.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-19 12:56:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Toloka",
    "github_project": "dbt-af",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dbt-af"
}
        
Elapsed time: 0.38918s