<!-- <base href="https://github.com/fal-ai/fal/blob/-/projects/adapter/" target="_blank" /> -->
# Welcome to dbt-fal 👋 do more with dbt
dbt-fal adapter is the ✨easiest✨ way to run your [dbt Python models](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/python-models).
Starting with dbt v1.3, you can now build your dbt models in Python. This leads to some cool use cases that was once difficult to build with SQL alone. Some examples are:
- Using Python stats libraries to calculate stats
- Building forecasts
- Building other predictive models such as classification and clustering
This is fantastic! BUT, there is still one issue though! The developer experience with Snowflake and Bigquery is not great, and there is no Python support for Redshift and Postgres.
dbt-fal provides the best environment to run your Python models that works with all other data warehouses! With dbt-fal, you can:
- Build and test your models locally
- Isolate each model to run in its own environment with its own dependencies
- [Coming Soon] Run your Python models in the ☁️ cloud ☁️ with elasticly scaling Python environments.
- [Coming Soon] Even add GPUs to your models for some heavier workloads such as training ML models.
## Getting Started
### 1. Install dbt-fal
`pip install dbt-fal[bigquery,snowflake]` *Add your current warehouse here*
### 2. Update your `profiles.yml` and add the fal adapter
```yaml
jaffle_shop:
target: dev_with_fal
outputs:
dev_with_fal:
type: fal
db_profile: dev_bigquery # This points to your main adapter
dev_bigquery:
type: bigquery
...
```
Don't forget to point to your main adapter with the `db_profile` attribute. This is how the fal adapter knows how to connect to your data warehouse.
### 3. `dbt run`!
That is it! It is really that simple 😊
### 4. [🚨 Cool Feature Alert 🚨] Environment management with dbt-fal
If you want some help with environment management (vs sticking to the default env that the dbt process runs in), you can create a fal_project.yml in the same folder as your dbt project and have “named environments”:
In your dbt project folder:
```bash
$ touch fal_project.yml
# Paste the config below
environments:
- name: ml
type: venv
requirements:
- prophet
```
and then in your dbt model:
```bash
$ vi models/orders_forecast.py
def model(dbt, fal):
dbt.config(fal_environment="ml") # Add this line
df: pd.DataFrame = dbt.ref("orders_daily")
```
The `dbt.config(fal_environment=“ml”)` will give you an isolated clean env to run things in, so you dont pollute your package space.
### 5. [Coming Soon™️] Move your compute to the Cloud!
Let us know if you are interested in this. We are looking for beta users.
# `dbt-fal` command line tool
With the `dbt-fal` CLI, you can:
- [Send Slack notifications](https://github.com/fal-ai/fal/blob/-/examples/slack-example) upon dbt model success or failure.
- [Load data from external data sources](https://blog.fal.ai/populate-dbt-models-with-csv-data/) before a model starts running.
- [Download dbt models](https://docs.fal.ai/fal/python-package) into a Python context with a familiar syntax: `ref('my_dbt_model')` using `FalDbt`
- [Programatically access rich metadata](https://docs.fal.ai/fal/reference/variables-and-functions) about your dbt project.
Head to our [documentation site](https://docs.fal.ai/) for a deeper dive or play with [in-depth examples](https://github.com/fal-ai/fal/blob/-/examples/README.md) to see how fal can help you get more done with dbt.
## 1. Install `dbt-fal`
```bash
$ pip install dbt-fal[postgres]
```
## 2. Go to your dbt project directory
```bash
$ cd ~/src/my_dbt_project
```
## 3. Create a Python script: `send_slack_message.py`
```python
import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
CHANNEL_ID = os.getenv("SLACK_BOT_CHANNEL")
SLACK_TOKEN = os.getenv("SLACK_BOT_TOKEN")
client = WebClient(token=SLACK_TOKEN)
message_text = f"Model: {context.current_model.name}. Status: {context.current_model.status}."
try:
response = client.chat_postMessage(
channel=CHANNEL_ID,
text=message_text
)
except SlackApiError as e:
assert e.response["error"]
```
## 4. Add a `meta` section in your `schema.yml`
```yaml
models:
- name: historical_ozone_levels
description: Ozone levels
config:
materialized: table
columns:
- name: ozone_level
description: Ozone level
- name: ds
description: Date
meta:
fal:
scripts:
- send_slack_message.py
```
## 5. Run `dbt-fal flow run`
```bash
$ dbt-fal flow run
# both your dbt models and python scripts are run
```
## 6. Alternatively run `dbt` and `fal` consecutively
```bash
$ dbt run
# Your dbt models are run
$ dbt-fal run
# Your python scripts are run
```
## Running scripts before dbt runs
Run scripts before the model runs by using the `pre-hook:` configuration option.
Given the following schema.yml:
```
models:
- name: boston
description: Ozone levels
config:
materialized: table
meta:
owner: "@meder"
fal:
pre-hook:
- fal_scripts/trigger_fivetran.py
post-hook:
- fal_scripts/slack.py
```
`dbt-fal flow run` will run `fal_scripts/trigger_fivetran.py`, then the `boston` dbt model, and finally `fal_scripts/slack.py`.
If a model is selected with a selection flag (e.g. `--select boston`), the hooks associated to the model will always run with it.
```bash
$ dbt-fal flow run --select boston
```
# Concepts
## profile.yml and Credentials
`fal` integrates with `dbt`'s `profile.yml` file to access and read data from the data warehouse. Once you setup credentials in your `profile.yml` file for your existing `dbt` workflows anytime you use `ref` or `source` to create a dataframe `fal` authenticates using the credentials specified in the `profile.yml` file.
## `meta` Syntax
```yaml
models:
- name: historical_ozone_levels
...
meta:
owner: "@me"
fal:
post-hook:
- send_slack_message.py
- another_python_script.py
```
Use the `fal` and `post-hook` keys underneath the `meta` config to let `fal` CLI know where to look for the Python scripts. You can pass a list of scripts as shown above to run one or more scripts as a post-hook operation after a `dbt run`.
## Variables and functions
Inside a Python script, you get access to some useful variables and functions
### Variables
A `context` object with information relevant to the model through which the script was run. For the [`meta` Syntax](#meta-syntax) example, we would get the following:
```python
context.current_model.name
#= historical_ozone_levels
context.current_model.meta
#= {'owner': '@me'}
context.current_model.meta.get("owner")
#= '@me'
context.current_model.status
# Could be one of
#= 'success'
#= 'error'
#= 'skipped'
```
`context` object also has access to test information related to the current model. If the previous dbt command was either `test` or `build`, the `context.current_model.test` property is populated with a list of tests:
```python
context.current_model.tests
#= [CurrentTest(name='not_null', modelname='historical_ozone_levels, column='ds', status='Pass')]
```
### `ref` and `source` functions
There are also available some familiar functions from `dbt`
```python
# Refer to dbt models or sources by name and returns it as `pandas.DataFrame`
ref('model_name')
source('source_name', 'table_name')
# You can use it to get the running model data
ref(context.current_model.name)
```
### `write_to_model` function
> ❗️ We recommend using the [`dbt-fal`](https://pypi.org/project/dbt-fal/) adapter for writing data back to your data-warehouse.
It is also possible to send data back to your data-warehouse. This makes it easy to get the data, process it and upload it back into dbt territory.
This function is available in fal Python models only, that is a Python script inside a `fal_models` directory and add a `fal-models-paths` to your `dbt_project.yml`
```yaml
name: "jaffle_shop"
# ...
model-paths: ["models"]
# ...
vars:
# Add this to your dbt_project.yml
fal-models-paths: ["fal_models"]
```
Once added, it will automatically be run by fal without having to add any extra configurations in the `schema.yml`.
```python
source_df = source('source_name', 'table_name')
ref_df = ref('a_model')
# Your code here
df = ...
# Upload a `pandas.DataFrame` back to the datawarehouse
write_to_model(df)
```
`write_to_model` also accepts an optional `dtype` argument, which lets you specify datatypes of columns. It works the same way as `dtype` argument for [`DataFrame.to_sql` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html).
```python
from sqlalchemy.types import Integer
# Upload but specifically create the `value` column with type `integer`
# Can be useful if data has `None` values
write_to_model(df, dtype={'value': Integer()})
```
## Importing `fal` as a Python package
You may be interested in accessing dbt models and sources easily from a Jupyter Notebook or another Python script.
For that, just import the `fal` package and intantiate a FalDbt project:
```py
from fal.dbt import FalDbt
faldbt = FalDbt(profiles_dir="~/.dbt", project_dir="../my_project")
faldbt.list_sources()
# [
# DbtSource(name='results' ...),
# DbtSource(name='ticket_data_sentiment_analysis' ...)
# ...
# ]
faldbt.list_models()
# [
# DbtModel(name='zendesk_ticket_data' ...),
# DbtModel(name='agent_wait_time' ...)
# ...
# ]
sentiments = faldbt.source('results', 'ticket_data_sentiment_analysis')
# pandas.DataFrame
tickets = faldbt.ref('stg_zendesk_ticket_data')
# pandas.DataFrame
```
# Why are we building this?
We think `dbt` is great because it empowers data people to get more done with the tools that they are already familiar with.
This library will form the basis of our attempt to more comprehensively enable **data science workloads** downstream of `dbt`. And because having reliable data pipelines is the most important ingredient in building predictive analytics, we are building a library that integrates well with dbt.
# Have feedback or need help?
- Join us in [fal on Discord](https://discord.com/invite/Fyc9PwrccF)
- Join the [dbt Community](http://community.getdbt.com/) and go into our [#tools-fal channel](https://getdbt.slack.com/archives/C02V8QW3Q4Q)
Raw data
{
"_id": null,
"home_page": "https://github.com/fal-ai/fal",
"name": "dbt-fal",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "dbt,pandas,fal,runtime",
"author": "Features & Labels",
"author_email": "hello@fal.ai",
"download_url": "https://files.pythonhosted.org/packages/01/23/4f20974fe054fb7b864da982d56b370ca7fb094c62dd72e5e8d11d194c2d/dbt_fal-1.5.9.tar.gz",
"platform": null,
"description": "<!-- <base href=\"https://github.com/fal-ai/fal/blob/-/projects/adapter/\" target=\"_blank\" /> -->\n\n# Welcome to dbt-fal \ud83d\udc4b do more with dbt\n\ndbt-fal adapter is the \u2728easiest\u2728 way to run your [dbt Python models](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/python-models).\n\nStarting with dbt v1.3, you can now build your dbt models in Python. This leads to some cool use cases that was once difficult to build with SQL alone. Some examples are:\n\n- Using Python stats libraries to calculate stats\n- Building forecasts\n- Building other predictive models such as classification and clustering\n\nThis is fantastic! BUT, there is still one issue though! The developer experience with Snowflake and Bigquery is not great, and there is no Python support for Redshift and Postgres.\n\ndbt-fal provides the best environment to run your Python models that works with all other data warehouses! With dbt-fal, you can:\n\n- Build and test your models locally\n- Isolate each model to run in its own environment with its own dependencies\n- [Coming Soon] Run your Python models in the \u2601\ufe0f cloud \u2601\ufe0f with elasticly scaling Python environments.\n- [Coming Soon] Even add GPUs to your models for some heavier workloads such as training ML models.\n\n## Getting Started\n\n### 1. Install dbt-fal\n`pip install dbt-fal[bigquery,snowflake]` *Add your current warehouse here*\n\n### 2. Update your `profiles.yml` and add the fal adapter\n\n```yaml\njaffle_shop:\n target: dev_with_fal\n outputs:\n dev_with_fal:\n type: fal\n db_profile: dev_bigquery # This points to your main adapter\n dev_bigquery:\n type: bigquery\n ...\n```\n\nDon't forget to point to your main adapter with the `db_profile` attribute. This is how the fal adapter knows how to connect to your data warehouse.\n\n### 3. `dbt run`!\nThat is it! It is really that simple \ud83d\ude0a\n\n### 4. [\ud83d\udea8 Cool Feature Alert \ud83d\udea8] Environment management with dbt-fal\nIf you want some help with environment management (vs sticking to the default env that the dbt process runs in), you can create a fal_project.yml in the same folder as your dbt project and have \u201cnamed environments\u201d:\n\nIn your dbt project folder:\n```bash\n$ touch fal_project.yml\n\n# Paste the config below\nenvironments:\n - name: ml\n type: venv\n requirements:\n - prophet\n```\n\nand then in your dbt model:\n\n```bash\n$ vi models/orders_forecast.py\n\ndef model(dbt, fal):\n dbt.config(fal_environment=\"ml\") # Add this line\n\n df: pd.DataFrame = dbt.ref(\"orders_daily\")\n```\n\nThe `dbt.config(fal_environment=\u201cml\u201d)` will give you an isolated clean env to run things in, so you dont pollute your package space.\n\n### 5. [Coming Soon\u2122\ufe0f] Move your compute to the Cloud!\nLet us know if you are interested in this. We are looking for beta users.\n\n# `dbt-fal` command line tool\n\nWith the `dbt-fal` CLI, you can:\n\n- [Send Slack notifications](https://github.com/fal-ai/fal/blob/-/examples/slack-example) upon dbt model success or failure.\n- [Load data from external data sources](https://blog.fal.ai/populate-dbt-models-with-csv-data/) before a model starts running.\n- [Download dbt models](https://docs.fal.ai/fal/python-package) into a Python context with a familiar syntax: `ref('my_dbt_model')` using `FalDbt`\n- [Programatically access rich metadata](https://docs.fal.ai/fal/reference/variables-and-functions) about your dbt project.\n\nHead to our [documentation site](https://docs.fal.ai/) for a deeper dive or play with [in-depth examples](https://github.com/fal-ai/fal/blob/-/examples/README.md) to see how fal can help you get more done with dbt.\n\n\n## 1. Install `dbt-fal`\n\n```bash\n$ pip install dbt-fal[postgres]\n```\n\n## 2. Go to your dbt project directory\n\n```bash\n$ cd ~/src/my_dbt_project\n```\n\n## 3. Create a Python script: `send_slack_message.py`\n\n```python\nimport os\nfrom slack_sdk import WebClient\nfrom slack_sdk.errors import SlackApiError\n\nCHANNEL_ID = os.getenv(\"SLACK_BOT_CHANNEL\")\nSLACK_TOKEN = os.getenv(\"SLACK_BOT_TOKEN\")\n\nclient = WebClient(token=SLACK_TOKEN)\nmessage_text = f\"Model: {context.current_model.name}. Status: {context.current_model.status}.\"\n\ntry:\n response = client.chat_postMessage(\n channel=CHANNEL_ID,\n text=message_text\n )\nexcept SlackApiError as e:\n assert e.response[\"error\"]\n```\n\n## 4. Add a `meta` section in your `schema.yml`\n\n```yaml\nmodels:\n - name: historical_ozone_levels\n description: Ozone levels\n config:\n materialized: table\n columns:\n - name: ozone_level\n description: Ozone level\n - name: ds\n description: Date\n meta:\n fal:\n scripts:\n - send_slack_message.py\n```\n\n## 5. Run `dbt-fal flow run`\n\n```bash\n$ dbt-fal flow run\n# both your dbt models and python scripts are run\n```\n\n## 6. Alternatively run `dbt` and `fal` consecutively\n\n```bash\n$ dbt run\n# Your dbt models are run\n\n$ dbt-fal run\n# Your python scripts are run\n```\n\n\n## Running scripts before dbt runs\n\nRun scripts before the model runs by using the `pre-hook:` configuration option.\n\nGiven the following schema.yml:\n\n```\nmodels:\n - name: boston\n description: Ozone levels\n config:\n materialized: table\n meta:\n owner: \"@meder\"\n fal:\n pre-hook:\n - fal_scripts/trigger_fivetran.py\n post-hook:\n - fal_scripts/slack.py\n```\n\n`dbt-fal flow run` will run `fal_scripts/trigger_fivetran.py`, then the `boston` dbt model, and finally `fal_scripts/slack.py`.\nIf a model is selected with a selection flag (e.g. `--select boston`), the hooks associated to the model will always run with it.\n\n```bash\n$ dbt-fal flow run --select boston\n```\n\n# Concepts\n\n## profile.yml and Credentials\n\n`fal` integrates with `dbt`'s `profile.yml` file to access and read data from the data warehouse. Once you setup credentials in your `profile.yml` file for your existing `dbt` workflows anytime you use `ref` or `source` to create a dataframe `fal` authenticates using the credentials specified in the `profile.yml` file.\n\n## `meta` Syntax\n\n```yaml\nmodels:\n - name: historical_ozone_levels\n ...\n meta:\n owner: \"@me\"\n fal:\n post-hook:\n - send_slack_message.py\n - another_python_script.py\n```\n\nUse the `fal` and `post-hook` keys underneath the `meta` config to let `fal` CLI know where to look for the Python scripts. You can pass a list of scripts as shown above to run one or more scripts as a post-hook operation after a `dbt run`.\n\n## Variables and functions\n\nInside a Python script, you get access to some useful variables and functions\n\n### Variables\n\nA `context` object with information relevant to the model through which the script was run. For the [`meta` Syntax](#meta-syntax) example, we would get the following:\n\n```python\ncontext.current_model.name\n#= historical_ozone_levels\n\ncontext.current_model.meta\n#= {'owner': '@me'}\n\ncontext.current_model.meta.get(\"owner\")\n#= '@me'\n\ncontext.current_model.status\n# Could be one of\n#= 'success'\n#= 'error'\n#= 'skipped'\n```\n\n`context` object also has access to test information related to the current model. If the previous dbt command was either `test` or `build`, the `context.current_model.test` property is populated with a list of tests:\n\n```python\ncontext.current_model.tests\n#= [CurrentTest(name='not_null', modelname='historical_ozone_levels, column='ds', status='Pass')]\n```\n\n### `ref` and `source` functions\n\nThere are also available some familiar functions from `dbt`\n\n```python\n# Refer to dbt models or sources by name and returns it as `pandas.DataFrame`\nref('model_name')\nsource('source_name', 'table_name')\n\n# You can use it to get the running model data\nref(context.current_model.name)\n```\n\n### `write_to_model` function\n\n> \u2757\ufe0f We recommend using the [`dbt-fal`](https://pypi.org/project/dbt-fal/) adapter for writing data back to your data-warehouse.\n\nIt is also possible to send data back to your data-warehouse. This makes it easy to get the data, process it and upload it back into dbt territory.\n\nThis function is available in fal Python models only, that is a Python script inside a `fal_models` directory and add a `fal-models-paths` to your `dbt_project.yml`\n\n```yaml\nname: \"jaffle_shop\"\n# ...\nmodel-paths: [\"models\"]\n# ...\n\nvars:\n # Add this to your dbt_project.yml\n fal-models-paths: [\"fal_models\"]\n```\n\nOnce added, it will automatically be run by fal without having to add any extra configurations in the `schema.yml`.\n\n```python\nsource_df = source('source_name', 'table_name')\nref_df = ref('a_model')\n\n# Your code here\ndf = ...\n\n# Upload a `pandas.DataFrame` back to the datawarehouse\nwrite_to_model(df)\n```\n\n`write_to_model` also accepts an optional `dtype` argument, which lets you specify datatypes of columns. It works the same way as `dtype` argument for [`DataFrame.to_sql` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html).\n\n```python\nfrom sqlalchemy.types import Integer\n# Upload but specifically create the `value` column with type `integer`\n# Can be useful if data has `None` values\nwrite_to_model(df, dtype={'value': Integer()})\n```\n\n\n## Importing `fal` as a Python package\n\nYou may be interested in accessing dbt models and sources easily from a Jupyter Notebook or another Python script.\nFor that, just import the `fal` package and intantiate a FalDbt project:\n\n```py\nfrom fal.dbt import FalDbt\nfaldbt = FalDbt(profiles_dir=\"~/.dbt\", project_dir=\"../my_project\")\n\nfaldbt.list_sources()\n# [\n# DbtSource(name='results' ...),\n# DbtSource(name='ticket_data_sentiment_analysis' ...)\n# ...\n# ]\n\nfaldbt.list_models()\n# [\n# DbtModel(name='zendesk_ticket_data' ...),\n# DbtModel(name='agent_wait_time' ...)\n# ...\n# ]\n\n\nsentiments = faldbt.source('results', 'ticket_data_sentiment_analysis')\n# pandas.DataFrame\ntickets = faldbt.ref('stg_zendesk_ticket_data')\n# pandas.DataFrame\n```\n\n# Why are we building this?\n\nWe think `dbt` is great because it empowers data people to get more done with the tools that they are already familiar with.\n\nThis library will form the basis of our attempt to more comprehensively enable **data science workloads** downstream of `dbt`. And because having reliable data pipelines is the most important ingredient in building predictive analytics, we are building a library that integrates well with dbt.\n\n# Have feedback or need help?\n\n- Join us in [fal on Discord](https://discord.com/invite/Fyc9PwrccF)\n- Join the [dbt Community](http://community.getdbt.com/) and go into our [#tools-fal channel](https://getdbt.slack.com/archives/C02V8QW3Q4Q)\n",
"bugtrack_url": null,
"license": "",
"summary": "Run python scripts from any dbt project.",
"version": "1.5.9",
"project_urls": {
"Homepage": "https://github.com/fal-ai/fal",
"Repository": "https://github.com/fal-ai/fal"
},
"split_keywords": [
"dbt",
"pandas",
"fal",
"runtime"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "284dbd038a65e5ef70218951e94582296dbcfba16c282863bf0d54437b3df270",
"md5": "f1a3397b2cf6484aec8e912991fae07d",
"sha256": "81de0baf3d3e19dddbf51a7007fd70c4b0976f1d516ee983fa77493c3ff311fa"
},
"downloads": -1,
"filename": "dbt_fal-1.5.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f1a3397b2cf6484aec8e912991fae07d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 123480,
"upload_time": "2023-09-04T13:16:47",
"upload_time_iso_8601": "2023-09-04T13:16:47.576120Z",
"url": "https://files.pythonhosted.org/packages/28/4d/bd038a65e5ef70218951e94582296dbcfba16c282863bf0d54437b3df270/dbt_fal-1.5.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "01234f20974fe054fb7b864da982d56b370ca7fb094c62dd72e5e8d11d194c2d",
"md5": "8f5418ca93dc169eac12e81cc42ff063",
"sha256": "50f13af042fcaf91285fb8a58e494841de996d6ec9d614d93d8d167cae168077"
},
"downloads": -1,
"filename": "dbt_fal-1.5.9.tar.gz",
"has_sig": false,
"md5_digest": "8f5418ca93dc169eac12e81cc42ff063",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 96624,
"upload_time": "2023-09-04T13:16:48",
"upload_time_iso_8601": "2023-09-04T13:16:48.966341Z",
"url": "https://files.pythonhosted.org/packages/01/23/4f20974fe054fb7b864da982d56b370ca7fb094c62dd72e5e8d11d194c2d/dbt_fal-1.5.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-04 13:16:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fal-ai",
"github_project": "fal",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "dbt-fal"
}