dbt-starburst

Name	dbt-starburst JSON
Version	1.0.0 JSON
	download
home_page	https://github.com/starburstdata/dbt-starburst
Summary	The Starburst adapter plugin for dbt (data build tool)
upload_time	2023-03-17 00:38:24
maintainer
docs_url	None
author	Starburst Data
requires_python	>=3.7
license	Apache License 2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p float="left">
  <img src="https://raw.githubusercontent.com/dbt-labs/dbt/ec7dee39f793aa4f7dd3dae37282cc87664813e4/etc/dbt-logo-full.svg" width="33%" />
  <img src="https://trino.io/assets/trino-og.png" width="33%" />
</p>

[![Build Status](https://github.com/starburstdata/dbt-trino/actions/workflows/ci.yml/badge.svg)](https://github.com/starburstdata/dbt-trino/actions/workflows/ci.yml?query=workflow%3A%22dbt-trino+tests%22+branch%3Amaster+event%3Apush) [![db-presto-trino Slack](https://img.shields.io/static/v1?logo=slack&logoColor=959DA5&label=Slack&labelColor=333a41&message=join%20conversation&color=3AC358)](https://getdbt.slack.com/channels/db-presto-trino)

# dbt-starburst

## Introduction

[dbt](https://docs.getdbt.com/docs/introduction) is a data transformation workflow tool that lets teams quickly and collaboratively deploy analytics code, following software engineering best practices like modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL to build production-grade data pipelines.

One frequently asked question in the context of using `dbt` tool is:

> Can I connect my dbt project to two databases?

(see the answered [question](https://docs.getdbt.com/faqs/connecting-to-two-dbs-not-allowed) on the dbt website).

**TL;DR** `dbt` stands for transformation as in `T` within `ELT` pipelines, it doesn't move data from source to a warehouse.

`dbt-starburst` adapter uses [Trino](https://trino.io/) as a underlying query engine to perform query federation across disperse data sources. Trino connects to multiple and diverse data sources ([available connectors](https://trino.io/docs/current/connector.html)) via one dbt connection and process SQL queries at scale. Transformations defined in dbt are passed to Trino which handles these SQL transformation queries and translates them to queries specific to the systems it connects to create tables or views and manipulate data.

This repository represents a fork of the [dbt-presto](https://github.com/dbt-labs/dbt-presto) with adaptations to make it work with Trino.

### Compatibility

This dbt plugin has been tested against `Trino` version `405`, `Starburst Enterprise` version `402-e.0` and `Starburst Galaxy`.

## Installation

This dbt adapter can be installed via pip:

```sh
$ pip install dbt-starburst
```

### Configuring your profile

A dbt profile can be configured to run against Trino using the following configuration:

| Option                         | Description                                                                                                  | Required?                                                                                               | Example                              |
|--------------------------------|--------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|--------------------------------------|
| method                         | The Trino authentication method to use                                                                       | Optional (default is `none`, supported methods are `ldap`, `kerberos`, `jwt`, `oauth` or `certificate`) | `none` or `kerberos`                 |
| user                           | Username for authentication                                                                                  | Optional (required if `method` is `none`, `ldap` or `kerberos`)                                         | `commander`                          |
| password                       | Password for authentication                                                                                  | Optional (required if `method` is `ldap`)                                                               | `none` or `abc123`                   |
| impersonation_user             | Username override, used for impersonation                                                                    | Optional (applicable if `ldap`)                                                                         | `impersonated_tom`                   |
| roles                          | Catalog roles                                                                                                | Optional                                                                                                | `system: analyst`                    |
| keytab                         | Path to keytab for kerberos authentication                                                                   | Optional (may be required if `method` is `kerberos`)                                                    | `/tmp/trino.keytab`                  |
| krb5_config                    | Path to config for kerberos authentication                                                                   | Optional (may be required if `method` is `kerberos`)                                                    | `/tmp/krb5.conf`                     |
| principal                      | Principal for kerberos authentication                                                                        | Optional (may be required if `method` is `kerberos`)                                                    | `trino@EXAMPLE.COM`                  |
| service_name                   | Service name for kerberos authentication                                                                     | Optional (default is `trino`)                                                                           | `abc123`                             |
| mutual_authentication          | Boolean flag for mutual authentication                                                                       | Optional (may be required if `method` is `kerberos`)                                                    | `false`                              |
| force_preemptive               | Boolean flag for preemptively initiate the Kerberos GSS exchange                                             | Optional (may be required if `method` is `kerberos`)                                                    | `false`                              |
| hostname_override              | Kerberos hostname for a host whose DNS name doesn't match                                                    | Optional (may be required if `method` is `kerberos`)                                                    | `EXAMPLE.COM`                        |
| sanitize_mutual_error_response | Boolean flag to strip content and headers from error responses                                               | Optional (may be required if `method` is `kerberos`)                                                    | `true`                               |
| delegate                       | Boolean flag for credential delgation (GSS_C_DELEG_FLAG)                                                     | Optional (may be required if `method` is `kerberos`)                                                    | `false`                              |
| jwt_token                      | JWT token for authentication                                                                                 | Optional (required if `method` is `jwt`)                                                                | `none` or `abc123`                   |
| client_certificate             | Path to client certificate to be used for certificate based authentication                                   | Optional (required if `method` is `certificate`)                                                        | `/tmp/tls.crt`                       |
| client_private_key             | Path to client private key to be used for certificate based authentication                                   | Optional (required if `method` is `certificate`)                                                        | `/tmp/tls.key`                       |
| http_headers                   | HTTP Headers to send alongside requests to Trino, specified as a yaml dictionary of (header, value) pairs.   | Optional                                                                                                | `X-Trino-Client-Info: dbt-starburst` |
| http_scheme                    | The HTTP scheme to use for requests to Trino                                                                 | Optional (default is `http`, or `https` for `method: kerberos`, `ldap` or `jwt`)                        | `https` or `http`                    |
| cert                           | The full path to a certificate file for authentication with trino                                            | Optional                                                                                                |                                      |
| session_properties             | Sets Trino session properties used in the connection                                                         | Optional                                                                                                | `query_max_run_time: 4h`             |
| database                       | Specify the database to build models into                                                                    | Required                                                                                                | `analytics`                          |
| schema                         | Specify the schema to build models into. Note: it is not recommended to use upper or mixed case schema names | Required                                                                                                | `public`                             |
| host                           | The hostname to connect to                                                                                   | Required                                                                                                | `127.0.0.1`                          |
| port                           | The port to connect to the host on                                                                           | Required                                                                                                | `8080`                               |
| threads                        | How many threads dbt should use                                                                              | Optional (default is `1`)                                                                               | `8`                                  |
| prepared_statements_enabled    | Enable usage of Trino prepared statements (used in `dbt seed` commands)                                      | Optional (default is `true`)                                                                            | `true` or `false`                    |
| retries                        | Configure how many times a database operation is retried when connection issues arise                        | Optional (default is `3`)                                                                               | `10`                                 |
| timezone                       | The time zone for the Trino session                                                                          | Optional (defaults to the client side local timezone)                                                   | `Europe/Brussels`                    |

**Example profiles.yml entry:**

```yaml
my-trino-db:
  target: dev
  outputs:
    dev:
      type: trino
      user: commander
      host: 127.0.0.1
      port: 8080
      database: analytics
      schema: public
      threads: 8
      http_scheme: http
      session_properties:
        query_max_run_time: 4h
        exchange_compression: True
      timezone: UTC
```

**Example profiles.yml entry for kerberos authentication:**
```yaml
my-trino-db:
  target: dev
  outputs:
    dev:
      type: trino
      method: kerberos
      user: commander
      keytab: /tmp/trino.keytab
      krb5_config: /tmp/krb5.conf
      principal: trino@EXAMPLE.COM
      host: trino.example.com
      port: 443
      database: analytics
      schema: public
```

For reference on which session properties can be set on the the dbt profile do execute

```sql
SHOW SESSION;
```

on your Trino instance.

## Usage Notes

#### Supported authentication types

- none - No authentication
- [ldap](https://trino.io/docs/current/security/authentication-types.html) - Specify username in `user` and password in `password`
- [kerberos](https://trino.io/docs/current/security/kerberos.html) - Specify username in `user`
- [jwt](https://trino.io/docs/current/security/jwt.html) - Specify JWT token in `jwt_token`
- [certificate](https://trino.io/docs/current/security/certificate.html) - Specify a client certificate in `client_certificate` and private key in `client_private_key`
- [oauth](https://trino.io/docs/current/security/oauth2.html) - It is recommended to install keyring to cache the OAuth2 token over multiple dbt invocations by running `pip install 'trino[external-authentication-token-cache]'`, keyring is not installed by default.

See also: https://trino.io/docs/current/security/authentication-types.html

#### Session properties per model

In some specific cases, there may be needed tuning through the Trino session properties only 
for a specific dbt model.
In such cases, using the [dbt hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook)
may come to the rescue:

```
{{
  config(
    pre_hook="set session query_max_run_time='10m'"
  )
}}
```

#### Materializations

##### Table

`dbt-starburst` supports two modes in `table` materialization `rename` and `drop` configured using `on_table_exists`.

- `rename` - creates intermediate table, then renames the target to backup one and renames intermediate to target one.
- `drop` - drops and recreates a table. It overcomes table rename limitation in AWS Glue.


By default `table` materialization uses `on_table_exists = 'rename'`, see an examples below how to change it.

In model add:
```jinja2
{{
  config(
    materialized = 'table',
    on_table_exists = 'drop`
  )
}}
```

or in `dbt_project.yaml`:

```yaml
models:
  path:
    materialized: table
    +on_table_exists: drop
```

Using `table` materialization and `on_table_exists = 'rename'` with AWS Glue may result in below error:

```
TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Table rename is not yet supported by Glue service")
```

##### View

Adapter supports two security modes in `view` materialization `DEFINER` and `INVOKER` configured using `view_security`.

See [Trino docs](https://trino.io/docs/current/sql/create-view.html#security) for more details about security modes in views.

By default `view` materialization uses `view_security = 'definer'`, see an examples below how to change it.

In model add:
```jinja2
{{
  config(
    materialized = 'view',
    view_security = 'invoker'
  )
}}
```

or in `dbt_project.yaml`:

```yaml
models:
  path:
    materialized: view
    +view_security: invoker
```


##### Incremental

Using an incremental model limits the amount of data that needs to be transformed, vastly reducing the runtime of your transformations. This improves performance and reduces compute costs.

```jinja2
{{
    config(
      materialized = 'incremental', 
      unique_key='<optional>',
      incremental_strategy='<optional>',)
}}
select * from {{ ref('events') }}
{% if is_incremental() %}
  where event_ts > (select max(event_ts) from {{ this }})
{% endif %}
```

Use the `+on_schema_change` property to define how dbt-starburst should handle column changes. See [dbt docs](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models#what-if-the-columns-of-my-incremental-model-change).

Set the `+views_enabled` to `false` if your connector doesn't support views. 

###### `append` (default)

The default incremental strategy is `append`. `append` only adds the new records based on the condition specified in the `is_incremental()` conditional block.

```jinja2
{{
    config(
      materialized = 'incremental')
}}
select * from {{ ref('events') }}
{% if is_incremental() %}
  where event_ts > (select max(event_ts) from {{ this }})
{% endif %}
```

###### `delete+insert`

Through the `delete+insert` incremental strategy, you can instruct dbt to use a two-step incremental approach. It will first delete the records detected through the configured `is_incremental()` block and re-insert them.

```jinja2
{{
    config(
      materialized = 'incremental',
      unique_key='user_id',
      incremental_strategy='delete+insert',
      )
}}
select * from {{ ref('users') }}
{% if is_incremental() %}
  where updated_ts > (select max(updated_ts) from {{ this }})
{% endif %}
```

###### `merge`

Through the `merge` incremental strategy, dbt-starburst constructs a [`MERGE` statement](https://trino.io/docs/current/sql/merge.html) which `INSERT`s new and `UPDATE`s existing records based on the unique key (specified by `unique_key`).  
If `unique_key` is not unique `delete+insert` strategy can be used.
Note that some connectors in Trino have limited or no support for `MERGE`.

```jinja2
{{
    config(
      materialized = 'incremental',
      unique_key='user_id',
      incremental_strategy='merge',
      )
}}
select * from {{ ref('users') }}
{% if is_incremental() %}
  where updated_ts > (select max(updated_ts) from {{ this }})
{% endif %}
```

###### Incremental overwrite on hive models

In case that the target incremental model is being accessed with
[hive](https://trino.io/docs/current/connector/hive.html) Trino connector, an `insert overwrite`
functionality can be achieved when using:

```
<hive-catalog-name>.insert-existing-partitions-behavior=OVERWRITE
```

setting on the Trino hive connector configuration.

Below is a sample hive profile entry to deal with `OVERWRITE` functionality for the hive connector called `minio`:

```yaml
trino-incremental-hive:
  target: dev
  outputs:
    dev:
      type: trino
      method: none
      user: admin
      password:
      catalog: minio
      schema: tiny
      host: localhost
      port: 8080
      http_scheme: http
      session_properties:
        minio.insert_existing_partitions_behavior: OVERWRITE
      threads: 1
```

Existing partitions in the target model that match the staged data will be overwritten.
The rest of the partitions will be simply appended to the target model.

NOTE that this functionality works on incremental models that use partitioning:

```jinja2
{{
    config(
        materialized = 'incremental',
        properties={
          "format": "'PARQUET'",
          "partitioned_by": "ARRAY['day']",
        }
    )
}}
```

##### Materialized view

The adapter also supports [materialized views](https://trino.io/docs/current/sql/create-materialized-view.html).
At every subsequent `dbt run` command, the materialized view is [refreshed](https://trino.io/docs/current/sql/refresh-materialized-view.html).

You can also define custom properties for the materialized view through the `properties` config.

This materialization supports the [full_refresh](https://docs.getdbt.com/reference/resource-configs/full_refresh) config and flag.
Whenever you want to rebuild your materialized view, e.g. when changing underlying SQL query, run `dbt run --full-refresh`.


In model add:
```jinja2
{{
  config(
    materialized = 'materialized_view',
    properties = {
      'format': "'PARQUET'"
    },
  )
}}
```

or in `dbt_project.yaml`:

```yaml
models:
  path:
    materialized: materialized_view
    properties:
      format: "'PARQUET'"
```


##### Snapshots

Commonly, analysts need to "look back in time" at some previous state of data in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is often not the case. dbt provides a mechanism, snapshots, which records changes to a mutable table over time.

Snapshots implement type-2 Slowly Changing Dimensions over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an orders table where the status field can be overwritten as the order is processed. [See also the dbt docs about snapshots](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots).

An example is given below.

```jinja2
{% snapshot orders_snapshot %}
{{
    config(
        target_database='analytics',
        target_schema='snapshots',
        unique_key='id',
        strategy='timestamp',
        updated_at='updated_at',
    )
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
```

Note that the Snapshot feature depends on the `current_timestamp` macro. In some connectors the standard precision (`TIMESTAMP(3) WITH TIME ZONE`) is not supported by the connector eg. Iceberg.

If necessary, you can override the standard precision by providing your own version of the `trino__current_timestamp()` macro as in following example:

```jinja2
{% macro trino__current_timestamp() %}
    current_timestamp(6)
{% endmacro %}
```

#### Use table properties to configure connector specifics

Trino connectors use table properties to configure connector specifics.

Check the Trino connector documentation for more information.

```jinja2
{{
  config(
    materialized='table',
    properties={
      "format": "'PARQUET'",
      "partitioning": "ARRAY['bucket(id, 2)']",
    }
  )
}}
```

#### Seeds

Seeds are CSV files in your dbt project (typically in your data directory), that dbt can load into your data warehouse using the dbt seed command.

For dbt-starburst batch_size is defined in macro `trino__get_batch_size()` and default value is `1000`.
In order to override default value define within your project a macro like the following:

```jinja2
{% macro default__get_batch_size() %}
  {{ return(10000) }}
{% endmacro %}
```

#### Persist docs

Persist docs optionally persist resource descriptions as column and relation comments in the database. By default, documentation persistence is disabled, but it can be enabled for specific resources or groups of resources as needed.

Detailed documentation can be found [here](https://docs.getdbt.com/reference/resource-configs/persist_docs).

#### Generating lineage flow in docs

In order to generate lineage flow in docs use `ref` function in the place of table names in the query. It builts dependencies between models and allows to create DAG with data flow. Refer to examples [here](https://docs.getdbt.com/docs/building-a-dbt-project/building-models#building-dependencies-between-models).

```sh
dbt docs generate          # generate docs
dbt docs serve --port 8081 # starts local server (by default docs server runs on 8080 port, it may cause conflict with Trino in case of local development)
```

#### Using Custom schemas

By default, all dbt models are built in the schema specified in your target. But sometimes you wish to build some of the models in a custom schema. In order to do so, use the `schema` configuration key to specify a custom schema for a model. See [here](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas) for the documentation. It is important to note that by default, dbt will generate the schema name for a model by concatenating the custom schema to the target schema, as in: `<target_schema>_<custom_schema>`. 


#### Prepared statements

The `dbt seed` feature uses [Trino's prepared statements](https://trino.io/docs/current/sql/prepare.html).

Python's http client has a hardcoded limit of 65536 bytes for a header line.

When executing a prepared statement with a large number of parameters, you might encounter following error:

`requests.exceptions.ConnectionError: ('Connection aborted.', LineTooLong('got more than 65536 bytes when reading header line'))`.

The prepared statements can be disabled by setting `prepared_statements_enabled` to `true` in your dbt profile (reverting back to the legacy behavior using Python string interpolation). This flag may be removed in later releases.

#### Grants

Please note that grants are only supported in [Starburst Enterprise](https://docs.starburst.io/latest/security/biac-overview.html) and [Starburst Galaxy](https://docs.starburst.io/starburst-galaxy/security/access-control.html) and Hive ([sql-standard](https://trino.io/docs/current/connector/hive-security.html)).

You can manage access to the datasets you're producing with dbt by using grants. To implement these permissions, define grants as resource configs on each model, seed, or snapshot. Define the default grants that apply to the entire project in your dbt_project.yml, and define model-specific grants within each model's SQL or YAML file.

```yaml
models:
  - name: specific_model
    config:
      grants:
        select: ['reporter', 'bi']
```

Read everything about grants in the [dbt docs](https://docs.getdbt.com/reference/resource-configs/grants).

## Contributing

- Want to report a bug or request a feature? Let us know on [Slack](http://community.getdbt.com/) in the [#db-presto-trino](https://getdbt.slack.com/channels/db-presto-trino) channel, or open [an issue](https://github.com/starburstdata/dbt-starburst/issues/new)
- Want to help us build dbt-starburst? Check out the [Contributing Guide](https://github.com/starburstdata/dbt-starburst/blob/HEAD/CONTRIBUTING.md)

### Release process

Before doing a release, it is required to bump the dbt-starburst version by triggering release workflow `version-bump.yml`. The major and minor part of the dbt version are used to associate dbt-starburst's version with the dbt version.

Next step is to merge the bump PR and making sure that test suite pass.

Finally, to release `dbt-starburst` to PyPi and GitHub trigger release workflow `release.yml`.

### Backport process

Sometimes it is necessary to backport some changes to some older versions. In that case, create branch from `x.x.latest` branch. There is a `x.x.latest` for each minor version, e.g. `1.3.latest`. Make a fix and open PR back to `x.x.latest`. Create changelog by `changie new` as ususal, as separate changlog for each minor version is kept on every `x.x.latest` branch.
After merging, to make a release of that version, just follow instructions from **Release process** section, but run every workflow on `x.x.latest` branch.

## Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected
to follow the [PyPA Code of Conduct](https://www.pypa.io/en/latest/code-of-conduct/).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/starburstdata/dbt-starburst",
    "name": "dbt-starburst",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Starburst Data",
    "author_email": "info@starburstdata.com",
    "download_url": "https://files.pythonhosted.org/packages/d1/8c/3cff6e12a5de1eeb17b1a9d7b65efcb183d2ee36185f1b2d7f8d15acfdf8/dbt-starburst-1.0.0.tar.gz",
    "platform": "any",
    "description": "<p float=\"left\">\n  <img src=\"https://raw.githubusercontent.com/dbt-labs/dbt/ec7dee39f793aa4f7dd3dae37282cc87664813e4/etc/dbt-logo-full.svg\" width=\"33%\" />\n  <img src=\"https://trino.io/assets/trino-og.png\" width=\"33%\" />\n</p>\n\n[![Build Status](https://github.com/starburstdata/dbt-trino/actions/workflows/ci.yml/badge.svg)](https://github.com/starburstdata/dbt-trino/actions/workflows/ci.yml?query=workflow%3A%22dbt-trino+tests%22+branch%3Amaster+event%3Apush) [![db-presto-trino Slack](https://img.shields.io/static/v1?logo=slack&logoColor=959DA5&label=Slack&labelColor=333a41&message=join%20conversation&color=3AC358)](https://getdbt.slack.com/channels/db-presto-trino)\n\n# dbt-starburst\n\n## Introduction\n\n[dbt](https://docs.getdbt.com/docs/introduction) is a data transformation workflow tool that lets teams quickly and collaboratively deploy analytics code, following software engineering best practices like modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL to build production-grade data pipelines.\n\nOne frequently asked question in the context of using `dbt` tool is:\n\n> Can I connect my dbt project to two databases?\n\n(see the answered [question](https://docs.getdbt.com/faqs/connecting-to-two-dbs-not-allowed) on the dbt website).\n\n**TL;DR** `dbt` stands for transformation as in `T` within `ELT` pipelines, it doesn't move data from source to a warehouse.\n\n`dbt-starburst` adapter uses [Trino](https://trino.io/) as a underlying query engine to perform query federation across disperse data sources. Trino connects to multiple and diverse data sources ([available connectors](https://trino.io/docs/current/connector.html)) via one dbt connection and process SQL queries at scale. Transformations defined in dbt are passed to Trino which handles these SQL transformation queries and translates them to queries specific to the systems it connects to create tables or views and manipulate data.\n\nThis repository represents a fork of the [dbt-presto](https://github.com/dbt-labs/dbt-presto) with adaptations to make it work with Trino.\n\n### Compatibility\n\nThis dbt plugin has been tested against `Trino` version `405`, `Starburst Enterprise` version `402-e.0` and `Starburst Galaxy`.\n\n## Installation\n\nThis dbt adapter can be installed via pip:\n\n```sh\n$ pip install dbt-starburst\n```\n\n### Configuring your profile\n\nA dbt profile can be configured to run against Trino using the following configuration:\n\n| Option                         | Description                                                                                                  | Required?                                                                                               | Example                              |\n|--------------------------------|--------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|--------------------------------------|\n| method                         | The Trino authentication method to use                                                                       | Optional (default is `none`, supported methods are `ldap`, `kerberos`, `jwt`, `oauth` or `certificate`) | `none` or `kerberos`                 |\n| user                           | Username for authentication                                                                                  | Optional (required if `method` is `none`, `ldap` or `kerberos`)                                         | `commander`                          |\n| password                       | Password for authentication                                                                                  | Optional (required if `method` is `ldap`)                                                               | `none` or `abc123`                   |\n| impersonation_user             | Username override, used for impersonation                                                                    | Optional (applicable if `ldap`)                                                                         | `impersonated_tom`                   |\n| roles                          | Catalog roles                                                                                                | Optional                                                                                                | `system: analyst`                    |\n| keytab                         | Path to keytab for kerberos authentication                                                                   | Optional (may be required if `method` is `kerberos`)                                                    | `/tmp/trino.keytab`                  |\n| krb5_config                    | Path to config for kerberos authentication                                                                   | Optional (may be required if `method` is `kerberos`)                                                    | `/tmp/krb5.conf`                     |\n| principal                      | Principal for kerberos authentication                                                                        | Optional (may be required if `method` is `kerberos`)                                                    | `trino@EXAMPLE.COM`                  |\n| service_name                   | Service name for kerberos authentication                                                                     | Optional (default is `trino`)                                                                           | `abc123`                             |\n| mutual_authentication          | Boolean flag for mutual authentication                                                                       | Optional (may be required if `method` is `kerberos`)                                                    | `false`                              |\n| force_preemptive               | Boolean flag for preemptively initiate the Kerberos GSS exchange                                             | Optional (may be required if `method` is `kerberos`)                                                    | `false`                              |\n| hostname_override              | Kerberos hostname for a host whose DNS name doesn't match                                                    | Optional (may be required if `method` is `kerberos`)                                                    | `EXAMPLE.COM`                        |\n| sanitize_mutual_error_response | Boolean flag to strip content and headers from error responses                                               | Optional (may be required if `method` is `kerberos`)                                                    | `true`                               |\n| delegate                       | Boolean flag for credential delgation (GSS_C_DELEG_FLAG)                                                     | Optional (may be required if `method` is `kerberos`)                                                    | `false`                              |\n| jwt_token                      | JWT token for authentication                                                                                 | Optional (required if `method` is `jwt`)                                                                | `none` or `abc123`                   |\n| client_certificate             | Path to client certificate to be used for certificate based authentication                                   | Optional (required if `method` is `certificate`)                                                        | `/tmp/tls.crt`                       |\n| client_private_key             | Path to client private key to be used for certificate based authentication                                   | Optional (required if `method` is `certificate`)                                                        | `/tmp/tls.key`                       |\n| http_headers                   | HTTP Headers to send alongside requests to Trino, specified as a yaml dictionary of (header, value) pairs.   | Optional                                                                                                | `X-Trino-Client-Info: dbt-starburst` |\n| http_scheme                    | The HTTP scheme to use for requests to Trino                                                                 | Optional (default is `http`, or `https` for `method: kerberos`, `ldap` or `jwt`)                        | `https` or `http`                    |\n| cert                           | The full path to a certificate file for authentication with trino                                            | Optional                                                                                                |                                      |\n| session_properties             | Sets Trino session properties used in the connection                                                         | Optional                                                                                                | `query_max_run_time: 4h`             |\n| database                       | Specify the database to build models into                                                                    | Required                                                                                                | `analytics`                          |\n| schema                         | Specify the schema to build models into. Note: it is not recommended to use upper or mixed case schema names | Required                                                                                                | `public`                             |\n| host                           | The hostname to connect to                                                                                   | Required                                                                                                | `127.0.0.1`                          |\n| port                           | The port to connect to the host on                                                                           | Required                                                                                                | `8080`                               |\n| threads                        | How many threads dbt should use                                                                              | Optional (default is `1`)                                                                               | `8`                                  |\n| prepared_statements_enabled    | Enable usage of Trino prepared statements (used in `dbt seed` commands)                                      | Optional (default is `true`)                                                                            | `true` or `false`                    |\n| retries                        | Configure how many times a database operation is retried when connection issues arise                        | Optional (default is `3`)                                                                               | `10`                                 |\n| timezone                       | The time zone for the Trino session                                                                          | Optional (defaults to the client side local timezone)                                                   | `Europe/Brussels`                    |\n\n**Example profiles.yml entry:**\n\n```yaml\nmy-trino-db:\n  target: dev\n  outputs:\n    dev:\n      type: trino\n      user: commander\n      host: 127.0.0.1\n      port: 8080\n      database: analytics\n      schema: public\n      threads: 8\n      http_scheme: http\n      session_properties:\n        query_max_run_time: 4h\n        exchange_compression: True\n      timezone: UTC\n```\n\n**Example profiles.yml entry for kerberos authentication:**\n```yaml\nmy-trino-db:\n  target: dev\n  outputs:\n    dev:\n      type: trino\n      method: kerberos\n      user: commander\n      keytab: /tmp/trino.keytab\n      krb5_config: /tmp/krb5.conf\n      principal: trino@EXAMPLE.COM\n      host: trino.example.com\n      port: 443\n      database: analytics\n      schema: public\n```\n\nFor reference on which session properties can be set on the the dbt profile do execute\n\n```sql\nSHOW SESSION;\n```\n\non your Trino instance.\n\n## Usage Notes\n\n#### Supported authentication types\n\n- none - No authentication\n- [ldap](https://trino.io/docs/current/security/authentication-types.html) - Specify username in `user` and password in `password`\n- [kerberos](https://trino.io/docs/current/security/kerberos.html) - Specify username in `user`\n- [jwt](https://trino.io/docs/current/security/jwt.html) - Specify JWT token in `jwt_token`\n- [certificate](https://trino.io/docs/current/security/certificate.html) - Specify a client certificate in `client_certificate` and private key in `client_private_key`\n- [oauth](https://trino.io/docs/current/security/oauth2.html) - It is recommended to install keyring to cache the OAuth2 token over multiple dbt invocations by running `pip install 'trino[external-authentication-token-cache]'`, keyring is not installed by default.\n\nSee also: https://trino.io/docs/current/security/authentication-types.html\n\n#### Session properties per model\n\nIn some specific cases, there may be needed tuning through the Trino session properties only \nfor a specific dbt model.\nIn such cases, using the [dbt hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook)\nmay come to the rescue:\n\n```\n{{\n  config(\n    pre_hook=\"set session query_max_run_time='10m'\"\n  )\n}}\n```\n\n#### Materializations\n\n##### Table\n\n`dbt-starburst` supports two modes in `table` materialization `rename` and `drop` configured using `on_table_exists`.\n\n- `rename` - creates intermediate table, then renames the target to backup one and renames intermediate to target one.\n- `drop` - drops and recreates a table. It overcomes table rename limitation in AWS Glue.\n\n\nBy default `table` materialization uses `on_table_exists = 'rename'`, see an examples below how to change it.\n\nIn model add:\n```jinja2\n{{\n  config(\n    materialized = 'table',\n    on_table_exists = 'drop`\n  )\n}}\n```\n\nor in `dbt_project.yaml`:\n\n```yaml\nmodels:\n  path:\n    materialized: table\n    +on_table_exists: drop\n```\n\nUsing `table` materialization and `on_table_exists = 'rename'` with AWS Glue may result in below error:\n\n```\nTrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message=\"Table rename is not yet supported by Glue service\")\n```\n\n##### View\n\nAdapter supports two security modes in `view` materialization `DEFINER` and `INVOKER` configured using `view_security`.\n\nSee [Trino docs](https://trino.io/docs/current/sql/create-view.html#security) for more details about security modes in views.\n\nBy default `view` materialization uses `view_security = 'definer'`, see an examples below how to change it.\n\nIn model add:\n```jinja2\n{{\n  config(\n    materialized = 'view',\n    view_security = 'invoker'\n  )\n}}\n```\n\nor in `dbt_project.yaml`:\n\n```yaml\nmodels:\n  path:\n    materialized: view\n    +view_security: invoker\n```\n\n\n##### Incremental\n\nUsing an incremental model limits the amount of data that needs to be transformed, vastly reducing the runtime of your transformations. This improves performance and reduces compute costs.\n\n```jinja2\n{{\n    config(\n      materialized = 'incremental', \n      unique_key='<optional>',\n      incremental_strategy='<optional>',)\n}}\nselect * from {{ ref('events') }}\n{% if is_incremental() %}\n  where event_ts > (select max(event_ts) from {{ this }})\n{% endif %}\n```\n\nUse the `+on_schema_change` property to define how dbt-starburst should handle column changes. See [dbt docs](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models#what-if-the-columns-of-my-incremental-model-change).\n\nSet the `+views_enabled` to `false` if your connector doesn't support views. \n\n###### `append` (default)\n\nThe default incremental strategy is `append`. `append` only adds the new records based on the condition specified in the `is_incremental()` conditional block.\n\n```jinja2\n{{\n    config(\n      materialized = 'incremental')\n}}\nselect * from {{ ref('events') }}\n{% if is_incremental() %}\n  where event_ts > (select max(event_ts) from {{ this }})\n{% endif %}\n```\n\n###### `delete+insert`\n\nThrough the `delete+insert` incremental strategy, you can instruct dbt to use a two-step incremental approach. It will first delete the records detected through the configured `is_incremental()` block and re-insert them.\n\n```jinja2\n{{\n    config(\n      materialized = 'incremental',\n      unique_key='user_id',\n      incremental_strategy='delete+insert',\n      )\n}}\nselect * from {{ ref('users') }}\n{% if is_incremental() %}\n  where updated_ts > (select max(updated_ts) from {{ this }})\n{% endif %}\n```\n\n###### `merge`\n\nThrough the `merge` incremental strategy, dbt-starburst constructs a [`MERGE` statement](https://trino.io/docs/current/sql/merge.html) which `INSERT`s new and `UPDATE`s existing records based on the unique key (specified by `unique_key`).  \nIf `unique_key` is not unique `delete+insert` strategy can be used.\nNote that some connectors in Trino have limited or no support for `MERGE`.\n\n```jinja2\n{{\n    config(\n      materialized = 'incremental',\n      unique_key='user_id',\n      incremental_strategy='merge',\n      )\n}}\nselect * from {{ ref('users') }}\n{% if is_incremental() %}\n  where updated_ts > (select max(updated_ts) from {{ this }})\n{% endif %}\n```\n\n###### Incremental overwrite on hive models\n\nIn case that the target incremental model is being accessed with\n[hive](https://trino.io/docs/current/connector/hive.html) Trino connector, an `insert overwrite`\nfunctionality can be achieved when using:\n\n```\n<hive-catalog-name>.insert-existing-partitions-behavior=OVERWRITE\n```\n\nsetting on the Trino hive connector configuration.\n\nBelow is a sample hive profile entry to deal with `OVERWRITE` functionality for the hive connector called `minio`:\n\n```yaml\ntrino-incremental-hive:\n  target: dev\n  outputs:\n    dev:\n      type: trino\n      method: none\n      user: admin\n      password:\n      catalog: minio\n      schema: tiny\n      host: localhost\n      port: 8080\n      http_scheme: http\n      session_properties:\n        minio.insert_existing_partitions_behavior: OVERWRITE\n      threads: 1\n```\n\nExisting partitions in the target model that match the staged data will be overwritten.\nThe rest of the partitions will be simply appended to the target model.\n\nNOTE that this functionality works on incremental models that use partitioning:\n\n```jinja2\n{{\n    config(\n        materialized = 'incremental',\n        properties={\n          \"format\": \"'PARQUET'\",\n          \"partitioned_by\": \"ARRAY['day']\",\n        }\n    )\n}}\n```\n\n##### Materialized view\n\nThe adapter also supports [materialized views](https://trino.io/docs/current/sql/create-materialized-view.html).\nAt every subsequent `dbt run` command, the materialized view is [refreshed](https://trino.io/docs/current/sql/refresh-materialized-view.html).\n\nYou can also define custom properties for the materialized view through the `properties` config.\n\nThis materialization supports the [full_refresh](https://docs.getdbt.com/reference/resource-configs/full_refresh) config and flag.\nWhenever you want to rebuild your materialized view, e.g. when changing underlying SQL query, run `dbt run --full-refresh`.\n\n\nIn model add:\n```jinja2\n{{\n  config(\n    materialized = 'materialized_view',\n    properties = {\n      'format': \"'PARQUET'\"\n    },\n  )\n}}\n```\n\nor in `dbt_project.yaml`:\n\n```yaml\nmodels:\n  path:\n    materialized: materialized_view\n    properties:\n      format: \"'PARQUET'\"\n```\n\n\n##### Snapshots\n\nCommonly, analysts need to \"look back in time\" at some previous state of data in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is often not the case. dbt provides a mechanism, snapshots, which records changes to a mutable table over time.\n\nSnapshots implement type-2 Slowly Changing Dimensions over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an orders table where the status field can be overwritten as the order is processed. [See also the dbt docs about snapshots](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots).\n\nAn example is given below.\n\n```jinja2\n{% snapshot orders_snapshot %}\n{{\n    config(\n        target_database='analytics',\n        target_schema='snapshots',\n        unique_key='id',\n        strategy='timestamp',\n        updated_at='updated_at',\n    )\n}}\nselect * from {{ source('jaffle_shop', 'orders') }}\n{% endsnapshot %}\n```\n\nNote that the Snapshot feature depends on the `current_timestamp` macro. In some connectors the standard precision (`TIMESTAMP(3) WITH TIME ZONE`) is not supported by the connector eg. Iceberg.\n\nIf necessary, you can override the standard precision by providing your own version of the `trino__current_timestamp()` macro as in following example:\n\n```jinja2\n{% macro trino__current_timestamp() %}\n    current_timestamp(6)\n{% endmacro %}\n```\n\n#### Use table properties to configure connector specifics\n\nTrino connectors use table properties to configure connector specifics.\n\nCheck the Trino connector documentation for more information.\n\n```jinja2\n{{\n  config(\n    materialized='table',\n    properties={\n      \"format\": \"'PARQUET'\",\n      \"partitioning\": \"ARRAY['bucket(id, 2)']\",\n    }\n  )\n}}\n```\n\n#### Seeds\n\nSeeds are CSV files in your dbt project (typically in your data directory), that dbt can load into your data warehouse using the dbt seed command.\n\nFor dbt-starburst batch_size is defined in macro `trino__get_batch_size()` and default value is `1000`.\nIn order to override default value define within your project a macro like the following:\n\n```jinja2\n{% macro default__get_batch_size() %}\n  {{ return(10000) }}\n{% endmacro %}\n```\n\n#### Persist docs\n\nPersist docs optionally persist resource descriptions as column and relation comments in the database. By default, documentation persistence is disabled, but it can be enabled for specific resources or groups of resources as needed.\n\nDetailed documentation can be found [here](https://docs.getdbt.com/reference/resource-configs/persist_docs).\n\n#### Generating lineage flow in docs\n\nIn order to generate lineage flow in docs use `ref` function in the place of table names in the query. It builts dependencies between models and allows to create DAG with data flow. Refer to examples [here](https://docs.getdbt.com/docs/building-a-dbt-project/building-models#building-dependencies-between-models).\n\n```sh\ndbt docs generate          # generate docs\ndbt docs serve --port 8081 # starts local server (by default docs server runs on 8080 port, it may cause conflict with Trino in case of local development)\n```\n\n#### Using Custom schemas\n\nBy default, all dbt models are built in the schema specified in your target. But sometimes you wish to build some of the models in a custom schema. In order to do so, use the `schema` configuration key to specify a custom schema for a model. See [here](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas) for the documentation. It is important to note that by default, dbt will generate the schema name for a model by concatenating the custom schema to the target schema, as in: `<target_schema>_<custom_schema>`. \n\n\n#### Prepared statements\n\nThe `dbt seed` feature uses [Trino's prepared statements](https://trino.io/docs/current/sql/prepare.html).\n\nPython's http client has a hardcoded limit of 65536 bytes for a header line.\n\nWhen executing a prepared statement with a large number of parameters, you might encounter following error:\n\n`requests.exceptions.ConnectionError: ('Connection aborted.', LineTooLong('got more than 65536 bytes when reading header line'))`.\n\nThe prepared statements can be disabled by setting `prepared_statements_enabled` to `true` in your dbt profile (reverting back to the legacy behavior using Python string interpolation). This flag may be removed in later releases.\n\n#### Grants\n\nPlease note that grants are only supported in [Starburst Enterprise](https://docs.starburst.io/latest/security/biac-overview.html) and [Starburst Galaxy](https://docs.starburst.io/starburst-galaxy/security/access-control.html) and Hive ([sql-standard](https://trino.io/docs/current/connector/hive-security.html)).\n\nYou can manage access to the datasets you're producing with dbt by using grants. To implement these permissions, define grants as resource configs on each model, seed, or snapshot. Define the default grants that apply to the entire project in your dbt_project.yml, and define model-specific grants within each model's SQL or YAML file.\n\n```yaml\nmodels:\n  - name: specific_model\n    config:\n      grants:\n        select: ['reporter', 'bi']\n```\n\nRead everything about grants in the [dbt docs](https://docs.getdbt.com/reference/resource-configs/grants).\n\n## Contributing\n\n- Want to report a bug or request a feature? Let us know on [Slack](http://community.getdbt.com/) in the [#db-presto-trino](https://getdbt.slack.com/channels/db-presto-trino) channel, or open [an issue](https://github.com/starburstdata/dbt-starburst/issues/new)\n- Want to help us build dbt-starburst? Check out the [Contributing Guide](https://github.com/starburstdata/dbt-starburst/blob/HEAD/CONTRIBUTING.md)\n\n### Release process\n\nBefore doing a release, it is required to bump the dbt-starburst version by triggering release workflow `version-bump.yml`. The major and minor part of the dbt version are used to associate dbt-starburst's version with the dbt version.\n\nNext step is to merge the bump PR and making sure that test suite pass.\n\nFinally, to release `dbt-starburst` to PyPi and GitHub trigger release workflow `release.yml`.\n\n### Backport process\n\nSometimes it is necessary to backport some changes to some older versions. In that case, create branch from `x.x.latest` branch. There is a `x.x.latest` for each minor version, e.g. `1.3.latest`. Make a fix and open PR back to `x.x.latest`. Create changelog by `changie new` as ususal, as separate changlog for each minor version is kept on every `x.x.latest` branch.\nAfter merging, to make a release of that version, just follow instructions from **Release process** section, but run every workflow on `x.x.latest` branch.\n\n## Code of Conduct\n\nEveryone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected\nto follow the [PyPA Code of Conduct](https://www.pypa.io/en/latest/code-of-conduct/).\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "The Starburst adapter plugin for dbt (data build tool)",
    "version": "1.0.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1dba1bfa3cfd2a625ac976ab7685ae3625f82543902669c82e35d8300997e94",
                "md5": "4023b43051a376cff03389e20c18881a",
                "sha256": "d806e1daa128a76cd16f34a3ec3d3eb6a0381ed1d848aad38ffda2b4c431e83a"
            },
            "downloads": -1,
            "filename": "dbt_starburst-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4023b43051a376cff03389e20c18881a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 20921,
            "upload_time": "2023-03-17T00:38:22",
            "upload_time_iso_8601": "2023-03-17T00:38:22.407078Z",
            "url": "https://files.pythonhosted.org/packages/d1/db/a1bfa3cfd2a625ac976ab7685ae3625f82543902669c82e35d8300997e94/dbt_starburst-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d18c3cff6e12a5de1eeb17b1a9d7b65efcb183d2ee36185f1b2d7f8d15acfdf8",
                "md5": "c7c1682a57a28c7593595075b329b9f5",
                "sha256": "d323e0f3039cedf6c6960890859e2bbe0427166a25ce0cac200b15e4d63beb4e"
            },
            "downloads": -1,
            "filename": "dbt-starburst-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c7c1682a57a28c7593595075b329b9f5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 28127,
            "upload_time": "2023-03-17T00:38:24",
            "upload_time_iso_8601": "2023-03-17T00:38:24.451810Z",
            "url": "https://files.pythonhosted.org/packages/d1/8c/3cff6e12a5de1eeb17b1a9d7b65efcb183d2ee36185f1b2d7f8d15acfdf8/dbt-starburst-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-17 00:38:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "starburstdata",
    "github_project": "dbt-starburst",
    "lcname": "dbt-starburst"
}

Starburst Data