dltctl


Namedltctl JSON
Version 0.2 PyPI version JSON
download
home_pagehttps://github.com/bkvarda/dltctl
SummaryA command line interface for Databricks Delta Live Tables
upload_time2023-03-23 18:35:36
maintainer
docs_urlNone
authorBrandon Kvarda
requires_python
licenseApache License 2.0
keywords databricks dlt delta live tables
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## dltctl
![Build badge](https://img.shields.io/github/workflow/status/bkvarda/dltctl/testing)  
A CLI tool for fast local iteration on Delta Live Tables pipelines and rapid deployment 

#### Installation
```
pip install dltctl
```

#### First-time Configuration
In order to function dltctl needs to know which Databricks workspace and which tokens/auth info to use. If you already use the databricks cli, dltctl will use whatever you've configured there. Otherwise, you can configure it using the same commands you would with the databricks CLI:
```
dltctl configure --jobs-api-version=2.1 --token
```

#### Usage
 dltctl requires a configuration file in order to function. To generate one, run:
```
dltctl init mypipeline
```
That will generate a dltctl.yaml file in your current directory that looks like this:
```
pipeline_settings:
  channel: CURRENT
  clusters:
  - autoscale:
      max_workers: 5
      min_workers: 1
      mode: ENHANCED
    driver_node_type_id: c5.4xlarge
    label: default
    node_type_id: c5.4xlarge
  continuous: false
  development: true
  edition: advanced
  name: mypipeline
  photon: false
```
This is a minimally viable dlt project yaml file. For more advanced settings, edit the file directly or use flags:
```
dltctl init mypipeline -f -c '{"label":"default", "aws_attributes": {"instance_profile_arn":"myprofilearn"}}'
```
Now you just need to bring your own DLT pipeline.  
Or if you just want to get started, you can try this:
```
echo "CREATE LIVE TABLE $(whoami | sed 's/\.//g')_dltctl_quickstart AS SELECT 1" > test.sql
```

Now you have the basics for a DLT pipeline deployment. You can deploy with dltctl like this:

```
dltctl deploy mypipeline
```
By default, dltctl uses a bunch of sane defaults to make getting started easy:
- It will search your current working directory recursively for .py and .sql files and add them as libraries to your DLT pipeline. To override this behavior, use the --pipeline-files-dir argument and specify a different directory, or use the
- It will use the same default pipeline configurations as the DLT UI
- It will upload your pipeline files to the Databricks workspace and convert them to notebooks using the Import API. By default it automatically determines your username and will store them in your user directory. You can override this behavior by specifying a workspace target using the --workspace-path flag or with config file settings. 
- It will then create and start the pipeline based on settings in dltctl.yaml 

Say you make changes to your code and want to restart your pipeline with your new version of the code, or with different pipeline settings. Simply update your pipeline settings (pipeline.json), save your code changes and run:
```
dltctl deploy
``` 

As an alternative, you could instead create a pipeline without starting it:
```
dltctl create
```
You can stage new files and settings to the workspace as often as you make changes:
```
dltctl stage
```
You can start and stop the pipeline manually:
```
dltctl start
dltctl stop
```
Or as before, you can still use dltctl deploy to combine all of these together.
```
dltctl deploy
```
You can trigger a full refresh using the -r flag:
```
dltctl start -r
dltctl deploy -r
```
If you don't want to watch the events, you can instead start or deploy as a job instead. You need to at least add a job_config parameter with a job name to your config file to do that. A minimally viable dltctl.yaml with job config looks like this:
```
job_config:
  name: mydltctljob
pipeline_settings:
  channel: CURRENT
  clusters:
  - autoscale:
      max_workers: 5
      min_workers: 1
      mode: ENHANCED
    driver_node_type_id: c5.4xlarge
    label: default
    node_type_id: c5.4xlarge
  continuous: false
  development: true
  edition: advanced
  photon: false
  name: mypipeline
```
Then you can run as a job instead:
```
dltctl deploy --as-job
```
Note that `dltctl deploy` and `dltctl stage` won't push changes and/or restart the pipeline if there aren't any changes. This means that adding a job config without changing anything else won't result in a job immediately created. You can force update though with the `--force` flag:
```
dltctl deploy --as-job --force
```
Or alternatively you can just start as a job since there are no other changes:
```
dltctl start --as-job
```
Here is an example of a more advanced dltctl.yaml:
```
pipeline_files_local_dir: .
pipeline_files_workspace_dir: /Users/foo@foo.com/dltctl_artifacts/nested_dir
job_config:
  name: foobk1234
  email_notifications:
    #on_start: [foo@foo.com]
    on_failure: [bar@bar.com]
  schedule:
    quartz_cron_expression: "0 0 12 * * ?"
    timezone_id: "America/Los_Angeles"
    pause_status: "UNPAUSED"
  tags:
    foo: bar
    bar: baz
pipeline_settings:
  channel: CURRENT
  clusters:
  - autoscale:
      max_workers: 4
      min_workers: 1
      mode: "ENHANCED"
    driver_node_type_id: c5.4xlarge
    label: default
    node_type_id: c5.4xlarge
    init_scripts:
    - dbfs:
        destination: dbfs:/bkvarda/init_scripts/datadog-install-driver-workers.sh
    spark_env_vars:
      DD_API_KEY: "{{secrets/bkvarda_dlt/dd_api_key}}"
      DD_ENV: dlt_test_pipeline
      DD_SITE: https://app.datadoghq.com
  continuous: true
  development: true
  edition: advanced
  name: foobk1234
  photon: false
  configuration:
    destination_table: "b"
    starting_offsets: "earliest"
```








            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bkvarda/dltctl",
    "name": "dltctl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "databricks dlt delta live tables",
    "author": "Brandon Kvarda",
    "author_email": "brandon@databricks.com",
    "download_url": "https://files.pythonhosted.org/packages/1d/b0/fa693a25f70a3393ae0c4efa96a06c6ba6c45aa4e5d39433de4f118a78d4/dltctl-0.2.tar.gz",
    "platform": null,
    "description": "## dltctl\n![Build badge](https://img.shields.io/github/workflow/status/bkvarda/dltctl/testing)  \nA CLI tool for fast local iteration on Delta Live Tables pipelines and rapid deployment \n\n#### Installation\n```\npip install dltctl\n```\n\n#### First-time Configuration\nIn order to function dltctl needs to know which Databricks workspace and which tokens/auth info to use. If you already use the databricks cli, dltctl will use whatever you've configured there. Otherwise, you can configure it using the same commands you would with the databricks CLI:\n```\ndltctl configure --jobs-api-version=2.1 --token\n```\n\n#### Usage\n dltctl requires a configuration file in order to function. To generate one, run:\n```\ndltctl init mypipeline\n```\nThat will generate a dltctl.yaml file in your current directory that looks like this:\n```\npipeline_settings:\n  channel: CURRENT\n  clusters:\n  - autoscale:\n      max_workers: 5\n      min_workers: 1\n      mode: ENHANCED\n    driver_node_type_id: c5.4xlarge\n    label: default\n    node_type_id: c5.4xlarge\n  continuous: false\n  development: true\n  edition: advanced\n  name: mypipeline\n  photon: false\n```\nThis is a minimally viable dlt project yaml file. For more advanced settings, edit the file directly or use flags:\n```\ndltctl init mypipeline -f -c '{\"label\":\"default\", \"aws_attributes\": {\"instance_profile_arn\":\"myprofilearn\"}}'\n```\nNow you just need to bring your own DLT pipeline.  \nOr if you just want to get started, you can try this:\n```\necho \"CREATE LIVE TABLE $(whoami | sed 's/\\.//g')_dltctl_quickstart AS SELECT 1\" > test.sql\n```\n\nNow you have the basics for a DLT pipeline deployment. You can deploy with dltctl like this:\n\n```\ndltctl deploy mypipeline\n```\nBy default, dltctl uses a bunch of sane defaults to make getting started easy:\n- It will search your current working directory recursively for .py and .sql files and add them as libraries to your DLT pipeline. To override this behavior, use the --pipeline-files-dir argument and specify a different directory, or use the\n- It will use the same default pipeline configurations as the DLT UI\n- It will upload your pipeline files to the Databricks workspace and convert them to notebooks using the Import API. By default it automatically determines your username and will store them in your user directory. You can override this behavior by specifying a workspace target using the --workspace-path flag or with config file settings. \n- It will then create and start the pipeline based on settings in dltctl.yaml \n\nSay you make changes to your code and want to restart your pipeline with your new version of the code, or with different pipeline settings. Simply update your pipeline settings (pipeline.json), save your code changes and run:\n```\ndltctl deploy\n``` \n\nAs an alternative, you could instead create a pipeline without starting it:\n```\ndltctl create\n```\nYou can stage new files and settings to the workspace as often as you make changes:\n```\ndltctl stage\n```\nYou can start and stop the pipeline manually:\n```\ndltctl start\ndltctl stop\n```\nOr as before, you can still use dltctl deploy to combine all of these together.\n```\ndltctl deploy\n```\nYou can trigger a full refresh using the -r flag:\n```\ndltctl start -r\ndltctl deploy -r\n```\nIf you don't want to watch the events, you can instead start or deploy as a job instead. You need to at least add a job_config parameter with a job name to your config file to do that. A minimally viable dltctl.yaml with job config looks like this:\n```\njob_config:\n  name: mydltctljob\npipeline_settings:\n  channel: CURRENT\n  clusters:\n  - autoscale:\n      max_workers: 5\n      min_workers: 1\n      mode: ENHANCED\n    driver_node_type_id: c5.4xlarge\n    label: default\n    node_type_id: c5.4xlarge\n  continuous: false\n  development: true\n  edition: advanced\n  photon: false\n  name: mypipeline\n```\nThen you can run as a job instead:\n```\ndltctl deploy --as-job\n```\nNote that `dltctl deploy` and `dltctl stage` won't push changes and/or restart the pipeline if there aren't any changes. This means that adding a job config without changing anything else won't result in a job immediately created. You can force update though with the `--force` flag:\n```\ndltctl deploy --as-job --force\n```\nOr alternatively you can just start as a job since there are no other changes:\n```\ndltctl start --as-job\n```\nHere is an example of a more advanced dltctl.yaml:\n```\npipeline_files_local_dir: .\npipeline_files_workspace_dir: /Users/foo@foo.com/dltctl_artifacts/nested_dir\njob_config:\n  name: foobk1234\n  email_notifications:\n    #on_start: [foo@foo.com]\n    on_failure: [bar@bar.com]\n  schedule:\n    quartz_cron_expression: \"0 0 12 * * ?\"\n    timezone_id: \"America/Los_Angeles\"\n    pause_status: \"UNPAUSED\"\n  tags:\n    foo: bar\n    bar: baz\npipeline_settings:\n  channel: CURRENT\n  clusters:\n  - autoscale:\n      max_workers: 4\n      min_workers: 1\n      mode: \"ENHANCED\"\n    driver_node_type_id: c5.4xlarge\n    label: default\n    node_type_id: c5.4xlarge\n    init_scripts:\n    - dbfs:\n        destination: dbfs:/bkvarda/init_scripts/datadog-install-driver-workers.sh\n    spark_env_vars:\n      DD_API_KEY: \"{{secrets/bkvarda_dlt/dd_api_key}}\"\n      DD_ENV: dlt_test_pipeline\n      DD_SITE: https://app.datadoghq.com\n  continuous: true\n  development: true\n  edition: advanced\n  name: foobk1234\n  photon: false\n  configuration:\n    destination_table: \"b\"\n    starting_offsets: \"earliest\"\n```\n\n\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "A command line interface for Databricks Delta Live Tables",
    "version": "0.2",
    "split_keywords": [
        "databricks",
        "dlt",
        "delta",
        "live",
        "tables"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ec33ff332e08482b6fb9549da45e1cd0ada72650f5ccb41e6fc79d9781951c2",
                "md5": "9b82d80edeaea55f60ae9ee0f7b11a22",
                "sha256": "1be9a16ed336651fda8ab53e976eceba6dee5edba7723be170b2944df1608bd5"
            },
            "downloads": -1,
            "filename": "dltctl-0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b82d80edeaea55f60ae9ee0f7b11a22",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 25975,
            "upload_time": "2023-03-23T18:35:34",
            "upload_time_iso_8601": "2023-03-23T18:35:34.725409Z",
            "url": "https://files.pythonhosted.org/packages/9e/c3/3ff332e08482b6fb9549da45e1cd0ada72650f5ccb41e6fc79d9781951c2/dltctl-0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1db0fa693a25f70a3393ae0c4efa96a06c6ba6c45aa4e5d39433de4f118a78d4",
                "md5": "f60be73592a55977ff4d50eca4108cf7",
                "sha256": "8647f85316bdbb8e95ae6f0e94231ace01bf10d7005a0f51d4e3ff821c0e52a4"
            },
            "downloads": -1,
            "filename": "dltctl-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f60be73592a55977ff4d50eca4108cf7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 23932,
            "upload_time": "2023-03-23T18:35:36",
            "upload_time_iso_8601": "2023-03-23T18:35:36.204644Z",
            "url": "https://files.pythonhosted.org/packages/1d/b0/fa693a25f70a3393ae0c4efa96a06c6ba6c45aa4e5d39433de4f118a78d4/dltctl-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-23 18:35:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "bkvarda",
    "github_project": "dltctl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "dltctl"
}
        
Elapsed time: 0.05920s