# Databricks CI/CD
[![PyPI Latest Release](https://img.shields.io/pypi/v/databricks-cicd.svg)](https://pypi.org/project/databricks-cicd/)
This is a tool for building CI/CD pipelines for Databricks. It is a python package that
works in conjunction with a custom GIT repository (or a simple file structure) to validate
and deploy content to databricks. Currently, it can handle the following content:
* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL
* **Jobs** - list of Databricks jobs
* **Clusters**
* **Instance Pools**
* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace
# Installation
`pip install databricks-cicd`
# Requirements
To use this tool, you need a source directory structure (preferably as a private GIT repository)
that has the following structure:
```
any_local_folder_or_git_repo/
├── workspace/
│ ├── some_notebooks_subdir
│ │ └── Notebook 1.py
│ ├── Notebook 2.sql
│ ├── Notebook 3.r
│ └── Notebook 4.scala
├── jobs/
│ ├── My first job.json
│ └── Side gig.json
├── clusters/
│ ├── orion.json
│ └── Another cluster.json
├── instance_pools/
│ ├── Pool 1.json
│ └── Pool 2.json
└── dbfs/
├── strawbery_jam.jar
├── subdir
│ └── some_other.jar
├── some_python.egg
└── Ice cream.jpeg
```
**_Note:_** All folder names represent the default and can be configured. This is just a sample.
# Usage
For the latest options and commands run:
```
cicd -h
```
A sample command could be:
```shell
cicd deploy \
-w sample_12432.7.azuredatabricks.net \
-u john.smith@domain.com \
-t dapi_sample_token_0d5-2 \
-lp '~/git/my-private-repo' \
-tp /blabla \
-c DEV.ini \
--verbose
```
**_Note:_** Paths for windows need to be in double quotes
The default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a
custom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))
# Create content
#### Notebooks:
1. Add a notebook to source
1. On the databricks UI go to your notebook.
1. Click on `File -> Export -> Source file`.
1. Add that file to the `workspace` folder of this repo **without changing the file name**.
#### Jobs:
1. Add a job to source
1. Get the source of the job and write it to a file. You need to have the
[Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli)
and [JQ](https://stedolan.github.io/jq/download/) installed.
For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it
in `c:\Windows\System32` folder. Then on Windows/Linux/MAC:
```
databricks jobs get --job-id 74 | jq .settings > Job_Name.json
```
This downloads the source JSON of the job from the databricks server and pulls only the settings from it,
then writes it in to a file.
**_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces
in names.
1. Add that file to the `jobs` folder
#### Clusters:
1. Add a cluster to source
1. Get the source of the cluster and write it to a file.
```
databricks clusters get --cluster-name orion > orion.json
```
**_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces
in names.
1. Add that file to the `clusters` folder
#### Instance pools:
1. Add an instance pool to source
1. Similar to clusters, just use `instance-pools` instead of `clusters`
#### DBFS:
1. Add a file to dbfs
1. Just add a file to the the `dbfs` folder.
# TODO
* Improve validation. It is still a baby.
Raw data
{
"_id": null,
"home_page": "https://github.com/man40/databricks-cicd",
"name": "databricks-cicd",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "databricks cicd",
"author": "Manol Manolov",
"author_email": "man40dev@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/11/41/ab60898e478db8c539a399900b6dda9910fa88bb480b0dae7e4234137ee7/databricks-cicd-0.1.14.tar.gz",
"platform": null,
"description": "# Databricks CI/CD\n[![PyPI Latest Release](https://img.shields.io/pypi/v/databricks-cicd.svg)](https://pypi.org/project/databricks-cicd/)\n\nThis is a tool for building CI/CD pipelines for Databricks. It is a python package that\nworks in conjunction with a custom GIT repository (or a simple file structure) to validate \nand deploy content to databricks. Currently, it can handle the following content:\n* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL\n* **Jobs** - list of Databricks jobs\n* **Clusters**\n* **Instance Pools**\n* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace\n\n# Installation\n`pip install databricks-cicd`\n\n# Requirements\nTo use this tool, you need a source directory structure (preferably as a private GIT repository) \nthat has the following structure:\n```\nany_local_folder_or_git_repo/\n\u251c\u2500\u2500 workspace/\n\u2502 \u251c\u2500\u2500 some_notebooks_subdir\n\u2502 \u2502 \u2514\u2500\u2500 Notebook 1.py\n\u2502 \u251c\u2500\u2500 Notebook 2.sql\n\u2502 \u251c\u2500\u2500 Notebook 3.r\n\u2502 \u2514\u2500\u2500 Notebook 4.scala\n\u251c\u2500\u2500 jobs/\n\u2502 \u251c\u2500\u2500 My first job.json\n\u2502 \u2514\u2500\u2500 Side gig.json\n\u251c\u2500\u2500 clusters/\n\u2502 \u251c\u2500\u2500 orion.json\n\u2502 \u2514\u2500\u2500 Another cluster.json\n\u251c\u2500\u2500 instance_pools/\n\u2502 \u251c\u2500\u2500 Pool 1.json\n\u2502 \u2514\u2500\u2500 Pool 2.json\n\u2514\u2500\u2500 dbfs/\n \u251c\u2500\u2500 strawbery_jam.jar\n \u251c\u2500\u2500 subdir\n \u2502 \u2514\u2500\u2500 some_other.jar\n \u251c\u2500\u2500 some_python.egg\n \u2514\u2500\u2500 Ice cream.jpeg\n```\n\n**_Note:_** All folder names represent the default and can be configured. This is just a sample.\n\n# Usage\nFor the latest options and commands run:\n```\ncicd -h\n```\nA sample command could be:\n```shell\ncicd deploy \\\n -w sample_12432.7.azuredatabricks.net \\\n -u john.smith@domain.com \\\n -t dapi_sample_token_0d5-2 \\\n -lp '~/git/my-private-repo' \\\n -tp /blabla \\\n -c DEV.ini \\\n --verbose\n```\n**_Note:_** Paths for windows need to be in double quotes\n\nThe default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a\ncustom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))\n\n# Create content\n\n#### Notebooks:\n1. Add a notebook to source\n 1. On the databricks UI go to your notebook. \n 1. Click on `File -> Export -> Source file`. \n 1. Add that file to the `workspace` folder of this repo **without changing the file name**.\n\n#### Jobs:\n1. Add a job to source\n 1. Get the source of the job and write it to a file. You need to have the\n [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli) \n and [JQ](https://stedolan.github.io/jq/download/) installed. \n For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it \n in `c:\\Windows\\System32` folder. Then on Windows/Linux/MAC: \n ```\n databricks jobs get --job-id 74 | jq .settings > Job_Name.json\n ```\n This downloads the source JSON of the job from the databricks server and pulls only the settings from it, \n then writes it in to a file.\n \n **_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces \n in names.\n 1. Add that file to the `jobs` folder\n \n#### Clusters:\n1. Add a cluster to source\n 1. Get the source of the cluster and write it to a file. \n ```\n databricks clusters get --cluster-name orion > orion.json\n ```\n **_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces \n in names.\n 1. Add that file to the `clusters` folder\n \n#### Instance pools:\n1. Add an instance pool to source\n 1. Similar to clusters, just use `instance-pools` instead of `clusters`\n \n#### DBFS:\n1. Add a file to dbfs\n 1. Just add a file to the the `dbfs` folder.\n \n# TODO\n* Improve validation. It is still a baby.\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "CICD tool for testing and deploying to Databricks",
"version": "0.1.14",
"split_keywords": [
"databricks",
"cicd"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "1d91573a2362da21c8ebf96510518a87",
"sha256": "1612f7011bdadf2b9ae105a5e3640a3b651b3a46eaaa0e96de41f3dba5247267"
},
"downloads": -1,
"filename": "databricks_cicd-0.1.14-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1d91573a2362da21c8ebf96510518a87",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 21141,
"upload_time": "2022-12-21T22:02:29",
"upload_time_iso_8601": "2022-12-21T22:02:29.538572Z",
"url": "https://files.pythonhosted.org/packages/94/0f/21578add17f781c4341655f5f773853217ac796bc9ccc67ee55673b559d2/databricks_cicd-0.1.14-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "a86d48d91066d0009bc2b92bcf012820",
"sha256": "1b929da545c5cce116800635eb1f1105e6124dd44514189e73d55a95423976f3"
},
"downloads": -1,
"filename": "databricks-cicd-0.1.14.tar.gz",
"has_sig": false,
"md5_digest": "a86d48d91066d0009bc2b92bcf012820",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15220,
"upload_time": "2022-12-21T22:02:30",
"upload_time_iso_8601": "2022-12-21T22:02:30.492897Z",
"url": "https://files.pythonhosted.org/packages/11/41/ab60898e478db8c539a399900b6dda9910fa88bb480b0dae7e4234137ee7/databricks-cicd-0.1.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-21 22:02:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "man40",
"github_project": "databricks-cicd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "requests",
"specs": [
[
">=",
"2.28.1"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"6.7"
]
]
}
],
"lcname": "databricks-cicd"
}