# Databricks CI/CD
[![PyPI Latest Release](https://img.shields.io/pypi/v/tactivos-databricks-cicd.svg)](https://pypi.org/project/tactivis-databricks-cicd/)
### Forked from Manol Manolov's original databricks-cicd repo to use with tactivos/data-databricks repo
This is a tool for building CI/CD pipelines for Databricks. It is a python package that
works in conjunction with a custom GIT repository (or a simple file structure) to validate
and deploy content to databricks. Currently, it can handle the following content:
* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL
* **Jobs** - list of Databricks jobs
* **Clusters**
* **Instance Pools**
* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace
# Installation
`pip install tactivos-databricks-cicd`
# Requirements
To use this tool, you need a source directory structure (preferably as a private GIT repository)
that has the following structure:
```
any_local_folder_or_git_repo/
├── workspace/
│ ├── some_notebooks_subdir
│ │ └── Notebook 1.py
│ ├── Notebook 2.sql
│ ├── Notebook 3.r
│ └── Notebook 4.scala
├── jobs/
│ ├── My first job.json
│ └── Side gig.json
├── clusters/
│ ├── orion.json
│ └── Another cluster.json
├── instance_pools/
│ ├── Pool 1.json
│ └── Pool 2.json
└── dbfs/
├── strawbery_jam.jar
├── subdir
│ └── some_other.jar
├── some_python.egg
└── Ice cream.jpeg
```
**_Note:_** All folder names represent the default and can be configured. This is just a sample.
# Usage
For the latest options and commands run:
```
cicd -h
```
A sample command could be:
```shell
cicd deploy \
-w sample_12432.7.azuredatabricks.net \
-u john.smith@domain.com \
-t dapi_sample_token_0d5-2 \
-lp '~/git/my-private-repo' \
-tp /blabla \
-c DEV.ini \
--verbose
```
**_Note:_** Paths for windows need to be in double quotes
The default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a
custom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))
# Create content
#### Notebooks:
1. Add a notebook to source
1. On the databricks UI go to your notebook.
1. Click on `File -> Export -> Source file`.
1. Add that file to the `workspace` folder of this repo **without changing the file name**.
#### Jobs:
1. Add a job to source
1. Get the source of the job and write it to a file. You need to have the
[Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli)
and [JQ](https://stedolan.github.io/jq/download/) installed.
For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it
in `c:\Windows\System32` folder. Then on Windows/Linux/MAC:
```
databricks jobs get --job-id 74 | jq .settings > Job_Name.json
```
This downloads the source JSON of the job from the databricks server and pulls only the settings from it,
then writes it in to a file.
**_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces
in names.
1. Add that file to the `jobs` folder
#### Clusters:
1. Add a cluster to source
1. Get the source of the cluster and write it to a file.
```
databricks clusters get --cluster-name orion > orion.json
```
**_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces
in names.
1. Add that file to the `clusters` folder
#### Instance pools:
1. Add an instance pool to source
1. Similar to clusters, just use `instance-pools` instead of `clusters`
#### DBFS:
1. Add a file to dbfs
1. Just add a file to the the `dbfs` folder.
Raw data
{
"_id": null,
"home_page": "https://github.com/tactivos/databricks-cicd",
"name": "tactivos-databricks-cicd",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "databricks cicd",
"author": "Manol Manolov - Minor changes from Kevin Gould (Mural)",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/1d/49/9e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4/tactivos-databricks-cicd-0.1.16.tar.gz",
"platform": null,
"description": "# Databricks CI/CD\n\n[![PyPI Latest Release](https://img.shields.io/pypi/v/tactivos-databricks-cicd.svg)](https://pypi.org/project/tactivis-databricks-cicd/)\n\n### Forked from Manol Manolov's original databricks-cicd repo to use with tactivos/data-databricks repo\n\nThis is a tool for building CI/CD pipelines for Databricks. It is a python package that\nworks in conjunction with a custom GIT repository (or a simple file structure) to validate \nand deploy content to databricks. Currently, it can handle the following content:\n* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL\n* **Jobs** - list of Databricks jobs\n* **Clusters**\n* **Instance Pools**\n* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace\n\n# Installation\n`pip install tactivos-databricks-cicd`\n\n# Requirements\nTo use this tool, you need a source directory structure (preferably as a private GIT repository) \nthat has the following structure:\n```\nany_local_folder_or_git_repo/\n\u251c\u2500\u2500 workspace/\n\u2502 \u251c\u2500\u2500 some_notebooks_subdir\n\u2502 \u2502 \u2514\u2500\u2500 Notebook 1.py\n\u2502 \u251c\u2500\u2500 Notebook 2.sql\n\u2502 \u251c\u2500\u2500 Notebook 3.r\n\u2502 \u2514\u2500\u2500 Notebook 4.scala\n\u251c\u2500\u2500 jobs/\n\u2502 \u251c\u2500\u2500 My first job.json\n\u2502 \u2514\u2500\u2500 Side gig.json\n\u251c\u2500\u2500 clusters/\n\u2502 \u251c\u2500\u2500 orion.json\n\u2502 \u2514\u2500\u2500 Another cluster.json\n\u251c\u2500\u2500 instance_pools/\n\u2502 \u251c\u2500\u2500 Pool 1.json\n\u2502 \u2514\u2500\u2500 Pool 2.json\n\u2514\u2500\u2500 dbfs/\n \u251c\u2500\u2500 strawbery_jam.jar\n \u251c\u2500\u2500 subdir\n \u2502 \u2514\u2500\u2500 some_other.jar\n \u251c\u2500\u2500 some_python.egg\n \u2514\u2500\u2500 Ice cream.jpeg\n```\n\n**_Note:_** All folder names represent the default and can be configured. This is just a sample.\n\n# Usage\nFor the latest options and commands run:\n```\ncicd -h\n```\nA sample command could be:\n```shell\ncicd deploy \\\n -w sample_12432.7.azuredatabricks.net \\\n -u john.smith@domain.com \\\n -t dapi_sample_token_0d5-2 \\\n -lp '~/git/my-private-repo' \\\n -tp /blabla \\\n -c DEV.ini \\\n --verbose\n```\n**_Note:_** Paths for windows need to be in double quotes\n\nThe default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a\ncustom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))\n\n# Create content\n\n#### Notebooks:\n1. Add a notebook to source\n 1. On the databricks UI go to your notebook. \n 1. Click on `File -> Export -> Source file`. \n 1. Add that file to the `workspace` folder of this repo **without changing the file name**.\n\n#### Jobs:\n1. Add a job to source\n 1. Get the source of the job and write it to a file. You need to have the\n [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli) \n and [JQ](https://stedolan.github.io/jq/download/) installed. \n For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it \n in `c:\\Windows\\System32` folder. Then on Windows/Linux/MAC: \n ```\n databricks jobs get --job-id 74 | jq .settings > Job_Name.json\n ```\n This downloads the source JSON of the job from the databricks server and pulls only the settings from it, \n then writes it in to a file.\n \n **_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces \n in names.\n 1. Add that file to the `jobs` folder\n \n#### Clusters:\n1. Add a cluster to source\n 1. Get the source of the cluster and write it to a file. \n ```\n databricks clusters get --cluster-name orion > orion.json\n ```\n **_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces \n in names.\n 1. Add that file to the `clusters` folder\n \n#### Instance pools:\n1. Add an instance pool to source\n 1. Similar to clusters, just use `instance-pools` instead of `clusters`\n \n#### DBFS:\n1. Add a file to dbfs\n 1. Just add a file to the the `dbfs` folder.\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "CICD tool for testing and deploying to Databricks",
"version": "0.1.16",
"project_urls": {
"Homepage": "https://github.com/tactivos/databricks-cicd",
"Source": "https://github.com/tactivos/databricks-cicd"
},
"split_keywords": [
"databricks",
"cicd"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "15fa59de245b453e3e934d5efba1fc976cb5fb3a2020e28175deb960db101afe",
"md5": "78f8dd5441a6680062954be6317ede93",
"sha256": "f8f2f052ac24d99a7e39b9ea6b332a788896d6d3d752f23384d0996ac1ab11d0"
},
"downloads": -1,
"filename": "tactivos_databricks_cicd-0.1.16-py3-none-any.whl",
"has_sig": false,
"md5_digest": "78f8dd5441a6680062954be6317ede93",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 22309,
"upload_time": "2023-07-17T20:58:21",
"upload_time_iso_8601": "2023-07-17T20:58:21.909133Z",
"url": "https://files.pythonhosted.org/packages/15/fa/59de245b453e3e934d5efba1fc976cb5fb3a2020e28175deb960db101afe/tactivos_databricks_cicd-0.1.16-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1d499e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4",
"md5": "74f826a86bd81b9bf495fae90ccb3703",
"sha256": "ef80aed317701e5523b39c53b252db0c1a5ca4fdf04c8b5b0ebcbe17bcb5d1e4"
},
"downloads": -1,
"filename": "tactivos-databricks-cicd-0.1.16.tar.gz",
"has_sig": false,
"md5_digest": "74f826a86bd81b9bf495fae90ccb3703",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16216,
"upload_time": "2023-07-17T20:58:23",
"upload_time_iso_8601": "2023-07-17T20:58:23.141028Z",
"url": "https://files.pythonhosted.org/packages/1d/49/9e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4/tactivos-databricks-cicd-0.1.16.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-17 20:58:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tactivos",
"github_project": "databricks-cicd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "requests",
"specs": [
[
">=",
"2.28.1"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"6.7"
]
]
}
],
"lcname": "tactivos-databricks-cicd"
}