databricks-cicd


Namedatabricks-cicd JSON
Version 0.1.14 PyPI version JSON
download
home_pagehttps://github.com/man40/databricks-cicd
SummaryCICD tool for testing and deploying to Databricks
upload_time2022-12-21 22:02:30
maintainer
docs_urlNone
authorManol Manolov
requires_python
licenseApache License 2.0
keywords databricks cicd
VCS
bugtrack_url
requirements requests click
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Databricks CI/CD
[![PyPI Latest Release](https://img.shields.io/pypi/v/databricks-cicd.svg)](https://pypi.org/project/databricks-cicd/)

This is a tool for building CI/CD pipelines for Databricks. It is a python package that
works in conjunction with a custom GIT repository (or a simple file structure) to validate 
and deploy content to databricks. Currently, it can handle the following content:
* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL
* **Jobs** - list of Databricks jobs
* **Clusters**
* **Instance Pools**
* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace

# Installation
`pip install databricks-cicd`

# Requirements
To use this tool, you need a source directory structure (preferably as a private GIT repository) 
that has the following structure:
```
any_local_folder_or_git_repo/
├── workspace/
│   ├── some_notebooks_subdir
│   │   └── Notebook 1.py
│   ├── Notebook 2.sql
│   ├── Notebook 3.r
│   └── Notebook 4.scala
├── jobs/
│   ├── My first job.json
│   └── Side gig.json
├── clusters/
│   ├── orion.json
│   └── Another cluster.json
├── instance_pools/
│   ├── Pool 1.json
│   └── Pool 2.json
└── dbfs/
    ├── strawbery_jam.jar
    ├── subdir
    │   └── some_other.jar
    ├── some_python.egg
    └── Ice cream.jpeg
```

**_Note:_** All folder names represent the default and can be configured. This is just a sample.

# Usage
For the latest options and commands run:
```
cicd -h
```
A sample command could be:
```shell
cicd deploy \
   -w sample_12432.7.azuredatabricks.net \
   -u john.smith@domain.com \
   -t dapi_sample_token_0d5-2 \
   -lp '~/git/my-private-repo' \
   -tp /blabla \
   -c DEV.ini \
   --verbose
```
**_Note:_** Paths for windows need to be in double quotes

The default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a
custom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))

# Create content

#### Notebooks:
1. Add a notebook to source
   1. On the databricks UI go to your notebook. 
   1. Click on `File -> Export -> Source file`. 
   1. Add that file to the `workspace` folder of this repo **without changing the file name**.

#### Jobs:
1. Add a job to source
   1. Get the source of the job and write it to a file. You need to have the
      [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli) 
      and [JQ](https://stedolan.github.io/jq/download/) installed. 
      For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it 
      in `c:\Windows\System32` folder. Then on Windows/Linux/MAC: 
      ```
      databricks jobs get --job-id 74 | jq .settings > Job_Name.json
      ```
      This downloads the source JSON of the job from the databricks server and pulls only the settings from it, 
      then writes it in to a file.
      
      **_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces 
      in names.
   1. Add that file to the `jobs` folder
   
#### Clusters:
1. Add a cluster to source
   1. Get the source of the cluster and write it to a file. 
      ```
      databricks clusters get --cluster-name orion > orion.json
      ```
      **_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces 
      in names.
   1. Add that file to the `clusters` folder
   
#### Instance pools:
1. Add an instance pool to source
   1. Similar to clusters, just use `instance-pools` instead of `clusters`
   
#### DBFS:
1. Add a file to dbfs
   1. Just add a file to the the `dbfs` folder.
   
# TODO
* Improve validation. It is still a baby.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/man40/databricks-cicd",
    "name": "databricks-cicd",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "databricks cicd",
    "author": "Manol Manolov",
    "author_email": "man40dev@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/11/41/ab60898e478db8c539a399900b6dda9910fa88bb480b0dae7e4234137ee7/databricks-cicd-0.1.14.tar.gz",
    "platform": null,
    "description": "# Databricks CI/CD\n[![PyPI Latest Release](https://img.shields.io/pypi/v/databricks-cicd.svg)](https://pypi.org/project/databricks-cicd/)\n\nThis is a tool for building CI/CD pipelines for Databricks. It is a python package that\nworks in conjunction with a custom GIT repository (or a simple file structure) to validate \nand deploy content to databricks. Currently, it can handle the following content:\n* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL\n* **Jobs** - list of Databricks jobs\n* **Clusters**\n* **Instance Pools**\n* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace\n\n# Installation\n`pip install databricks-cicd`\n\n# Requirements\nTo use this tool, you need a source directory structure (preferably as a private GIT repository) \nthat has the following structure:\n```\nany_local_folder_or_git_repo/\n\u251c\u2500\u2500 workspace/\n\u2502   \u251c\u2500\u2500 some_notebooks_subdir\n\u2502   \u2502   \u2514\u2500\u2500 Notebook 1.py\n\u2502   \u251c\u2500\u2500 Notebook 2.sql\n\u2502   \u251c\u2500\u2500 Notebook 3.r\n\u2502   \u2514\u2500\u2500 Notebook 4.scala\n\u251c\u2500\u2500 jobs/\n\u2502   \u251c\u2500\u2500 My first job.json\n\u2502   \u2514\u2500\u2500 Side gig.json\n\u251c\u2500\u2500 clusters/\n\u2502   \u251c\u2500\u2500 orion.json\n\u2502   \u2514\u2500\u2500 Another cluster.json\n\u251c\u2500\u2500 instance_pools/\n\u2502   \u251c\u2500\u2500 Pool 1.json\n\u2502   \u2514\u2500\u2500 Pool 2.json\n\u2514\u2500\u2500 dbfs/\n    \u251c\u2500\u2500 strawbery_jam.jar\n    \u251c\u2500\u2500 subdir\n    \u2502   \u2514\u2500\u2500 some_other.jar\n    \u251c\u2500\u2500 some_python.egg\n    \u2514\u2500\u2500 Ice cream.jpeg\n```\n\n**_Note:_** All folder names represent the default and can be configured. This is just a sample.\n\n# Usage\nFor the latest options and commands run:\n```\ncicd -h\n```\nA sample command could be:\n```shell\ncicd deploy \\\n   -w sample_12432.7.azuredatabricks.net \\\n   -u john.smith@domain.com \\\n   -t dapi_sample_token_0d5-2 \\\n   -lp '~/git/my-private-repo' \\\n   -tp /blabla \\\n   -c DEV.ini \\\n   --verbose\n```\n**_Note:_** Paths for windows need to be in double quotes\n\nThe default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a\ncustom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))\n\n# Create content\n\n#### Notebooks:\n1. Add a notebook to source\n   1. On the databricks UI go to your notebook. \n   1. Click on `File -> Export -> Source file`. \n   1. Add that file to the `workspace` folder of this repo **without changing the file name**.\n\n#### Jobs:\n1. Add a job to source\n   1. Get the source of the job and write it to a file. You need to have the\n      [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli) \n      and [JQ](https://stedolan.github.io/jq/download/) installed. \n      For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it \n      in `c:\\Windows\\System32` folder. Then on Windows/Linux/MAC: \n      ```\n      databricks jobs get --job-id 74 | jq .settings > Job_Name.json\n      ```\n      This downloads the source JSON of the job from the databricks server and pulls only the settings from it, \n      then writes it in to a file.\n      \n      **_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces \n      in names.\n   1. Add that file to the `jobs` folder\n   \n#### Clusters:\n1. Add a cluster to source\n   1. Get the source of the cluster and write it to a file. \n      ```\n      databricks clusters get --cluster-name orion > orion.json\n      ```\n      **_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces \n      in names.\n   1. Add that file to the `clusters` folder\n   \n#### Instance pools:\n1. Add an instance pool to source\n   1. Similar to clusters, just use `instance-pools` instead of `clusters`\n   \n#### DBFS:\n1. Add a file to dbfs\n   1. Just add a file to the the `dbfs` folder.\n   \n# TODO\n* Improve validation. It is still a baby.\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "CICD tool for testing and deploying to Databricks",
    "version": "0.1.14",
    "split_keywords": [
        "databricks",
        "cicd"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "1d91573a2362da21c8ebf96510518a87",
                "sha256": "1612f7011bdadf2b9ae105a5e3640a3b651b3a46eaaa0e96de41f3dba5247267"
            },
            "downloads": -1,
            "filename": "databricks_cicd-0.1.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1d91573a2362da21c8ebf96510518a87",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 21141,
            "upload_time": "2022-12-21T22:02:29",
            "upload_time_iso_8601": "2022-12-21T22:02:29.538572Z",
            "url": "https://files.pythonhosted.org/packages/94/0f/21578add17f781c4341655f5f773853217ac796bc9ccc67ee55673b559d2/databricks_cicd-0.1.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "a86d48d91066d0009bc2b92bcf012820",
                "sha256": "1b929da545c5cce116800635eb1f1105e6124dd44514189e73d55a95423976f3"
            },
            "downloads": -1,
            "filename": "databricks-cicd-0.1.14.tar.gz",
            "has_sig": false,
            "md5_digest": "a86d48d91066d0009bc2b92bcf012820",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15220,
            "upload_time": "2022-12-21T22:02:30",
            "upload_time_iso_8601": "2022-12-21T22:02:30.492897Z",
            "url": "https://files.pythonhosted.org/packages/11/41/ab60898e478db8c539a399900b6dda9910fa88bb480b0dae7e4234137ee7/databricks-cicd-0.1.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-21 22:02:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "man40",
    "github_project": "databricks-cicd",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.28.1"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "6.7"
                ]
            ]
        }
    ],
    "lcname": "databricks-cicd"
}
        
Elapsed time: 0.02177s