tactivos-databricks-cicd

Name	tactivos-databricks-cicd JSON
Version	0.1.16 JSON
	download
home_page	https://github.com/tactivos/databricks-cicd
Summary	CICD tool for testing and deploying to Databricks
upload_time	2023-07-17 20:58:23
maintainer
docs_url	None
author	Manol Manolov - Minor changes from Kevin Gould (Mural)
requires_python
license	Apache License 2.0
keywords	databricks cicd
VCS
bugtrack_url
requirements	requests click
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Databricks CI/CD

[![PyPI Latest Release](https://img.shields.io/pypi/v/tactivos-databricks-cicd.svg)](https://pypi.org/project/tactivis-databricks-cicd/)

### Forked from Manol Manolov's original databricks-cicd repo to use with tactivos/data-databricks repo

This is a tool for building CI/CD pipelines for Databricks. It is a python package that
works in conjunction with a custom GIT repository (or a simple file structure) to validate 
and deploy content to databricks. Currently, it can handle the following content:
* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL
* **Jobs** - list of Databricks jobs
* **Clusters**
* **Instance Pools**
* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace

# Installation
`pip install tactivos-databricks-cicd`

# Requirements
To use this tool, you need a source directory structure (preferably as a private GIT repository) 
that has the following structure:
```
any_local_folder_or_git_repo/
├── workspace/
│   ├── some_notebooks_subdir
│   │   └── Notebook 1.py
│   ├── Notebook 2.sql
│   ├── Notebook 3.r
│   └── Notebook 4.scala
├── jobs/
│   ├── My first job.json
│   └── Side gig.json
├── clusters/
│   ├── orion.json
│   └── Another cluster.json
├── instance_pools/
│   ├── Pool 1.json
│   └── Pool 2.json
└── dbfs/
    ├── strawbery_jam.jar
    ├── subdir
    │   └── some_other.jar
    ├── some_python.egg
    └── Ice cream.jpeg
```

**_Note:_** All folder names represent the default and can be configured. This is just a sample.

# Usage
For the latest options and commands run:
```
cicd -h
```
A sample command could be:
```shell
cicd deploy \
   -w sample_12432.7.azuredatabricks.net \
   -u john.smith@domain.com \
   -t dapi_sample_token_0d5-2 \
   -lp '~/git/my-private-repo' \
   -tp /blabla \
   -c DEV.ini \
   --verbose
```
**_Note:_** Paths for windows need to be in double quotes

The default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a
custom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))

# Create content

#### Notebooks:
1. Add a notebook to source
   1. On the databricks UI go to your notebook. 
   1. Click on `File -> Export -> Source file`. 
   1. Add that file to the `workspace` folder of this repo **without changing the file name**.

#### Jobs:
1. Add a job to source
   1. Get the source of the job and write it to a file. You need to have the
      [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli) 
      and [JQ](https://stedolan.github.io/jq/download/) installed. 
      For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it 
      in `c:\Windows\System32` folder. Then on Windows/Linux/MAC: 
      ```
      databricks jobs get --job-id 74 | jq .settings > Job_Name.json
      ```
      This downloads the source JSON of the job from the databricks server and pulls only the settings from it, 
      then writes it in to a file.
      
      **_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces 
      in names.
   1. Add that file to the `jobs` folder
   
#### Clusters:
1. Add a cluster to source
   1. Get the source of the cluster and write it to a file. 
      ```
      databricks clusters get --cluster-name orion > orion.json
      ```
      **_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces 
      in names.
   1. Add that file to the `clusters` folder
   
#### Instance pools:
1. Add an instance pool to source
   1. Similar to clusters, just use `instance-pools` instead of `clusters`
   
#### DBFS:
1. Add a file to dbfs
   1. Just add a file to the the `dbfs` folder.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tactivos/databricks-cicd",
    "name": "tactivos-databricks-cicd",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "databricks cicd",
    "author": "Manol Manolov - Minor changes from Kevin Gould (Mural)",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/1d/49/9e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4/tactivos-databricks-cicd-0.1.16.tar.gz",
    "platform": null,
    "description": "# Databricks CI/CD\n\n[![PyPI Latest Release](https://img.shields.io/pypi/v/tactivos-databricks-cicd.svg)](https://pypi.org/project/tactivis-databricks-cicd/)\n\n### Forked from Manol Manolov's original databricks-cicd repo to use with tactivos/data-databricks repo\n\nThis is a tool for building CI/CD pipelines for Databricks. It is a python package that\nworks in conjunction with a custom GIT repository (or a simple file structure) to validate \nand deploy content to databricks. Currently, it can handle the following content:\n* **Workspace** - a collection of notebooks written in Scala, Python, R or SQL\n* **Jobs** - list of Databricks jobs\n* **Clusters**\n* **Instance Pools**\n* **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace\n\n# Installation\n`pip install tactivos-databricks-cicd`\n\n# Requirements\nTo use this tool, you need a source directory structure (preferably as a private GIT repository) \nthat has the following structure:\n```\nany_local_folder_or_git_repo/\n\u251c\u2500\u2500 workspace/\n\u2502   \u251c\u2500\u2500 some_notebooks_subdir\n\u2502   \u2502   \u2514\u2500\u2500 Notebook 1.py\n\u2502   \u251c\u2500\u2500 Notebook 2.sql\n\u2502   \u251c\u2500\u2500 Notebook 3.r\n\u2502   \u2514\u2500\u2500 Notebook 4.scala\n\u251c\u2500\u2500 jobs/\n\u2502   \u251c\u2500\u2500 My first job.json\n\u2502   \u2514\u2500\u2500 Side gig.json\n\u251c\u2500\u2500 clusters/\n\u2502   \u251c\u2500\u2500 orion.json\n\u2502   \u2514\u2500\u2500 Another cluster.json\n\u251c\u2500\u2500 instance_pools/\n\u2502   \u251c\u2500\u2500 Pool 1.json\n\u2502   \u2514\u2500\u2500 Pool 2.json\n\u2514\u2500\u2500 dbfs/\n    \u251c\u2500\u2500 strawbery_jam.jar\n    \u251c\u2500\u2500 subdir\n    \u2502   \u2514\u2500\u2500 some_other.jar\n    \u251c\u2500\u2500 some_python.egg\n    \u2514\u2500\u2500 Ice cream.jpeg\n```\n\n**_Note:_** All folder names represent the default and can be configured. This is just a sample.\n\n# Usage\nFor the latest options and commands run:\n```\ncicd -h\n```\nA sample command could be:\n```shell\ncicd deploy \\\n   -w sample_12432.7.azuredatabricks.net \\\n   -u john.smith@domain.com \\\n   -t dapi_sample_token_0d5-2 \\\n   -lp '~/git/my-private-repo' \\\n   -tp /blabla \\\n   -c DEV.ini \\\n   --verbose\n```\n**_Note:_** Paths for windows need to be in double quotes\n\nThe default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a\ncustom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))\n\n# Create content\n\n#### Notebooks:\n1. Add a notebook to source\n   1. On the databricks UI go to your notebook. \n   1. Click on `File -> Export -> Source file`. \n   1. Add that file to the `workspace` folder of this repo **without changing the file name**.\n\n#### Jobs:\n1. Add a job to source\n   1. Get the source of the job and write it to a file. You need to have the\n      [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli) \n      and [JQ](https://stedolan.github.io/jq/download/) installed. \n      For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it \n      in `c:\\Windows\\System32` folder. Then on Windows/Linux/MAC: \n      ```\n      databricks jobs get --job-id 74 | jq .settings > Job_Name.json\n      ```\n      This downloads the source JSON of the job from the databricks server and pulls only the settings from it, \n      then writes it in to a file.\n      \n      **_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces \n      in names.\n   1. Add that file to the `jobs` folder\n   \n#### Clusters:\n1. Add a cluster to source\n   1. Get the source of the cluster and write it to a file. \n      ```\n      databricks clusters get --cluster-name orion > orion.json\n      ```\n      **_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces \n      in names.\n   1. Add that file to the `clusters` folder\n   \n#### Instance pools:\n1. Add an instance pool to source\n   1. Similar to clusters, just use `instance-pools` instead of `clusters`\n   \n#### DBFS:\n1. Add a file to dbfs\n   1. Just add a file to the the `dbfs` folder.\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "CICD tool for testing and deploying to Databricks",
    "version": "0.1.16",
    "project_urls": {
        "Homepage": "https://github.com/tactivos/databricks-cicd",
        "Source": "https://github.com/tactivos/databricks-cicd"
    },
    "split_keywords": [
        "databricks",
        "cicd"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "15fa59de245b453e3e934d5efba1fc976cb5fb3a2020e28175deb960db101afe",
                "md5": "78f8dd5441a6680062954be6317ede93",
                "sha256": "f8f2f052ac24d99a7e39b9ea6b332a788896d6d3d752f23384d0996ac1ab11d0"
            },
            "downloads": -1,
            "filename": "tactivos_databricks_cicd-0.1.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "78f8dd5441a6680062954be6317ede93",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 22309,
            "upload_time": "2023-07-17T20:58:21",
            "upload_time_iso_8601": "2023-07-17T20:58:21.909133Z",
            "url": "https://files.pythonhosted.org/packages/15/fa/59de245b453e3e934d5efba1fc976cb5fb3a2020e28175deb960db101afe/tactivos_databricks_cicd-0.1.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1d499e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4",
                "md5": "74f826a86bd81b9bf495fae90ccb3703",
                "sha256": "ef80aed317701e5523b39c53b252db0c1a5ca4fdf04c8b5b0ebcbe17bcb5d1e4"
            },
            "downloads": -1,
            "filename": "tactivos-databricks-cicd-0.1.16.tar.gz",
            "has_sig": false,
            "md5_digest": "74f826a86bd81b9bf495fae90ccb3703",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16216,
            "upload_time": "2023-07-17T20:58:23",
            "upload_time_iso_8601": "2023-07-17T20:58:23.141028Z",
            "url": "https://files.pythonhosted.org/packages/1d/49/9e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4/tactivos-databricks-cicd-0.1.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-17 20:58:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tactivos",
    "github_project": "databricks-cicd",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.28.1"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "6.7"
                ]
            ]
        }
    ],
    "lcname": "tactivos-databricks-cicd"
}

Manol Manolov - Minor changes from Kevin Gould (Mural)