<h1 align="center">
Astro Databricks
</h1>
<h3 align="center">
Affordable Databricks Workflows in Apache Airflow<br><br>
</h3>
[![Python versions](https://img.shields.io/pypi/pyversions/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)
[![License](https://img.shields.io/pypi/l/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)
[![Development Status](https://img.shields.io/pypi/status/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)
[![PyPI downloads](https://img.shields.io/pypi/dm/astro-providers-databricks.svg)](https://pypistats.org/packages/astro-providers-databricks)
[![Contributors](https://img.shields.io/github/contributors/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)
[![Commit activity](https://img.shields.io/github/commit-activity/m/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)
[![CI](https://github.com/astronomer/astro-providers-databricks/actions/workflows/ci.yml/badge.svg)](https://github.com/astronomer/astro-providers-databricks)
[![codecov](https://codecov.io/gh/astronomer/astro-providers-databricks/branch/main/graph/badge.svg?token=MI4SSE50Q6)](https://codecov.io/gh/astronomer/astro-providers-databricks)
**Astro Databricks** is an [Apache Airflow](https://github.com/apache/airflow) provider created by [Astronomer](https://www.astronomer.io/) for an **optimal Databricks experience**. With the `DatabricksTaskGroup`, Astro Datatricks allows you to run from Databricks workflows without
the need of running Jobs individually, which can result in [75% cost reduction](https://www.databricks.com/product/aws-pricing).
## Prerequisites
* Apache Airflow >= 2.2.4
* Python >= 2.7
* Databricks account
* Previously created Databricks Notebooks
## Install
```shell
pip install astro-providers-databricks
```
## Quickstart
1. Use pre-existing or create two simple [Databricks Notebooks](https://docs.databricks.com/notebooks/). Their identifiers will be used in step (5). The original example DAG uses:
* `Shared/Notebook_1`
* `Shared/Notebook_2`
2. Generate a [Databricks Personal Token](https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens). This will be used in step (6).
3. Ensure that your Airflow environment is set up correctly by running the following commands:
```shell
export AIRFLOW_HOME=`pwd`
airflow db init
```
4. [Create using your preferred way](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) a Databricks Airflow connection (so Airflow can access Databricks using your credentials). This can be done by running the following command, replacing the login and password (with your access token):
```shell
airflow connections add 'databricks_conn' \
--conn-json '{
"conn_type": "databricks",
"login": "some.email@yourcompany.com",
"host": "https://dbc-c9390870-65ef.cloud.databricks.com/",
"password": "personal-access-token"
}'
```
5. Copy the following workflow into a file named `example_databricks_workflow.py` and add it to the `dags` directory of your Airflow project:
https://github.com/astronomer/astro-providers-databricks/blob/45897543a5e34d446c84b3fbc4f6f7a3ed16cdf7/example_dags/example_databricks_workflow.py#L48-L101
Alternatively, you can download `example_databricks_workflow.py`
```shell
curl -O https://raw.githubusercontent.com/astronomer/astro-providers-databricks/main/example_dags/example_databricks_workflow.py
```
6. Run the example DAG:
```sh
airflow dags test example_databricks_workflow `date -Iseconds`
```
This will create a Databricks Workflow with two Notebook jobs.
## Available features
* `DatabricksWorkflowTaskGroup`: Airflow [task group](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#taskgroups) that allows users to create a [Databricks Workflow](https://www.databricks.com/product/workflows).
* `DatabricksNotebookOperator`: Airflow [operator](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html) which abstracts a pre-existing [Databricks Notebook](https://docs.databricks.com/notebooks/). Can be used independently to run the Notebook, or within a Databricks Workflow Task Group.
* `AstroDatabricksPlugin`: An Airflow [plugin](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html) which is installed by the default. It allows users, by using the UI, to view a Databricks job and retry running it in case of failure.
## Documentation
The documentation is a work in progress--we aim to follow the [Diátaxis](https://diataxis.fr/) system:
* [Reference Guide](https://astronomer.github.io/astro-providers-databricks/)
## Changelog
Astro Databricks follows [semantic versioning](https://semver.org/) for releases. Read [changelog](CHANGELOG.rst) to understand more about the changes introduced to each version.
## Contribution guidelines
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Read the [Contribution Guidelines](docs/contributing.rst) for a detailed overview on how to contribute.
Contributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).
## License
[Apache Licence 2.0](LICENSE)
Raw data
{
"_id": null,
"home_page": null,
"name": "astro-providers-databricks",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "airflow,apache-airflow,astronomer,dags",
"author": null,
"author_email": "Astronomer <humans@astronomer.io>",
"download_url": "https://files.pythonhosted.org/packages/3f/51/2c395d2713abb96fd7cdcc298ae6a129eff35834313c9375890ce5d0cb57/astro_providers_databricks-0.1.0a1.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n Astro Databricks\n</h1>\n <h3 align=\"center\">\n Affordable Databricks Workflows in Apache Airflow<br><br>\n</h3>\n\n[![Python versions](https://img.shields.io/pypi/pyversions/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)\n[![License](https://img.shields.io/pypi/l/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)\n[![Development Status](https://img.shields.io/pypi/status/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)\n[![PyPI downloads](https://img.shields.io/pypi/dm/astro-providers-databricks.svg)](https://pypistats.org/packages/astro-providers-databricks)\n[![Contributors](https://img.shields.io/github/contributors/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)\n[![Commit activity](https://img.shields.io/github/commit-activity/m/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)\n[![CI](https://github.com/astronomer/astro-providers-databricks/actions/workflows/ci.yml/badge.svg)](https://github.com/astronomer/astro-providers-databricks)\n[![codecov](https://codecov.io/gh/astronomer/astro-providers-databricks/branch/main/graph/badge.svg?token=MI4SSE50Q6)](https://codecov.io/gh/astronomer/astro-providers-databricks)\n\n\n**Astro Databricks** is an [Apache Airflow](https://github.com/apache/airflow) provider created by [Astronomer](https://www.astronomer.io/) for an **optimal Databricks experience**. With the `DatabricksTaskGroup`, Astro Datatricks allows you to run from Databricks workflows without\nthe need of running Jobs individually, which can result in [75% cost reduction](https://www.databricks.com/product/aws-pricing).\n\n## Prerequisites\n\n* Apache Airflow >= 2.2.4\n* Python >= 2.7\n* Databricks account\n* Previously created Databricks Notebooks\n\n## Install\n\n```shell\npip install astro-providers-databricks\n```\n\n## Quickstart\n\n1. Use pre-existing or create two simple [Databricks Notebooks](https://docs.databricks.com/notebooks/). Their identifiers will be used in step (5). The original example DAG uses: \n * `Shared/Notebook_1`\n * `Shared/Notebook_2`\n\n2. Generate a [Databricks Personal Token](https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens). This will be used in step (6). \n\n3. Ensure that your Airflow environment is set up correctly by running the following commands:\n\n ```shell\n export AIRFLOW_HOME=`pwd`\n \n airflow db init\n ```\n \n4. [Create using your preferred way](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) a Databricks Airflow connection (so Airflow can access Databricks using your credentials). This can be done by running the following command, replacing the login and password (with your access token):\n\n```shell\nairflow connections add 'databricks_conn' \\\n --conn-json '{\n \"conn_type\": \"databricks\",\n \"login\": \"some.email@yourcompany.com\",\n \"host\": \"https://dbc-c9390870-65ef.cloud.databricks.com/\",\n \"password\": \"personal-access-token\"\n }'\n```\n\n5. Copy the following workflow into a file named `example_databricks_workflow.py` and add it to the `dags` directory of your Airflow project:\n \n https://github.com/astronomer/astro-providers-databricks/blob/45897543a5e34d446c84b3fbc4f6f7a3ed16cdf7/example_dags/example_databricks_workflow.py#L48-L101\n\n Alternatively, you can download `example_databricks_workflow.py`\n ```shell\n curl -O https://raw.githubusercontent.com/astronomer/astro-providers-databricks/main/example_dags/example_databricks_workflow.py\n ```\n\n6. Run the example DAG:\n\n ```sh\n airflow dags test example_databricks_workflow `date -Iseconds`\n ```\n \nThis will create a Databricks Workflow with two Notebook jobs.\n\n## Available features\n\n* `DatabricksWorkflowTaskGroup`: Airflow [task group](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#taskgroups) that allows users to create a [Databricks Workflow](https://www.databricks.com/product/workflows).\n* `DatabricksNotebookOperator`: Airflow [operator](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html) which abstracts a pre-existing [Databricks Notebook](https://docs.databricks.com/notebooks/). Can be used independently to run the Notebook, or within a Databricks Workflow Task Group.\n* `AstroDatabricksPlugin`: An Airflow [plugin](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html) which is installed by the default. It allows users, by using the UI, to view a Databricks job and retry running it in case of failure.\n\n## Documentation\n\nThe documentation is a work in progress--we aim to follow the [Di\u00e1taxis](https://diataxis.fr/) system:\n\n* [Reference Guide](https://astronomer.github.io/astro-providers-databricks/)\n\n## Changelog\n\nAstro Databricks follows [semantic versioning](https://semver.org/) for releases. Read [changelog](CHANGELOG.rst) to understand more about the changes introduced to each version.\n\n## Contribution guidelines\n\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.\n\nRead the [Contribution Guidelines](docs/contributing.rst) for a detailed overview on how to contribute.\n\nContributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).\n\n## License\n\n[Apache Licence 2.0](LICENSE)",
"bugtrack_url": null,
"license": null,
"summary": "Affordable Databricks Workflows in Apache Airflow",
"version": "0.1.0a1",
"split_keywords": [
"airflow",
"apache-airflow",
"astronomer",
"dags"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "444592a4c8cab38ba1ca33c9520d51b15ec6ada1923fccc961f78e0632dfc3a0",
"md5": "97a78212ff264cf2d159e7fc340f7e96",
"sha256": "6b90bac75ce8f14230777b254683be6d0bc72d03f30871597330781486e36991"
},
"downloads": -1,
"filename": "astro_providers_databricks-0.1.0a1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "97a78212ff264cf2d159e7fc340f7e96",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 20852,
"upload_time": "2023-03-10T17:31:44",
"upload_time_iso_8601": "2023-03-10T17:31:44.776882Z",
"url": "https://files.pythonhosted.org/packages/44/45/92a4c8cab38ba1ca33c9520d51b15ec6ada1923fccc961f78e0632dfc3a0/astro_providers_databricks-0.1.0a1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3f512c395d2713abb96fd7cdcc298ae6a129eff35834313c9375890ce5d0cb57",
"md5": "d4324f504678a103285605c4f3fbba58",
"sha256": "8bdd61521df2746750f324e838faf9491bc277fa8ed27cf3a3976afd2ab696db"
},
"downloads": -1,
"filename": "astro_providers_databricks-0.1.0a1.tar.gz",
"has_sig": false,
"md5_digest": "d4324f504678a103285605c4f3fbba58",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 1735966,
"upload_time": "2023-03-10T17:31:48",
"upload_time_iso_8601": "2023-03-10T17:31:48.599595Z",
"url": "https://files.pythonhosted.org/packages/3f/51/2c395d2713abb96fd7cdcc298ae6a129eff35834313c9375890ce5d0cb57/astro_providers_databricks-0.1.0a1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-10 17:31:48",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "astro-providers-databricks"
}