astro-providers-databricks


Nameastro-providers-databricks JSON
Version 0.1.0a1 PyPI version JSON
download
home_pageNone
SummaryAffordable Databricks Workflows in Apache Airflow
upload_time2023-03-10 17:31:48
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords airflow apache-airflow astronomer dags
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
  Astro Databricks
</h1>
  <h3 align="center">
  Affordable Databricks Workflows in Apache Airflow<br><br>
</h3>

[![Python versions](https://img.shields.io/pypi/pyversions/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)
[![License](https://img.shields.io/pypi/l/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)
[![Development Status](https://img.shields.io/pypi/status/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)
[![PyPI downloads](https://img.shields.io/pypi/dm/astro-providers-databricks.svg)](https://pypistats.org/packages/astro-providers-databricks)
[![Contributors](https://img.shields.io/github/contributors/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)
[![Commit activity](https://img.shields.io/github/commit-activity/m/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)
[![CI](https://github.com/astronomer/astro-providers-databricks/actions/workflows/ci.yml/badge.svg)](https://github.com/astronomer/astro-providers-databricks)
[![codecov](https://codecov.io/gh/astronomer/astro-providers-databricks/branch/main/graph/badge.svg?token=MI4SSE50Q6)](https://codecov.io/gh/astronomer/astro-providers-databricks)


**Astro Databricks** is an [Apache Airflow](https://github.com/apache/airflow) provider created by [Astronomer](https://www.astronomer.io/) for an **optimal Databricks experience**.  With the `DatabricksTaskGroup`, Astro Datatricks allows you to run from Databricks workflows without
the need of running Jobs individually, which can result in [75% cost reduction](https://www.databricks.com/product/aws-pricing).

## Prerequisites

* Apache Airflow >= 2.2.4
* Python >= 2.7
* Databricks account
* Previously created Databricks Notebooks

## Install

```shell
pip install astro-providers-databricks
```

## Quickstart

1. Use pre-existing or create two simple [Databricks Notebooks](https://docs.databricks.com/notebooks/). Their identifiers will be used in step (5). The original example DAG uses: 
   * `Shared/Notebook_1`
   * `Shared/Notebook_2`

2. Generate a [Databricks Personal Token](https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens). This will be used in step (6). 

3. Ensure that your Airflow environment is set up correctly by running the following commands:

    ```shell
    export AIRFLOW_HOME=`pwd`
   
    airflow db init
    ```
   
4. [Create using your preferred way](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) a Databricks Airflow connection (so Airflow can access Databricks using your credentials). This can be done by running the following command, replacing the login and password (with your access token):

```shell
airflow connections add 'databricks_conn' \
    --conn-json '{
        "conn_type": "databricks",
        "login": "some.email@yourcompany.com",
        "host": "https://dbc-c9390870-65ef.cloud.databricks.com/",
        "password": "personal-access-token"
    }'
```

5. Copy the following workflow into a file named `example_databricks_workflow.py` and add it to the `dags` directory of your Airflow project:
   
   https://github.com/astronomer/astro-providers-databricks/blob/45897543a5e34d446c84b3fbc4f6f7a3ed16cdf7/example_dags/example_databricks_workflow.py#L48-L101

   Alternatively, you can download `example_databricks_workflow.py`
   ```shell
    curl -O https://raw.githubusercontent.com/astronomer/astro-providers-databricks/main/example_dags/example_databricks_workflow.py
   ```

6. Run the example DAG:

    ```sh
    airflow dags test example_databricks_workflow `date -Iseconds`
    ```
   
This will create a Databricks Workflow with two Notebook jobs.

## Available features

* `DatabricksWorkflowTaskGroup`: Airflow [task group](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#taskgroups) that allows users to create a [Databricks Workflow](https://www.databricks.com/product/workflows).
* `DatabricksNotebookOperator`: Airflow [operator](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html) which abstracts a pre-existing [Databricks Notebook](https://docs.databricks.com/notebooks/). Can be used independently to run the Notebook, or within a Databricks Workflow Task Group.
* `AstroDatabricksPlugin`: An Airflow [plugin](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html) which is installed by the default. It allows users, by using the UI, to view a Databricks job and retry running it in case of failure.

## Documentation

The documentation is a work in progress--we aim to follow the [Diátaxis](https://diataxis.fr/) system:

* [Reference Guide](https://astronomer.github.io/astro-providers-databricks/)

## Changelog

Astro Databricks follows [semantic versioning](https://semver.org/) for releases. Read [changelog](CHANGELOG.rst) to understand more about the changes introduced to each version.

## Contribution guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read the [Contribution Guidelines](docs/contributing.rst) for a detailed overview on how to contribute.

Contributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).

## License

[Apache Licence 2.0](LICENSE)
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "astro-providers-databricks",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "airflow,apache-airflow,astronomer,dags",
    "author": null,
    "author_email": "Astronomer <humans@astronomer.io>",
    "download_url": "https://files.pythonhosted.org/packages/3f/51/2c395d2713abb96fd7cdcc298ae6a129eff35834313c9375890ce5d0cb57/astro_providers_databricks-0.1.0a1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n  Astro Databricks\n</h1>\n  <h3 align=\"center\">\n  Affordable Databricks Workflows in Apache Airflow<br><br>\n</h3>\n\n[![Python versions](https://img.shields.io/pypi/pyversions/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)\n[![License](https://img.shields.io/pypi/l/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)\n[![Development Status](https://img.shields.io/pypi/status/astro-providers-databricks.svg)](https://pypi.org/pypi/astro-providers-databricks)\n[![PyPI downloads](https://img.shields.io/pypi/dm/astro-providers-databricks.svg)](https://pypistats.org/packages/astro-providers-databricks)\n[![Contributors](https://img.shields.io/github/contributors/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)\n[![Commit activity](https://img.shields.io/github/commit-activity/m/astronomer/astro-providers-databricks)](https://github.com/astronomer/astro-providers-databricks)\n[![CI](https://github.com/astronomer/astro-providers-databricks/actions/workflows/ci.yml/badge.svg)](https://github.com/astronomer/astro-providers-databricks)\n[![codecov](https://codecov.io/gh/astronomer/astro-providers-databricks/branch/main/graph/badge.svg?token=MI4SSE50Q6)](https://codecov.io/gh/astronomer/astro-providers-databricks)\n\n\n**Astro Databricks** is an [Apache Airflow](https://github.com/apache/airflow) provider created by [Astronomer](https://www.astronomer.io/) for an **optimal Databricks experience**.  With the `DatabricksTaskGroup`, Astro Datatricks allows you to run from Databricks workflows without\nthe need of running Jobs individually, which can result in [75% cost reduction](https://www.databricks.com/product/aws-pricing).\n\n## Prerequisites\n\n* Apache Airflow >= 2.2.4\n* Python >= 2.7\n* Databricks account\n* Previously created Databricks Notebooks\n\n## Install\n\n```shell\npip install astro-providers-databricks\n```\n\n## Quickstart\n\n1. Use pre-existing or create two simple [Databricks Notebooks](https://docs.databricks.com/notebooks/). Their identifiers will be used in step (5). The original example DAG uses: \n   * `Shared/Notebook_1`\n   * `Shared/Notebook_2`\n\n2. Generate a [Databricks Personal Token](https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens). This will be used in step (6). \n\n3. Ensure that your Airflow environment is set up correctly by running the following commands:\n\n    ```shell\n    export AIRFLOW_HOME=`pwd`\n   \n    airflow db init\n    ```\n   \n4. [Create using your preferred way](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) a Databricks Airflow connection (so Airflow can access Databricks using your credentials). This can be done by running the following command, replacing the login and password (with your access token):\n\n```shell\nairflow connections add 'databricks_conn' \\\n    --conn-json '{\n        \"conn_type\": \"databricks\",\n        \"login\": \"some.email@yourcompany.com\",\n        \"host\": \"https://dbc-c9390870-65ef.cloud.databricks.com/\",\n        \"password\": \"personal-access-token\"\n    }'\n```\n\n5. Copy the following workflow into a file named `example_databricks_workflow.py` and add it to the `dags` directory of your Airflow project:\n   \n   https://github.com/astronomer/astro-providers-databricks/blob/45897543a5e34d446c84b3fbc4f6f7a3ed16cdf7/example_dags/example_databricks_workflow.py#L48-L101\n\n   Alternatively, you can download `example_databricks_workflow.py`\n   ```shell\n    curl -O https://raw.githubusercontent.com/astronomer/astro-providers-databricks/main/example_dags/example_databricks_workflow.py\n   ```\n\n6. Run the example DAG:\n\n    ```sh\n    airflow dags test example_databricks_workflow `date -Iseconds`\n    ```\n   \nThis will create a Databricks Workflow with two Notebook jobs.\n\n## Available features\n\n* `DatabricksWorkflowTaskGroup`: Airflow [task group](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#taskgroups) that allows users to create a [Databricks Workflow](https://www.databricks.com/product/workflows).\n* `DatabricksNotebookOperator`: Airflow [operator](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html) which abstracts a pre-existing [Databricks Notebook](https://docs.databricks.com/notebooks/). Can be used independently to run the Notebook, or within a Databricks Workflow Task Group.\n* `AstroDatabricksPlugin`: An Airflow [plugin](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html) which is installed by the default. It allows users, by using the UI, to view a Databricks job and retry running it in case of failure.\n\n## Documentation\n\nThe documentation is a work in progress--we aim to follow the [Di\u00e1taxis](https://diataxis.fr/) system:\n\n* [Reference Guide](https://astronomer.github.io/astro-providers-databricks/)\n\n## Changelog\n\nAstro Databricks follows [semantic versioning](https://semver.org/) for releases. Read [changelog](CHANGELOG.rst) to understand more about the changes introduced to each version.\n\n## Contribution guidelines\n\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.\n\nRead the [Contribution Guidelines](docs/contributing.rst) for a detailed overview on how to contribute.\n\nContributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).\n\n## License\n\n[Apache Licence 2.0](LICENSE)",
    "bugtrack_url": null,
    "license": null,
    "summary": "Affordable Databricks Workflows in Apache Airflow",
    "version": "0.1.0a1",
    "split_keywords": [
        "airflow",
        "apache-airflow",
        "astronomer",
        "dags"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "444592a4c8cab38ba1ca33c9520d51b15ec6ada1923fccc961f78e0632dfc3a0",
                "md5": "97a78212ff264cf2d159e7fc340f7e96",
                "sha256": "6b90bac75ce8f14230777b254683be6d0bc72d03f30871597330781486e36991"
            },
            "downloads": -1,
            "filename": "astro_providers_databricks-0.1.0a1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "97a78212ff264cf2d159e7fc340f7e96",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 20852,
            "upload_time": "2023-03-10T17:31:44",
            "upload_time_iso_8601": "2023-03-10T17:31:44.776882Z",
            "url": "https://files.pythonhosted.org/packages/44/45/92a4c8cab38ba1ca33c9520d51b15ec6ada1923fccc961f78e0632dfc3a0/astro_providers_databricks-0.1.0a1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3f512c395d2713abb96fd7cdcc298ae6a129eff35834313c9375890ce5d0cb57",
                "md5": "d4324f504678a103285605c4f3fbba58",
                "sha256": "8bdd61521df2746750f324e838faf9491bc277fa8ed27cf3a3976afd2ab696db"
            },
            "downloads": -1,
            "filename": "astro_providers_databricks-0.1.0a1.tar.gz",
            "has_sig": false,
            "md5_digest": "d4324f504678a103285605c4f3fbba58",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 1735966,
            "upload_time": "2023-03-10T17:31:48",
            "upload_time_iso_8601": "2023-03-10T17:31:48.599595Z",
            "url": "https://files.pythonhosted.org/packages/3f/51/2c395d2713abb96fd7cdcc298ae6a129eff35834313c9375890ce5d0cb57/astro_providers_databricks-0.1.0a1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-10 17:31:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "astro-providers-databricks"
}
        
Elapsed time: 0.10543s