prefect-databricks


Nameprefect-databricks JSON
Version 0.2.4 PyPI version JSON
download
home_pageNone
SummaryPrefect integrations with Databricks
upload_time2024-04-25 19:21:47
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseApache License 2.0
keywords prefect
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # prefect-databricks

Visit the full docs [here](https://PrefectHQ.github.io/prefect-databricks) to see additional examples and the API reference.
 
<p align="center">
    <a href="https://pypi.python.org/pypi/prefect-databricks/" alt="PyPI version">
        <img alt="PyPI" src="https://img.shields.io/pypi/v/prefect-databricks?color=0052FF&labelColor=090422"></a>
    <a href="https://pepy.tech/badge/prefect-databricks/" alt="Downloads">
        <img src="https://img.shields.io/pypi/dm/prefect-databricks?color=0052FF&labelColor=090422" /></a>
    <a href="https://github.com/PrefectHQ/prefect-databricks/pulse" alt="Activity">
</p>

## Welcome!

Prefect integrations for interacting with Databricks

The tasks within this collection were created by a code generator using the service's OpenAPI spec.

The service's REST API documentation can be found [here](https://docs.databricks.com/dev-tools/api/latest/index.html).

## Getting Started

### Python setup

Requires an installation of Python 3.7+.

We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.

These tasks are designed to work with Prefect 2. For more information about how to use Prefect, please refer to the [Prefect documentation](https://orion-docs.prefect.io/).

### Installation

Install `prefect-databricks` with `pip`:

```bash
pip install prefect-databricks
```

A list of available blocks in `prefect-databricks` and their setup instructions can be found [here](https://PrefectHQ.github.io/prefect-databricks/#blocks-catalog).

### Lists jobs on the Databricks instance

```python
from prefect import flow
from prefect_databricks import DatabricksCredentials
from prefect_databricks.jobs import jobs_list


@flow
def example_execute_endpoint_flow():
    databricks_credentials = DatabricksCredentials.load("my-block")
    jobs = jobs_list(
        databricks_credentials,
        limit=5
    )
    return jobs

example_execute_endpoint_flow()
```

### Use `with_options` to customize options on any existing task or flow

```python
custom_example_execute_endpoint_flow = example_execute_endpoint_flow.with_options(
    name="My custom flow name",
    retries=2,
    retry_delay_seconds=10,
)
```

### Launch a new cluster and run a Databricks notebook

Notebook named `example.ipynb` on Databricks which accepts a name parameter:

```python
name = dbutils.widgets.get("name")
message = f"Don't worry {name}, I got your request! Welcome to prefect-databricks!"
print(message)
```

Prefect flow that launches a new cluster to run `example.ipynb`:

```python
from prefect import flow
from prefect_databricks import DatabricksCredentials
from prefect_databricks.jobs import jobs_runs_submit
from prefect_databricks.models.jobs import (
    AutoScale,
    AwsAttributes,
    JobTaskSettings,
    NotebookTask,
    NewCluster,
)


@flow
def jobs_runs_submit_flow(notebook_path, **base_parameters):
    databricks_credentials = DatabricksCredentials.load("my-block")

    # specify new cluster settings
    aws_attributes = AwsAttributes(
        availability="SPOT",
        zone_id="us-west-2a",
        ebs_volume_type="GENERAL_PURPOSE_SSD",
        ebs_volume_count=3,
        ebs_volume_size=100,
    )
    auto_scale = AutoScale(min_workers=1, max_workers=2)
    new_cluster = NewCluster(
        aws_attributes=aws_attributes,
        autoscale=auto_scale,
        node_type_id="m4.large",
        spark_version="10.4.x-scala2.12",
        spark_conf={"spark.speculation": True},
    )

    # specify notebook to use and parameters to pass
    notebook_task = NotebookTask(
        notebook_path=notebook_path,
        base_parameters=base_parameters,
    )

    # compile job task settings
    job_task_settings = JobTaskSettings(
        new_cluster=new_cluster,
        notebook_task=notebook_task,
        task_key="prefect-task"
    )

    run = jobs_runs_submit(
        databricks_credentials=databricks_credentials,
        run_name="prefect-job",
        tasks=[job_task_settings]
    )

    return run


jobs_runs_submit_flow("/Users/username@gmail.com/example.ipynb", name="Marvin")
```

Note, instead of using the built-in models, you may also input valid JSON. For example, `AutoScale(min_workers=1, max_workers=2)` is equivalent to `{"min_workers": 1, "max_workers": 2}`.

For more tips on how to use tasks and flows in a Collection, check out [Using Collections](https://orion-docs.prefect.io/collections/usage/)!

## Resources

If you encounter any bugs while using `prefect-databricks`, feel free to open an issue in the [prefect-databricks](https://github.com/PrefectHQ/prefect-databricks) repository.

If you have any questions or issues while using `prefect-databricks`, you can find help in either the [Prefect Discourse forum](https://discourse.prefect.io/) or the [Prefect Slack community](https://prefect.io/slack).

Feel free to star or watch [`prefect-databricks`](https://github.com/PrefectHQ/prefect-databricks) for updates too!

## Contributing

If you'd like to help contribute to fix an issue or add a feature to `prefect-databricks`, please [propose changes through a pull request from a fork of the repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork).

Here are the steps:
1. [Fork the repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo#forking-a-repository)
2. [Clone the forked repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo#cloning-your-forked-repository)
3. Install the repository and its dependencies:
```
pip install -e ".[dev]"
```
4. Make desired changes
5. Add tests
6. Insert an entry to [CHANGELOG.md](https://github.com/PrefectHQ/prefect-databricks/blob/main/CHANGELOG.md)
7. Install `pre-commit` to perform quality checks prior to commit:
```
pre-commit install
```
8. `git commit`, `git push`, and create a pull request

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "prefect-databricks",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "prefect",
    "author": null,
    "author_email": "\"Prefect Technologies, Inc.\" <help@prefect.io>",
    "download_url": "https://files.pythonhosted.org/packages/f9/02/a5a00a198e3be4f60be14fa618d56e9f82da7f9290e27f3352f628c5f6cb/prefect_databricks-0.2.4.tar.gz",
    "platform": null,
    "description": "# prefect-databricks\n\nVisit the full docs [here](https://PrefectHQ.github.io/prefect-databricks) to see additional examples and the API reference.\n \n<p align=\"center\">\n    <a href=\"https://pypi.python.org/pypi/prefect-databricks/\" alt=\"PyPI version\">\n        <img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/prefect-databricks?color=0052FF&labelColor=090422\"></a>\n    <a href=\"https://pepy.tech/badge/prefect-databricks/\" alt=\"Downloads\">\n        <img src=\"https://img.shields.io/pypi/dm/prefect-databricks?color=0052FF&labelColor=090422\" /></a>\n    <a href=\"https://github.com/PrefectHQ/prefect-databricks/pulse\" alt=\"Activity\">\n</p>\n\n## Welcome!\n\nPrefect integrations for interacting with Databricks\n\nThe tasks within this collection were created by a code generator using the service's OpenAPI spec.\n\nThe service's REST API documentation can be found [here](https://docs.databricks.com/dev-tools/api/latest/index.html).\n\n## Getting Started\n\n### Python setup\n\nRequires an installation of Python 3.7+.\n\nWe recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.\n\nThese tasks are designed to work with Prefect 2. For more information about how to use Prefect, please refer to the [Prefect documentation](https://orion-docs.prefect.io/).\n\n### Installation\n\nInstall `prefect-databricks` with `pip`:\n\n```bash\npip install prefect-databricks\n```\n\nA list of available blocks in `prefect-databricks` and their setup instructions can be found [here](https://PrefectHQ.github.io/prefect-databricks/#blocks-catalog).\n\n### Lists jobs on the Databricks instance\n\n```python\nfrom prefect import flow\nfrom prefect_databricks import DatabricksCredentials\nfrom prefect_databricks.jobs import jobs_list\n\n\n@flow\ndef example_execute_endpoint_flow():\n    databricks_credentials = DatabricksCredentials.load(\"my-block\")\n    jobs = jobs_list(\n        databricks_credentials,\n        limit=5\n    )\n    return jobs\n\nexample_execute_endpoint_flow()\n```\n\n### Use `with_options` to customize options on any existing task or flow\n\n```python\ncustom_example_execute_endpoint_flow = example_execute_endpoint_flow.with_options(\n    name=\"My custom flow name\",\n    retries=2,\n    retry_delay_seconds=10,\n)\n```\n\n### Launch a new cluster and run a Databricks notebook\n\nNotebook named `example.ipynb` on Databricks which accepts a name parameter:\n\n```python\nname = dbutils.widgets.get(\"name\")\nmessage = f\"Don't worry {name}, I got your request! Welcome to prefect-databricks!\"\nprint(message)\n```\n\nPrefect flow that launches a new cluster to run `example.ipynb`:\n\n```python\nfrom prefect import flow\nfrom prefect_databricks import DatabricksCredentials\nfrom prefect_databricks.jobs import jobs_runs_submit\nfrom prefect_databricks.models.jobs import (\n    AutoScale,\n    AwsAttributes,\n    JobTaskSettings,\n    NotebookTask,\n    NewCluster,\n)\n\n\n@flow\ndef jobs_runs_submit_flow(notebook_path, **base_parameters):\n    databricks_credentials = DatabricksCredentials.load(\"my-block\")\n\n    # specify new cluster settings\n    aws_attributes = AwsAttributes(\n        availability=\"SPOT\",\n        zone_id=\"us-west-2a\",\n        ebs_volume_type=\"GENERAL_PURPOSE_SSD\",\n        ebs_volume_count=3,\n        ebs_volume_size=100,\n    )\n    auto_scale = AutoScale(min_workers=1, max_workers=2)\n    new_cluster = NewCluster(\n        aws_attributes=aws_attributes,\n        autoscale=auto_scale,\n        node_type_id=\"m4.large\",\n        spark_version=\"10.4.x-scala2.12\",\n        spark_conf={\"spark.speculation\": True},\n    )\n\n    # specify notebook to use and parameters to pass\n    notebook_task = NotebookTask(\n        notebook_path=notebook_path,\n        base_parameters=base_parameters,\n    )\n\n    # compile job task settings\n    job_task_settings = JobTaskSettings(\n        new_cluster=new_cluster,\n        notebook_task=notebook_task,\n        task_key=\"prefect-task\"\n    )\n\n    run = jobs_runs_submit(\n        databricks_credentials=databricks_credentials,\n        run_name=\"prefect-job\",\n        tasks=[job_task_settings]\n    )\n\n    return run\n\n\njobs_runs_submit_flow(\"/Users/username@gmail.com/example.ipynb\", name=\"Marvin\")\n```\n\nNote, instead of using the built-in models, you may also input valid JSON. For example, `AutoScale(min_workers=1, max_workers=2)` is equivalent to `{\"min_workers\": 1, \"max_workers\": 2}`.\n\nFor more tips on how to use tasks and flows in a Collection, check out [Using Collections](https://orion-docs.prefect.io/collections/usage/)!\n\n## Resources\n\nIf you encounter any bugs while using `prefect-databricks`, feel free to open an issue in the [prefect-databricks](https://github.com/PrefectHQ/prefect-databricks) repository.\n\nIf you have any questions or issues while using `prefect-databricks`, you can find help in either the [Prefect Discourse forum](https://discourse.prefect.io/) or the [Prefect Slack community](https://prefect.io/slack).\n\nFeel free to star or watch [`prefect-databricks`](https://github.com/PrefectHQ/prefect-databricks) for updates too!\n\n## Contributing\n\nIf you'd like to help contribute to fix an issue or add a feature to `prefect-databricks`, please [propose changes through a pull request from a fork of the repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork).\n\nHere are the steps:\n1. [Fork the repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo#forking-a-repository)\n2. [Clone the forked repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo#cloning-your-forked-repository)\n3. Install the repository and its dependencies:\n```\npip install -e \".[dev]\"\n```\n4. Make desired changes\n5. Add tests\n6. Insert an entry to [CHANGELOG.md](https://github.com/PrefectHQ/prefect-databricks/blob/main/CHANGELOG.md)\n7. Install `pre-commit` to perform quality checks prior to commit:\n```\npre-commit install\n```\n8. `git commit`, `git push`, and create a pull request\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Prefect integrations with Databricks",
    "version": "0.2.4",
    "project_urls": {
        "Homepage": "https://github.com/PrefectHQ/prefect/tree/main/src/integrations/prefect-databricks"
    },
    "split_keywords": [
        "prefect"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "53015ef7e549dc89445d6bf06a4128d45a26bb6439457bcb4c08c3b86dbdcdfd",
                "md5": "3d0406648aab2b0a405b9e8f2f37fad2",
                "sha256": "8f339473a2feec97632bd7e11ef47849a6630aebb3cc0f888e26f166314edd41"
            },
            "downloads": -1,
            "filename": "prefect_databricks-0.2.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d0406648aab2b0a405b9e8f2f37fad2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 127140,
            "upload_time": "2024-04-25T19:21:46",
            "upload_time_iso_8601": "2024-04-25T19:21:46.630451Z",
            "url": "https://files.pythonhosted.org/packages/53/01/5ef7e549dc89445d6bf06a4128d45a26bb6439457bcb4c08c3b86dbdcdfd/prefect_databricks-0.2.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f902a5a00a198e3be4f60be14fa618d56e9f82da7f9290e27f3352f628c5f6cb",
                "md5": "8730973b5846f86dcf93cb9d180040a0",
                "sha256": "9251dd42f55d29cdc9d4a2c63096909ec7a513b1c8a7fbaf85f20bb9bd3bf672"
            },
            "downloads": -1,
            "filename": "prefect_databricks-0.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "8730973b5846f86dcf93cb9d180040a0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 128275,
            "upload_time": "2024-04-25T19:21:47",
            "upload_time_iso_8601": "2024-04-25T19:21:47.903229Z",
            "url": "https://files.pythonhosted.org/packages/f9/02/a5a00a198e3be4f60be14fa618d56e9f82da7f9290e27f3352f628c5f6cb/prefect_databricks-0.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-25 19:21:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PrefectHQ",
    "github_project": "prefect",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "circle": true,
    "requirements": [],
    "lcname": "prefect-databricks"
}
        
Elapsed time: 0.23540s