databricks-aws-utils


Namedatabricks-aws-utils JSON
Version 1.5.1 PyPI version JSON
download
home_pagehttps://github.com/lucasvieirasilva/databricks-aws-utils
SummaryDatabricks AWS Utils
upload_time2024-04-12 09:29:24
maintainerLucas Vieira
docs_urlNone
authorLucas Vieira
requires_python<=3.11,>=3.8.1
licenseProprietary
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Databricks AWS Utils

Databricks AWS Utils is a library to abstract Databricks integration with AWS Services

## Features

- Convert Delta Table to be consumed by AWS Athena with Schema evolution
- Run queries against AWS RDS using AWS Secrets Manager to retrieve the connection properties and returns as Spark DataFrame

## Install

`pip install databricks-aws-utils`

## Delta Table to AWS Athena

### Motivation

Currently, delta tables are only compatible with AWS Athena engine v3, however, even with the compatibility, there are some limitations regarding the schema evolution, where the schema is not fully or correctly synchronized with the AWS Glue catalog, causing problems when querying the table.

To solve this problem, we created this library to convert the delta table columns to be compatible with the AWS Glue catalog and update the table metadata, allowing the table to be queried correctly by AWS Athena.

### Usage

```python
from databricks_aws_utils.delta_table import DeltaTableUtils

...

DeltaTableUtils(spark, 'my_schema.my_table_name').to_athena_v3()
```

The `to_athena_v3` function uses the spark session to capture the current delta schema and update the glue table.

**NOTE**: This feature is only compatible with AWS Athena engine v3, and the Databricks cluster must have access to the AWS Glue catalog.

**NOTE**: This feature is not supported by Databricks Unity Catalog, since it does not allow queries from AWS Athena.

#### Custom IAM Role

If you need to use a custom IAM Role to update the AWS Glue table, you can pass the role name as a parameter to the `DeltaTableUtils` class.

```python
from databricks_aws_utils.delta_table import DeltaTableUtils

...

DeltaTableUtils(
    spark,
    'my_schema.my_table_name',
    iam_role='my_custom_iam_role'
).to_athena_v3()
```

**NOTE**: The Databricks cluster must have permission to assume the custom IAM Role.

### Athena Engine v2

AWS Athena engine v2 doesn't support delta tables, so, to query a delta table using AWS Athena engine v2, it's necessary to generate Hive Symlink from the delta table and point to a different table.

```python
from databricks_aws_utils.delta_table import DeltaTableUtils

...

DeltaTableUtils(spark, 'my_schema.my_table_name').to_athena('my_schema', 'my_symlink_table_name')
```

**NOTE**: The schema name provided in the `to_athena` doesn't need to be the same as the delta table schema.

## Contributing

- See our [Contributing Guide](CONTRIBUTING.md)

## Change Log

- See our [Change Log](CHANGELOG.md)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lucasvieirasilva/databricks-aws-utils",
    "name": "databricks-aws-utils",
    "maintainer": "Lucas Vieira",
    "docs_url": null,
    "requires_python": "<=3.11,>=3.8.1",
    "maintainer_email": "lucas.vieira94@outlook.com",
    "keywords": null,
    "author": "Lucas Vieira",
    "author_email": "lucas.vieira94@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/ca/f4/23762932940095250ab4bafc94dd35cff670ac215365dd3a105df30ad6a3/databricks_aws_utils-1.5.1.tar.gz",
    "platform": null,
    "description": "# Databricks AWS Utils\n\nDatabricks AWS Utils is a library to abstract Databricks integration with AWS Services\n\n## Features\n\n- Convert Delta Table to be consumed by AWS Athena with Schema evolution\n- Run queries against AWS RDS using AWS Secrets Manager to retrieve the connection properties and returns as Spark DataFrame\n\n## Install\n\n`pip install databricks-aws-utils`\n\n## Delta Table to AWS Athena\n\n### Motivation\n\nCurrently, delta tables are only compatible with AWS Athena engine v3, however, even with the compatibility, there are some limitations regarding the schema evolution, where the schema is not fully or correctly synchronized with the AWS Glue catalog, causing problems when querying the table.\n\nTo solve this problem, we created this library to convert the delta table columns to be compatible with the AWS Glue catalog and update the table metadata, allowing the table to be queried correctly by AWS Athena.\n\n### Usage\n\n```python\nfrom databricks_aws_utils.delta_table import DeltaTableUtils\n\n...\n\nDeltaTableUtils(spark, 'my_schema.my_table_name').to_athena_v3()\n```\n\nThe `to_athena_v3` function uses the spark session to capture the current delta schema and update the glue table.\n\n**NOTE**: This feature is only compatible with AWS Athena engine v3, and the Databricks cluster must have access to the AWS Glue catalog.\n\n**NOTE**: This feature is not supported by Databricks Unity Catalog, since it does not allow queries from AWS Athena.\n\n#### Custom IAM Role\n\nIf you need to use a custom IAM Role to update the AWS Glue table, you can pass the role name as a parameter to the `DeltaTableUtils` class.\n\n```python\nfrom databricks_aws_utils.delta_table import DeltaTableUtils\n\n...\n\nDeltaTableUtils(\n    spark,\n    'my_schema.my_table_name',\n    iam_role='my_custom_iam_role'\n).to_athena_v3()\n```\n\n**NOTE**: The Databricks cluster must have permission to assume the custom IAM Role.\n\n### Athena Engine v2\n\nAWS Athena engine v2 doesn't support delta tables, so, to query a delta table using AWS Athena engine v2, it's necessary to generate Hive Symlink from the delta table and point to a different table.\n\n```python\nfrom databricks_aws_utils.delta_table import DeltaTableUtils\n\n...\n\nDeltaTableUtils(spark, 'my_schema.my_table_name').to_athena('my_schema', 'my_symlink_table_name')\n```\n\n**NOTE**: The schema name provided in the `to_athena` doesn't need to be the same as the delta table schema.\n\n## Contributing\n\n- See our [Contributing Guide](CONTRIBUTING.md)\n\n## Change Log\n\n- See our [Change Log](CHANGELOG.md)\n\n",
    "bugtrack_url": null,
    "license": "Proprietary",
    "summary": "Databricks AWS Utils",
    "version": "1.5.1",
    "project_urls": {
        "Homepage": "https://github.com/lucasvieirasilva/databricks-aws-utils",
        "Repository": "https://github.com/lucasvieirasilva/databricks-aws-utils"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd2f47579f1ddbdea4936493dce8cff1bb6a69936ce3f4094a95355d9dd467f8",
                "md5": "7d171fc355615a5b45fde9da1410325a",
                "sha256": "0ae188a49ec5101e2420a1d11d2c69c0eb6bfe02ec21b00131aea6c2496ff627"
            },
            "downloads": -1,
            "filename": "databricks_aws_utils-1.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7d171fc355615a5b45fde9da1410325a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.11,>=3.8.1",
            "size": 10314,
            "upload_time": "2024-04-12T09:29:20",
            "upload_time_iso_8601": "2024-04-12T09:29:20.203008Z",
            "url": "https://files.pythonhosted.org/packages/fd/2f/47579f1ddbdea4936493dce8cff1bb6a69936ce3f4094a95355d9dd467f8/databricks_aws_utils-1.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "caf423762932940095250ab4bafc94dd35cff670ac215365dd3a105df30ad6a3",
                "md5": "5d7d3798185d6fb2b6f0d3afd88fe5fb",
                "sha256": "b079db39c02dc65d615949f527df9430597b183d8ce1056dc9b12d28b36df9e3"
            },
            "downloads": -1,
            "filename": "databricks_aws_utils-1.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5d7d3798185d6fb2b6f0d3afd88fe5fb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.11,>=3.8.1",
            "size": 8681,
            "upload_time": "2024-04-12T09:29:24",
            "upload_time_iso_8601": "2024-04-12T09:29:24.675992Z",
            "url": "https://files.pythonhosted.org/packages/ca/f4/23762932940095250ab4bafc94dd35cff670ac215365dd3a105df30ad6a3/databricks_aws_utils-1.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-12 09:29:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lucasvieirasilva",
    "github_project": "databricks-aws-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "databricks-aws-utils"
}
        
Elapsed time: 0.43829s