databricks-labs-remorph


Namedatabricks-labs-remorph JSON
Version 0.1.6 PyPI version JSON
download
home_pageNone
SummarySQL code converter and data reconcilation tool for accelerating data onboarding to Databricks from EDW, CDW and other ETL sources.
upload_time2024-04-04 12:26:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords databricks
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Databricks Labs Remorph
---
![Databricks Labs Remorph](docs/remorph-logo.svg)

[![lines of code](https://tokei.rs/b1/github/databrickslabs/remorph)]([https://codecov.io/github/databrickslabs/remorph](https://github.com/databrickslabs/remorph))

-----

# Table of Contents

1. [Introduction](#introduction)
   - [Remorph](#remorph)
   - [Transpile](#transpile)
2. [Environment Setup](#environment-setup)
3. [How to use Transpile](#how-to-use-transpile)
4. [Project Support](#project-support)

----
# Introduction

## Remorph
Remorph stands as a comprehensive toolkit meticulously crafted to facilitate seamless migrations to Databricks. 
This suite of tools is dedicated to simplifying and optimizing the entire migration process, offering two distinctive functionalities – Transpile and Reconcile. Whether you are navigating code translation or resolving potential conflicts, Remorph ensures a smooth journey for any migration project. With Remorph as your trusted ally, 
the migration experience becomes not only efficient but also well-managed, setting the stage for a successful transition to the Databricks platform.

## Transpile
Transpile is a self-contained SQL parser, transpiler, and validator designed to interpret a diverse range of SQL inputs and generate syntactically and semantically correct SQL in the Databricks SQL dialect. This tool serves as an automated solution, named Transpile, specifically crafted for migrating and translating SQL scripts from various sources to the Databricks SQL format. Currently, it exclusively supports Snowflake as a source platform, leveraging the open-source SQLglot.

Transpile stands out as a comprehensive and versatile SQL transpiler, boasting a robust test suite to ensure reliability. Developed entirely in Python, it not only demonstrates high performance but also highlights syntax errors and provides warnings or raises alerts for dialect incompatibilities based on configurations.

#### Design Flow:
```mermaid
flowchart TD
    A(Transpile CLI) --> |Directory| B[Transpile All Files In Directory];
    A --> |File| C[Transpile Single File] ;
    B --> D[List Files];
    C --> E("Sqlglot(transpile)");
    D --> E
    E --> |Parse Error| F(Failed Queries)
    E --> G{Skip Validations}
    G --> |Yes| H(Save Output)
    G --> |No| I{Validate}
    I --> |Success| H
    I --> |Fail| J(Flag, Capture)
    J --> H
```

----

# Environment Setup

1. `Databricks CLI` - Ensure that you have the Databricks Command-Line Interface (CLI) installed on your machine. Refer to the installation instructions provided for Linux, MacOS, and Windows, available [here](https://docs.databricks.com/en/dev-tools/cli/install.html#install-or-update-the-databricks-cli).

2. `Databricks Connect` - Set up the Databricks workspace configuration file by following the instructions provided [here](https://docs.databricks.com/en/dev-tools/auth/index.html#databricks-configuration-profiles). Note that Databricks labs use 'DEFAULT' as the default profile for establishing connections to Databricks.
   
3. `Python` - Verify that your machine has Python version 3.10 or later installed to meet the required dependencies for seamless operation.
   - `Windows` - Install python from [here](https://www.python.org/downloads/). Your Windows computer will need a shell environment ([GitBash](https://www.git-scm.com/downloads) or [WSL](https://learn.microsoft.com/en-us/windows/wsl/about))
   - `MacOS/Unix` - Use [brew](https://formulae.brew.sh/formula/python@3.10) to install python in macOS/Unix machines
#### Installing Databricks CLI on macOS
![macos-databricks-cli-install](docs/macos-databricks-cli-install.gif)

#### Install Databricks CLI via curl on Windows
![windows-databricks-cli-install](docs/windows-databricks-cli-install.gif)

#### Check Python version on Windows, macOS, and Unix

![check-python-version](docs/check-python-version.gif)

----

# How to Use Transpile

## Step 1 : Installation

Upon completing the environment setup, install Remorph by executing the following command:
```bash
databricks labs install remorph
```

Verify the successful installation by executing the provided command; confirmation of a successful installation is indicated when the displayed output aligns with the example screenshot provided:
```bash
 databricks labs remorph transpile --help
 ```
![transpile-help](docs/transpile-help.png)

## Step 2 : Set Up Prerequisite File
1. Transpile necessitates input in the form of either a directory containing SQL files or a single SQL file. 
2. The SQL file should encompass scripts intended for migration to Databricks SQL.

Below is the detailed explanation on the arguments required for Transpile.
- `input-sql [Required]` - The path to the SQL file or directory containing SQL files to be transpiled.
- `source [Required]` - The source platform of the SQL scripts. Currently, only Snowflake is supported.
- `output-folder [Optional]` - The path to the output folder where the transpiled SQL files will be stored. If not specified, the transpiled SQL files will be stored in the same directory as the input SQL file.
- `skip-validation [Optional]` - The default value is True. If set to False, the transpiler will validate the transpiled SQL scripts against the Databricks catalog and schema provided by user.
- `catalog-name [Optional]` - The name of the catalog in Databricks. If not specified, the default catalog `transpiler_test` will be used.
- `schema-name [Optional]` - The name of the schema in Databricks. If not specified, the default schema `convertor_test` will be used.

## Step 3 : Execution
Execute the below command to intialize the transpile process.
```bash
 databricks labs  remorph transpile --input-sql <absolute-path> --source <snowflake> --output-folder <absolute-path> --skip-validation <True|False> --catalog-name <catalog name> --schema-name <schema name>
```

----

# Project Support
Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs).  They are provided AS-IS and we do not make any guarantees of any kind.  Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo.  They will be reviewed as time permits, but there are no formal SLAs for support.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "databricks-labs-remorph",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Databricks",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/b1/1d/b050c1aa7827842fcbf2e0db2d5fd76602e22518f22762945d8c9202acde/databricks_labs_remorph-0.1.6.tar.gz",
    "platform": null,
    "description": "Databricks Labs Remorph\n---\n![Databricks Labs Remorph](docs/remorph-logo.svg)\n\n[![lines of code](https://tokei.rs/b1/github/databrickslabs/remorph)]([https://codecov.io/github/databrickslabs/remorph](https://github.com/databrickslabs/remorph))\n\n-----\n\n# Table of Contents\n\n1. [Introduction](#introduction)\n   - [Remorph](#remorph)\n   - [Transpile](#transpile)\n2. [Environment Setup](#environment-setup)\n3. [How to use Transpile](#how-to-use-transpile)\n4. [Project Support](#project-support)\n\n----\n# Introduction\n\n## Remorph\nRemorph stands as a comprehensive toolkit meticulously crafted to facilitate seamless migrations to Databricks. \nThis suite of tools is dedicated to simplifying and optimizing the entire migration process, offering two distinctive functionalities \u2013 Transpile and Reconcile. Whether you are navigating code translation or resolving potential conflicts, Remorph ensures a smooth journey for any migration project. With Remorph as your trusted ally, \nthe migration experience becomes not only efficient but also well-managed, setting the stage for a successful transition to the Databricks platform.\n\n## Transpile\nTranspile is a self-contained SQL parser, transpiler, and validator designed to interpret a diverse range of SQL inputs and generate syntactically and semantically correct SQL in the Databricks SQL dialect. This tool serves as an automated solution, named Transpile, specifically crafted for migrating and translating SQL scripts from various sources to the Databricks SQL format. Currently, it exclusively supports Snowflake as a source platform, leveraging the open-source SQLglot.\n\nTranspile stands out as a comprehensive and versatile SQL transpiler, boasting a robust test suite to ensure reliability. Developed entirely in Python, it not only demonstrates high performance but also highlights syntax errors and provides warnings or raises alerts for dialect incompatibilities based on configurations.\n\n#### Design Flow:\n```mermaid\nflowchart TD\n    A(Transpile CLI) --> |Directory| B[Transpile All Files In Directory];\n    A --> |File| C[Transpile Single File] ;\n    B --> D[List Files];\n    C --> E(\"Sqlglot(transpile)\");\n    D --> E\n    E --> |Parse Error| F(Failed Queries)\n    E --> G{Skip Validations}\n    G --> |Yes| H(Save Output)\n    G --> |No| I{Validate}\n    I --> |Success| H\n    I --> |Fail| J(Flag, Capture)\n    J --> H\n```\n\n----\n\n# Environment Setup\n\n1. `Databricks CLI` - Ensure that you have the Databricks Command-Line Interface (CLI) installed on your machine. Refer to the installation instructions provided for Linux, MacOS, and Windows, available [here](https://docs.databricks.com/en/dev-tools/cli/install.html#install-or-update-the-databricks-cli).\n\n2. `Databricks Connect` - Set up the Databricks workspace configuration file by following the instructions provided [here](https://docs.databricks.com/en/dev-tools/auth/index.html#databricks-configuration-profiles). Note that Databricks labs use 'DEFAULT' as the default profile for establishing connections to Databricks.\n   \n3. `Python` - Verify that your machine has Python version 3.10 or later installed to meet the required dependencies for seamless operation.\n   - `Windows` - Install python from [here](https://www.python.org/downloads/). Your Windows computer will need a shell environment ([GitBash](https://www.git-scm.com/downloads) or [WSL](https://learn.microsoft.com/en-us/windows/wsl/about))\n   - `MacOS/Unix` - Use [brew](https://formulae.brew.sh/formula/python@3.10) to install python in macOS/Unix machines\n#### Installing Databricks CLI on macOS\n![macos-databricks-cli-install](docs/macos-databricks-cli-install.gif)\n\n#### Install Databricks CLI via curl on Windows\n![windows-databricks-cli-install](docs/windows-databricks-cli-install.gif)\n\n#### Check Python version on Windows, macOS, and Unix\n\n![check-python-version](docs/check-python-version.gif)\n\n----\n\n# How to Use Transpile\n\n## Step 1 : Installation\n\nUpon completing the environment setup, install Remorph by executing the following command:\n```bash\ndatabricks labs install remorph\n```\n\nVerify the successful installation by executing the provided command; confirmation of a successful installation is indicated when the displayed output aligns with the example screenshot provided:\n```bash\n databricks labs remorph transpile --help\n ```\n![transpile-help](docs/transpile-help.png)\n\n## Step 2 : Set Up Prerequisite File\n1. Transpile necessitates input in the form of either a directory containing SQL files or a single SQL file. \n2. The SQL file should encompass scripts intended for migration to Databricks SQL.\n\nBelow is the detailed explanation on the arguments required for Transpile.\n- `input-sql [Required]` - The path to the SQL file or directory containing SQL files to be transpiled.\n- `source [Required]` - The source platform of the SQL scripts. Currently, only Snowflake is supported.\n- `output-folder [Optional]` - The path to the output folder where the transpiled SQL files will be stored. If not specified, the transpiled SQL files will be stored in the same directory as the input SQL file.\n- `skip-validation [Optional]` - The default value is True. If set to False, the transpiler will validate the transpiled SQL scripts against the Databricks catalog and schema provided by user.\n- `catalog-name [Optional]` - The name of the catalog in Databricks. If not specified, the default catalog `transpiler_test` will be used.\n- `schema-name [Optional]` - The name of the schema in Databricks. If not specified, the default schema `convertor_test` will be used.\n\n## Step 3 : Execution\nExecute the below command to intialize the transpile process.\n```bash\n databricks labs  remorph transpile --input-sql <absolute-path> --source <snowflake> --output-folder <absolute-path> --skip-validation <True|False> --catalog-name <catalog name> --schema-name <schema name>\n```\n\n----\n\n# Project Support\nPlease note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs).  They are provided AS-IS and we do not make any guarantees of any kind.  Please do not submit a support ticket relating to any issues arising from the use of these projects.\n\nAny issues discovered through the use of this project should be filed as GitHub Issues on the Repo.  They will be reviewed as time permits, but there are no formal SLAs for support.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "SQL code converter and data reconcilation tool for accelerating data onboarding to Databricks from EDW, CDW and other ETL sources.",
    "version": "0.1.6",
    "project_urls": {
        "Documentation": "https://github.com/databrickslabs/remorph",
        "Issues": "https://github.com/databrickslabs/remorph/issues",
        "Source": "https://github.com/databrickslabs/remorph"
    },
    "split_keywords": [
        "databricks"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f468e62d3364e9547de535732eb31df9a3cf8cab07f858ba36245d665f0f9b57",
                "md5": "bb576cdb9ee21068915512acf5eff98c",
                "sha256": "c600a3d44e47ea3677813d14306c91889452cf3cc7eb0d279c85e3ae1e9a31ff"
            },
            "downloads": -1,
            "filename": "databricks_labs_remorph-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bb576cdb9ee21068915512acf5eff98c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 53069,
            "upload_time": "2024-04-04T12:26:56",
            "upload_time_iso_8601": "2024-04-04T12:26:56.450499Z",
            "url": "https://files.pythonhosted.org/packages/f4/68/e62d3364e9547de535732eb31df9a3cf8cab07f858ba36245d665f0f9b57/databricks_labs_remorph-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b11db050c1aa7827842fcbf2e0db2d5fd76602e22518f22762945d8c9202acde",
                "md5": "747c8dddfc20c0012608e3786e56dd3c",
                "sha256": "565b0cd7dca26a1133b645884df47d61293d39038291e3579b6e11101c249d07"
            },
            "downloads": -1,
            "filename": "databricks_labs_remorph-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "747c8dddfc20c0012608e3786e56dd3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 46829,
            "upload_time": "2024-04-04T12:26:58",
            "upload_time_iso_8601": "2024-04-04T12:26:58.224134Z",
            "url": "https://files.pythonhosted.org/packages/b1/1d/b050c1aa7827842fcbf2e0db2d5fd76602e22518f22762945d8c9202acde/databricks_labs_remorph-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-04 12:26:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "databrickslabs",
    "github_project": "remorph",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "databricks-labs-remorph"
}
        
Elapsed time: 0.21938s