databricks-labs-ucx


Namedatabricks-labs-ucx JSON
Version 0.22.0 PyPI version JSON
download
home_pageNone
SummaryUCX - Unity Catalog Migration Toolkit
upload_time2024-04-26 15:20:54
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords databricks unity catalog
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Databricks Labs UCX
===
![UCX by Databricks Labs](docs/logo-no-background.png)

The companion for upgrading to Unity Catalog. After [installation](#install-ucx), ensure to [trigger](#ensure-assessment-run-command) the [assessment workflow](#assessment-workflow), 
so that you'll be able to [scope the migration](docs/assessment.md) and execute the [group migration workflow](#group-migration-workflow). 
[`<installation_path>/README`](#readme-notebook) contains further instructions and explanations of these workflows. 
Then you can execute [table migration workflow](#table-migration-workflow).
More workflows, like notebook code migration is coming in the future releases. 
UCX exposes a number of command line utilities accessible via `databricks labs ucx`.

For questions, troubleshooting or bug fixes, please see our [troubleshooting guide](docs/troubleshooting.md) or submit [an issue](https://github.com/databrickslabs/ucx/issues). 
See [contributing instructions](CONTRIBUTING.md) to help improve this project.

[![build](https://github.com/databrickslabs/ucx/actions/workflows/push.yml/badge.svg)](https://github.com/databrickslabs/ucx/actions/workflows/push.yml) [![codecov](https://codecov.io/github/databrickslabs/ucx/graph/badge.svg?token=p0WKAfW5HQ)](https://codecov.io/github/databrickslabs/ucx)  [![lines of code](https://tokei.rs/b1/github/databrickslabs/ucx)]([https://codecov.io/github/databrickslabs/ucx](https://github.com/databrickslabs/ucx))

<!-- TOC -->
* [Databricks Labs UCX](#databricks-labs-ucx)
* [Installation](#installation)
  * [Authenticate Databricks CLI](#authenticate-databricks-cli)
  * [Install UCX](#install-ucx)
  * [[ADVANCED] Force install over existing UCX](#advanced-force-install-over-existing-ucx)
  * [[ADVANCED] Installing UCX on all workspaces within a Databricks account](#advanced-installing-ucx-on-all-workspaces-within-a-databricks-account)
  * [Upgrading UCX for newer versions](#upgrading-ucx-for-newer-versions)
  * [Uninstall UCX](#uninstall-ucx)
* [Migration process](#migration-process)
* [Workflows](#workflows)
  * [Readme notebook](#readme-notebook)
  * [Assessment workflow](#assessment-workflow)
  * [Group migration workflow](#group-migration-workflow)
  * [Debug notebook](#debug-notebook)
  * [Debug logs](#debug-logs)
  * [Table Migration Workflow](#table-migration-workflow)
    * [Dependency CLI commands](#dependency-cli-commands)
    * [Table Migration Workflow Tasks](#table-migration-workflow-tasks)
    * [Other considerations](#other-considerations)
* [Utility commands](#utility-commands)
  * [`logs` command](#logs-command)
  * [`ensure-assessment-run` command](#ensure-assessment-run-command)
  * [`repair-run` command](#repair-run-command)
  * [`workflows` command](#workflows-command)
  * [`open-remote-config` command](#open-remote-config-command)
  * [`installations` command](#installations-command)
* [Metastore related commands](#metastore-related-commands)
  * [`show-all-metastores` command](#show-all-metastores-command)
  * [`assign-metastore` command](#assign-metastore-command)
* [Table migration commands](#table-migration-commands)
  * [`principal-prefix-access` command](#principal-prefix-access-command)
    * [Access for AWS S3 Buckets](#access-for-aws-s3-buckets)
    * [Access for Azure Storage Accounts](#access-for-azure-storage-accounts)
  * [`create-uber-principal` command](#create-uber-principal-command)
  * [`migrate-credentials` command](#migrate-credentials-command)
  * [`validate-external-locations` command](#validate-external-locations-command)
  * [`migrate-locations` command](#migrate-locations-command)
  * [`create-table-mapping` command](#create-table-mapping-command)
  * [`skip` command](#skip-command)
  * [`revert-migrated-tables` command](#revert-migrated-tables-command)
  * [`create-catalogs-schemas` command](#create-catalogs-schemas-command)
  * [`move` command](#move-command)
  * [`alias` command](#alias-command)
* [Code migration commands](#code-migration-commands)
  * [`migrate-local-code` command](#migrate-local-code-command)
* [Cross-workspace installations](#cross-workspace-installations)
  * [`sync-workspace-info` command](#sync-workspace-info-command)
  * [`manual-workspace-info` command](#manual-workspace-info-command)
  * [`create-account-groups` command](#create-account-groups-command)
  * [`validate-groups-membership` command](#validate-groups-membership-command)
  * [`cluster-remap` command](#cluster-remap-command)
  * [`revert-cluster-remap` command](#revert-cluster-remap-command)
* [Star History](#star-history)
* [Project Support](#project-support)
<!-- TOC -->

# Installation

- Databricks CLI v0.213 or later. See [instructions](#authenticate-databricks-cli). 
- Python 3.10 or later. See [Windows](https://www.python.org/downloads/windows/) instructions.
- Network access to your Databricks Workspace used for the [installation process](#install-ucx).
- Network access to the Internet for [pypi.org](https://pypi.org) and [github.com](https://github.com) from machine running the installation.
- Databricks Workspace Administrator privileges for the user, that runs the installation. Running UCX as a Service Principal is not supported.
- Account level Identity Setup. See instructions for [AWS](https://docs.databricks.com/en/administration-guide/users-groups/best-practices.html), [Azure](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/best-practices), and [GCP](https://docs.gcp.databricks.com/administration-guide/users-groups/best-practices.html).
- Unity Catalog Metastore Created (per region). See instructions for [AWS](https://docs.databricks.com/en/data-governance/unity-catalog/create-metastore.html), [Azure](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-metastore), and [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/create-metastore.html).
- If your Databricks Workspace relies on an external Hive Metastore (such as AWS Glue), make sure to read [this guide](docs/external_hms_glue.md).
- Databricks Workspace has to have network access to [pypi.org](https://pypi.org) to download `databricks-sdk` and `pyyaml` packages.
- A PRO or Serverless SQL Warehouse to render the [report](docs/assessment.md) for the [assessment workflow](#assessment-workflow).

Once you [install UCX](#install-ucx), you can proceed to the [assessment workflow](#assessment-workflow) to ensure 
the compatibility of your workspace with Unity Catalog.

[[back to top](#databricks-labs-ucx)]

## Authenticate Databricks CLI

We only support installations and upgrades through [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html), as UCX requires an installation script run 
to make sure all the necessary and correct configurations are in place. Install Databricks CLI on macOS:
![macos_install_databricks](docs/macos_1_databrickslabsmac_installdatabricks.gif)

Install Databricks CLI on Windows:
![windows_install_databricks.png](docs/windows_install_databricks.png)

Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:

```commandline
databricks auth login --host WORKSPACE_HOST
```

To enable debug logs, simply add `--debug` flag to any command.

[[back to top](#databricks-labs-ucx)]

## Install UCX

Install UCX via Databricks CLI:

```commandline
databricks labs install ucx
```

You'll be prompted to select a [configuration profile](https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication) created by `databricks auth login` command.

Once you install, proceed to the [assessment workflow](#assessment-workflow) to ensure the compatibility of your workspace with UCX.

The `WorkspaceInstaller` class is used to create a new configuration for Unity Catalog migration in a Databricks workspace. 
It guides the user through a series of prompts to gather necessary information, such as selecting an inventory database, choosing 
a PRO or SERVERLESS SQL warehouse, specifying a log level and number of threads, and setting up an external Hive Metastore if necessary. 
Upon the first installation, you're prompted for a workspace local [group migration strategy](docs/group_name_conflict.md). 
Based on user input, the class creates a new cluster policy with the specified configuration. The user can review and confirm the configuration, 
which is saved to the workspace and can be opened in a web browser.

The [`WorkspaceInstallation`](src/databricks/labs/ucx/install.py) manages the installation and uninstallation of UCX in a workspace. It handles 
the configuration and exception management during the installation process. The installation process creates dashboards, databases, and jobs. 
It also includes the creation of a database with given configuration and the deployment of workflows with specific settings. The installation 
process can handle exceptions and infer errors from job runs and task runs. The workspace installation uploads wheels, creates cluster policies, 
and wheel runners to the workspace. It can also handle the creation of job tasks for a given task, such as job dashboard tasks, job notebook tasks, 
and job wheel tasks. The class handles the installation of UCX, including configuring the workspace, installing necessary libraries, and verifying 
the installation, making it easier for users to migrate their workspaces to UCX.

After this, UCX will be installed locally and a number of assets will be deployed in the selected workspace. 
These assets are available under the installation folder, i.e. `/Users/<your user>/.ucx/`.

You can also install a specific version by specifying it like `@v0.13.2` - `databricks labs install ucx@v0.13.2`.

![macos_install_ucx](docs/macos_2_databrickslabsmac_installucx.gif)

[[back to top](#databricks-labs-ucx)]

## [ADVANCED] Force install over existing UCX
Using an environment variable `UCX_FORCE_INSTALL` you can force the installation of UCX over an existing installation.
The values for the environment variable are 'global' and 'user'.

Global Install: When UCX is installed at '/Applications/ucx'
User Install: When UCX is installed at '/Users/<user>/.ucx'

If there is an existing global installation of UCX, you can force a user installation of UCX over the existing installation by setting the environment variable `UCX_FORCE_INSTALL` to 'global'.

At this moment there is no global override over a user installation of UCX. As this requires migration and can break existing installations.


| global | user | expected install location | install_folder      | mode                                            |
|--------|------|---------------------------|---------------------|-------------------------------------------------|
| no     | no   | default                   | `/Applications/ucx` | install                                         |
| yes    | no   | default                   | `/Applications/ucx` | upgrade                                         |
| no     | yes  | default                   | `/Users/X/.ucx`     | upgrade (existing installations must not break) |
| yes    | yes  | default                   | `/Users/X/.ucx`     | upgrade                                         |
| yes    | no   | **USER**                  | `/Users/X/.ucx`     | install (show prompt)                           |
| no     | yes  | **GLOBAL**                | ...                 | migrate                                         |


* `UCX_FORCE_INSTALL=user databricks labs install ucx` - will force the installation to be for user only
* `UCX_FORCE_INSTALL=global databricks labs install ucx` - will force the installation to be for root only


[[back to top](#databricks-labs-ucx)]

## [ADVANCED] Installing UCX on all workspaces within a Databricks account
Setting the environment variable `UCX_FORCE_INSTALL` to 'account' will install UCX on all workspaces within a Databricks account.

* `UCX_FORCE_INSTALL=account databricks labs install ucx`

After the first installation, UCX will prompt the user to confirm whether to install UCX on the remaining workspaces with the same answers. If confirmed, the remaining installations will be completed silently.

This installation mode will automatically select the following options:
* Automatically create and enable HMS lineage init script
* Automatically create a new SQL warehouse for UCX assessment

[[back to top](#databricks-labs-ucx)]

## Upgrading UCX for newer versions

Verify that UCX is installed

```commandline
databricks labs installed

Name  Description                            Version
ucx   Unity Catalog Migration Toolkit (UCX)  <version>
```

Upgrade UCX via Databricks CLI:

```commandline
databricks labs upgrade ucx
```

The prompts will be similar to [Installation](#install-ucx)

![macos_upgrade_ucx](docs/macos_3_databrickslabsmac_upgradeucx.gif)

[[back to top](#databricks-labs-ucx)]

## Uninstall UCX

Uninstall UCX via Databricks CLI:

```commandline
databricks labs uninstall ucx
```

Databricks CLI will confirm a few options:
- Whether you want to remove all ucx artefacts from the workspace as well. Defaults to no.
- Whether you want to delete the inventory database in `hive_metastore`. Defaults to no.

![macos_uninstall_ucx](docs/macos_4_databrickslabsmac_uninstallucx.gif)

[[back to top](#databricks-labs-ucx)]

# Migration process

On the high level, the steps in migration process start with the [assessment workflow](#assessment-workflow), 
followed by [group migration](#group-migration-workflow), [table migration workflow](#table-migration-workflow), 
finalised with the [code migration](#code-migration-commands). It can be described as:

```mermaid
flowchart TD
    subgraph workspace-admin
        assessment --> group-migration
        group-migration --> table-migration
        table-migration --> code-migration
        assessment --> create-table-mapping
        create-table-mapping --> table-migration
        create-table-mapping --> code-migration
        validate-external-locations --> table-migration
        table-migration --> revert-migrated-tables
        revert-migrated-tables --> table-migration
    end
    subgraph account-admin
        create-account-groups --> group-migration
        sync-workspace-info --> create-table-mapping
        group-migration --> validate-groups-membership
    end
    subgraph iam-admin
        setup-account-scim --> create-account-groups
        assessment --> create-uber-principal
        create-uber-principal --> table-migration
        assessment --> principal-prefix-access
        principal-prefix-access --> migrate-credentials
        migrate-credentials --> validate-external-locations
        setup-account-scim    
    end
```

[[back to top](#databricks-labs-ucx)]

# Workflows

Part of this application is deployed as [Databricks Workflows](https://docs.databricks.com/en/workflows/index.html). 
You can view the status of deployed workflows via the [`workflows` command](#workflows-command).
Failed workflows can be fixed with the [`repair-run` command](#repair-run-command).

[[back to top](#databricks-labs-ucx)]

## Readme notebook

![readme](docs/readme-notebook.png)

Every installation creates a `README` notebook with a detailed description of all deployed workflows and their tasks,
providing quick links to the relevant workflows and dashboards.

[[back to top](#databricks-labs-ucx)]

## Assessment workflow

The assessment workflow can be triggered using the Databricks UI, or via the [command line](#ensure-assessment-run-command). 

```commandline
databricks labs ucx ensure-assessment-run
```

![ucx_assessment_workflow](docs/ucx_assessment_workflow.png)

Once you finish the assessment, proceed to the [group migration workflow](#group-migration-workflow). 
See the [migration process diagram](#migration-process) to understand the role of the assessment workflow in the migration process.

The assessment workflow is designed to assess the compatibility of various entities in the current workspace with Unity Catalog. 
It identifies incompatible entities and provides information necessary for planning the migration to UC. The tasks in 
the assessment workflow can be executed in parallel or sequentially, depending on the dependencies specified in the `@task` decorators.
The output of each task is stored in Delta tables in the `$inventory_database` schema, that you specify during [installation](#install-ucx), 
which can be used for further analysis and decision-making through the [assessment report](docs/assessment.md). 
The assessment workflow can be executed multiple times to ensure that all incompatible entities are identified and accounted 
for before starting the migration process.

1. `crawl_tables`: This task scans all tables in the Hive Metastore of the current workspace and persists their metadata in a Delta table named `$inventory_database.tables`. This metadata includes information such as the database name, table name, table type, and table location. This task is used for assessing which tables cannot be easily migrated to Unity Catalog.
2. `crawl_grants`: This task scans the Delta table named `$inventory_database.tables` and issues a `SHOW GRANTS` statement for every object to retrieve the permissions assigned to it. The permissions include information such as the principal, action type, and the table it applies to. This task persists the permissions in the Delta table `$inventory_database.grants`.
3. `estimate_table_size_for_migration`: This task scans the Delta table named `$inventory_database.tables` and locates tables that cannot be synced. These tables will have to be cloned in the migration process. The task assesses the size of these tables and creates the `$inventory_database.table_size` table to list these sizes. The table size is a factor in deciding whether to clone these tables.
4. `crawl_mounts`: This task scans the workspace to compile a list of all existing mount points and stores this information in the `$inventory.mounts` table. This is crucial for planning the migration.
5. `guess_external_locations`: This task determines the shared path prefixes of all the tables that utilize mount points. The goal is to identify the external locations necessary for a successful migration and store this information in the `$inventory.external_locations` table.
6. `assess_jobs`: This task scans through all the jobs and identifies those that are not compatible with UC. The list of all the jobs is stored in the `$inventory.jobs` table.
7. `assess_clusters`: This task scans through all the clusters and identifies those that are not compatible with UC. The list of all the clusters is stored in the `$inventory.clusters` table.
8. `assess_pipelines`: This task scans through all the Pipelines and identifies those pipelines that have Azure Service Principals embedded in their configurations. A list of all the pipelines with matching configurations is stored in the `$inventory.pipelines` table.
9. `assess_azure_service_principals`: This task scans through all the clusters configurations, cluster policies, job cluster configurations, Pipeline configurations, and Warehouse configuration and identifies all the Azure Service Principals who have been given access to the Azure storage accounts via spark configurations referred in those entities. The list of all the Azure Service Principals referred in those configurations is saved in the `$inventory.azure_service_principals` table.
10. `assess_global_init_scripts`: This task scans through all the global init scripts and identifies if there is an Azure Service Principal who has been given access to the Azure storage accounts via spark configurations referred in those scripts.

![report](docs/assessment-report.png)

After UCX assessment workflow is executed, the assessment dashboard will be populated with findings and common recommendations. See [this guide](docs/assessment.md) for more details.

[[back to top](#databricks-labs-ucx)]

## Group migration workflow

You are required to complete the [assessment workflow](#assessment-workflow) before starting the group migration workflow.
See the [migration process diagram](#migration-process) to understand the role of the group migration workflow in the migration process.

See the [detailed design](docs/local-group-migration.md) of this workflow. It helps you to upgrade all Databricks workspace assets:
Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, 
Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries,
SQL Alerts, Token and Password usage permissions that are set on the workspace level, Secret scopes, Notebooks,
Directories, Repos, and Files. 

Once done with the group migration, proceed to [table migration workflow](#table-migration-workflow).

Use [`validate-groups-membership` command](#validate-groups-membership-command) for extra confidence.
If you don't have matching account groups, please run [`create-account-groups` command](#create-account-groups-command).

The group migration workflow is designed to migrate workspace-local groups to account-level groups in the Unity Catalog (UC) environment. It ensures that all the necessary groups are available in the workspace with the correct permissions, and removes any unnecessary groups and permissions. The tasks in the group migration workflow depend on the output of the assessment workflow and can be executed in sequence to ensure a successful migration. The output of each task is stored in Delta tables in the `$inventory_database` schema, which can be used for further analysis and decision-making. The group migration workflow can be executed multiple times to ensure that all the groups are migrated successfully and that all the necessary permissions are assigned.

1. `crawl_groups`: This task scans all groups for the local group migration scope.
2. `rename_workspace_local_groups`: This task renames workspace local groups by adding a `ucx-renamed-` prefix. This step is taken to avoid conflicts with account-level groups that may have the same name as workspace-local groups.
3. `reflect_account_groups_on_workspace`: This task adds matching account groups to this workspace. The matching account level group(s) must preexist(s) for this step to be successful. This step is necessary to ensure that the account-level groups are available in the workspace for assigning permissions.
4. `apply_permissions_to_account_groups`: This task assigns the full set of permissions of the original group to the account-level one. It covers local workspace-local permissions for all entities, including Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries, SQL Alerts, Token and Password usage permissions, Secret Scopes, Notebooks, Directories, Repos, Files. This step is necessary to ensure that the account-level groups have the necessary permissions to manage the entities in the workspace.
5. `validate_groups_permissions`: This task validates that all the crawled permissions are applied correctly to the destination groups.
6. `delete_backup_groups`: This task removes all workspace-level backup groups, along with their permissions. This should only be executed after confirming that the workspace-local migration worked successfully for all the groups involved. This step is necessary to clean up the workspace and remove any unnecessary groups and permissions.

[[back to top](#databricks-labs-ucx)]

## Debug notebook

![debug](docs/debug-notebook.png)

Every [installation](#installation) creates a debug notebook, that initializes UCX as a library,
so that you can implement missing features and 

[[back to top](#databricks-labs-ucx)]

## Debug logs

![debug](docs/debug-logs.png)

Every workflow run stores debug logs in the `logs` folder of the [installation](#installation). 
For tasks shorter than 10 minutes, they appear after task finish, whereas longer-running tasks 
flush the logs every 10 minutes.

To enable debug logs of [command-line interface](#authenticate-databricks-cli), 
simply add `--debug` flag to any command.

[[back to top](#databricks-labs-ucx)]

## Table Migration Workflow

The table migration workflow comprises multiple workflows and tasks designed to migrate tables from the Hive Metastore to the Unity Catalog. The subsequent diagram illustrates the workflow's tasks and their dependency CLI commands. 
```mermaid
flowchart TB
    subgraph CLI
      sync-workspace-info[sync-workspace-info] --> create_table_mapping[create-table-mapping]
      create_table_mapping[create-table-mapping] --> create_catalogs_schemas[create-catalogs-schemas]
      create_uber_principal[create-uber-principal]
      principal_prefix_access[principal-prefix-access] --> migrate_credentials[migrate-credentials]
      migrate_credentials --> migrate_locations[migrate-locations]
      migrate_locations --> create_catalogs_schemas
    end
    
    create_uber_principal --> workflow
    create_table_mapping --> workflow
    create_catalogs_schemas --> workflow
    
    subgraph workflow[Table Migration Workflows]
      subgraph mt_workflow[workflow: migrate-tables]
        dbfs_root_delta_mt_task[migrate_dbfs_root_delta_tables]
        external_tables_sync_mt_task[migrate_external_tables_sync]
        view_mt_task[roadmap: migrate_views]
        dbfs_root_delta_mt_task --> view_mt_task
        external_tables_sync_mt_task --> view_mt_task
      end
      
      subgraph mt_ctas_wf[roadmap workflow: migrate-tables-ctas]
        ctas_mt_task[migrate_tables_ctas] --> view_mt_task_ctas[roadmap: migrate_views]
      end
  
      subgraph mt_serde_inplace_wf[roadmap workflow: migrate-external-hiveserde-tables-in-place-experimental]
        serde_inplace_mt_task[migrate_external_hiveserde_tables_in_place_experimental] --> view_mt_task_inplace[roadmap: migrate_views]
      end
  
      subgraph mt_in_mounts_wf[roadmap workflow: migrate-tables-in-mounts-experimental]
        scan_tables_in_mounts_experimental_task[scan_tables_in_mounts_experimental] --> 
        migrate_tables_in_mounts_experimental[migrate_tables_in_mounts_experimental]
      end
    end
    
    classDef roadmap stroke:Green,stroke-width:2px,color:Green,stroke-dasharray: 8 3
    class view_mt_task roadmap;
    class mt_ctas_wf,ctas_mt_task,view_mt_task_ctas roadmap;
    class mt_serde_inplace_wf,serde_inplace_mt_task,view_mt_task_inplace roadmap;
    class mt_in_mounts_wf,scan_tables_in_mounts_experimental_task,migrate_tables_in_mounts_experimental roadmap;
```
More details can be found in the [design of table migration](docs/table_upgrade.md)

### Dependency CLI commands
- [`create-table-mapping`](#create-table-mapping-command) - Create `mapping.csv` which will be used by the workflow to identify the targets catalog, schema, table of the HMS table to be migrated. User should review and update the mapping file accordingly before proceeding with the migration workflow.
- [`principal-prefix-access`](#principal-prefix-access-command) - Identify all the storages used in the workspace and corresponding Azure Service Principal or IAM role that are used in the workspace. It outputs `azure_storage_account_info.csv` or `uc_roles_access.csv` which will be later used by [`migrate-credentials`](#migrate-credentials-command) command to create UC storage credentials. The csv can be edited to control which IAM roles or Azure Service Principals should be used to create UC storage credentials later.
- [`migrate-credentials`](#migrate-credentials-command) - Create UC storage credentials based on the Azure Service Principal or IAM role (`azure_storage_account_info.csv` or `uc_roles_access.csv`) identified by [`principal-prefix-access`](#principal-prefix-access-command) command.
- [`migrate-locations`](#migrate-locations-command) - Create missing external locations in the Unity Catalog. 
- [`create-catalogs-schemas`](#create-catalogs-schemas-command) - Create missing catalogs and schemas in the Unity Catalog. The candidate catalogs and schemas is based on `mapping.csv`
- [`create-uber-principal`](#create-uber-principal-command) - Create an Uber Principal with access to all storages used in the workspace. This principal will be used by the workflow job cluster to migrate all the tables in the workspace.

[`create-table-mapping`](#create-table-mapping-command) and [`create-uber-principal`](#create-uber-principal-command) are required to run the workflow. While other commands are optional, as long as the UC storage credentials, external locations, catalogs and schemas needed for successful migration are created.

To control the scope of the table migration, consider utilizing a combination of editing `mapping.csv`, employing [`skip`](#skip-command) command, and [`revert-migrated-tables`](#revert-migrated-tables-command) command.

See more details in [Table migration commands](#table-migration-commands)

### Table Migration Workflow Tasks
- `migrate_dbfs_root_delta_tables` - Migrate delta tables from the DBFS root using deep clone, along with legacy table ACL migrated if any.
- `migrate_external_tables_sync` - Migrate external tables using [`SYNC`](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-sync.html) command, along with legacy table ACL migrated if any.
- Following workflows/tasks are on the roadmap and being developed:
  - Migrate view
  - Migrate tables using CTAS
  - Optionally and experimentally in place migrate ParquetHiveSerDe, OrcSerde, AvroSerDe, LazySimpleSerDe, JsonSerDe, OpenCSVSerde tables.
  - Experimentally migrate Delta and Parquet data found in dbfs mount but not registered as Hive Metastore table into UC tables.

### Other considerations
- You may need to run the workflow multiple times to ensure all the tables are migrated successfully in phases.
- If your delta tables in DBFS root have large number of files, consider:
  - Setting higher Min and Max workers for auto-scale when being asked during the UCX installation. More nodes/cores in the cluster means more concurrency for calling cloud storage api to copy files when deep cloning the delta tables.
  - Setting higher "Parallelism for migrating dbfs root delta tables with deep clone" (default 200) when being asked during the UCX installation. This controls the number of Spark task/partition to be created for deep clone.
- Consider create an instance pool, and set the instance pool id when being asked during the UCX installation. This instance pool will be put into the cluster policy used by all UCX workflows job clusters.
- You may also manually edit the job cluster configration per job or per task after the workflows are deployed.

[[back to top](#databricks-labs-ucx)]

# Utility commands

## `logs` command

```text
$ databricks labs ucx logs [--workflow WORKFLOW_NAME] [--debug]
```

This command displays the logs of the last run of the specified workflow. If no workflow is specified, it displays 
the logs of the workflow that was run the last. This command is useful for developers and administrators who want to 
check the logs of the last run of a workflow and ensure that it was executed as expected. It can also be used for 
debugging purposes when a workflow is not behaving as expected. By default, only `INFO`, `WARNING`, and `ERROR` logs
are displayed. To display `DEBUG` logs, use the `--debug` flag.

[[back to top](#databricks-labs-ucx)] 

## `ensure-assessment-run` command

```commandline
databricks labs ucx ensure-assessment-run
```

This command ensures that the [assessment workflow](#assessment-workflow) was run on a workspace. 
This command will block until job finishes.
Failed workflows can be fixed with the [`repair-run` command](#repair-run-command). Workflows and their status can be 
listed with the [`workflows` command](#workflows-command).

[[back to top](#databricks-labs-ucx)]

## `repair-run` command

```commandline
databricks labs ucx repair-run --step WORKFLOW_NAME
```

This command repairs a failed [UCX Workflow](#workflows). This command is useful for developers and administrators who 
want to repair a failed job. It can also be used to debug issues related to job failures. This operation can also be 
done via [user interface](https://docs.databricks.com/en/workflows/jobs/repair-job-failures.html). Workflows and their 
status can be listed with the [`workflows` command](#workflows-command).

[[back to top](#databricks-labs-ucx)]

## `workflows` command

See the [migration process diagram](#migration-process) to understand the role of each workflow in the migration process.

```text
$ databricks labs ucx workflows
Step                                  State    Started
assessment                            RUNNING  1 hour 2 minutes ago
099-destroy-schema                    UNKNOWN  <never run>
migrate-groups                        UNKNOWN  <never run>
remove-workspace-local-backup-groups  UNKNOWN  <never run>
validate-groups-permissions           UNKNOWN  <never run>
```

This command displays the [deployed workflows](#workflows) and their state in the current workspace. It fetches the latest 
job status from the workspace and prints it in a table format. This command is useful for developers and administrators 
who want to check the status of UCX workflows and ensure that they have been executed as expected. It can also be used 
for debugging purposes when a workflow is not behaving as expected. Failed workflows can be fixed with 
the [`repair-run` command](#repair-run-command).

[[back to top](#databricks-labs-ucx)]

## `open-remote-config` command

```commandline
databricks labs ucx open-remote-config
```

This command opens the remote configuration file in the default web browser. It generates a link to the configuration file 
and opens it using the `webbrowser.open()` method. This command is useful for developers and administrators who want to view or 
edit the remote configuration file without having to manually navigate to it in the workspace. It can also be used to quickly 
access the configuration file from the command line. Here's the description of configuration properties:

  * `inventory_database`: A string representing the name of the inventory database.
  * `workspace_group_regex`: An optional string representing the regular expression to match workspace group names.
  * `workspace_group_replace`: An optional string to replace the matched group names with.
  * `account_group_regex`: An optional string representing the regular expression to match account group names.
  * `group_match_by_external_id`: A boolean value indicating whether to match groups by their external IDs.
  * `include_group_names`: An optional list of strings representing the names of groups to include for migration.
  * `renamed_group_prefix`: An optional string representing the prefix to add to renamed group names.
  * `instance_pool_id`: An optional string representing the ID of the instance pool.
  * `warehouse_id`: An optional string representing the ID of the warehouse.
  * `connect`: An optional `Config` object representing the configuration for connecting to the warehouse.
  * `num_threads`: An optional integer representing the number of threads to use for migration.
  * `database_to_catalog_mapping`: An optional dictionary mapping source database names to target catalog names.
  * `default_catalog`: An optional string representing the default catalog name.
  * `log_level`: An optional string representing the log level.
  * `workspace_start_path`: A string representing the starting path for notebooks and directories crawler in the workspace.
  * `instance_profile`: An optional string representing the name of the instance profile.
  * `spark_conf`: An optional dictionary of Spark configuration properties.
  * `override_clusters`: An optional dictionary mapping job cluster names to existing cluster IDs.
  * `policy_id`: An optional string representing the ID of the cluster policy.
  * `is_terraform_used`: A boolean value indicating whether some workspace resources are managed by Terraform.
  * `include_databases`: An optional list of strings representing the names of databases to include for migration.

[[back to top](#databricks-labs-ucx)]

## `installations` command

```text
$ databricks labs ucx installations 
...
13:49:16  INFO [d.labs.ucx] Fetching installations...
13:49:17  INFO [d.l.blueprint.parallel][finding_ucx_installations_5] finding ucx installations 10/88, rps: 22.838/sec
13:49:17  INFO [d.l.blueprint.parallel][finding_ucx_installations_9] finding ucx installations 20/88, rps: 35.002/sec
13:49:17  INFO [d.l.blueprint.parallel][finding_ucx_installations_2] finding ucx installations 30/88, rps: 51.556/sec
13:49:18  INFO [d.l.blueprint.parallel][finding_ucx_installations_9] finding ucx installations 40/88, rps: 56.272/sec
13:49:18  INFO [d.l.blueprint.parallel][finding_ucx_installations_19] finding ucx installations 50/88, rps: 67.382/sec
...
Path                                      Database  Warehouse
/Users/serge.smertin@databricks.com/.ucx  ucx       675eaf1ff976aa98
```

This command displays the [installations](#installation) by different users on the same workspace. It fetches all 
the installations where the `ucx` package is installed and prints their details in JSON format. This command is useful 
for administrators who want to see which users have installed `ucx` and where. It can also be used to debug issues 
related to multiple installations of `ucx` on the same workspace.

[[back to top](#databricks-labs-ucx)]

# Metastore related commands

These commands are used to assign a Unity Catalog metastore to a workspace. The metastore assignment is a pre-requisite
for any further migration steps.

[[back to top](#databricks-labs-ucx)]

## `show-all-metastores` command

```text
databricks labs ucx show-all-metastores [--workspace-id <workspace-id>]
```

This command lists all the metastores available to be assigned to a workspace. If no workspace is specified, it lists
all the metastores available in the account. This command is useful when there are multiple metastores available within
a region and you want to see which ones are available for assignment.

[[back to top](#databricks-labs-ucx)]

## `assign-metastore` command

```text
databricks labs ucx assign-metastore --workspace-id <workspace-id> [--metastore-id <metastore-id>]
```

This command assigns a metastore to a workspace with `workspace-id`. If there is only a single metastore in the workspace
region, it will be automatically assigned to the workspace. If there are multiple metastores available, you need to specify
the metastore id of the metastore you want to assign to the workspace.

# Table migration commands

These commands are vital part of [table migration workflow](#table-migration-workflow) process and require 
the [assessment workflow](#assessment-workflow) and 
[group migration workflow](#group-migration-workflow) to be completed. 
See the [migration process diagram](#migration-process) to understand the role of the table migration commands in 
the migration process. 

The first step is to run the [`principal-prefix-access` command](#principal-prefix-access-command) to identify all 
the storage accounts used by tables in the workspace and their permissions on each storage account.

If you don't have any storage credentials and external locations configured,  you'll need to run 
the [`migrate-credentials` command](#migrate-credentials-command) to migrate the service principals 
and [`migrate-locations` command](#migrate-locations-command) to create the external locations. 
If some of the external locations already exist, you should run 
the [`validate-external-locations` command](#validate-external-locations-command).
You'll need to create the [uber principal](#create-uber-principal-command) with the _**access to all storage**_ used to tables in 
the workspace, so that you can migrate all the tables. If you already have the principal, you can skip this step.

Ask your Databricks Account admin to run the [`sync-workspace-info` command](#sync-workspace-info-command) to sync the
workspace information with the UCX installations. Once the workspace information is synced, you can run the 
[`create-table-mapping` command](#create-table-mapping-command) to align your tables with the Unity Catalog,
[create catalogs and schemas](#create-catalogs-schemas-command) and start the migration. During multiple runs of
the table migration workflow, you can use the [`revert-migrated-tables` command](#revert-migrated-tables-command) to 
revert the tables that were migrated in the previous run. You can also skip the tables that you don't want to migrate 
using the [`skip` command](#skip-command).

Once you're done with the table migration, proceed to the [code migration](#code-migration-commands).

[[back to top](#databricks-labs-ucx)]

## `principal-prefix-access` command

```text
databricks labs ucx principal-prefix-access [--subscription-id <Azure Subscription ID>] [--aws-profile <AWS CLI profile>]
```

This command depends on results from the [assessment workflow](#assessment-workflow) and requires [AWS CLI](#access-for-aws-s3-buckets) 
or [Azure CLI](#access-for-azure-storage-accounts) to be installed and authenticated for the given machine. This command
identifies all the storage accounts used by tables in the workspace and their permissions on each storage account. 
Once you're done running this command, proceed to the [`migrate-credentials` command](#migrate-credentials-command).

[[back to top](#databricks-labs-ucx)]

### Access for AWS S3 Buckets

```commandline
databricks labs ucx principal-prefix-access --aws-profile test-profile
```

Use to identify all instance profiles in the workspace, and map their access to S3 buckets. 
Also captures the IAM roles which has UC arn listed, and map their access to S3 buckets 
This requires `aws` CLI to be installed and configured. 

Once done, proceed to the [`migrate-credentials` command](#migrate-credentials-command).

[[back to top](#databricks-labs-ucx)]

### Access for Azure Storage Accounts

```commandline
databricks labs ucx principal-prefix-access --subscription-id test-subscription-id
```

Use to identify all storage account used by tables, identify the relevant Azure service principals and their permissions 
on each storage account. This requires Azure CLI to be installed and configured via `az login`. 

Once done, proceed to the [`migrate-credentials` command](#migrate-credentials-command).

[[back to top](#databricks-labs-ucx)]

## `create-uber-principal` command

```text
databricks labs ucx create-uber-principal [--subscription-id X]
```

**Requires Cloud IAM admin privileges.** Once the [`assessment` workflow](#assessment-workflow) complete, you should run 
this command to creates a service principal with the _**read-only access to all storage**_ used by tables in this 
workspace and configure the [UCX Cluster Policy](#installation) with the details of it. Once migration is complete, this
service principal should be unprovisioned. On Azure, it creates a principal with `Storage Blob Data Reader` role 
assignment on every storage account using Azure Resource Manager APIs.

This command is one of prerequisites for the [table migration workflow](#table-migration-workflow).

[[back to top](#databricks-labs-ucx)]

## `migrate-credentials` command

```commandline
databricks labs ucx migrate-credentials
```

For Azure, this command migrate Azure Service Principals, which have `Storage Blob Data Contributor`,
`Storage Blob Data Reader`, `Storage Blob Data Owner` roles on ADLS Gen2 locations that are being used in
Databricks, to UC storage credentials. The Azure Service Principals to location mapping are listed 
in `/Users/{user_name}/.ucx/azure_storage_account_info.csv` which is generated 
by [`principal-prefix-access` command](#principal-prefix-access-command). 
Please review the file and delete the Service Principals you do not want to be migrated.
The command will only migrate the Service Principals that have client secret stored in Databricks Secret.

Once you're done with this command, run [`validate-external-locations` command](#validate-external-locations-command) after this one.

[[back to top](#databricks-labs-ucx)]

## `validate-external-locations` command

```text
databricks labs ucx validate-external-locations 
```

Once the [`assessment` workflow](#assessment-workflow) finished successfully, [storage credentials](#migrate-credentials-command) are configured, 
run this command to validate and report the missing Unity Catalog external locations to be created. 

This command validates and provides mapping to external tables to external locations, also as Terraform configurations.

Once you're done with this command, proceed to the [`migrate-locations` command](#migrate-locations-command).

[[back to top](#databricks-labs-ucx)]

## `migrate-locations` command

```text
databricks labs ucx migrate-locations 
```

Once the [`assessment` workflow](#assessment-workflow) finished successfully, and [storage credentials](#migrate-credentials-command) are configured,
run this command to have Unity Catalog external locations created. The candidate locations to be created are extracted from guess_external_locations
task in the assessment job. You can run [`validate-external-locations` command](#validate-external-locations-command) to check the candidate locations.

Once you're done with this command, proceed to the [`create-table-mapping` command](#create-table-mapping-command).

[[back to top](#databricks-labs-ucx)]

## `create-table-mapping` command

```text
databricks labs ucx create-table-mapping 
```

Once the [`assessment` workflow](#assessment-workflow) finished successfully 
[workspace info is synchronized](#sync-workspace-info-command), run this command to create the initial 
table mapping for review in CSV format in the Databricks Workspace:

```text
workspace_name,catalog_name,src_schema,dst_schema,src_table,dst_table
labs-azure,labs_azure,default,default,ucx_tybzs,ucx_tybzs
```

You are supposed to review this mapping and adjust it if necessary. This file is in CSV format, so that you can edit it 
easier in your favorite spreadsheet application.

Once you're done with this command, [create catalogs and schemas](#create-catalogs-schemas-command). During 
multiple runs of the table migration workflow, you can use the [`revert-migrated-tables` command](#revert-migrated-tables-command)
to revert the tables that were migrated in the previous run. You can also skip the tables that you don't want to migrate
using the [`skip` command](#skip-command).

This command is one of prerequisites for the [table migration workflow](#table-migration-workflow).

Once you're done with table migration, proceed to the [code migration](#code-migration-commands).

[[back to top](#databricks-labs-ucx)]

## `skip` command

```text
databricks labs ucx skip --schema X [--table Y]  
```

Anywhere after [`create-table-mapping` command](#create-table-mapping-command) is executed, you can run this command.

This command allows users to skip certain schemas or tables during the [table migration](#table-migration-workflow) process.
The command takes `--schema` and optionally `--table` flags to specify the schema and table to skip. If no `--table` flag 
is provided, all tables in the specified HMS database are skipped.
This command is useful to temporarily disable migration on a particular schema or table.

Once you're done with table migration, proceed to the [code migration](#code-migration-commands).

[[back to top](#databricks-labs-ucx)]

## `revert-migrated-tables` command

```text
databricks labs ucx revert-migrated-tables --schema X --table Y [--delete-managed]  
```

Anywhere after [`create-table-mapping` command](#create-table-mapping-command) is executed, you can run this command.

This command removes the `upgraded_from` property on a migrated table for re-migration in the [table migration](#table-migration-workflow) process. 
This command is useful for developers and administrators who want to revert the migration of a table. It can also be used 
to debug issues related to table migration.

Go back to the [`create-table-mapping` command](#create-table-mapping-command) after you're done with this command.

[[back to top](#databricks-labs-ucx)]

## `create-catalogs-schemas` command

```text
databricks labs ucx create-catalogs-schemas
```
After [`create-table-mapping` command](#create-table-mapping-command) is executed, you can run this command to have the required UC catalogs and schemas created.
This command is supposed to be run before migrating tables to UC using [table migration workflow](#table-migration-workflow).

[[back to top](#databricks-labs-ucx)]

## `move` command

```text
databricks labs ucx move --from-catalog A --from-schema B --from-table C --to-catalog D --to-schema E  
```

This command moves a UC table/tables from one schema to another schema after
the [table migration](#table-migration-workflow) process. This is useful for developers and administrators who want 
to adjust their catalog structure after tables upgrade.

Users will be prompted whether tables/view are dropped after moving to new schema. This only applies to `MANAGED` tables and views.

This command moves different table types differently:
- `MANAGED` tables are deep-cloned to the new schema. 
- `EXTERNAL` tables are dropped from the original schema, then created in the target schema using the same location. 
This is due to Unity Catalog not supporting multiple tables with overlapping paths
- `VIEW` are recreated using the same view definition.

This command supports moving multiple tables at once, by specifying `*` as the table name.

[[back to top](#databricks-labs-ucx)]

## `alias` command

```text
databricks labs ucx alias --from-catalog A --from-schema B --from-table C --to-catalog D --to-schema E  
```

This command aliases a UC table/tables from one schema to another schema in the same or different catalog. 
It takes a `WorkspaceClient` object and `from` and `to` parameters as parameters and aliases the tables using 
the `TableMove` class. This command is useful for developers and administrators who want to create an alias for a table. 
It can also be used to debug issues related to table aliasing.

[[back to top](#databricks-labs-ucx)]

# Code migration commands

See the [migration process diagram](#migration-process) to understand the role of the code migration commands in the migration process.

After you're done with the [table migration](#table-migration-workflow), you can proceed to the code migration.

Once you're done with the code migration, you can run the [`cluster-remap` command](#cluster-remap-command) to remap the
clusters to be UC compatible.

[[back to top](#databricks-labs-ucx)]

## `migrate-local-code` command

```text
databricks labs ucx migrate-local-code
```

**(Experimental)** Once [table migration](#table-migration-workflow) is complete, you can run this command to 
migrate all python and SQL files in the current working directory. This command is highly experimental and
at the moment only supports Python and SQL files and discards code comments and formatting during 
the automated transformation process.

[[back to top](#databricks-labs-ucx)]

# Cross-workspace installations

When installing UCX across multiple workspaces, administrators need to keep UCX configurations in sync.
UCX will prompt you to select an account profile that has been defined in `~/.databrickscfg`. If you don't have one,
authenticate your machine with:

* `databricks auth login --host https://accounts.cloud.databricks.com/` (AWS)
* `databricks auth login --host https://accounts.azuredatabricks.net/` (Azure)

Ask your Databricks Account admin to run the [`sync-workspace-info` command](#sync-workspace-info-command) to sync the
workspace information with the UCX installations. Once the workspace information is synced, you can run the 
[`create-table-mapping` command](#create-table-mapping-command) to align your tables with the Unity Catalog.

[[back to top](#databricks-labs-ucx)]

## `sync-workspace-info` command

```text
databricks labs ucx sync-workspace-info 
14:07:07  INFO [databricks.sdk] Using Azure CLI authentication with AAD tokens
14:07:07  INFO [d.labs.ucx] Account ID: ...
14:07:10  INFO [d.l.blueprint.parallel][finding_ucx_installations_16] finding ucx installations 10/88, rps: 16.415/sec
14:07:10  INFO [d.l.blueprint.parallel][finding_ucx_installations_0] finding ucx installations 20/88, rps: 32.110/sec
14:07:11  INFO [d.l.blueprint.parallel][finding_ucx_installations_18] finding ucx installations 30/88, rps: 39.786/sec
...
```

**Requires Databricks Account Administrator privileges.** This command uploads the workspace config to all workspaces 
in the account where `ucx` is installed. This command is necessary to create an immutable default catalog mapping for
[table migration](#table-migration-workflow) process and is the prerequisite 
for [`create-table-mapping` command](#create-table-mapping-command).

If you cannot get account administrator privileges in reasonable time, you can take the risk and 
run [`manual-workspace-info` command](#manual-workspace-info-command) to enter Databricks Workspace IDs and Databricks 
Workspace names. 

[[back to top](#databricks-labs-ucx)]

## `manual-workspace-info` command

```text
$ databricks labs ucx manual-workspace-info
14:20:36  WARN [d.l.ucx.account] You are strongly recommended to run "databricks labs ucx sync-workspace-info" by account admin,
 ... otherwise there is a significant risk of inconsistencies between different workspaces. This command will overwrite all UCX 
 ... installations on this given workspace. Result may be consistent only within https://adb-987654321.10.azuredatabricks.net
Workspace name for 987654321 (default: workspace-987654321): labs-workspace
Next workspace id (default: stop): 12345
Workspace name for 12345 (default: workspace-12345): other-workspace
Next workspace id (default: stop): 
14:21:19  INFO [d.l.blueprint.parallel][finding_ucx_installations_11] finding ucx installations 10/89, rps: 24.577/sec
14:21:19  INFO [d.l.blueprint.parallel][finding_ucx_installations_15] finding ucx installations 20/89, rps: 48.305/sec
...
14:21:20  INFO [d.l.ucx.account] Synchronised workspace id mapping for installations on current workspace
```

This command is only supposed to be run if the [`sync-workspace-info` command](#sync-workspace-info-command) cannot be 
run. It prompts the user to enter the required information manually and creates the workspace info. This command is 
useful for workspace administrators who are unable to use the `sync-workspace-info` command, because they are not 
Databricks Account Administrators. It can also be used to manually create the workspace info in a new workspace.

[[back to top](#databricks-labs-ucx)]

## `create-account-groups` command

```text
$ databricks labs ucx create-account-groups [--workspace-ids 123,456,789]
```

**Requires Databricks Account Administrator privileges.** This command creates account-level groups if a workspace local 
group is not present in the account. It crawls all workspaces configured in `--workspace-ids` flag, then creates 
account level groups if a WS local group is not present in the account. If `--workspace-ids` flag is not specified, UCX 
will create account groups for all workspaces configured in the account.

The following scenarios are supported, if a group X:
- Exist in workspaces A,B,C, and it has same members in there, it will be created in the account
- Exist in workspaces A,B but not in C, it will be created in the account
- Exist in workspaces A,B,C. It has same members in A,B, but not in C. Then, X and C_X will be created in the account

This command is useful for the setups, that don't have SCIM provisioning in place.

Once you're done with this command, proceed to the [group migration workflow](#group-migration-workflow).

[[back to top](#databricks-labs-ucx)]

## `validate-groups-membership` command

```text
$ databricks labs ucx validate-groups-membership
...
14:30:36  INFO [d.l.u.workspace_access.groups] Found 483 account groups
14:30:36  INFO [d.l.u.workspace_access.groups] No group listing provided, all matching groups will be migrated
14:30:36  INFO [d.l.u.workspace_access.groups] There are no groups with different membership between account and workspace
Workspace Group Name  Members Count  Account Group Name  Members Count  Difference
```

This command validates the groups to see if the groups at the account level and workspace level have different membership.
This command is useful for administrators who want to ensure that the groups have the correct membership. It can also be
used to debug issues related to group membership. See [group migration](docs/local-group-migration.md) and 
[group migration](#group-migration-workflow) for more details.

Valid group membership is important to ensure users has correct access after legacy table ACL is migrated in [table migration workflow](#table-migration-workflow)

[[back to top](#databricks-labs-ucx)]

## `cluster-remap` command

```text
$ databricks labs ucx cluster-remap 
21:29:38  INFO [d.labs.ucx] Remapping the Clusters to UC
Cluster Name                                            Cluster Id                                        
Field Eng Shared UC LTS Cluster                         0601-182128-dcbte59m                              
Shared Autoscaling Americas cluster                     0329-145545-rugby794
```
```text
Please provide the cluster id's as comma separated value from the above list (default: <ALL>): 
```

Once you're done with the [code migration](#code-migration-commands), you can run this command to remap the clusters to UC enabled.

This command will remap the cluster to uc enabled one. When we run this command it will list all the clusters 
and its id's and asks to provide the cluster id's as comma separated value which has to be remapped, by default it will take all cluster ids.
Once we provide the cluster id's it will update these clusters to UC enabled.Back up of the existing cluster 
config will be stored in backup folder inside the installed location(backup/clusters/cluster_id.json) as a json file.This will help 
to revert the cluster remapping.

You can revert the cluster remapping using the [`revert-cluster-remap` command](#revert-cluster-remap-command).

[[back to top](#databricks-labs-ucx)]

## `revert-cluster-remap` command

```text
$ databricks labs ucx revert-cluster-remap
21:31:29  INFO [d.labs.ucx] Reverting the Remapping of the Clusters from UC
21:31:33  INFO [d.labs.ucx] 0301-055912-4ske39iq
21:31:33  INFO [d.labs.ucx] 0306-121015-v1llqff6
Please provide the cluster id's as comma separated value from the above list (default: <ALL>):
```

If a customer want's to revert the cluster remap done using the [`cluster-remap` command](#cluster-remap-command) they can use this command to revert 
its configuration from UC to original one.It will iterate through the list of clusters from the back up folder and reverts the 
cluster configurations to original one.This will also ask the user to provide the list of clusters that has to be reverted as a prompt.
By default, it will revert all the clusters present in the backup folder

[[back to top](#databricks-labs-ucx)]

# Star History

[![Star History Chart](https://api.star-history.com/svg?repos=databrickslabs/ucx&type=Date)](https://star-history.com/#databrickslabs/ucx)

[[back to top](#databricks-labs-ucx)]

# Project Support
Please note that all projects in the databrickslabs GitHub account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs).  They are provided AS-IS, and we do not make any guarantees of any kind.  Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo.  They will be reviewed as time permits, but there are no formal SLAs for support.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "databricks-labs-ucx",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Serge Smertin <serge.smertin@databricks.com>, Liran Bareket <liran.bareket@databricks.com>, Marcin Wojtyczka <marcin.wojtyczka@databricks.com>, Ziyuan Qin <ziyuan.qin@databricks.com>, William Conti <william.conti@databricks.com>, Hari Selvarajan <hari.selvarajan@databricks.com>, Vuong Nguyen <vuong.nguyen@databricks.com>",
    "keywords": "Databricks, Unity Catalog",
    "author": null,
    "author_email": "Serge Smertin <serge.smertin@databricks.com>",
    "download_url": "https://files.pythonhosted.org/packages/03/ab/ff5dde4aa9e15ab63dbd3428b3e35b6d69821d235282951d5751b131922d/databricks_labs_ucx-0.22.0.tar.gz",
    "platform": null,
    "description": "Databricks Labs UCX\n===\n![UCX by Databricks Labs](docs/logo-no-background.png)\n\nThe companion for upgrading to Unity Catalog. After [installation](#install-ucx), ensure to [trigger](#ensure-assessment-run-command) the [assessment workflow](#assessment-workflow), \nso that you'll be able to [scope the migration](docs/assessment.md) and execute the [group migration workflow](#group-migration-workflow). \n[`<installation_path>/README`](#readme-notebook) contains further instructions and explanations of these workflows. \nThen you can execute [table migration workflow](#table-migration-workflow).\nMore workflows, like notebook code migration is coming in the future releases. \nUCX exposes a number of command line utilities accessible via `databricks labs ucx`.\n\nFor questions, troubleshooting or bug fixes, please see our [troubleshooting guide](docs/troubleshooting.md) or submit [an issue](https://github.com/databrickslabs/ucx/issues). \nSee [contributing instructions](CONTRIBUTING.md) to help improve this project.\n\n[![build](https://github.com/databrickslabs/ucx/actions/workflows/push.yml/badge.svg)](https://github.com/databrickslabs/ucx/actions/workflows/push.yml) [![codecov](https://codecov.io/github/databrickslabs/ucx/graph/badge.svg?token=p0WKAfW5HQ)](https://codecov.io/github/databrickslabs/ucx)  [![lines of code](https://tokei.rs/b1/github/databrickslabs/ucx)]([https://codecov.io/github/databrickslabs/ucx](https://github.com/databrickslabs/ucx))\n\n<!-- TOC -->\n* [Databricks Labs UCX](#databricks-labs-ucx)\n* [Installation](#installation)\n  * [Authenticate Databricks CLI](#authenticate-databricks-cli)\n  * [Install UCX](#install-ucx)\n  * [[ADVANCED] Force install over existing UCX](#advanced-force-install-over-existing-ucx)\n  * [[ADVANCED] Installing UCX on all workspaces within a Databricks account](#advanced-installing-ucx-on-all-workspaces-within-a-databricks-account)\n  * [Upgrading UCX for newer versions](#upgrading-ucx-for-newer-versions)\n  * [Uninstall UCX](#uninstall-ucx)\n* [Migration process](#migration-process)\n* [Workflows](#workflows)\n  * [Readme notebook](#readme-notebook)\n  * [Assessment workflow](#assessment-workflow)\n  * [Group migration workflow](#group-migration-workflow)\n  * [Debug notebook](#debug-notebook)\n  * [Debug logs](#debug-logs)\n  * [Table Migration Workflow](#table-migration-workflow)\n    * [Dependency CLI commands](#dependency-cli-commands)\n    * [Table Migration Workflow Tasks](#table-migration-workflow-tasks)\n    * [Other considerations](#other-considerations)\n* [Utility commands](#utility-commands)\n  * [`logs` command](#logs-command)\n  * [`ensure-assessment-run` command](#ensure-assessment-run-command)\n  * [`repair-run` command](#repair-run-command)\n  * [`workflows` command](#workflows-command)\n  * [`open-remote-config` command](#open-remote-config-command)\n  * [`installations` command](#installations-command)\n* [Metastore related commands](#metastore-related-commands)\n  * [`show-all-metastores` command](#show-all-metastores-command)\n  * [`assign-metastore` command](#assign-metastore-command)\n* [Table migration commands](#table-migration-commands)\n  * [`principal-prefix-access` command](#principal-prefix-access-command)\n    * [Access for AWS S3 Buckets](#access-for-aws-s3-buckets)\n    * [Access for Azure Storage Accounts](#access-for-azure-storage-accounts)\n  * [`create-uber-principal` command](#create-uber-principal-command)\n  * [`migrate-credentials` command](#migrate-credentials-command)\n  * [`validate-external-locations` command](#validate-external-locations-command)\n  * [`migrate-locations` command](#migrate-locations-command)\n  * [`create-table-mapping` command](#create-table-mapping-command)\n  * [`skip` command](#skip-command)\n  * [`revert-migrated-tables` command](#revert-migrated-tables-command)\n  * [`create-catalogs-schemas` command](#create-catalogs-schemas-command)\n  * [`move` command](#move-command)\n  * [`alias` command](#alias-command)\n* [Code migration commands](#code-migration-commands)\n  * [`migrate-local-code` command](#migrate-local-code-command)\n* [Cross-workspace installations](#cross-workspace-installations)\n  * [`sync-workspace-info` command](#sync-workspace-info-command)\n  * [`manual-workspace-info` command](#manual-workspace-info-command)\n  * [`create-account-groups` command](#create-account-groups-command)\n  * [`validate-groups-membership` command](#validate-groups-membership-command)\n  * [`cluster-remap` command](#cluster-remap-command)\n  * [`revert-cluster-remap` command](#revert-cluster-remap-command)\n* [Star History](#star-history)\n* [Project Support](#project-support)\n<!-- TOC -->\n\n# Installation\n\n- Databricks CLI v0.213 or later. See [instructions](#authenticate-databricks-cli). \n- Python 3.10 or later. See [Windows](https://www.python.org/downloads/windows/) instructions.\n- Network access to your Databricks Workspace used for the [installation process](#install-ucx).\n- Network access to the Internet for [pypi.org](https://pypi.org) and [github.com](https://github.com) from machine running the installation.\n- Databricks Workspace Administrator privileges for the user, that runs the installation. Running UCX as a Service Principal is not supported.\n- Account level Identity Setup. See instructions for [AWS](https://docs.databricks.com/en/administration-guide/users-groups/best-practices.html), [Azure](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/best-practices), and [GCP](https://docs.gcp.databricks.com/administration-guide/users-groups/best-practices.html).\n- Unity Catalog Metastore Created (per region). See instructions for [AWS](https://docs.databricks.com/en/data-governance/unity-catalog/create-metastore.html), [Azure](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-metastore), and [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/create-metastore.html).\n- If your Databricks Workspace relies on an external Hive Metastore (such as AWS Glue), make sure to read [this guide](docs/external_hms_glue.md).\n- Databricks Workspace has to have network access to [pypi.org](https://pypi.org) to download `databricks-sdk` and `pyyaml` packages.\n- A PRO or Serverless SQL Warehouse to render the [report](docs/assessment.md) for the [assessment workflow](#assessment-workflow).\n\nOnce you [install UCX](#install-ucx), you can proceed to the [assessment workflow](#assessment-workflow) to ensure \nthe compatibility of your workspace with Unity Catalog.\n\n[[back to top](#databricks-labs-ucx)]\n\n## Authenticate Databricks CLI\n\nWe only support installations and upgrades through [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html), as UCX requires an installation script run \nto make sure all the necessary and correct configurations are in place. Install Databricks CLI on macOS:\n![macos_install_databricks](docs/macos_1_databrickslabsmac_installdatabricks.gif)\n\nInstall Databricks CLI on Windows:\n![windows_install_databricks.png](docs/windows_install_databricks.png)\n\nOnce you install Databricks CLI, authenticate your current machine to a Databricks Workspace:\n\n```commandline\ndatabricks auth login --host WORKSPACE_HOST\n```\n\nTo enable debug logs, simply add `--debug` flag to any command.\n\n[[back to top](#databricks-labs-ucx)]\n\n## Install UCX\n\nInstall UCX via Databricks CLI:\n\n```commandline\ndatabricks labs install ucx\n```\n\nYou'll be prompted to select a [configuration profile](https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication) created by `databricks auth login` command.\n\nOnce you install, proceed to the [assessment workflow](#assessment-workflow) to ensure the compatibility of your workspace with UCX.\n\nThe `WorkspaceInstaller` class is used to create a new configuration for Unity Catalog migration in a Databricks workspace. \nIt guides the user through a series of prompts to gather necessary information, such as selecting an inventory database, choosing \na PRO or SERVERLESS SQL warehouse, specifying a log level and number of threads, and setting up an external Hive Metastore if necessary. \nUpon the first installation, you're prompted for a workspace local [group migration strategy](docs/group_name_conflict.md). \nBased on user input, the class creates a new cluster policy with the specified configuration. The user can review and confirm the configuration, \nwhich is saved to the workspace and can be opened in a web browser.\n\nThe [`WorkspaceInstallation`](src/databricks/labs/ucx/install.py) manages the installation and uninstallation of UCX in a workspace. It handles \nthe configuration and exception management during the installation process. The installation process creates dashboards, databases, and jobs. \nIt also includes the creation of a database with given configuration and the deployment of workflows with specific settings. The installation \nprocess can handle exceptions and infer errors from job runs and task runs. The workspace installation uploads wheels, creates cluster policies, \nand wheel runners to the workspace. It can also handle the creation of job tasks for a given task, such as job dashboard tasks, job notebook tasks, \nand job wheel tasks. The class handles the installation of UCX, including configuring the workspace, installing necessary libraries, and verifying \nthe installation, making it easier for users to migrate their workspaces to UCX.\n\nAfter this, UCX will be installed locally and a number of assets will be deployed in the selected workspace. \nThese assets are available under the installation folder, i.e. `/Users/<your user>/.ucx/`.\n\nYou can also install a specific version by specifying it like `@v0.13.2` - `databricks labs install ucx@v0.13.2`.\n\n![macos_install_ucx](docs/macos_2_databrickslabsmac_installucx.gif)\n\n[[back to top](#databricks-labs-ucx)]\n\n## [ADVANCED] Force install over existing UCX\nUsing an environment variable `UCX_FORCE_INSTALL` you can force the installation of UCX over an existing installation.\nThe values for the environment variable are 'global' and 'user'.\n\nGlobal Install: When UCX is installed at '/Applications/ucx'\nUser Install: When UCX is installed at '/Users/<user>/.ucx'\n\nIf there is an existing global installation of UCX, you can force a user installation of UCX over the existing installation by setting the environment variable `UCX_FORCE_INSTALL` to 'global'.\n\nAt this moment there is no global override over a user installation of UCX. As this requires migration and can break existing installations.\n\n\n| global | user | expected install location | install_folder      | mode                                            |\n|--------|------|---------------------------|---------------------|-------------------------------------------------|\n| no     | no   | default                   | `/Applications/ucx` | install                                         |\n| yes    | no   | default                   | `/Applications/ucx` | upgrade                                         |\n| no     | yes  | default                   | `/Users/X/.ucx`     | upgrade (existing installations must not break) |\n| yes    | yes  | default                   | `/Users/X/.ucx`     | upgrade                                         |\n| yes    | no   | **USER**                  | `/Users/X/.ucx`     | install (show prompt)                           |\n| no     | yes  | **GLOBAL**                | ...                 | migrate                                         |\n\n\n* `UCX_FORCE_INSTALL=user databricks labs install ucx` - will force the installation to be for user only\n* `UCX_FORCE_INSTALL=global databricks labs install ucx` - will force the installation to be for root only\n\n\n[[back to top](#databricks-labs-ucx)]\n\n## [ADVANCED] Installing UCX on all workspaces within a Databricks account\nSetting the environment variable `UCX_FORCE_INSTALL` to 'account' will install UCX on all workspaces within a Databricks account.\n\n* `UCX_FORCE_INSTALL=account databricks labs install ucx`\n\nAfter the first installation, UCX will prompt the user to confirm whether to install UCX on the remaining workspaces with the same answers. If confirmed, the remaining installations will be completed silently.\n\nThis installation mode will automatically select the following options:\n* Automatically create and enable HMS lineage init script\n* Automatically create a new SQL warehouse for UCX assessment\n\n[[back to top](#databricks-labs-ucx)]\n\n## Upgrading UCX for newer versions\n\nVerify that UCX is installed\n\n```commandline\ndatabricks labs installed\n\nName  Description                            Version\nucx   Unity Catalog Migration Toolkit (UCX)  <version>\n```\n\nUpgrade UCX via Databricks CLI:\n\n```commandline\ndatabricks labs upgrade ucx\n```\n\nThe prompts will be similar to [Installation](#install-ucx)\n\n![macos_upgrade_ucx](docs/macos_3_databrickslabsmac_upgradeucx.gif)\n\n[[back to top](#databricks-labs-ucx)]\n\n## Uninstall UCX\n\nUninstall UCX via Databricks CLI:\n\n```commandline\ndatabricks labs uninstall ucx\n```\n\nDatabricks CLI will confirm a few options:\n- Whether you want to remove all ucx artefacts from the workspace as well. Defaults to no.\n- Whether you want to delete the inventory database in `hive_metastore`. Defaults to no.\n\n![macos_uninstall_ucx](docs/macos_4_databrickslabsmac_uninstallucx.gif)\n\n[[back to top](#databricks-labs-ucx)]\n\n# Migration process\n\nOn the high level, the steps in migration process start with the [assessment workflow](#assessment-workflow), \nfollowed by [group migration](#group-migration-workflow), [table migration workflow](#table-migration-workflow), \nfinalised with the [code migration](#code-migration-commands). It can be described as:\n\n```mermaid\nflowchart TD\n    subgraph workspace-admin\n        assessment --> group-migration\n        group-migration --> table-migration\n        table-migration --> code-migration\n        assessment --> create-table-mapping\n        create-table-mapping --> table-migration\n        create-table-mapping --> code-migration\n        validate-external-locations --> table-migration\n        table-migration --> revert-migrated-tables\n        revert-migrated-tables --> table-migration\n    end\n    subgraph account-admin\n        create-account-groups --> group-migration\n        sync-workspace-info --> create-table-mapping\n        group-migration --> validate-groups-membership\n    end\n    subgraph iam-admin\n        setup-account-scim --> create-account-groups\n        assessment --> create-uber-principal\n        create-uber-principal --> table-migration\n        assessment --> principal-prefix-access\n        principal-prefix-access --> migrate-credentials\n        migrate-credentials --> validate-external-locations\n        setup-account-scim    \n    end\n```\n\n[[back to top](#databricks-labs-ucx)]\n\n# Workflows\n\nPart of this application is deployed as [Databricks Workflows](https://docs.databricks.com/en/workflows/index.html). \nYou can view the status of deployed workflows via the [`workflows` command](#workflows-command).\nFailed workflows can be fixed with the [`repair-run` command](#repair-run-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## Readme notebook\n\n![readme](docs/readme-notebook.png)\n\nEvery installation creates a `README` notebook with a detailed description of all deployed workflows and their tasks,\nproviding quick links to the relevant workflows and dashboards.\n\n[[back to top](#databricks-labs-ucx)]\n\n## Assessment workflow\n\nThe assessment workflow can be triggered using the Databricks UI, or via the [command line](#ensure-assessment-run-command). \n\n```commandline\ndatabricks labs ucx ensure-assessment-run\n```\n\n![ucx_assessment_workflow](docs/ucx_assessment_workflow.png)\n\nOnce you finish the assessment, proceed to the [group migration workflow](#group-migration-workflow). \nSee the [migration process diagram](#migration-process) to understand the role of the assessment workflow in the migration process.\n\nThe assessment workflow is designed to assess the compatibility of various entities in the current workspace with Unity Catalog. \nIt identifies incompatible entities and provides information necessary for planning the migration to UC. The tasks in \nthe assessment workflow can be executed in parallel or sequentially, depending on the dependencies specified in the `@task` decorators.\nThe output of each task is stored in Delta tables in the `$inventory_database` schema, that you specify during [installation](#install-ucx), \nwhich can be used for further analysis and decision-making through the [assessment report](docs/assessment.md). \nThe assessment workflow can be executed multiple times to ensure that all incompatible entities are identified and accounted \nfor before starting the migration process.\n\n1. `crawl_tables`: This task scans all tables in the Hive Metastore of the current workspace and persists their metadata in a Delta table named `$inventory_database.tables`. This metadata includes information such as the database name, table name, table type, and table location. This task is used for assessing which tables cannot be easily migrated to Unity Catalog.\n2. `crawl_grants`: This task scans the Delta table named `$inventory_database.tables` and issues a `SHOW GRANTS` statement for every object to retrieve the permissions assigned to it. The permissions include information such as the principal, action type, and the table it applies to. This task persists the permissions in the Delta table `$inventory_database.grants`.\n3. `estimate_table_size_for_migration`: This task scans the Delta table named `$inventory_database.tables` and locates tables that cannot be synced. These tables will have to be cloned in the migration process. The task assesses the size of these tables and creates the `$inventory_database.table_size` table to list these sizes. The table size is a factor in deciding whether to clone these tables.\n4. `crawl_mounts`: This task scans the workspace to compile a list of all existing mount points and stores this information in the `$inventory.mounts` table. This is crucial for planning the migration.\n5. `guess_external_locations`: This task determines the shared path prefixes of all the tables that utilize mount points. The goal is to identify the external locations necessary for a successful migration and store this information in the `$inventory.external_locations` table.\n6. `assess_jobs`: This task scans through all the jobs and identifies those that are not compatible with UC. The list of all the jobs is stored in the `$inventory.jobs` table.\n7. `assess_clusters`: This task scans through all the clusters and identifies those that are not compatible with UC. The list of all the clusters is stored in the `$inventory.clusters` table.\n8. `assess_pipelines`: This task scans through all the Pipelines and identifies those pipelines that have Azure Service Principals embedded in their configurations. A list of all the pipelines with matching configurations is stored in the `$inventory.pipelines` table.\n9. `assess_azure_service_principals`: This task scans through all the clusters configurations, cluster policies, job cluster configurations, Pipeline configurations, and Warehouse configuration and identifies all the Azure Service Principals who have been given access to the Azure storage accounts via spark configurations referred in those entities. The list of all the Azure Service Principals referred in those configurations is saved in the `$inventory.azure_service_principals` table.\n10. `assess_global_init_scripts`: This task scans through all the global init scripts and identifies if there is an Azure Service Principal who has been given access to the Azure storage accounts via spark configurations referred in those scripts.\n\n![report](docs/assessment-report.png)\n\nAfter UCX assessment workflow is executed, the assessment dashboard will be populated with findings and common recommendations. See [this guide](docs/assessment.md) for more details.\n\n[[back to top](#databricks-labs-ucx)]\n\n## Group migration workflow\n\nYou are required to complete the [assessment workflow](#assessment-workflow) before starting the group migration workflow.\nSee the [migration process diagram](#migration-process) to understand the role of the group migration workflow in the migration process.\n\nSee the [detailed design](docs/local-group-migration.md) of this workflow. It helps you to upgrade all Databricks workspace assets:\nLegacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, \nDatabricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries,\nSQL Alerts, Token and Password usage permissions that are set on the workspace level, Secret scopes, Notebooks,\nDirectories, Repos, and Files. \n\nOnce done with the group migration, proceed to [table migration workflow](#table-migration-workflow).\n\nUse [`validate-groups-membership` command](#validate-groups-membership-command) for extra confidence.\nIf you don't have matching account groups, please run [`create-account-groups` command](#create-account-groups-command).\n\nThe group migration workflow is designed to migrate workspace-local groups to account-level groups in the Unity Catalog (UC) environment. It ensures that all the necessary groups are available in the workspace with the correct permissions, and removes any unnecessary groups and permissions. The tasks in the group migration workflow depend on the output of the assessment workflow and can be executed in sequence to ensure a successful migration. The output of each task is stored in Delta tables in the `$inventory_database` schema, which can be used for further analysis and decision-making. The group migration workflow can be executed multiple times to ensure that all the groups are migrated successfully and that all the necessary permissions are assigned.\n\n1. `crawl_groups`: This task scans all groups for the local group migration scope.\n2. `rename_workspace_local_groups`: This task renames workspace local groups by adding a `ucx-renamed-` prefix. This step is taken to avoid conflicts with account-level groups that may have the same name as workspace-local groups.\n3. `reflect_account_groups_on_workspace`: This task adds matching account groups to this workspace. The matching account level group(s) must preexist(s) for this step to be successful. This step is necessary to ensure that the account-level groups are available in the workspace for assigning permissions.\n4. `apply_permissions_to_account_groups`: This task assigns the full set of permissions of the original group to the account-level one. It covers local workspace-local permissions for all entities, including Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries, SQL Alerts, Token and Password usage permissions, Secret Scopes, Notebooks, Directories, Repos, Files. This step is necessary to ensure that the account-level groups have the necessary permissions to manage the entities in the workspace.\n5. `validate_groups_permissions`: This task validates that all the crawled permissions are applied correctly to the destination groups.\n6. `delete_backup_groups`: This task removes all workspace-level backup groups, along with their permissions. This should only be executed after confirming that the workspace-local migration worked successfully for all the groups involved. This step is necessary to clean up the workspace and remove any unnecessary groups and permissions.\n\n[[back to top](#databricks-labs-ucx)]\n\n## Debug notebook\n\n![debug](docs/debug-notebook.png)\n\nEvery [installation](#installation) creates a debug notebook, that initializes UCX as a library,\nso that you can implement missing features and \n\n[[back to top](#databricks-labs-ucx)]\n\n## Debug logs\n\n![debug](docs/debug-logs.png)\n\nEvery workflow run stores debug logs in the `logs` folder of the [installation](#installation). \nFor tasks shorter than 10 minutes, they appear after task finish, whereas longer-running tasks \nflush the logs every 10 minutes.\n\nTo enable debug logs of [command-line interface](#authenticate-databricks-cli), \nsimply add `--debug` flag to any command.\n\n[[back to top](#databricks-labs-ucx)]\n\n## Table Migration Workflow\n\nThe table migration workflow comprises multiple workflows and tasks designed to migrate tables from the Hive Metastore to the Unity Catalog. The subsequent diagram illustrates the workflow's tasks and their dependency CLI commands. \n```mermaid\nflowchart TB\n    subgraph CLI\n      sync-workspace-info[sync-workspace-info] --> create_table_mapping[create-table-mapping]\n      create_table_mapping[create-table-mapping] --> create_catalogs_schemas[create-catalogs-schemas]\n      create_uber_principal[create-uber-principal]\n      principal_prefix_access[principal-prefix-access] --> migrate_credentials[migrate-credentials]\n      migrate_credentials --> migrate_locations[migrate-locations]\n      migrate_locations --> create_catalogs_schemas\n    end\n    \n    create_uber_principal --> workflow\n    create_table_mapping --> workflow\n    create_catalogs_schemas --> workflow\n    \n    subgraph workflow[Table Migration Workflows]\n      subgraph mt_workflow[workflow: migrate-tables]\n        dbfs_root_delta_mt_task[migrate_dbfs_root_delta_tables]\n        external_tables_sync_mt_task[migrate_external_tables_sync]\n        view_mt_task[roadmap: migrate_views]\n        dbfs_root_delta_mt_task --> view_mt_task\n        external_tables_sync_mt_task --> view_mt_task\n      end\n      \n      subgraph mt_ctas_wf[roadmap workflow: migrate-tables-ctas]\n        ctas_mt_task[migrate_tables_ctas] --> view_mt_task_ctas[roadmap: migrate_views]\n      end\n  \n      subgraph mt_serde_inplace_wf[roadmap workflow: migrate-external-hiveserde-tables-in-place-experimental]\n        serde_inplace_mt_task[migrate_external_hiveserde_tables_in_place_experimental] --> view_mt_task_inplace[roadmap: migrate_views]\n      end\n  \n      subgraph mt_in_mounts_wf[roadmap workflow: migrate-tables-in-mounts-experimental]\n        scan_tables_in_mounts_experimental_task[scan_tables_in_mounts_experimental] --> \n        migrate_tables_in_mounts_experimental[migrate_tables_in_mounts_experimental]\n      end\n    end\n    \n    classDef roadmap stroke:Green,stroke-width:2px,color:Green,stroke-dasharray: 8 3\n    class view_mt_task roadmap;\n    class mt_ctas_wf,ctas_mt_task,view_mt_task_ctas roadmap;\n    class mt_serde_inplace_wf,serde_inplace_mt_task,view_mt_task_inplace roadmap;\n    class mt_in_mounts_wf,scan_tables_in_mounts_experimental_task,migrate_tables_in_mounts_experimental roadmap;\n```\nMore details can be found in the [design of table migration](docs/table_upgrade.md)\n\n### Dependency CLI commands\n- [`create-table-mapping`](#create-table-mapping-command) - Create `mapping.csv` which will be used by the workflow to identify the targets catalog, schema, table of the HMS table to be migrated. User should review and update the mapping file accordingly before proceeding with the migration workflow.\n- [`principal-prefix-access`](#principal-prefix-access-command) - Identify all the storages used in the workspace and corresponding Azure Service Principal or IAM role that are used in the workspace. It outputs `azure_storage_account_info.csv` or `uc_roles_access.csv` which will be later used by [`migrate-credentials`](#migrate-credentials-command) command to create UC storage credentials. The csv can be edited to control which IAM roles or Azure Service Principals should be used to create UC storage credentials later.\n- [`migrate-credentials`](#migrate-credentials-command) - Create UC storage credentials based on the Azure Service Principal or IAM role (`azure_storage_account_info.csv` or `uc_roles_access.csv`) identified by [`principal-prefix-access`](#principal-prefix-access-command) command.\n- [`migrate-locations`](#migrate-locations-command) - Create missing external locations in the Unity Catalog. \n- [`create-catalogs-schemas`](#create-catalogs-schemas-command) - Create missing catalogs and schemas in the Unity Catalog. The candidate catalogs and schemas is based on `mapping.csv`\n- [`create-uber-principal`](#create-uber-principal-command) - Create an Uber Principal with access to all storages used in the workspace. This principal will be used by the workflow job cluster to migrate all the tables in the workspace.\n\n[`create-table-mapping`](#create-table-mapping-command) and [`create-uber-principal`](#create-uber-principal-command) are required to run the workflow. While other commands are optional, as long as the UC storage credentials, external locations, catalogs and schemas needed for successful migration are created.\n\nTo control the scope of the table migration, consider utilizing a combination of editing `mapping.csv`, employing [`skip`](#skip-command) command, and [`revert-migrated-tables`](#revert-migrated-tables-command) command.\n\nSee more details in [Table migration commands](#table-migration-commands)\n\n### Table Migration Workflow Tasks\n- `migrate_dbfs_root_delta_tables` - Migrate delta tables from the DBFS root using deep clone, along with legacy table ACL migrated if any.\n- `migrate_external_tables_sync` - Migrate external tables using [`SYNC`](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-sync.html) command, along with legacy table ACL migrated if any.\n- Following workflows/tasks are on the roadmap and being developed:\n  - Migrate view\n  - Migrate tables using CTAS\n  - Optionally and experimentally in place migrate ParquetHiveSerDe, OrcSerde, AvroSerDe, LazySimpleSerDe, JsonSerDe, OpenCSVSerde tables.\n  - Experimentally migrate Delta and Parquet data found in dbfs mount but not registered as Hive Metastore table into UC tables.\n\n### Other considerations\n- You may need to run the workflow multiple times to ensure all the tables are migrated successfully in phases.\n- If your delta tables in DBFS root have large number of files, consider:\n  - Setting higher Min and Max workers for auto-scale when being asked during the UCX installation. More nodes/cores in the cluster means more concurrency for calling cloud storage api to copy files when deep cloning the delta tables.\n  - Setting higher \"Parallelism for migrating dbfs root delta tables with deep clone\" (default 200) when being asked during the UCX installation. This controls the number of Spark task/partition to be created for deep clone.\n- Consider create an instance pool, and set the instance pool id when being asked during the UCX installation. This instance pool will be put into the cluster policy used by all UCX workflows job clusters.\n- You may also manually edit the job cluster configration per job or per task after the workflows are deployed.\n\n[[back to top](#databricks-labs-ucx)]\n\n# Utility commands\n\n## `logs` command\n\n```text\n$ databricks labs ucx logs [--workflow WORKFLOW_NAME] [--debug]\n```\n\nThis command displays the logs of the last run of the specified workflow. If no workflow is specified, it displays \nthe logs of the workflow that was run the last. This command is useful for developers and administrators who want to \ncheck the logs of the last run of a workflow and ensure that it was executed as expected. It can also be used for \ndebugging purposes when a workflow is not behaving as expected. By default, only `INFO`, `WARNING`, and `ERROR` logs\nare displayed. To display `DEBUG` logs, use the `--debug` flag.\n\n[[back to top](#databricks-labs-ucx)] \n\n## `ensure-assessment-run` command\n\n```commandline\ndatabricks labs ucx ensure-assessment-run\n```\n\nThis command ensures that the [assessment workflow](#assessment-workflow) was run on a workspace. \nThis command will block until job finishes.\nFailed workflows can be fixed with the [`repair-run` command](#repair-run-command). Workflows and their status can be \nlisted with the [`workflows` command](#workflows-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `repair-run` command\n\n```commandline\ndatabricks labs ucx repair-run --step WORKFLOW_NAME\n```\n\nThis command repairs a failed [UCX Workflow](#workflows). This command is useful for developers and administrators who \nwant to repair a failed job. It can also be used to debug issues related to job failures. This operation can also be \ndone via [user interface](https://docs.databricks.com/en/workflows/jobs/repair-job-failures.html). Workflows and their \nstatus can be listed with the [`workflows` command](#workflows-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `workflows` command\n\nSee the [migration process diagram](#migration-process) to understand the role of each workflow in the migration process.\n\n```text\n$ databricks labs ucx workflows\nStep                                  State    Started\nassessment                            RUNNING  1 hour 2 minutes ago\n099-destroy-schema                    UNKNOWN  <never run>\nmigrate-groups                        UNKNOWN  <never run>\nremove-workspace-local-backup-groups  UNKNOWN  <never run>\nvalidate-groups-permissions           UNKNOWN  <never run>\n```\n\nThis command displays the [deployed workflows](#workflows) and their state in the current workspace. It fetches the latest \njob status from the workspace and prints it in a table format. This command is useful for developers and administrators \nwho want to check the status of UCX workflows and ensure that they have been executed as expected. It can also be used \nfor debugging purposes when a workflow is not behaving as expected. Failed workflows can be fixed with \nthe [`repair-run` command](#repair-run-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `open-remote-config` command\n\n```commandline\ndatabricks labs ucx open-remote-config\n```\n\nThis command opens the remote configuration file in the default web browser. It generates a link to the configuration file \nand opens it using the `webbrowser.open()` method. This command is useful for developers and administrators who want to view or \nedit the remote configuration file without having to manually navigate to it in the workspace. It can also be used to quickly \naccess the configuration file from the command line. Here's the description of configuration properties:\n\n  * `inventory_database`: A string representing the name of the inventory database.\n  * `workspace_group_regex`: An optional string representing the regular expression to match workspace group names.\n  * `workspace_group_replace`: An optional string to replace the matched group names with.\n  * `account_group_regex`: An optional string representing the regular expression to match account group names.\n  * `group_match_by_external_id`: A boolean value indicating whether to match groups by their external IDs.\n  * `include_group_names`: An optional list of strings representing the names of groups to include for migration.\n  * `renamed_group_prefix`: An optional string representing the prefix to add to renamed group names.\n  * `instance_pool_id`: An optional string representing the ID of the instance pool.\n  * `warehouse_id`: An optional string representing the ID of the warehouse.\n  * `connect`: An optional `Config` object representing the configuration for connecting to the warehouse.\n  * `num_threads`: An optional integer representing the number of threads to use for migration.\n  * `database_to_catalog_mapping`: An optional dictionary mapping source database names to target catalog names.\n  * `default_catalog`: An optional string representing the default catalog name.\n  * `log_level`: An optional string representing the log level.\n  * `workspace_start_path`: A string representing the starting path for notebooks and directories crawler in the workspace.\n  * `instance_profile`: An optional string representing the name of the instance profile.\n  * `spark_conf`: An optional dictionary of Spark configuration properties.\n  * `override_clusters`: An optional dictionary mapping job cluster names to existing cluster IDs.\n  * `policy_id`: An optional string representing the ID of the cluster policy.\n  * `is_terraform_used`: A boolean value indicating whether some workspace resources are managed by Terraform.\n  * `include_databases`: An optional list of strings representing the names of databases to include for migration.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `installations` command\n\n```text\n$ databricks labs ucx installations \n...\n13:49:16  INFO [d.labs.ucx] Fetching installations...\n13:49:17  INFO [d.l.blueprint.parallel][finding_ucx_installations_5] finding ucx installations 10/88, rps: 22.838/sec\n13:49:17  INFO [d.l.blueprint.parallel][finding_ucx_installations_9] finding ucx installations 20/88, rps: 35.002/sec\n13:49:17  INFO [d.l.blueprint.parallel][finding_ucx_installations_2] finding ucx installations 30/88, rps: 51.556/sec\n13:49:18  INFO [d.l.blueprint.parallel][finding_ucx_installations_9] finding ucx installations 40/88, rps: 56.272/sec\n13:49:18  INFO [d.l.blueprint.parallel][finding_ucx_installations_19] finding ucx installations 50/88, rps: 67.382/sec\n...\nPath                                      Database  Warehouse\n/Users/serge.smertin@databricks.com/.ucx  ucx       675eaf1ff976aa98\n```\n\nThis command displays the [installations](#installation) by different users on the same workspace. It fetches all \nthe installations where the `ucx` package is installed and prints their details in JSON format. This command is useful \nfor administrators who want to see which users have installed `ucx` and where. It can also be used to debug issues \nrelated to multiple installations of `ucx` on the same workspace.\n\n[[back to top](#databricks-labs-ucx)]\n\n# Metastore related commands\n\nThese commands are used to assign a Unity Catalog metastore to a workspace. The metastore assignment is a pre-requisite\nfor any further migration steps.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `show-all-metastores` command\n\n```text\ndatabricks labs ucx show-all-metastores [--workspace-id <workspace-id>]\n```\n\nThis command lists all the metastores available to be assigned to a workspace. If no workspace is specified, it lists\nall the metastores available in the account. This command is useful when there are multiple metastores available within\na region and you want to see which ones are available for assignment.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `assign-metastore` command\n\n```text\ndatabricks labs ucx assign-metastore --workspace-id <workspace-id> [--metastore-id <metastore-id>]\n```\n\nThis command assigns a metastore to a workspace with `workspace-id`. If there is only a single metastore in the workspace\nregion, it will be automatically assigned to the workspace. If there are multiple metastores available, you need to specify\nthe metastore id of the metastore you want to assign to the workspace.\n\n# Table migration commands\n\nThese commands are vital part of [table migration workflow](#table-migration-workflow) process and require \nthe [assessment workflow](#assessment-workflow) and \n[group migration workflow](#group-migration-workflow) to be completed. \nSee the [migration process diagram](#migration-process) to understand the role of the table migration commands in \nthe migration process. \n\nThe first step is to run the [`principal-prefix-access` command](#principal-prefix-access-command) to identify all \nthe storage accounts used by tables in the workspace and their permissions on each storage account.\n\nIf you don't have any storage credentials and external locations configured,  you'll need to run \nthe [`migrate-credentials` command](#migrate-credentials-command) to migrate the service principals \nand [`migrate-locations` command](#migrate-locations-command) to create the external locations. \nIf some of the external locations already exist, you should run \nthe [`validate-external-locations` command](#validate-external-locations-command).\nYou'll need to create the [uber principal](#create-uber-principal-command) with the _**access to all storage**_ used to tables in \nthe workspace, so that you can migrate all the tables. If you already have the principal, you can skip this step.\n\nAsk your Databricks Account admin to run the [`sync-workspace-info` command](#sync-workspace-info-command) to sync the\nworkspace information with the UCX installations. Once the workspace information is synced, you can run the \n[`create-table-mapping` command](#create-table-mapping-command) to align your tables with the Unity Catalog,\n[create catalogs and schemas](#create-catalogs-schemas-command) and start the migration. During multiple runs of\nthe table migration workflow, you can use the [`revert-migrated-tables` command](#revert-migrated-tables-command) to \nrevert the tables that were migrated in the previous run. You can also skip the tables that you don't want to migrate \nusing the [`skip` command](#skip-command).\n\nOnce you're done with the table migration, proceed to the [code migration](#code-migration-commands).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `principal-prefix-access` command\n\n```text\ndatabricks labs ucx principal-prefix-access [--subscription-id <Azure Subscription ID>] [--aws-profile <AWS CLI profile>]\n```\n\nThis command depends on results from the [assessment workflow](#assessment-workflow) and requires [AWS CLI](#access-for-aws-s3-buckets) \nor [Azure CLI](#access-for-azure-storage-accounts) to be installed and authenticated for the given machine. This command\nidentifies all the storage accounts used by tables in the workspace and their permissions on each storage account. \nOnce you're done running this command, proceed to the [`migrate-credentials` command](#migrate-credentials-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n### Access for AWS S3 Buckets\n\n```commandline\ndatabricks labs ucx principal-prefix-access --aws-profile test-profile\n```\n\nUse to identify all instance profiles in the workspace, and map their access to S3 buckets. \nAlso captures the IAM roles which has UC arn listed, and map their access to S3 buckets \nThis requires `aws` CLI to be installed and configured. \n\nOnce done, proceed to the [`migrate-credentials` command](#migrate-credentials-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n### Access for Azure Storage Accounts\n\n```commandline\ndatabricks labs ucx principal-prefix-access --subscription-id test-subscription-id\n```\n\nUse to identify all storage account used by tables, identify the relevant Azure service principals and their permissions \non each storage account. This requires Azure CLI to be installed and configured via `az login`. \n\nOnce done, proceed to the [`migrate-credentials` command](#migrate-credentials-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `create-uber-principal` command\n\n```text\ndatabricks labs ucx create-uber-principal [--subscription-id X]\n```\n\n**Requires Cloud IAM admin privileges.** Once the [`assessment` workflow](#assessment-workflow) complete, you should run \nthis command to creates a service principal with the _**read-only access to all storage**_ used by tables in this \nworkspace and configure the [UCX Cluster Policy](#installation) with the details of it. Once migration is complete, this\nservice principal should be unprovisioned. On Azure, it creates a principal with `Storage Blob Data Reader` role \nassignment on every storage account using Azure Resource Manager APIs.\n\nThis command is one of prerequisites for the [table migration workflow](#table-migration-workflow).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `migrate-credentials` command\n\n```commandline\ndatabricks labs ucx migrate-credentials\n```\n\nFor Azure, this command migrate Azure Service Principals, which have `Storage Blob Data Contributor`,\n`Storage Blob Data Reader`, `Storage Blob Data Owner` roles on ADLS Gen2 locations that are being used in\nDatabricks, to UC storage credentials. The Azure Service Principals to location mapping are listed \nin `/Users/{user_name}/.ucx/azure_storage_account_info.csv` which is generated \nby [`principal-prefix-access` command](#principal-prefix-access-command). \nPlease review the file and delete the Service Principals you do not want to be migrated.\nThe command will only migrate the Service Principals that have client secret stored in Databricks Secret.\n\nOnce you're done with this command, run [`validate-external-locations` command](#validate-external-locations-command) after this one.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `validate-external-locations` command\n\n```text\ndatabricks labs ucx validate-external-locations \n```\n\nOnce the [`assessment` workflow](#assessment-workflow) finished successfully, [storage credentials](#migrate-credentials-command) are configured, \nrun this command to validate and report the missing Unity Catalog external locations to be created. \n\nThis command validates and provides mapping to external tables to external locations, also as Terraform configurations.\n\nOnce you're done with this command, proceed to the [`migrate-locations` command](#migrate-locations-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `migrate-locations` command\n\n```text\ndatabricks labs ucx migrate-locations \n```\n\nOnce the [`assessment` workflow](#assessment-workflow) finished successfully, and [storage credentials](#migrate-credentials-command) are configured,\nrun this command to have Unity Catalog external locations created. The candidate locations to be created are extracted from guess_external_locations\ntask in the assessment job. You can run [`validate-external-locations` command](#validate-external-locations-command) to check the candidate locations.\n\nOnce you're done with this command, proceed to the [`create-table-mapping` command](#create-table-mapping-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `create-table-mapping` command\n\n```text\ndatabricks labs ucx create-table-mapping \n```\n\nOnce the [`assessment` workflow](#assessment-workflow) finished successfully \n[workspace info is synchronized](#sync-workspace-info-command), run this command to create the initial \ntable mapping for review in CSV format in the Databricks Workspace:\n\n```text\nworkspace_name,catalog_name,src_schema,dst_schema,src_table,dst_table\nlabs-azure,labs_azure,default,default,ucx_tybzs,ucx_tybzs\n```\n\nYou are supposed to review this mapping and adjust it if necessary. This file is in CSV format, so that you can edit it \neasier in your favorite spreadsheet application.\n\nOnce you're done with this command, [create catalogs and schemas](#create-catalogs-schemas-command). During \nmultiple runs of the table migration workflow, you can use the [`revert-migrated-tables` command](#revert-migrated-tables-command)\nto revert the tables that were migrated in the previous run. You can also skip the tables that you don't want to migrate\nusing the [`skip` command](#skip-command).\n\nThis command is one of prerequisites for the [table migration workflow](#table-migration-workflow).\n\nOnce you're done with table migration, proceed to the [code migration](#code-migration-commands).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `skip` command\n\n```text\ndatabricks labs ucx skip --schema X [--table Y]  \n```\n\nAnywhere after [`create-table-mapping` command](#create-table-mapping-command) is executed, you can run this command.\n\nThis command allows users to skip certain schemas or tables during the [table migration](#table-migration-workflow) process.\nThe command takes `--schema` and optionally `--table` flags to specify the schema and table to skip. If no `--table` flag \nis provided, all tables in the specified HMS database are skipped.\nThis command is useful to temporarily disable migration on a particular schema or table.\n\nOnce you're done with table migration, proceed to the [code migration](#code-migration-commands).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `revert-migrated-tables` command\n\n```text\ndatabricks labs ucx revert-migrated-tables --schema X --table Y [--delete-managed]  \n```\n\nAnywhere after [`create-table-mapping` command](#create-table-mapping-command) is executed, you can run this command.\n\nThis command removes the `upgraded_from` property on a migrated table for re-migration in the [table migration](#table-migration-workflow) process. \nThis command is useful for developers and administrators who want to revert the migration of a table. It can also be used \nto debug issues related to table migration.\n\nGo back to the [`create-table-mapping` command](#create-table-mapping-command) after you're done with this command.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `create-catalogs-schemas` command\n\n```text\ndatabricks labs ucx create-catalogs-schemas\n```\nAfter [`create-table-mapping` command](#create-table-mapping-command) is executed, you can run this command to have the required UC catalogs and schemas created.\nThis command is supposed to be run before migrating tables to UC using [table migration workflow](#table-migration-workflow).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `move` command\n\n```text\ndatabricks labs ucx move --from-catalog A --from-schema B --from-table C --to-catalog D --to-schema E  \n```\n\nThis command moves a UC table/tables from one schema to another schema after\nthe [table migration](#table-migration-workflow) process. This is useful for developers and administrators who want \nto adjust their catalog structure after tables upgrade.\n\nUsers will be prompted whether tables/view are dropped after moving to new schema. This only applies to `MANAGED` tables and views.\n\nThis command moves different table types differently:\n- `MANAGED` tables are deep-cloned to the new schema. \n- `EXTERNAL` tables are dropped from the original schema, then created in the target schema using the same location. \nThis is due to Unity Catalog not supporting multiple tables with overlapping paths\n- `VIEW` are recreated using the same view definition.\n\nThis command supports moving multiple tables at once, by specifying `*` as the table name.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `alias` command\n\n```text\ndatabricks labs ucx alias --from-catalog A --from-schema B --from-table C --to-catalog D --to-schema E  \n```\n\nThis command aliases a UC table/tables from one schema to another schema in the same or different catalog. \nIt takes a `WorkspaceClient` object and `from` and `to` parameters as parameters and aliases the tables using \nthe `TableMove` class. This command is useful for developers and administrators who want to create an alias for a table. \nIt can also be used to debug issues related to table aliasing.\n\n[[back to top](#databricks-labs-ucx)]\n\n# Code migration commands\n\nSee the [migration process diagram](#migration-process) to understand the role of the code migration commands in the migration process.\n\nAfter you're done with the [table migration](#table-migration-workflow), you can proceed to the code migration.\n\nOnce you're done with the code migration, you can run the [`cluster-remap` command](#cluster-remap-command) to remap the\nclusters to be UC compatible.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `migrate-local-code` command\n\n```text\ndatabricks labs ucx migrate-local-code\n```\n\n**(Experimental)** Once [table migration](#table-migration-workflow) is complete, you can run this command to \nmigrate all python and SQL files in the current working directory. This command is highly experimental and\nat the moment only supports Python and SQL files and discards code comments and formatting during \nthe automated transformation process.\n\n[[back to top](#databricks-labs-ucx)]\n\n# Cross-workspace installations\n\nWhen installing UCX across multiple workspaces, administrators need to keep UCX configurations in sync.\nUCX will prompt you to select an account profile that has been defined in `~/.databrickscfg`. If you don't have one,\nauthenticate your machine with:\n\n* `databricks auth login --host https://accounts.cloud.databricks.com/` (AWS)\n* `databricks auth login --host https://accounts.azuredatabricks.net/` (Azure)\n\nAsk your Databricks Account admin to run the [`sync-workspace-info` command](#sync-workspace-info-command) to sync the\nworkspace information with the UCX installations. Once the workspace information is synced, you can run the \n[`create-table-mapping` command](#create-table-mapping-command) to align your tables with the Unity Catalog.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `sync-workspace-info` command\n\n```text\ndatabricks labs ucx sync-workspace-info \n14:07:07  INFO [databricks.sdk] Using Azure CLI authentication with AAD tokens\n14:07:07  INFO [d.labs.ucx] Account ID: ...\n14:07:10  INFO [d.l.blueprint.parallel][finding_ucx_installations_16] finding ucx installations 10/88, rps: 16.415/sec\n14:07:10  INFO [d.l.blueprint.parallel][finding_ucx_installations_0] finding ucx installations 20/88, rps: 32.110/sec\n14:07:11  INFO [d.l.blueprint.parallel][finding_ucx_installations_18] finding ucx installations 30/88, rps: 39.786/sec\n...\n```\n\n**Requires Databricks Account Administrator privileges.** This command uploads the workspace config to all workspaces \nin the account where `ucx` is installed. This command is necessary to create an immutable default catalog mapping for\n[table migration](#table-migration-workflow) process and is the prerequisite \nfor [`create-table-mapping` command](#create-table-mapping-command).\n\nIf you cannot get account administrator privileges in reasonable time, you can take the risk and \nrun [`manual-workspace-info` command](#manual-workspace-info-command) to enter Databricks Workspace IDs and Databricks \nWorkspace names. \n\n[[back to top](#databricks-labs-ucx)]\n\n## `manual-workspace-info` command\n\n```text\n$ databricks labs ucx manual-workspace-info\n14:20:36  WARN [d.l.ucx.account] You are strongly recommended to run \"databricks labs ucx sync-workspace-info\" by account admin,\n ... otherwise there is a significant risk of inconsistencies between different workspaces. This command will overwrite all UCX \n ... installations on this given workspace. Result may be consistent only within https://adb-987654321.10.azuredatabricks.net\nWorkspace name for 987654321 (default: workspace-987654321): labs-workspace\nNext workspace id (default: stop): 12345\nWorkspace name for 12345 (default: workspace-12345): other-workspace\nNext workspace id (default: stop): \n14:21:19  INFO [d.l.blueprint.parallel][finding_ucx_installations_11] finding ucx installations 10/89, rps: 24.577/sec\n14:21:19  INFO [d.l.blueprint.parallel][finding_ucx_installations_15] finding ucx installations 20/89, rps: 48.305/sec\n...\n14:21:20  INFO [d.l.ucx.account] Synchronised workspace id mapping for installations on current workspace\n```\n\nThis command is only supposed to be run if the [`sync-workspace-info` command](#sync-workspace-info-command) cannot be \nrun. It prompts the user to enter the required information manually and creates the workspace info. This command is \nuseful for workspace administrators who are unable to use the `sync-workspace-info` command, because they are not \nDatabricks Account Administrators. It can also be used to manually create the workspace info in a new workspace.\n\n[[back to top](#databricks-labs-ucx)]\n\n## `create-account-groups` command\n\n```text\n$ databricks labs ucx create-account-groups [--workspace-ids 123,456,789]\n```\n\n**Requires Databricks Account Administrator privileges.** This command creates account-level groups if a workspace local \ngroup is not present in the account. It crawls all workspaces configured in `--workspace-ids` flag, then creates \naccount level groups if a WS local group is not present in the account. If `--workspace-ids` flag is not specified, UCX \nwill create account groups for all workspaces configured in the account.\n\nThe following scenarios are supported, if a group X:\n- Exist in workspaces A,B,C, and it has same members in there, it will be created in the account\n- Exist in workspaces A,B but not in C, it will be created in the account\n- Exist in workspaces A,B,C. It has same members in A,B, but not in C. Then, X and C_X will be created in the account\n\nThis command is useful for the setups, that don't have SCIM provisioning in place.\n\nOnce you're done with this command, proceed to the [group migration workflow](#group-migration-workflow).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `validate-groups-membership` command\n\n```text\n$ databricks labs ucx validate-groups-membership\n...\n14:30:36  INFO [d.l.u.workspace_access.groups] Found 483 account groups\n14:30:36  INFO [d.l.u.workspace_access.groups] No group listing provided, all matching groups will be migrated\n14:30:36  INFO [d.l.u.workspace_access.groups] There are no groups with different membership between account and workspace\nWorkspace Group Name  Members Count  Account Group Name  Members Count  Difference\n```\n\nThis command validates the groups to see if the groups at the account level and workspace level have different membership.\nThis command is useful for administrators who want to ensure that the groups have the correct membership. It can also be\nused to debug issues related to group membership. See [group migration](docs/local-group-migration.md) and \n[group migration](#group-migration-workflow) for more details.\n\nValid group membership is important to ensure users has correct access after legacy table ACL is migrated in [table migration workflow](#table-migration-workflow)\n\n[[back to top](#databricks-labs-ucx)]\n\n## `cluster-remap` command\n\n```text\n$ databricks labs ucx cluster-remap \n21:29:38  INFO [d.labs.ucx] Remapping the Clusters to UC\nCluster Name                                            Cluster Id                                        \nField Eng Shared UC LTS Cluster                         0601-182128-dcbte59m                              \nShared Autoscaling Americas cluster                     0329-145545-rugby794\n```\n```text\nPlease provide the cluster id's as comma separated value from the above list (default: <ALL>): \n```\n\nOnce you're done with the [code migration](#code-migration-commands), you can run this command to remap the clusters to UC enabled.\n\nThis command will remap the cluster to uc enabled one. When we run this command it will list all the clusters \nand its id's and asks to provide the cluster id's as comma separated value which has to be remapped, by default it will take all cluster ids.\nOnce we provide the cluster id's it will update these clusters to UC enabled.Back up of the existing cluster \nconfig will be stored in backup folder inside the installed location(backup/clusters/cluster_id.json) as a json file.This will help \nto revert the cluster remapping.\n\nYou can revert the cluster remapping using the [`revert-cluster-remap` command](#revert-cluster-remap-command).\n\n[[back to top](#databricks-labs-ucx)]\n\n## `revert-cluster-remap` command\n\n```text\n$ databricks labs ucx revert-cluster-remap\n21:31:29  INFO [d.labs.ucx] Reverting the Remapping of the Clusters from UC\n21:31:33  INFO [d.labs.ucx] 0301-055912-4ske39iq\n21:31:33  INFO [d.labs.ucx] 0306-121015-v1llqff6\nPlease provide the cluster id's as comma separated value from the above list (default: <ALL>):\n```\n\nIf a customer want's to revert the cluster remap done using the [`cluster-remap` command](#cluster-remap-command) they can use this command to revert \nits configuration from UC to original one.It will iterate through the list of clusters from the back up folder and reverts the \ncluster configurations to original one.This will also ask the user to provide the list of clusters that has to be reverted as a prompt.\nBy default, it will revert all the clusters present in the backup folder\n\n[[back to top](#databricks-labs-ucx)]\n\n# Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=databrickslabs/ucx&type=Date)](https://star-history.com/#databrickslabs/ucx)\n\n[[back to top](#databricks-labs-ucx)]\n\n# Project Support\nPlease note that all projects in the databrickslabs GitHub account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs).  They are provided AS-IS, and we do not make any guarantees of any kind.  Please do not submit a support ticket relating to any issues arising from the use of these projects.\n\nAny issues discovered through the use of this project should be filed as GitHub Issues on the Repo.  They will be reviewed as time permits, but there are no formal SLAs for support.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "UCX - Unity Catalog Migration Toolkit",
    "version": "0.22.0",
    "project_urls": {
        "Issues": "https://github.com/databricks/ucx/issues",
        "Source": "https://github.com/databricks/ucx"
    },
    "split_keywords": [
        "databricks",
        " unity catalog"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3419dcaefc6d67528e70c29d49547162c52e7bf2a45f4c83ac8a72c38a10a3a0",
                "md5": "284458cd0153d9a41b777d5eb2ebaf12",
                "sha256": "86bc2cc7ceadf75195098098714887168bc59430e7a8a4e089b31ab5a1c3a01e"
            },
            "downloads": -1,
            "filename": "databricks_labs_ucx-0.22.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "284458cd0153d9a41b777d5eb2ebaf12",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 262360,
            "upload_time": "2024-04-26T15:20:52",
            "upload_time_iso_8601": "2024-04-26T15:20:52.791331Z",
            "url": "https://files.pythonhosted.org/packages/34/19/dcaefc6d67528e70c29d49547162c52e7bf2a45f4c83ac8a72c38a10a3a0/databricks_labs_ucx-0.22.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03abff5dde4aa9e15ab63dbd3428b3e35b6d69821d235282951d5751b131922d",
                "md5": "18f1504302d0e29db9933d1c50699d98",
                "sha256": "bc79b9b0c2aa216442f34dd018a29f46b608acb975be1f5fb4f4ba151c180a98"
            },
            "downloads": -1,
            "filename": "databricks_labs_ucx-0.22.0.tar.gz",
            "has_sig": false,
            "md5_digest": "18f1504302d0e29db9933d1c50699d98",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 211895,
            "upload_time": "2024-04-26T15:20:54",
            "upload_time_iso_8601": "2024-04-26T15:20:54.913082Z",
            "url": "https://files.pythonhosted.org/packages/03/ab/ff5dde4aa9e15ab63dbd3428b3e35b6d69821d235282951d5751b131922d/databricks_labs_ucx-0.22.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-26 15:20:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "databricks",
    "github_project": "ucx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "databricks-labs-ucx"
}
        
Elapsed time: 0.24920s