databricks-labs-blueprint


Namedatabricks-labs-blueprint JSON
Version 0.9.3 PyPI version JSON
download
home_pageNone
SummaryCommon libraries for Databricks Labs
upload_time2024-11-14 13:33:23
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords databricks
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- FOR CONTRIBUTORS: Edit this file in Visual Studio Code with the recommended extensions, so that we update the table of contents automatically -->
Databricks Labs Blueprint
===

[![python](https://img.shields.io/badge/python-3.10,%203.11,%203.12-green)](https://github.com/databrickslabs/blueprint/actions/workflows/push.yml)
[![codecov](https://codecov.io/github/databrickslabs/blueprint/graph/badge.svg?token=x1JSVddfZa)](https://codecov.io/github/databrickslabs/blueprint) [![lines of code](https://tokei.rs/b1/github/databrickslabs/blueprint)]([https://codecov.io/github/databrickslabs/blueprint](https://github.com/databrickslabs/blueprint))


Baseline for Databricks Labs projects written in Python. Sources are validated with `mypy` and `pylint`. See [Contributing instructions](CONTRIBUTING.md) if you would like to improve this project.

<!-- TOC -->
* [Databricks Labs Blueprint](#databricks-labs-blueprint)
* [Installation](#installation)
* [Batteries Included](#batteries-included)
  * [Python-native `pathlib.Path`-like interfaces](#python-native-pathlibpath-like-interfaces)
    * [Working With User Home Folders](#working-with-user-home-folders)
    * [Relative File Paths](#relative-file-paths)
    * [Browser URLs for Workspace Paths](#browser-urls-for-workspace-paths)
    * [`read/write_text()`, `read/write_bytes()`, and `glob()` Methods](#readwrite_text-readwrite_bytes-and-glob-methods)
    * [Moving Files](#moving-files)
    * [Working With Notebook Sources](#working-with-notebook-sources)
  * [Basic Terminal User Interface (TUI) Primitives](#basic-terminal-user-interface-tui-primitives)
    * [Simple Text Questions](#simple-text-questions)
    * [Confirming Actions](#confirming-actions)
    * [Single Choice from List](#single-choice-from-list)
    * [Single Choice from Dictionary](#single-choice-from-dictionary)
    * [Multiple Choices from Dictionary](#multiple-choices-from-dictionary)
    * [Unit Testing Prompts](#unit-testing-prompts)
  * [Nicer Logging Formatter](#nicer-logging-formatter)
    * [Rendering on Dark Background](#rendering-on-dark-background)
    * [Rendering in Databricks Notebooks](#rendering-in-databricks-notebooks)
    * [Integration With Your App](#integration-with-your-app)
    * [Integration with `console_script` Entrypoints](#integration-with-console_script-entrypoints)
  * [Parallel Task Execution](#parallel-task-execution)
    * [Collecting Results](#collecting-results)
    * [Collecting Errors from Background Tasks](#collecting-errors-from-background-tasks)
    * [Strict Failures from Background Tasks](#strict-failures-from-background-tasks)
  * [Application and Installation State](#application-and-installation-state)
    * [Install Folder](#install-folder)
    * [Detecting Current Installation](#detecting-current-installation)
    * [Detecting Installations From All Users](#detecting-installations-from-all-users)
    * [Saving `@dataclass` configuration](#saving-dataclass-configuration)
    * [Saving CSV files](#saving-csv-files)
    * [Loading `@dataclass` configuration](#loading-dataclass-configuration)
    * [Brute-forcing `SerdeError` with `as_dict()` and `from_dict()`](#brute-forcing-serdeerror-with-as_dict-and-from_dict)
    * [Configuration Format Evolution](#configuration-format-evolution)
    * [Uploading Untyped Files](#uploading-untyped-files)
    * [Listing All Files in the Install Folder](#listing-all-files-in-the-install-folder)
    * [Unit Testing Installation State](#unit-testing-installation-state)
    * [Assert Rewriting with PyTest](#assert-rewriting-with-pytest)
  * [Application State Migrations](#application-state-migrations)
  * [Building Wheels](#building-wheels)
    * [Released Version Detection](#released-version-detection)
    * [Unreleased Version Detection](#unreleased-version-detection)
    * [Application Name Detection](#application-name-detection)
    * [Using `ProductInfo` with integration tests](#using-productinfo-with-integration-tests)
    * [Publishing Wheels to Databricks Workspace](#publishing-wheels-to-databricks-workspace)
    * [Publishing upstream dependencies to workspaces without Public Internet access](#publishing-upstream-dependencies-to-workspaces-without-public-internet-access)
  * [Databricks CLI's `databricks labs ...` Router](#databricks-clis-databricks-labs--router)
    * [Account-level Commands](#account-level-commands)
    * [Commands with interactive prompts](#commands-with-interactive-prompts)
    * [Integration with Databricks Connect](#integration-with-databricks-connect)
    * [Starting New Projects](#starting-new-projects)
* [Notable Downstream Projects](#notable-downstream-projects)
* [Project Support](#project-support)
<!-- TOC -->

# Installation

You can install this project via `pip`:

```
pip install databricks-labs-blueprint
```

# Batteries Included

This library contains a proven set of building blocks, tested in production through [UCX](https://github.com/databrickslabs/ucx) and projects.

## Python-native `pathlib.Path`-like interfaces

This library exposes subclasses of [`pathlib`](https://docs.python.org/3/library/pathlib.html) from Python's standard 
library that work with Databricks Workspace paths. These classes provide a more intuitive and Pythonic way to work
with Databricks Workspace paths than the standard `str` paths. The classes are designed to be drop-in replacements
for `pathlib.Path` and provide additional functionality for working with Databricks Workspace paths.

[[back to top](#databricks-labs-blueprint)]

### Working With User Home Folders

This code initializes a client to interact with a Databricks workspace, creates 
a relative workspace path (`~/some-folder/foo/bar/baz`), verifies the path is not absolute, and then demonstrates 
that converting this relative path to an absolute path is not implemented and raises an error. Subsequently, 
it expands the relative path to the user's home directory and creates the specified directory if it does not 
already exist.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
assert not wsp.is_absolute()

wsp.absolute()  # raises NotImplementedError

with_user = wsp.expanduser()
with_user.mkdir()

user_name = ws.current_user.me().user_name
wsp_check = WorkspacePath(ws, f"/Users/{user_name}/{name}/foo/bar/baz")
assert wsp_check.is_dir()

wsp_check.parent.rmdir() # raises BadRequest
wsp_check.parent.rmdir(recursive=True)

assert not wsp_check.exists()
```

[[back to top](#databricks-labs-blueprint)]

### Relative File Paths

This code expands the `~` symbol to the full path of the user's home directory, computes the relative path from this 
home directory to the previously created directory (`~/some-folder/foo/bar/baz`), and verifies it matches the expected 
relative path (`some-folder/foo/bar/baz`). It then confirms that the expanded path is absolute, checks that 
calling `absolute()` on this path returns the path itself, and converts the path to a FUSE-compatible path 
format (`/Workspace/username@example.com/some-folder/foo/bar/baz`).

```python
from pathlib import Path
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()

home = WorkspacePath(ws, "~").expanduser()
relative_name = with_user.relative_to(home)
assert relative_name.as_posix() == f"{name}/foo/bar/baz"

assert with_user.is_absolute()
assert with_user.absolute() == with_user
assert with_user.as_fuse() == Path("/Workspace") / with_user.as_posix()
```

[[back to top](#databricks-labs-blueprint)]

### Browser URLs for Workspace Paths

`as_uri()` method returns a browser-accessible URI for the workspace path. This example retrieves the current user's username
from the Databricks workspace client, constructs a browser-accessible URI for the previously created directory 
(~/some-folder/foo/bar/baz) by formatting the host URL and encoding the username, and then verifies that the URI 
generated by the with_user path object matches the constructed browser URI:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()

user_name = ws.current_user.me().user_name
browser_uri = f'{ws.config.host}#workspace/Users/{user_name.replace("@", "%40")}/{name}/foo/bar/baz'

assert with_user.as_uri() == browser_uri
```

[[back to top](#databricks-labs-blueprint)]

### `read/write_text()`, `read/write_bytes()`, and `glob()` Methods

This code creates a `WorkspacePath` object for the path `~/some-folder/a/b/c`, expands it to the full user path, 
and creates the directory along with any necessary parent directories. It then creates a file named `hello.txt` within 
this directory, writes "Hello, World!" to it, and verifies the content. The code lists all `.txt` files in the directory 
and ensures there is exactly one file, which is `hello.txt`. Finally, it deletes `hello.txt` and confirms that the file 
no longer exists.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/a/b/c")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)

hello_txt = with_user / "hello.txt"
hello_txt.write_text("Hello, World!")
assert hello_txt.read_text() == "Hello, World!"

files = list(with_user.glob("**/*.txt"))
assert len(files) == 1
assert hello_txt == files[0]
assert files[0].name == "hello.txt"

with_user.joinpath("hello.txt").unlink()

assert not hello_txt.exists()
```

`read_bytes()` method works as expected:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()

wsp = WorkspacePath(ws, f"~/{name}")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)

hello_bin = with_user.joinpath("hello.bin")
hello_bin.write_bytes(b"Hello, World!")

assert hello_bin.read_bytes() == b"Hello, World!"

with_user.joinpath("hello.bin").unlink()

assert not hello_bin.exists()
```

[[back to top](#databricks-labs-blueprint)]

### Moving Files

This code creates a WorkspacePath object for the path ~/some-folder, expands it to the full user path, and creates 
the directory along with any necessary parent directories. It then creates a file named hello.txt within this directory 
and writes "Hello, World!" to it. The code then renames the file to hello2.txt, verifies that hello.txt no longer exists, 
and checks that the content of hello2.txt is "Hello, World!".

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()

wsp = WorkspacePath(ws, f"~/{name}")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)

hello_txt = with_user / "hello.txt"
hello_txt.write_text("Hello, World!")

hello_txt.replace(with_user / "hello2.txt")

assert not hello_txt.exists()
assert (with_user / "hello2.txt").read_text() == "Hello, World!"
```

[[back to top](#databricks-labs-blueprint)]

### Working With Notebook Sources

This code initializes a Databricks WorkspaceClient, creates a WorkspacePath object for the path ~/some-folder, and 
defines two items within this folder: a text file (a.txt) and a Python notebook (b). It creates the notebook with 
specified content and writes "Hello, World!" to the text file. The code then retrieves all files in the folder, asserts 
there are exactly two files, and verifies the suffix and content of each file. Specifically, it checks that a.txt has a 
.txt suffix and b has a .py suffix, with the notebook containing the expected code.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

ws = WorkspaceClient()

folder = WorkspacePath(ws, "~/some-folder")

txt_file = folder / "a.txt"
py_notebook = folder / "b"  # notebooks have no file extension

make_notebook(path=py_notebook, content="display(spark.range(10))")
txt_file.write_text("Hello, World!")

files = {_.name: _ for _ in folder.glob("**/*")}
assert len(files) == 2

assert files["a.txt"].suffix == ".txt"
assert files["b"].suffix == ".py"  # suffix is determined from ObjectInfo
assert files["b"].read_text() == "# Databricks notebook source\ndisplay(spark.range(10))"
```

[[back to top](#databricks-labs-blueprint)]

## Basic Terminal User Interface (TUI) Primitives

Your command-line apps do need testable interactivity, which is provided by `from databricks.labs.blueprint.tui import Prompts`. Here are some examples of it:

![ucx install](docs/ucx-install.gif)

It is also integrated with our [command router](#commands-with-interactive-prompts). 

[[back to top](#databricks-labs-blueprint)]

### Simple Text Questions

Use `prompts.question()` as a bit more involved than `input()` builtin:

```python
from databricks.labs.blueprint.tui import Prompts

prompts = Prompts()
answer = prompts.question('Enter a year', default='2024', valid_number=True)
print(answer)
```

![question](docs/prompts-question.gif)

Optional arguments are:

* `default` (str) - use given value if user didn't input anything
* `max_attempts` (int, default 10) - number of attempts to throw exception after invalid or empty input
* `valid_number` (bool) - input has to be a valid number
* `valid_regex` (bool) - input has to be a valid regular expression
* `validate` - function that takes a string and returns boolean, like `lambda x: 'awesome' in x`, that could be used to further validate input.

[[back to top](#databricks-labs-blueprint)]

### Confirming Actions

Use `prompts.confirm()` to guard any optional or destructive actions of your app:

```python
if prompts.confirm('Destroy database?'):
    print('DESTROYING DATABASE')
```

![confirm](docs/prompts-confirm.gif)

[[back to top](#databricks-labs-blueprint)]

### Single Choice from List

Use to select a value from a list:

```python
answer = prompts.choice('Select a language', ['Python', 'Rust', 'Go', 'Java'])
print(answer)
```

![choice](docs/prompts-choice.gif)

[[back to top](#databricks-labs-blueprint)]

### Single Choice from Dictionary

Use to select a value from the dictionary by showing users sorted dictionary keys:

```python
answer = prompts.choice_from_dict('Select a locale', {
    'Українська': 'ua',
    'English': 'en'
})
print(f'Locale is: {answer}')
```

![choice from dict](docs/prompts-choice-from-dict.gif)

[[back to top](#databricks-labs-blueprint)]

### Multiple Choices from Dictionary

Use to select multiple items from dictionary

```python
answer = prompts.multiple_choice_from_dict(
    'What projects are written in Python? Select [DONE] when ready.', {
    'Databricks Labs UCX': 'ucx',
    'Databricks SDK for Python': 'sdk-py',
    'Databricks SDK for Go': 'sdk-go',
    'Databricks CLI': 'cli',
})
print(f'Answer is: {answer}')
```

![multiple choice](docs/prompts-choice-from-dict.gif)

[[back to top](#databricks-labs-blueprint)]

### Unit Testing Prompts

Use `MockPrompts` with regular expressions as keys and values as answers. The longest key takes precedence.

```python
from databricks.labs.blueprint.tui import MockPrompts

def test_ask_for_int():
    prompts = MockPrompts({r".*": ""})
    res = prompts.question("Number of threads", default="8", valid_number=True)
    assert "8" == res
```

[[back to top](#databricks-labs-blueprint)]

## Nicer Logging Formatter

There's a basic logging configuration available for [Python SDK](https://github.com/databricks/databricks-sdk-py?tab=readme-ov-file#logging), but the default output is not pretty and is relatively inconvenient to read. Here's how make output from Python's standard logging facility more enjoyable to read:

```python
from databricks.labs.blueprint.logger import install_logger

install_logger()

import logging
logging.root.setLevel("DEBUG") # use only for development or demo purposes

logger = logging.getLogger("name.of.your.module")
logger.debug("This is a debug message")
logger.info("This is an table message")
logger.warning("This is a warning message")
logger.error("This is an error message", exc_info=KeyError(123))
logger.critical("This is a critical message")
```

Here are the assumptions made by this formatter:

 * Most likely you're forwarding your logs to a file already, this log formatter is mainly for visual consumption.
 * The average app or Databricks Job most likely finishes running within a day or two, so we display only hours, minutes, and seconds from the timestamp.
 * We gray out debug messages, and highlight all other messages. Errors and fatas are additionally painted with red.
 * We shorten the name of the logger to a readable chunk only, not to clutter the space. Real-world apps have deeply nested folder structures and filenames like `src/databricks/labs/ucx/migration/something.py`, which translate into `databricks.labs.ucx.migration.something` fully-qualified Python module names, that get reflected into `__name__` [top-level code environment](https://docs.python.org/3/library/__main__.html#what-is-the-top-level-code-environment) special variable, that you idiomatically use with logging as `logger.getLogger(__name__)`. This log formatter shortens the full module path to a more readable `d.l.u.migration.something`, which is easier to consume from a terminal screen or a notebook. 
 * We only show the name of the thread if it's other than `MainThread`, because the overwhelming majority of Python applications are single-threaded.

[[back to top](#databricks-labs-blueprint)]

### Rendering on Dark Background

Here's how the output would look like on dark terminal backgrounds, including those from GitHub Actions:

![logger dark](docs/logger-dark.png)

[[back to top](#databricks-labs-blueprint)]

### Rendering in Databricks Notebooks

And here's how things will appear when executed from Databricks Runtime as part of notebook or a workflow:

![logger white](docs/notebook-logger.png)

[[back to top](#databricks-labs-blueprint)]

### Integration With Your App

Just place the following code in your wheel's top-most `__init__.py` file:

```python
from databricks.labs.blueprint.logger import install_logger

install_logger(level="INFO")
```

And place this idiomatic 

```python
# ... insert this into the top of your file
from databricks.labs.blueprint.entrypoint import get_logger

logger = get_logger(__file__)
# ... top of the file insert end
```

... and you'll be able to benefit from the readable console stderr formatting everywhere 

Each time you'd need to turn on debug logging, just invoke `logging.root.setLevel("DEBUG")` (even in notebook).

[[back to top](#databricks-labs-blueprint)]

### Integration with `console_script` Entrypoints

When you invoke Python as an entry point to your wheel (also known as `console_scripts`), [`__name__` top-level code environment](https://docs.python.org/3/library/__main__.html#what-is-the-top-level-code-environment) would always be equal to `__main__`. But you really want to get the logger to be named after your Python module and not just `__main__` (see [rendering in Databricks notebooks](#rendering-in-databricks-notebooks)).

If you create a `dist/logger.py` file with the following contents:

```python
from databricks.labs.blueprint.entrypoint import get_logger, run_main

logger = get_logger(__file__)

def main(first_arg, second_arg, *other):
    logger.info(f'First arg is: {first_arg}')
    logger.info(f'Second arg is: {second_arg}')
    logger.info(f'Everything else is: {other}')
    logger.debug('... and this message is only shown when you are debugging from PyCharm IDE')

if __name__ == '__main__':
    run_main(main)
```

... and invoke it with `python dist/logger.py Hello world, my name is Serge`, you should get back the following output.

```
13:46:42  INFO [dist.logger] First arg is: Hello
13:46:42  INFO [dist.logger] Second arg is: world,
13:46:42  INFO [dist.logger] Everything else is: ('my', 'name', 'is', 'Serge')
```

Everything is made easy thanks to `run_main(fn)` helper.

[[back to top](#databricks-labs-blueprint)]

## Parallel Task Execution

Python applies global interpreter lock (GIL) for compute-intensive tasks, though IO-intensive tasks, like calling Databricks APIs through Databricks SDK for Python, are not subject to GIL. It's quite a common task to perform multiple different API calls in parallel, though it is overwhelmingly difficult to do multi-threading right. `concurrent.futures import ThreadPoolExecutor` is great, but sometimes we want something even more high level. This library helps you navigate the most common road bumps.

[[back to top](#databricks-labs-blueprint)]

### Collecting Results

This library helps you filtering out empty results from background tasks, so that the downstream code is generally simpler. We're also handling the thread pool namind, so that the name of the list of tasks properly gets into log messages. After all background tasks completed their execution, we log something like `Finished 'task group name' tasks: 50% results available (2/4). Took 0:00:00.000604`.

```python
from databricks.labs.blueprint.parallel import Threads

def not_really_but_fine():
    logger.info("did something, but returned None")

def doing_something():
    logger.info("doing something important")
    return f'result from {doing_something.__name__}'

logger.root.setLevel('DEBUG')
tasks = [not_really_but_fine, not_really_but_fine, doing_something, doing_something]
results, errors = Threads.gather("task group name", tasks)

assert ['result from doing_something', 'result from doing_something'] == results
assert [] == errors
```

This will log the following messages:

```
14:20:15 DEBUG [d.l.blueprint.parallel] Starting 4 tasks in 20 threads
14:20:15  INFO [dist.logger][task_group_name_0] did something, but returned None
14:20:15  INFO [dist.logger][task_group_name_1] did something, but returned None
14:20:15  INFO [dist.logger][task_group_name_1] doing something important
14:20:15  INFO [dist.logger][task_group_name_1] doing something important
14:20:15  INFO [d.l.blueprint.parallel][task_group_name_1] task group name 4/4, rps: 7905.138/sec
14:20:15  INFO [d.l.blueprint.parallel] Finished 'task group name' tasks: 50% results available (2/4). Took 0:00:00.000604
```

[[back to top](#databricks-labs-blueprint)]

### Collecting Errors from Background Tasks

Inspired by Go Language's idiomatic error handling approach, this library allows for collecting errors from all of the background tasks and handle them separately. For all other cases, we recommend using [strict failures](#strict-failures-from-background-tasks)

```python
from databricks.sdk.errors import NotFound
from databricks.labs.blueprint.parallel import Threads

def works():
    return True

def fails():
    raise NotFound("something is not right")

tasks = [works, fails, works, fails, works, fails, works, fails]
results, errors = Threads.gather("doing some work", tasks)

assert [True, True, True, True] == results
assert 4 == len(errors)
```

This will log the following messages:

```
14:08:31 ERROR [d.l.blueprint.parallel][doing_some_work_0] doing some work task failed: something is not right: ...
...
14:08:31 ERROR [d.l.blueprint.parallel][doing_some_work_3] doing some work task failed: something is not right: ...
14:08:31 ERROR [d.l.blueprint.parallel] More than half 'doing some work' tasks failed: 50% results available (4/8). Took 0:00:00.001011
```

[[back to top](#databricks-labs-blueprint)]

### Strict Failures from Background Tasks

Use `Threads.strict(...)` to raise `ManyError` with the summary of all failed tasks:

```python
from databricks.sdk.errors import NotFound
from databricks.labs.blueprint.parallel import Threads

def works():
    return True

def fails():
    raise NotFound("something is not right")

tasks = [works, fails, works, fails, works, fails, works, fails]
results = Threads.strict("doing some work", tasks)

# this line won't get executed
assert [True, True, True, True] == results
```

This will log the following messages:

```
...
14:11:46 ERROR [d.l.blueprint.parallel] More than half 'doing some work' tasks failed: 50% results available (4/8). Took 0:00:00.001098
...
databricks.labs.blueprint.parallel.ManyError: Detected 4 failures: NotFound: something is not right
```

[[back to top](#databricks-labs-blueprint)]

## Application and Installation State

There always needs to be a location, where you put application code, artifacts, and configuration. 
The `Installation` class is used to manage the `~/.{product}` folder on WorkspaceFS to track [typed files](#saving-dataclass-configuration).
It provides methods for serializing and deserializing objects of a specific type, as well as managing the [storage location](#install-folder) 
for those objects. The class includes methods for loading and saving objects, uploading and downloading
files, and managing the installation folder.

The `Installation` class can be helpful for unit testing by allowing you to mock the file system and control
the behavior of the [`load`](#loading-dataclass-configuration) and [`save`](#saving-dataclass-configuration) methods. 
See [unit testing](#unit-testing-installation-state) for more details.

[[back to top](#databricks-labs-blueprint)]

### Install Folder

The `install_folder` method returns the path to the installation folder on WorkspaceFS. The installation folder 
is used to store typed files that are managed by the `Installation` class. [Publishing wheels](#publishing-wheels-to-databricks-workspace) 
update the `version.json` file in the install folder.

When integration testing, you may want to have a [random installation folder](#using-productinfo-with-integration-tests) for each test execution.

If an `install_folder` argument is provided to the constructor of the `Installation` class, it will be used
as the installation folder. Otherwise, the installation folder will be determined based on the current user's
username. Specifically, the installation folder will be `/Users/{user_name}/.{product}`, where `{user_name}`
is the username of the current user and `{product}` is the [name of the product](#application-name-detection)
 associated with the installation. Here is an example of how you can use the `install_folder` method:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

# Create an Installation object for the "blueprint" product
install = Installation(WorkspaceClient(), "blueprint")

# Print the path to the installation folder
print(install.install_folder())
# Output: /Users/{user_name}/.blueprint
```

In this example, the `Installation` object is created for the "blueprint" product. The `install_folder` method
is then called to print the path to the installation folder. The output will be `/Users/{user_name}/.blueprint`,
where `{user_name}` is the username of the current user.

You can also provide an `install_folder` argument to the constructor to specify a custom installation folder.
Here is an example of how you can do this:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

# Create an Installation object for the "blueprint" product with a custom installation folder
install = Installation(WorkspaceClient(), "blueprint", install_folder="/my/custom/folder")

# Print the path to the installation folder
print(install.install_folder())
# Output: /my/custom/folder
```

In this example, the `Installation` object is created for the "blueprint" product with a custom installation
folder of `/my/custom/folder`. The `install_folder` method is then called to print the path to the installation
folder. The output will be `/my/custom/folder`.

[[back to top](#databricks-labs-blueprint)]

### Detecting Current Installation

`Installation.current(ws, product)` returns the `Installation` object for the given product in the current workspace.

If the installation is not found, a `NotFound` error is raised. If `assume_user` argument is True, the method
will assume that the installation is in the user's home directory and return it if found. If False, the method
will only return an installation that is in the `/Applications` directory.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

ws = WorkspaceClient()

# current user installation
installation = Installation.assume_user_home(ws, "blueprint")
assert "/Users/foo/.blueprint" == installation.install_folder()
assert not installation.is_global()

# workspace global installation
installation = Installation.current(ws, "blueprint")
assert "/Applications/blueprint" == installation.install_folder()
assert installation.is_global()
```

[[back to top](#databricks-labs-blueprint)]

### Detecting Installations From All Users

`Installation.existing(ws, product)` Returns a collection of all existing installations for the given product in the current workspace.

This method searches for installations in the root /Applications directory and home directories of all users in the workspace. 
Let's say, users `foo@example.com` and `bar@example.com` installed `blueprint` product in their home folders. The following
code will print `/Workspace/bar@example.com/.blueprint` and `/Workspace/foo@example.com/.blueprint`:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

ws = WorkspaceClient()

global_install = Installation.assume_global(ws, 'blueprint')
global_install.upload("some.bin", b"...")

user_install = Installation.assume_user_home(ws, 'blueprint')
user_install.upload("some.bin", b"...")

for blueprint in Installation.existing(ws, "blueprint"):
  print(blueprint.install_folder())
```

[[back to top](#databricks-labs-blueprint)]

### Saving `@dataclass` configuration

The `save(obj)` method saves a dataclass instance of type `T` to a file on WorkspaceFS. If no `filename` is provided, 
the name of the `type_ref` class will be used as the filename. Any missing parent directories are created automatically.
If the object has a `__version__` attribute, the method will add a `version` field to the serialized object
with the value of the `__version__` attribute. See [configuration format evolution](#configuration-format-evolution) 
for more details. `save(obj)` works with JSON and YAML configurations without the need to supply `filename` keyword 
attribute. When you need to save [CSV files](#saving-csv-files), the `filename` attribute is required. If you need to 
upload arbitrary and untyped files, use the [`upload()` method](#uploading-untyped-files).

Here is an example of how you can use the `save` method:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

install = Installation(WorkspaceClient(), "blueprint")

@dataclass
class MyClass:
    field1: str
    field2: str

obj = MyClass('value1', 'value2')
install.save(obj)

# Verify that the object was saved correctly
loaded_obj = install.load(MyClass)
assert loaded_obj == obj
```

In this example, the `Installation` object is created for the "blueprint" product. A dataclass object of type
`MyClass` is then created and saved to a file using the `save` method. The object is then loaded from the file
using the [`load` method](#loading-dataclass-configuration) and compared to the original object to verify that 
it was saved correctly.

[[back to top](#databricks-labs-blueprint)]

### Saving CSV files

You may need to upload a CSV file to Databricks Workspace, so that it's easier editable from a Databricks Workspace UI 
or tools like Google Sheets or Microsoft Excel. If non-technical humands don't need to edit application state,
use [dataclasses](#saving-dataclass-configuration) for configuration. CSV files currently don't support 
[format evolution](#configuration-format-evolution).

The following example will save `workspaces.csv` file with two records and a header:

```python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.provisioning import Workspace
from databricks.labs.blueprint.installation import Installation

installation = Installation(WorkspaceClient(), "blueprint")

installation.save([
  Workspace(workspace_id=1234, workspace_name="first"),
  Workspace(workspace_id=1235, workspace_name="second"),
], filename="workspaces.csv")

# ~ $ databricks workspace export /Users/foo@example.com/.blueprint/workspaces.csv
# ... workspace_id,workspace_name
# ... 1234,first
# ... 1235,second
```

[[back to top](#databricks-labs-blueprint)]

### Loading `@dataclass` configuration

The `load(type_ref[, filename])` method loads an object of type `type_ref` from a file on WorkspaceFS. If no `filename` is
provided, the `__file__` attribute of `type_ref` will be used as the filename, otherwise the library will figure out the name
based on a class name.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

@dataclass
class SomeConfig:  # <-- auto-detected filename is `some-config.json`
    version: str

ws = WorkspaceClient()
installation = Installation.current(ws, "blueprint")
cfg = installation.load(SomeConfig)

installation.save(SomeConfig("0.1.2"))
installation.assert_file_written("some-config.json", {"version": "0.1.2"})
```

[[back to top](#databricks-labs-blueprint)]

### Brute-forcing `SerdeError` with `as_dict()` and `from_dict()`

In the rare circumstances when you cannot use [@dataclass](#loading-dataclass-configuration) or you get `SerdeError` that you cannot explain, you can implement `from_dict(cls, raw: dict) -> 'T'` and `as_dict(self) -> dict` methods on the class:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

class SomePolicy:
    def __init__(self, a, b):
        self._a = a
        self._b = b

    def as_dict(self) -> dict:
        return {"a": self._a, "b": self._b}

    @classmethod
    def from_dict(cls, raw: dict):
        return cls(raw.get("a"), raw.get("b"))

    def __eq__(self, o):
        assert isinstance(o, SomePolicy)
        return self._a == o._a and self._b == o._b

policy = SomePolicy(1, 2)
installation = Installation.current(WorkspaceClient(), "blueprint")
installation.save(policy, filename="backups/policy-123.json")
load = installation.load(SomePolicy, filename="backups/policy-123.json")

assert load == policy
```

[[back to top](#databricks-labs-blueprint)]

### Configuration Format Evolution

As time progresses, your application evolves. So does the configuration file format with it. This library provides
a common utility to seamlessly evolve configuration file format across versions, providing callbacks to convert
from older versions to newer. If you need to migrate configuration or database state of the entire application, 
use the [application state migrations](#application-state-migrations).

If the type has a `__version__` attribute, the method will check that the version of the object in the file
matches the expected version. If the versions do not match, the method will attempt to migrate the object to
the expected version using a method named `v{actual_version}_migrate` on the `type_ref` class. If the migration
is successful, the method will return the migrated object. If the migration is not successful, the method will
raise an `IllegalState` exception. Let's say, we have `/Users/foo@example.com/.blueprint/config.yml` file with
only the `initial: 999` as content, which is from older installations of the `blueprint` product:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

@dataclass
class EvolvedConfig:
    __file__ = "config.yml"
    __version__ = 3

    initial: int
    added_in_v1: int
    added_in_v2: int

    @staticmethod
    def v1_migrate(raw: dict) -> dict:
        raw["added_in_v1"] = 111
        raw["version"] = 2
        return raw

    @staticmethod
    def v2_migrate(raw: dict) -> dict:
        raw["added_in_v2"] = 222
        raw["version"] = 3
        return raw

installation = Installation.current(WorkspaceClient(), "blueprint")
cfg = installation.load(EvolvedConfig)

assert 999 == cfg.initial
assert 111 == cfg.added_in_v1  # <-- added by v1_migrate()
assert 222 == cfg.added_in_v2  # <-- added by v2_migrate()
```

[[back to top](#databricks-labs-blueprint)]

### Uploading Untyped Files

The `upload(filename, raw_bytes)` and `upload_dbfs(filename, raw_bytes)` methods upload raw bytes to a file on 
WorkspaceFS (or DBFS) with the given `filename`, creating any missing directories where required. This method 
is used to upload files that are not typed, i.e., they do not use the [`@dataclass` decorator](#saving-dataclass-configuration).

```python
installation = Installation(ws, "blueprint")

target = installation.upload("wheels/foo.whl", b"abc")
assert "/Users/foo/.blueprint/wheels/foo.whl" == target
```

The most common example is a [wheel](#building-wheels), which we already integrate with `Installation` framework.

[[back to top](#databricks-labs-blueprint)]

### Listing All Files in the Install Folder

You can use `files()` method to recursively list all files in the [install folder](#install-folder).

[[back to top](#databricks-labs-blueprint)]

### Unit Testing Installation State

You can create a `MockInstallation` object and use it to override the default installation folder and the contents 
of the files in that folder. This allows you to test the of your code in different scenarios, such as when a file 
is not found or when the contents of a file do not match the expected format. 


For example, you have the following `WorkspaceConfig` class that is serialized into `config.yml` on your workspace:

```python
@dataclass
class WorkspaceConfig:
  __file__ = "config.yml"
  __version__ = 2

  inventory_database: str
  connect: Config | None = None
  workspace_group_regex: str | None = None
  include_group_names: list[str] | None = None
  num_threads: int | None = 10
  database_to_catalog_mapping: dict[str, str] | None = None
  log_level: str | None = "INFO"
  workspace_start_path: str = "/"
```

Here's the only code necessary to verify that specific content got written:

```python
from databricks.labs.blueprint.installation import MockInstallation

installation = MockInstallation()

installation.save(WorkspaceConfig(inventory_database="some_blueprint"))

installation.assert_file_written("config.yml", {
  "version": 2,
  "inventory_database": "some_blueprint",
  "log_level": "INFO",
  "num_threads": 10,
  "workspace_start_path": "/",
})
```

This method is far superior than directly comparing raw bytes content via mock:

```python
ws.workspace.upload.assert_called_with(
  "/Users/foo/.blueprint/config.yml",
  yaml.dump(
    {
      "version": 2,
      "num_threads": 10,
      "inventory_database": "some_blueprint",
      "include_group_names": ["foo", "bar"],
      "workspace_start_path": "/",
      "log_level": "INFO",
    }
  ).encode("utf8"),
  format=ImportFormat.AUTO,
  overwrite=True,
)
```

And it's even better if you use PyTest, where we have even [deeper integration](#assert-rewriting-with-pytest).

[[back to top](#databricks-labs-blueprint)]

### Assert Rewriting with PyTest

If you are using [PyTest](https://docs.pytest.org/), then add this to your `conftest.py`, so that
the assertions are more readable:

```python
import pytest

pytest.register_assert_rewrite('databricks.labs.blueprint.installation')
```

![pytest asserts](docs/pytest-installation-asserts.png)

[[back to top](#databricks-labs-blueprint)]

## Application State Migrations

As time goes by, your applications evolve as well, requiring the addition of new columns to database schemas, 
changes of the database state, or some migrations of configured workflows. This utility allows you to do seamless 
upgrades from version X to version Z through version Y. Idiomatic usage in your deployment automation is as follows:

```python
from ... import Config
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.upgrades import Upgrades
from databricks.labs.blueprint.wheels import ProductInfo

product_info = ProductInfo.from_class(Config)
ws = WorkspaceClient(product=product_info.product_name(), product_version=product_info.version())
installation = product_info.current_installation(ws)
config = installation.load(Config)
upgrades = Upgrades(product_info, installation)
upgrades.apply(ws)
```

The upgrade process loads the version of [the product](#application-name-detection) that is about to be installed from `__about__.py` file that
declares the [`__version__` variable](#released-version-detection). This version is compares with the version currently installed on
the Databricks Workspace by loading it from the `version.json` file in the [installation folder](#install-folder). This file is kept
up-to-date automatically if you use the [databricks.labs.blueprint.wheels.WheelsV2](#publishing-wheels-to-databricks-workspace).

If those versions are different, the process looks for the `upgrades` folder next to `__about__.py` file and
computes a difference for the upgrades in need to be rolled out. Every upgrade script in that directory has to
start with a valid SemVer identifier, followed by the alphanumeric description of the change,
like `v0.0.1_add_service.py`. Each script has to expose a function that takes [`Installation`](#installation) and
`WorkspaceClient` arguments to perform the relevant upgrades. Here's the example:

```python
from ... import Config

import logging, dataclasses
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.installation import Installation

upgrade_logger = logging.getLogger(__name__)

def upgrade(installation: Installation, ws: WorkspaceClient):
    upgrade_logger.info(f"creating new automated service user for the installation")
    config = installation.load(Config)
    service_principal = ws.service_principals.create(display_name='blueprint-service')
    new_config = dataclasses.replace(config, application_id=service_principal.application_id)
    installation.save(new_config)
```

To prevent the same upgrade script from being applies twice, we use `applied-upgrades.json` file in
the installation directory. At the moment, there's no `downgrade(installation, ws)`, but it can easily be added in 
the future versions of this library.

[[back to top](#databricks-labs-blueprint)]

## Building Wheels

We recommend deploying applications as wheels, which are part of the [application installation](#application-and-installation-state). But versioning, testing, and deploying those is often a tedious process.

### Released Version Detection

When you deploy your Python app as a wheel, every time it has to have a different version. This library detects `__about__.py` file automatically anywhere in the project root and reads `__version__` variable from it. We support [SemVer](https://semver.org/) versioning scheme. [Publishing wheels](#publishing-wheels-to-databricks-workspace) update `version.json` file in the [install folder](#install-folder).

```python
from databricks.labs.blueprint.wheels import ProductInfo

product_info = ProductInfo(__file__)
version = product_info.released_version()
logger.info(f'Version is: {version}')
```

[[back to top](#databricks-labs-blueprint)]

### Unreleased Version Detection

When you develop your wheel and iterate on testing it, it's often required to upload a file with different name each time you build it. We use `git describe --tags` command to fetch the latest SemVer-compatible tag (e.g. `v0.0.2`) and append the number of commits with timestamp to it. For example, if the released version is `v0.0.1`, then the unreleased version would be something like `0.0.2+120240105144650`. We verify that this version is compatible with both SemVer and [PEP 440](https://peps.python.org/pep-0440/). [Publishing wheels](#publishing-wheels-to-databricks-workspace) update `version.json` file in the [install folder](#install-folder).

```python
product_info = ProductInfo(__file__)

version = product_info.unreleased_version()
is_git = product_info.is_git_checkout()
is_unreleased = product_info.is_unreleased_version()

logger.info(f'Version is: {version}')
logger.info(f'Git checkout: {is_git}')
logger.info(f'Is unreleased: {is_unreleased}')
```

[[back to top](#databricks-labs-blueprint)]

### Application Name Detection

Library can infer the name of application by taking the directory name when `__about__.py` file is located within the current project. See [released version detection](#released-version-detection) for more details.
[`ProductInfo.for_testing(klass)`](#using-productinfo-with-integration-tests) creates a new `ProductInfo` object with a random `product_name`.

```python
from databricks.labs.blueprint.wheels import ProductInfo

product_info = ProductInfo(__file__)
logger.info(f'Product name is: {product_info.product_name()}')
```

[[back to top](#databricks-labs-blueprint)]

### Using `ProductInfo` with integration tests

When you're integration testing your [installations](#installation), you may want to have different [installation folders](#install-folder) for each test execution. `ProductInfo.for_testing(klass)` helps you with this:

```python
from ... import ConfigurationClass
from databricks.labs.blueprint.wheels import ProductInfo

first = ProductInfo.for_testing(ConfigurationClass)
second = ProductInfo.for_testing(ConfigurationClass)
assert first.product_name() != second.product_name()
```

[[back to top](#databricks-labs-blueprint)]

### Publishing Wheels to Databricks Workspace

Before you execute a wheel on Databricks, you have to build it and upload it. This library provides detects [released](#released-version-detection) or [unreleased](#unreleased-version-detection) version of the wheel, copies it over to a temporary folder, changes the `__about__.py` file with the right version, and builds the wheel in the temporary location, so that it's not polluted with build artifacts. `Wheels` is a context manager, so it removes all temporary files and folders ather `with` block finishes. This library is successfully used to concurrently test wheels on Shared Databricks Clusters through notebook-scoped libraries. Before you deploy the new version of the wheel, it is highly advised that you perform [application state upgrades](#application-state-migrations).

Every call `wheels.upload_to_wsfs()` updates `version.json` file in the [install folder](#install-folder), which holds `version` field with the current wheel version. There's also `wheel` field, that contains the path to the current wheel file on WorkspaceFS.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.wheels import ProductInfo

w = WorkspaceClient()
product_info = ProductInfo(__file__)
installation = product_info.current_installation(w)

with product_info.wheels(w) as wheels:
    remote_wheel = wheels.upload_to_wsfs()
    logger.info(f'Uploaded to {remote_wheel}')
```

This will print something like:

```
15:08:44  INFO [dist.logger] Uploaded to /Users/serge.smertin@databricks.com/.blueprint/wheels/databricks_labs_blueprint-0.0.2+120240105150840-py3-none-any.whl
```

You can also do `wheels.upload_to_dbfs()`, though you're not able to set any access control over it.

### Publishing upstream dependencies to workspaces without Public Internet access

Python wheel may have dependencies that are not included in the wheel itself. These dependencies are usually other Python packages that your wheel relies on. During installation on regular Databricks Workspaces, these dependencies get automatically fetched from [Python Package Index](https://pypi.org/). 

Some Databricks Workspaces are configured with extra layers of network security, that block all access to Public Internet, including [Python Package Index](https://pypi.org/). To ensure installations working on these kinds of workspaces, developers need to explicitly upload all upstream dependencies for their applications to work correctly.

The `upload_wheel_dependencies(prefixes)` method can be used to upload these dependencies to Databricks Workspace. This method takes a list of prefixes as an argument. It will upload all the dependencies of the wheel that have names starting with any of the provided prefixes.

Here is an example of how you can use this method:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.wheels import ProductInfo

ws = WorkspaceClient()
product_info = ProductInfo(__file__)
installation = product_info.current_installation(ws)

with product_info.wheels(ws) as wheels:
    wheel_paths = wheels.upload_wheel_dependencies(['databricks_sdk', 'pandas'])
    for path in wheel_paths:
        print(f'Uploaded dependency to {path}')
```

In this example, the `upload_wheel_dependencies(['databricks_sdk', 'pandas'])` call will upload all the dependencies of the wheel that have names starting with 'databricks_sdk' or 'pandas'. This method excludes any platform specific dependencies (i.e. ending with `-none-any.whl`). Also the main wheel file is not uploaded. The method returns a list of paths to the uploaded dependencies on WorkspaceFS.


[[back to top](#databricks-labs-blueprint)]

## Databricks CLI's `databricks labs ...` Router

This library contains common utilities for Databricks CLI entrypoints defined in [`labs.yml`](labs.yml) file. Here's the example metadata for a tool named `blueprint` with a single `me` command and flag named `--greeting`, that has `Hello` as default value:

```yaml
---
name: blueprint
description: Common libraries for Databricks Labs
install:
  script: src/databricks/labs/blueprint/__init__.py
entrypoint: src/databricks/labs/blueprint/__main__.py
min_python: 3.10
commands:
  - name: me
    description: shows current username
    flags:
     - name: greeting
       default: Hello
       description: Greeting prefix
```

And here's the content for [`src/databricks/labs/blueprint/__main__.py`](src/databricks/labs/blueprint/__main__.py) file, that executes `databricks labs blueprint me` command with `databricks.sdk.WorkspaceClient` automatically injected into an argument with magical name `w`:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.entrypoint import get_logger
from databricks.labs.blueprint.cli import App

app = App(__file__)
logger = get_logger(__file__)


@app.command
def me(w: WorkspaceClient, greeting: str):
    """Shows current username"""
    logger.info(f"{greeting}, {w.current_user.me().user_name}!")


if "__main__" == __name__:
    app()
```

[[back to top](#databricks-labs-blueprint)]

### Account-level Commands

As you may have noticed, there were only workspace-level commands, but you can also nave native account-level command support. You need to specify the `is_account` property when declaring it in `labs.yml` file:

```yaml
commands:
  # ...
  - name: workspaces
    is_account: true
    description: shows current workspaces
```

and `@app.command(is_account=True)` will get you `databricks.sdk.AccountClient` injected into `a` argument:

```python
from databricks.sdk import AccountClient

@app.command(is_account=True)
def workspaces(a: AccountClient):
    """Shows workspaces"""
    for ws in a.workspaces.list():
        logger.info(f"Workspace: {ws.workspace_name} ({ws.workspace_id})")
```

[[back to top](#databricks-labs-blueprint)]

### Commands with interactive prompts

If your command needs some terminal interactivity, simply add [`prompts: Prompts` argument](#basic-terminal-user-interface-tui-primitives) to your command:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.entrypoint import get_logger
from databricks.labs.blueprint.cli import App
from databricks.labs.blueprint.tui import Prompts

app = App(__file__)
logger = get_logger(__file__)


@app.command
def me(w: WorkspaceClient, prompts: Prompts):
    """Shows current username"""
    if prompts.confirm("Are you sure?"):
        logger.info(f"Hello, {w.current_user.me().user_name}!")

if "__main__" == __name__:
    app()
```

[[back to top](#databricks-labs-blueprint)]

### Integration with Databricks Connect

Invoking Sparksession using Databricks Connect

```python
from databricks.sdk import WorkspaceClient
from databricks.connect import DatabricksSession

@app.command
def example(w: WorkspaceClient):
    """Building Spark Session using Databricks Connect"""
    spark = DatabricksSession.builder().sdk_config(w.config).getOrCreate()
    spark.sql("SHOW TABLES")
```

[[back to top](#databricks-labs-blueprint)]

### Starting New Projects

This tooling makes it easier to start new projects. First, install the CLI:

```
databricks labs install blueprint
```

After, create new project in a designated directory:

```
databricks labs blueprint init-project --target /path/to/folder
```

[[back to top](#databricks-labs-blueprint)]

# Notable Downstream Projects

This library is used in the following projects:

- [UCX - Automated upgrade to Unity Catalog](https://github.com/databrickslabs/ucx)

[[back to top](#databricks-labs-blueprint)]

# Project Support

Please note that this project is provided for your exploration only and is not 
formally supported by Databricks with Service Level Agreements (SLAs). They are 
provided AS-IS, and we do not make any guarantees of any kind. Please do not 
submit a support ticket relating to any issues arising from the use of this project.

Any issues discovered through the use of this project should be filed as GitHub 
[Issues on this repository](https://github.com/databrickslabs/blueprint/issues). 
They will be reviewed as time permits, but no formal SLAs for support exist.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "databricks-labs-blueprint",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Databricks",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/c9/d0/9d818b50dc4fa86a9b5fdc0d74b96eacbd06410d7bd10b9e5f75dc416e35/databricks_labs_blueprint-0.9.3.tar.gz",
    "platform": null,
    "description": "<!-- FOR CONTRIBUTORS: Edit this file in Visual Studio Code with the recommended extensions, so that we update the table of contents automatically -->\nDatabricks Labs Blueprint\n===\n\n[![python](https://img.shields.io/badge/python-3.10,%203.11,%203.12-green)](https://github.com/databrickslabs/blueprint/actions/workflows/push.yml)\n[![codecov](https://codecov.io/github/databrickslabs/blueprint/graph/badge.svg?token=x1JSVddfZa)](https://codecov.io/github/databrickslabs/blueprint) [![lines of code](https://tokei.rs/b1/github/databrickslabs/blueprint)]([https://codecov.io/github/databrickslabs/blueprint](https://github.com/databrickslabs/blueprint))\n\n\nBaseline for Databricks Labs projects written in Python. Sources are validated with `mypy` and `pylint`. See [Contributing instructions](CONTRIBUTING.md) if you would like to improve this project.\n\n<!-- TOC -->\n* [Databricks Labs Blueprint](#databricks-labs-blueprint)\n* [Installation](#installation)\n* [Batteries Included](#batteries-included)\n  * [Python-native `pathlib.Path`-like interfaces](#python-native-pathlibpath-like-interfaces)\n    * [Working With User Home Folders](#working-with-user-home-folders)\n    * [Relative File Paths](#relative-file-paths)\n    * [Browser URLs for Workspace Paths](#browser-urls-for-workspace-paths)\n    * [`read/write_text()`, `read/write_bytes()`, and `glob()` Methods](#readwrite_text-readwrite_bytes-and-glob-methods)\n    * [Moving Files](#moving-files)\n    * [Working With Notebook Sources](#working-with-notebook-sources)\n  * [Basic Terminal User Interface (TUI) Primitives](#basic-terminal-user-interface-tui-primitives)\n    * [Simple Text Questions](#simple-text-questions)\n    * [Confirming Actions](#confirming-actions)\n    * [Single Choice from List](#single-choice-from-list)\n    * [Single Choice from Dictionary](#single-choice-from-dictionary)\n    * [Multiple Choices from Dictionary](#multiple-choices-from-dictionary)\n    * [Unit Testing Prompts](#unit-testing-prompts)\n  * [Nicer Logging Formatter](#nicer-logging-formatter)\n    * [Rendering on Dark Background](#rendering-on-dark-background)\n    * [Rendering in Databricks Notebooks](#rendering-in-databricks-notebooks)\n    * [Integration With Your App](#integration-with-your-app)\n    * [Integration with `console_script` Entrypoints](#integration-with-console_script-entrypoints)\n  * [Parallel Task Execution](#parallel-task-execution)\n    * [Collecting Results](#collecting-results)\n    * [Collecting Errors from Background Tasks](#collecting-errors-from-background-tasks)\n    * [Strict Failures from Background Tasks](#strict-failures-from-background-tasks)\n  * [Application and Installation State](#application-and-installation-state)\n    * [Install Folder](#install-folder)\n    * [Detecting Current Installation](#detecting-current-installation)\n    * [Detecting Installations From All Users](#detecting-installations-from-all-users)\n    * [Saving `@dataclass` configuration](#saving-dataclass-configuration)\n    * [Saving CSV files](#saving-csv-files)\n    * [Loading `@dataclass` configuration](#loading-dataclass-configuration)\n    * [Brute-forcing `SerdeError` with `as_dict()` and `from_dict()`](#brute-forcing-serdeerror-with-as_dict-and-from_dict)\n    * [Configuration Format Evolution](#configuration-format-evolution)\n    * [Uploading Untyped Files](#uploading-untyped-files)\n    * [Listing All Files in the Install Folder](#listing-all-files-in-the-install-folder)\n    * [Unit Testing Installation State](#unit-testing-installation-state)\n    * [Assert Rewriting with PyTest](#assert-rewriting-with-pytest)\n  * [Application State Migrations](#application-state-migrations)\n  * [Building Wheels](#building-wheels)\n    * [Released Version Detection](#released-version-detection)\n    * [Unreleased Version Detection](#unreleased-version-detection)\n    * [Application Name Detection](#application-name-detection)\n    * [Using `ProductInfo` with integration tests](#using-productinfo-with-integration-tests)\n    * [Publishing Wheels to Databricks Workspace](#publishing-wheels-to-databricks-workspace)\n    * [Publishing upstream dependencies to workspaces without Public Internet access](#publishing-upstream-dependencies-to-workspaces-without-public-internet-access)\n  * [Databricks CLI's `databricks labs ...` Router](#databricks-clis-databricks-labs--router)\n    * [Account-level Commands](#account-level-commands)\n    * [Commands with interactive prompts](#commands-with-interactive-prompts)\n    * [Integration with Databricks Connect](#integration-with-databricks-connect)\n    * [Starting New Projects](#starting-new-projects)\n* [Notable Downstream Projects](#notable-downstream-projects)\n* [Project Support](#project-support)\n<!-- TOC -->\n\n# Installation\n\nYou can install this project via `pip`:\n\n```\npip install databricks-labs-blueprint\n```\n\n# Batteries Included\n\nThis library contains a proven set of building blocks, tested in production through [UCX](https://github.com/databrickslabs/ucx) and projects.\n\n## Python-native `pathlib.Path`-like interfaces\n\nThis library exposes subclasses of [`pathlib`](https://docs.python.org/3/library/pathlib.html) from Python's standard \nlibrary that work with Databricks Workspace paths. These classes provide a more intuitive and Pythonic way to work\nwith Databricks Workspace paths than the standard `str` paths. The classes are designed to be drop-in replacements\nfor `pathlib.Path` and provide additional functionality for working with Databricks Workspace paths.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Working With User Home Folders\n\nThis code initializes a client to interact with a Databricks workspace, creates \na relative workspace path (`~/some-folder/foo/bar/baz`), verifies the path is not absolute, and then demonstrates \nthat converting this relative path to an absolute path is not implemented and raises an error. Subsequently, \nit expands the relative path to the user's home directory and creates the specified directory if it does not \nalready exist.\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.paths import WorkspacePath\n\nname = 'some-folder'\nws = WorkspaceClient()\nwsp = WorkspacePath(ws, f\"~/{name}/foo/bar/baz\")\nassert not wsp.is_absolute()\n\nwsp.absolute()  # raises NotImplementedError\n\nwith_user = wsp.expanduser()\nwith_user.mkdir()\n\nuser_name = ws.current_user.me().user_name\nwsp_check = WorkspacePath(ws, f\"/Users/{user_name}/{name}/foo/bar/baz\")\nassert wsp_check.is_dir()\n\nwsp_check.parent.rmdir() # raises BadRequest\nwsp_check.parent.rmdir(recursive=True)\n\nassert not wsp_check.exists()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Relative File Paths\n\nThis code expands the `~` symbol to the full path of the user's home directory, computes the relative path from this \nhome directory to the previously created directory (`~/some-folder/foo/bar/baz`), and verifies it matches the expected \nrelative path (`some-folder/foo/bar/baz`). It then confirms that the expanded path is absolute, checks that \ncalling `absolute()` on this path returns the path itself, and converts the path to a FUSE-compatible path \nformat (`/Workspace/username@example.com/some-folder/foo/bar/baz`).\n\n```python\nfrom pathlib import Path\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.paths import WorkspacePath\n\nname = 'some-folder'\nws = WorkspaceClient()\nwsp = WorkspacePath(ws, f\"~/{name}/foo/bar/baz\")\nwith_user = wsp.expanduser()\n\nhome = WorkspacePath(ws, \"~\").expanduser()\nrelative_name = with_user.relative_to(home)\nassert relative_name.as_posix() == f\"{name}/foo/bar/baz\"\n\nassert with_user.is_absolute()\nassert with_user.absolute() == with_user\nassert with_user.as_fuse() == Path(\"/Workspace\") / with_user.as_posix()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Browser URLs for Workspace Paths\n\n`as_uri()` method returns a browser-accessible URI for the workspace path. This example retrieves the current user's username\nfrom the Databricks workspace client, constructs a browser-accessible URI for the previously created directory \n(~/some-folder/foo/bar/baz) by formatting the host URL and encoding the username, and then verifies that the URI \ngenerated by the with_user path object matches the constructed browser URI:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.paths import WorkspacePath\n\nname = 'some-folder'\nws = WorkspaceClient()\nwsp = WorkspacePath(ws, f\"~/{name}/foo/bar/baz\")\nwith_user = wsp.expanduser()\n\nuser_name = ws.current_user.me().user_name\nbrowser_uri = f'{ws.config.host}#workspace/Users/{user_name.replace(\"@\", \"%40\")}/{name}/foo/bar/baz'\n\nassert with_user.as_uri() == browser_uri\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### `read/write_text()`, `read/write_bytes()`, and `glob()` Methods\n\nThis code creates a `WorkspacePath` object for the path `~/some-folder/a/b/c`, expands it to the full user path, \nand creates the directory along with any necessary parent directories. It then creates a file named `hello.txt` within \nthis directory, writes \"Hello, World!\" to it, and verifies the content. The code lists all `.txt` files in the directory \nand ensures there is exactly one file, which is `hello.txt`. Finally, it deletes `hello.txt` and confirms that the file \nno longer exists.\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.paths import WorkspacePath\n\nname = 'some-folder'\nws = WorkspaceClient()\nwsp = WorkspacePath(ws, f\"~/{name}/a/b/c\")\nwith_user = wsp.expanduser()\nwith_user.mkdir(parents=True)\n\nhello_txt = with_user / \"hello.txt\"\nhello_txt.write_text(\"Hello, World!\")\nassert hello_txt.read_text() == \"Hello, World!\"\n\nfiles = list(with_user.glob(\"**/*.txt\"))\nassert len(files) == 1\nassert hello_txt == files[0]\nassert files[0].name == \"hello.txt\"\n\nwith_user.joinpath(\"hello.txt\").unlink()\n\nassert not hello_txt.exists()\n```\n\n`read_bytes()` method works as expected:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.paths import WorkspacePath\n\nname = 'some-folder'\nws = WorkspaceClient()\n\nwsp = WorkspacePath(ws, f\"~/{name}\")\nwith_user = wsp.expanduser()\nwith_user.mkdir(parents=True)\n\nhello_bin = with_user.joinpath(\"hello.bin\")\nhello_bin.write_bytes(b\"Hello, World!\")\n\nassert hello_bin.read_bytes() == b\"Hello, World!\"\n\nwith_user.joinpath(\"hello.bin\").unlink()\n\nassert not hello_bin.exists()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Moving Files\n\nThis code creates a WorkspacePath object for the path ~/some-folder, expands it to the full user path, and creates \nthe directory along with any necessary parent directories. It then creates a file named hello.txt within this directory \nand writes \"Hello, World!\" to it. The code then renames the file to hello2.txt, verifies that hello.txt no longer exists, \nand checks that the content of hello2.txt is \"Hello, World!\".\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.paths import WorkspacePath\n\nname = 'some-folder'\nws = WorkspaceClient()\n\nwsp = WorkspacePath(ws, f\"~/{name}\")\nwith_user = wsp.expanduser()\nwith_user.mkdir(parents=True)\n\nhello_txt = with_user / \"hello.txt\"\nhello_txt.write_text(\"Hello, World!\")\n\nhello_txt.replace(with_user / \"hello2.txt\")\n\nassert not hello_txt.exists()\nassert (with_user / \"hello2.txt\").read_text() == \"Hello, World!\"\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Working With Notebook Sources\n\nThis code initializes a Databricks WorkspaceClient, creates a WorkspacePath object for the path ~/some-folder, and \ndefines two items within this folder: a text file (a.txt) and a Python notebook (b). It creates the notebook with \nspecified content and writes \"Hello, World!\" to the text file. The code then retrieves all files in the folder, asserts \nthere are exactly two files, and verifies the suffix and content of each file. Specifically, it checks that a.txt has a \n.txt suffix and b has a .py suffix, with the notebook containing the expected code.\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.paths import WorkspacePath\n\nws = WorkspaceClient()\n\nfolder = WorkspacePath(ws, \"~/some-folder\")\n\ntxt_file = folder / \"a.txt\"\npy_notebook = folder / \"b\"  # notebooks have no file extension\n\nmake_notebook(path=py_notebook, content=\"display(spark.range(10))\")\ntxt_file.write_text(\"Hello, World!\")\n\nfiles = {_.name: _ for _ in folder.glob(\"**/*\")}\nassert len(files) == 2\n\nassert files[\"a.txt\"].suffix == \".txt\"\nassert files[\"b\"].suffix == \".py\"  # suffix is determined from ObjectInfo\nassert files[\"b\"].read_text() == \"# Databricks notebook source\\ndisplay(spark.range(10))\"\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n## Basic Terminal User Interface (TUI) Primitives\n\nYour command-line apps do need testable interactivity, which is provided by `from databricks.labs.blueprint.tui import Prompts`. Here are some examples of it:\n\n![ucx install](docs/ucx-install.gif)\n\nIt is also integrated with our [command router](#commands-with-interactive-prompts). \n\n[[back to top](#databricks-labs-blueprint)]\n\n### Simple Text Questions\n\nUse `prompts.question()` as a bit more involved than `input()` builtin:\n\n```python\nfrom databricks.labs.blueprint.tui import Prompts\n\nprompts = Prompts()\nanswer = prompts.question('Enter a year', default='2024', valid_number=True)\nprint(answer)\n```\n\n![question](docs/prompts-question.gif)\n\nOptional arguments are:\n\n* `default` (str) - use given value if user didn't input anything\n* `max_attempts` (int, default 10) - number of attempts to throw exception after invalid or empty input\n* `valid_number` (bool) - input has to be a valid number\n* `valid_regex` (bool) - input has to be a valid regular expression\n* `validate` - function that takes a string and returns boolean, like `lambda x: 'awesome' in x`, that could be used to further validate input.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Confirming Actions\n\nUse `prompts.confirm()` to guard any optional or destructive actions of your app:\n\n```python\nif prompts.confirm('Destroy database?'):\n    print('DESTROYING DATABASE')\n```\n\n![confirm](docs/prompts-confirm.gif)\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Single Choice from List\n\nUse to select a value from a list:\n\n```python\nanswer = prompts.choice('Select a language', ['Python', 'Rust', 'Go', 'Java'])\nprint(answer)\n```\n\n![choice](docs/prompts-choice.gif)\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Single Choice from Dictionary\n\nUse to select a value from the dictionary by showing users sorted dictionary keys:\n\n```python\nanswer = prompts.choice_from_dict('Select a locale', {\n    '\u0423\u043a\u0440\u0430\u0457\u043d\u0441\u044c\u043a\u0430': 'ua',\n    'English': 'en'\n})\nprint(f'Locale is: {answer}')\n```\n\n![choice from dict](docs/prompts-choice-from-dict.gif)\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Multiple Choices from Dictionary\n\nUse to select multiple items from dictionary\n\n```python\nanswer = prompts.multiple_choice_from_dict(\n    'What projects are written in Python? Select [DONE] when ready.', {\n    'Databricks Labs UCX': 'ucx',\n    'Databricks SDK for Python': 'sdk-py',\n    'Databricks SDK for Go': 'sdk-go',\n    'Databricks CLI': 'cli',\n})\nprint(f'Answer is: {answer}')\n```\n\n![multiple choice](docs/prompts-choice-from-dict.gif)\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Unit Testing Prompts\n\nUse `MockPrompts` with regular expressions as keys and values as answers. The longest key takes precedence.\n\n```python\nfrom databricks.labs.blueprint.tui import MockPrompts\n\ndef test_ask_for_int():\n    prompts = MockPrompts({r\".*\": \"\"})\n    res = prompts.question(\"Number of threads\", default=\"8\", valid_number=True)\n    assert \"8\" == res\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n## Nicer Logging Formatter\n\nThere's a basic logging configuration available for [Python SDK](https://github.com/databricks/databricks-sdk-py?tab=readme-ov-file#logging), but the default output is not pretty and is relatively inconvenient to read. Here's how make output from Python's standard logging facility more enjoyable to read:\n\n```python\nfrom databricks.labs.blueprint.logger import install_logger\n\ninstall_logger()\n\nimport logging\nlogging.root.setLevel(\"DEBUG\") # use only for development or demo purposes\n\nlogger = logging.getLogger(\"name.of.your.module\")\nlogger.debug(\"This is a debug message\")\nlogger.info(\"This is an table message\")\nlogger.warning(\"This is a warning message\")\nlogger.error(\"This is an error message\", exc_info=KeyError(123))\nlogger.critical(\"This is a critical message\")\n```\n\nHere are the assumptions made by this formatter:\n\n * Most likely you're forwarding your logs to a file already, this log formatter is mainly for visual consumption.\n * The average app or Databricks Job most likely finishes running within a day or two, so we display only hours, minutes, and seconds from the timestamp.\n * We gray out debug messages, and highlight all other messages. Errors and fatas are additionally painted with red.\n * We shorten the name of the logger to a readable chunk only, not to clutter the space. Real-world apps have deeply nested folder structures and filenames like `src/databricks/labs/ucx/migration/something.py`, which translate into `databricks.labs.ucx.migration.something` fully-qualified Python module names, that get reflected into `__name__` [top-level code environment](https://docs.python.org/3/library/__main__.html#what-is-the-top-level-code-environment) special variable, that you idiomatically use with logging as `logger.getLogger(__name__)`. This log formatter shortens the full module path to a more readable `d.l.u.migration.something`, which is easier to consume from a terminal screen or a notebook. \n * We only show the name of the thread if it's other than `MainThread`, because the overwhelming majority of Python applications are single-threaded.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Rendering on Dark Background\n\nHere's how the output would look like on dark terminal backgrounds, including those from GitHub Actions:\n\n![logger dark](docs/logger-dark.png)\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Rendering in Databricks Notebooks\n\nAnd here's how things will appear when executed from Databricks Runtime as part of notebook or a workflow:\n\n![logger white](docs/notebook-logger.png)\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Integration With Your App\n\nJust place the following code in your wheel's top-most `__init__.py` file:\n\n```python\nfrom databricks.labs.blueprint.logger import install_logger\n\ninstall_logger(level=\"INFO\")\n```\n\nAnd place this idiomatic \n\n```python\n# ... insert this into the top of your file\nfrom databricks.labs.blueprint.entrypoint import get_logger\n\nlogger = get_logger(__file__)\n# ... top of the file insert end\n```\n\n... and you'll be able to benefit from the readable console stderr formatting everywhere \n\nEach time you'd need to turn on debug logging, just invoke `logging.root.setLevel(\"DEBUG\")` (even in notebook).\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Integration with `console_script` Entrypoints\n\nWhen you invoke Python as an entry point to your wheel (also known as `console_scripts`), [`__name__` top-level code environment](https://docs.python.org/3/library/__main__.html#what-is-the-top-level-code-environment) would always be equal to `__main__`. But you really want to get the logger to be named after your Python module and not just `__main__` (see [rendering in Databricks notebooks](#rendering-in-databricks-notebooks)).\n\nIf you create a `dist/logger.py` file with the following contents:\n\n```python\nfrom databricks.labs.blueprint.entrypoint import get_logger, run_main\n\nlogger = get_logger(__file__)\n\ndef main(first_arg, second_arg, *other):\n    logger.info(f'First arg is: {first_arg}')\n    logger.info(f'Second arg is: {second_arg}')\n    logger.info(f'Everything else is: {other}')\n    logger.debug('... and this message is only shown when you are debugging from PyCharm IDE')\n\nif __name__ == '__main__':\n    run_main(main)\n```\n\n... and invoke it with `python dist/logger.py Hello world, my name is Serge`, you should get back the following output.\n\n```\n13:46:42  INFO [dist.logger] First arg is: Hello\n13:46:42  INFO [dist.logger] Second arg is: world,\n13:46:42  INFO [dist.logger] Everything else is: ('my', 'name', 'is', 'Serge')\n```\n\nEverything is made easy thanks to `run_main(fn)` helper.\n\n[[back to top](#databricks-labs-blueprint)]\n\n## Parallel Task Execution\n\nPython applies global interpreter lock (GIL) for compute-intensive tasks, though IO-intensive tasks, like calling Databricks APIs through Databricks SDK for Python, are not subject to GIL. It's quite a common task to perform multiple different API calls in parallel, though it is overwhelmingly difficult to do multi-threading right. `concurrent.futures import ThreadPoolExecutor` is great, but sometimes we want something even more high level. This library helps you navigate the most common road bumps.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Collecting Results\n\nThis library helps you filtering out empty results from background tasks, so that the downstream code is generally simpler. We're also handling the thread pool namind, so that the name of the list of tasks properly gets into log messages. After all background tasks completed their execution, we log something like `Finished 'task group name' tasks: 50% results available (2/4). Took 0:00:00.000604`.\n\n```python\nfrom databricks.labs.blueprint.parallel import Threads\n\ndef not_really_but_fine():\n    logger.info(\"did something, but returned None\")\n\ndef doing_something():\n    logger.info(\"doing something important\")\n    return f'result from {doing_something.__name__}'\n\nlogger.root.setLevel('DEBUG')\ntasks = [not_really_but_fine, not_really_but_fine, doing_something, doing_something]\nresults, errors = Threads.gather(\"task group name\", tasks)\n\nassert ['result from doing_something', 'result from doing_something'] == results\nassert [] == errors\n```\n\nThis will log the following messages:\n\n```\n14:20:15 DEBUG [d.l.blueprint.parallel] Starting 4 tasks in 20 threads\n14:20:15  INFO [dist.logger][task_group_name_0] did something, but returned None\n14:20:15  INFO [dist.logger][task_group_name_1] did something, but returned None\n14:20:15  INFO [dist.logger][task_group_name_1] doing something important\n14:20:15  INFO [dist.logger][task_group_name_1] doing something important\n14:20:15  INFO [d.l.blueprint.parallel][task_group_name_1] task group name 4/4, rps: 7905.138/sec\n14:20:15  INFO [d.l.blueprint.parallel] Finished 'task group name' tasks: 50% results available (2/4). Took 0:00:00.000604\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Collecting Errors from Background Tasks\n\nInspired by Go Language's idiomatic error handling approach, this library allows for collecting errors from all of the background tasks and handle them separately. For all other cases, we recommend using [strict failures](#strict-failures-from-background-tasks)\n\n```python\nfrom databricks.sdk.errors import NotFound\nfrom databricks.labs.blueprint.parallel import Threads\n\ndef works():\n    return True\n\ndef fails():\n    raise NotFound(\"something is not right\")\n\ntasks = [works, fails, works, fails, works, fails, works, fails]\nresults, errors = Threads.gather(\"doing some work\", tasks)\n\nassert [True, True, True, True] == results\nassert 4 == len(errors)\n```\n\nThis will log the following messages:\n\n```\n14:08:31 ERROR [d.l.blueprint.parallel][doing_some_work_0] doing some work task failed: something is not right: ...\n...\n14:08:31 ERROR [d.l.blueprint.parallel][doing_some_work_3] doing some work task failed: something is not right: ...\n14:08:31 ERROR [d.l.blueprint.parallel] More than half 'doing some work' tasks failed: 50% results available (4/8). Took 0:00:00.001011\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Strict Failures from Background Tasks\n\nUse `Threads.strict(...)` to raise `ManyError` with the summary of all failed tasks:\n\n```python\nfrom databricks.sdk.errors import NotFound\nfrom databricks.labs.blueprint.parallel import Threads\n\ndef works():\n    return True\n\ndef fails():\n    raise NotFound(\"something is not right\")\n\ntasks = [works, fails, works, fails, works, fails, works, fails]\nresults = Threads.strict(\"doing some work\", tasks)\n\n# this line won't get executed\nassert [True, True, True, True] == results\n```\n\nThis will log the following messages:\n\n```\n...\n14:11:46 ERROR [d.l.blueprint.parallel] More than half 'doing some work' tasks failed: 50% results available (4/8). Took 0:00:00.001098\n...\ndatabricks.labs.blueprint.parallel.ManyError: Detected 4 failures: NotFound: something is not right\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n## Application and Installation State\n\nThere always needs to be a location, where you put application code, artifacts, and configuration. \nThe `Installation` class is used to manage the `~/.{product}` folder on WorkspaceFS to track [typed files](#saving-dataclass-configuration).\nIt provides methods for serializing and deserializing objects of a specific type, as well as managing the [storage location](#install-folder) \nfor those objects. The class includes methods for loading and saving objects, uploading and downloading\nfiles, and managing the installation folder.\n\nThe `Installation` class can be helpful for unit testing by allowing you to mock the file system and control\nthe behavior of the [`load`](#loading-dataclass-configuration) and [`save`](#saving-dataclass-configuration) methods. \nSee [unit testing](#unit-testing-installation-state) for more details.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Install Folder\n\nThe `install_folder` method returns the path to the installation folder on WorkspaceFS. The installation folder \nis used to store typed files that are managed by the `Installation` class. [Publishing wheels](#publishing-wheels-to-databricks-workspace) \nupdate the `version.json` file in the install folder.\n\nWhen integration testing, you may want to have a [random installation folder](#using-productinfo-with-integration-tests) for each test execution.\n\nIf an `install_folder` argument is provided to the constructor of the `Installation` class, it will be used\nas the installation folder. Otherwise, the installation folder will be determined based on the current user's\nusername. Specifically, the installation folder will be `/Users/{user_name}/.{product}`, where `{user_name}`\nis the username of the current user and `{product}` is the [name of the product](#application-name-detection)\n associated with the installation. Here is an example of how you can use the `install_folder` method:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\n# Create an Installation object for the \"blueprint\" product\ninstall = Installation(WorkspaceClient(), \"blueprint\")\n\n# Print the path to the installation folder\nprint(install.install_folder())\n# Output: /Users/{user_name}/.blueprint\n```\n\nIn this example, the `Installation` object is created for the \"blueprint\" product. The `install_folder` method\nis then called to print the path to the installation folder. The output will be `/Users/{user_name}/.blueprint`,\nwhere `{user_name}` is the username of the current user.\n\nYou can also provide an `install_folder` argument to the constructor to specify a custom installation folder.\nHere is an example of how you can do this:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\n# Create an Installation object for the \"blueprint\" product with a custom installation folder\ninstall = Installation(WorkspaceClient(), \"blueprint\", install_folder=\"/my/custom/folder\")\n\n# Print the path to the installation folder\nprint(install.install_folder())\n# Output: /my/custom/folder\n```\n\nIn this example, the `Installation` object is created for the \"blueprint\" product with a custom installation\nfolder of `/my/custom/folder`. The `install_folder` method is then called to print the path to the installation\nfolder. The output will be `/my/custom/folder`.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Detecting Current Installation\n\n`Installation.current(ws, product)` returns the `Installation` object for the given product in the current workspace.\n\nIf the installation is not found, a `NotFound` error is raised. If `assume_user` argument is True, the method\nwill assume that the installation is in the user's home directory and return it if found. If False, the method\nwill only return an installation that is in the `/Applications` directory.\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\nws = WorkspaceClient()\n\n# current user installation\ninstallation = Installation.assume_user_home(ws, \"blueprint\")\nassert \"/Users/foo/.blueprint\" == installation.install_folder()\nassert not installation.is_global()\n\n# workspace global installation\ninstallation = Installation.current(ws, \"blueprint\")\nassert \"/Applications/blueprint\" == installation.install_folder()\nassert installation.is_global()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Detecting Installations From All Users\n\n`Installation.existing(ws, product)` Returns a collection of all existing installations for the given product in the current workspace.\n\nThis method searches for installations in the root /Applications directory and home directories of all users in the workspace. \nLet's say, users `foo@example.com` and `bar@example.com` installed `blueprint` product in their home folders. The following\ncode will print `/Workspace/bar@example.com/.blueprint` and `/Workspace/foo@example.com/.blueprint`:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\nws = WorkspaceClient()\n\nglobal_install = Installation.assume_global(ws, 'blueprint')\nglobal_install.upload(\"some.bin\", b\"...\")\n\nuser_install = Installation.assume_user_home(ws, 'blueprint')\nuser_install.upload(\"some.bin\", b\"...\")\n\nfor blueprint in Installation.existing(ws, \"blueprint\"):\n  print(blueprint.install_folder())\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Saving `@dataclass` configuration\n\nThe `save(obj)` method saves a dataclass instance of type `T` to a file on WorkspaceFS. If no `filename` is provided, \nthe name of the `type_ref` class will be used as the filename. Any missing parent directories are created automatically.\nIf the object has a `__version__` attribute, the method will add a `version` field to the serialized object\nwith the value of the `__version__` attribute. See [configuration format evolution](#configuration-format-evolution) \nfor more details. `save(obj)` works with JSON and YAML configurations without the need to supply `filename` keyword \nattribute. When you need to save [CSV files](#saving-csv-files), the `filename` attribute is required. If you need to \nupload arbitrary and untyped files, use the [`upload()` method](#uploading-untyped-files).\n\nHere is an example of how you can use the `save` method:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\ninstall = Installation(WorkspaceClient(), \"blueprint\")\n\n@dataclass\nclass MyClass:\n    field1: str\n    field2: str\n\nobj = MyClass('value1', 'value2')\ninstall.save(obj)\n\n# Verify that the object was saved correctly\nloaded_obj = install.load(MyClass)\nassert loaded_obj == obj\n```\n\nIn this example, the `Installation` object is created for the \"blueprint\" product. A dataclass object of type\n`MyClass` is then created and saved to a file using the `save` method. The object is then loaded from the file\nusing the [`load` method](#loading-dataclass-configuration) and compared to the original object to verify that \nit was saved correctly.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Saving CSV files\n\nYou may need to upload a CSV file to Databricks Workspace, so that it's easier editable from a Databricks Workspace UI \nor tools like Google Sheets or Microsoft Excel. If non-technical humands don't need to edit application state,\nuse [dataclasses](#saving-dataclass-configuration) for configuration. CSV files currently don't support \n[format evolution](#configuration-format-evolution).\n\nThe following example will save `workspaces.csv` file with two records and a header:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.sdk.service.provisioning import Workspace\nfrom databricks.labs.blueprint.installation import Installation\n\ninstallation = Installation(WorkspaceClient(), \"blueprint\")\n\ninstallation.save([\n  Workspace(workspace_id=1234, workspace_name=\"first\"),\n  Workspace(workspace_id=1235, workspace_name=\"second\"),\n], filename=\"workspaces.csv\")\n\n# ~ $ databricks workspace export /Users/foo@example.com/.blueprint/workspaces.csv\n# ... workspace_id,workspace_name\n# ... 1234,first\n# ... 1235,second\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Loading `@dataclass` configuration\n\nThe `load(type_ref[, filename])` method loads an object of type `type_ref` from a file on WorkspaceFS. If no `filename` is\nprovided, the `__file__` attribute of `type_ref` will be used as the filename, otherwise the library will figure out the name\nbased on a class name.\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\n@dataclass\nclass SomeConfig:  # <-- auto-detected filename is `some-config.json`\n    version: str\n\nws = WorkspaceClient()\ninstallation = Installation.current(ws, \"blueprint\")\ncfg = installation.load(SomeConfig)\n\ninstallation.save(SomeConfig(\"0.1.2\"))\ninstallation.assert_file_written(\"some-config.json\", {\"version\": \"0.1.2\"})\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Brute-forcing `SerdeError` with `as_dict()` and `from_dict()`\n\nIn the rare circumstances when you cannot use [@dataclass](#loading-dataclass-configuration) or you get `SerdeError` that you cannot explain, you can implement `from_dict(cls, raw: dict) -> 'T'` and `as_dict(self) -> dict` methods on the class:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\nclass SomePolicy:\n    def __init__(self, a, b):\n        self._a = a\n        self._b = b\n\n    def as_dict(self) -> dict:\n        return {\"a\": self._a, \"b\": self._b}\n\n    @classmethod\n    def from_dict(cls, raw: dict):\n        return cls(raw.get(\"a\"), raw.get(\"b\"))\n\n    def __eq__(self, o):\n        assert isinstance(o, SomePolicy)\n        return self._a == o._a and self._b == o._b\n\npolicy = SomePolicy(1, 2)\ninstallation = Installation.current(WorkspaceClient(), \"blueprint\")\ninstallation.save(policy, filename=\"backups/policy-123.json\")\nload = installation.load(SomePolicy, filename=\"backups/policy-123.json\")\n\nassert load == policy\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Configuration Format Evolution\n\nAs time progresses, your application evolves. So does the configuration file format with it. This library provides\na common utility to seamlessly evolve configuration file format across versions, providing callbacks to convert\nfrom older versions to newer. If you need to migrate configuration or database state of the entire application, \nuse the [application state migrations](#application-state-migrations).\n\nIf the type has a `__version__` attribute, the method will check that the version of the object in the file\nmatches the expected version. If the versions do not match, the method will attempt to migrate the object to\nthe expected version using a method named `v{actual_version}_migrate` on the `type_ref` class. If the migration\nis successful, the method will return the migrated object. If the migration is not successful, the method will\nraise an `IllegalState` exception. Let's say, we have `/Users/foo@example.com/.blueprint/config.yml` file with\nonly the `initial: 999` as content, which is from older installations of the `blueprint` product:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\n@dataclass\nclass EvolvedConfig:\n    __file__ = \"config.yml\"\n    __version__ = 3\n\n    initial: int\n    added_in_v1: int\n    added_in_v2: int\n\n    @staticmethod\n    def v1_migrate(raw: dict) -> dict:\n        raw[\"added_in_v1\"] = 111\n        raw[\"version\"] = 2\n        return raw\n\n    @staticmethod\n    def v2_migrate(raw: dict) -> dict:\n        raw[\"added_in_v2\"] = 222\n        raw[\"version\"] = 3\n        return raw\n\ninstallation = Installation.current(WorkspaceClient(), \"blueprint\")\ncfg = installation.load(EvolvedConfig)\n\nassert 999 == cfg.initial\nassert 111 == cfg.added_in_v1  # <-- added by v1_migrate()\nassert 222 == cfg.added_in_v2  # <-- added by v2_migrate()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Uploading Untyped Files\n\nThe `upload(filename, raw_bytes)` and `upload_dbfs(filename, raw_bytes)` methods upload raw bytes to a file on \nWorkspaceFS (or DBFS) with the given `filename`, creating any missing directories where required. This method \nis used to upload files that are not typed, i.e., they do not use the [`@dataclass` decorator](#saving-dataclass-configuration).\n\n```python\ninstallation = Installation(ws, \"blueprint\")\n\ntarget = installation.upload(\"wheels/foo.whl\", b\"abc\")\nassert \"/Users/foo/.blueprint/wheels/foo.whl\" == target\n```\n\nThe most common example is a [wheel](#building-wheels), which we already integrate with `Installation` framework.\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Listing All Files in the Install Folder\n\nYou can use `files()` method to recursively list all files in the [install folder](#install-folder).\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Unit Testing Installation State\n\nYou can create a `MockInstallation` object and use it to override the default installation folder and the contents \nof the files in that folder. This allows you to test the of your code in different scenarios, such as when a file \nis not found or when the contents of a file do not match the expected format. \n\n\nFor example, you have the following `WorkspaceConfig` class that is serialized into `config.yml` on your workspace:\n\n```python\n@dataclass\nclass WorkspaceConfig:\n  __file__ = \"config.yml\"\n  __version__ = 2\n\n  inventory_database: str\n  connect: Config | None = None\n  workspace_group_regex: str | None = None\n  include_group_names: list[str] | None = None\n  num_threads: int | None = 10\n  database_to_catalog_mapping: dict[str, str] | None = None\n  log_level: str | None = \"INFO\"\n  workspace_start_path: str = \"/\"\n```\n\nHere's the only code necessary to verify that specific content got written:\n\n```python\nfrom databricks.labs.blueprint.installation import MockInstallation\n\ninstallation = MockInstallation()\n\ninstallation.save(WorkspaceConfig(inventory_database=\"some_blueprint\"))\n\ninstallation.assert_file_written(\"config.yml\", {\n  \"version\": 2,\n  \"inventory_database\": \"some_blueprint\",\n  \"log_level\": \"INFO\",\n  \"num_threads\": 10,\n  \"workspace_start_path\": \"/\",\n})\n```\n\nThis method is far superior than directly comparing raw bytes content via mock:\n\n```python\nws.workspace.upload.assert_called_with(\n  \"/Users/foo/.blueprint/config.yml\",\n  yaml.dump(\n    {\n      \"version\": 2,\n      \"num_threads\": 10,\n      \"inventory_database\": \"some_blueprint\",\n      \"include_group_names\": [\"foo\", \"bar\"],\n      \"workspace_start_path\": \"/\",\n      \"log_level\": \"INFO\",\n    }\n  ).encode(\"utf8\"),\n  format=ImportFormat.AUTO,\n  overwrite=True,\n)\n```\n\nAnd it's even better if you use PyTest, where we have even [deeper integration](#assert-rewriting-with-pytest).\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Assert Rewriting with PyTest\n\nIf you are using [PyTest](https://docs.pytest.org/), then add this to your `conftest.py`, so that\nthe assertions are more readable:\n\n```python\nimport pytest\n\npytest.register_assert_rewrite('databricks.labs.blueprint.installation')\n```\n\n![pytest asserts](docs/pytest-installation-asserts.png)\n\n[[back to top](#databricks-labs-blueprint)]\n\n## Application State Migrations\n\nAs time goes by, your applications evolve as well, requiring the addition of new columns to database schemas, \nchanges of the database state, or some migrations of configured workflows. This utility allows you to do seamless \nupgrades from version X to version Z through version Y. Idiomatic usage in your deployment automation is as follows:\n\n```python\nfrom ... import Config\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.upgrades import Upgrades\nfrom databricks.labs.blueprint.wheels import ProductInfo\n\nproduct_info = ProductInfo.from_class(Config)\nws = WorkspaceClient(product=product_info.product_name(), product_version=product_info.version())\ninstallation = product_info.current_installation(ws)\nconfig = installation.load(Config)\nupgrades = Upgrades(product_info, installation)\nupgrades.apply(ws)\n```\n\nThe upgrade process loads the version of [the product](#application-name-detection) that is about to be installed from `__about__.py` file that\ndeclares the [`__version__` variable](#released-version-detection). This version is compares with the version currently installed on\nthe Databricks Workspace by loading it from the `version.json` file in the [installation folder](#install-folder). This file is kept\nup-to-date automatically if you use the [databricks.labs.blueprint.wheels.WheelsV2](#publishing-wheels-to-databricks-workspace).\n\nIf those versions are different, the process looks for the `upgrades` folder next to `__about__.py` file and\ncomputes a difference for the upgrades in need to be rolled out. Every upgrade script in that directory has to\nstart with a valid SemVer identifier, followed by the alphanumeric description of the change,\nlike `v0.0.1_add_service.py`. Each script has to expose a function that takes [`Installation`](#installation) and\n`WorkspaceClient` arguments to perform the relevant upgrades. Here's the example:\n\n```python\nfrom ... import Config\n\nimport logging, dataclasses\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.installation import Installation\n\nupgrade_logger = logging.getLogger(__name__)\n\ndef upgrade(installation: Installation, ws: WorkspaceClient):\n    upgrade_logger.info(f\"creating new automated service user for the installation\")\n    config = installation.load(Config)\n    service_principal = ws.service_principals.create(display_name='blueprint-service')\n    new_config = dataclasses.replace(config, application_id=service_principal.application_id)\n    installation.save(new_config)\n```\n\nTo prevent the same upgrade script from being applies twice, we use `applied-upgrades.json` file in\nthe installation directory. At the moment, there's no `downgrade(installation, ws)`, but it can easily be added in \nthe future versions of this library.\n\n[[back to top](#databricks-labs-blueprint)]\n\n## Building Wheels\n\nWe recommend deploying applications as wheels, which are part of the [application installation](#application-and-installation-state). But versioning, testing, and deploying those is often a tedious process.\n\n### Released Version Detection\n\nWhen you deploy your Python app as a wheel, every time it has to have a different version. This library detects `__about__.py` file automatically anywhere in the project root and reads `__version__` variable from it. We support [SemVer](https://semver.org/) versioning scheme. [Publishing wheels](#publishing-wheels-to-databricks-workspace) update `version.json` file in the [install folder](#install-folder).\n\n```python\nfrom databricks.labs.blueprint.wheels import ProductInfo\n\nproduct_info = ProductInfo(__file__)\nversion = product_info.released_version()\nlogger.info(f'Version is: {version}')\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Unreleased Version Detection\n\nWhen you develop your wheel and iterate on testing it, it's often required to upload a file with different name each time you build it. We use `git describe --tags` command to fetch the latest SemVer-compatible tag (e.g. `v0.0.2`) and append the number of commits with timestamp to it. For example, if the released version is `v0.0.1`, then the unreleased version would be something like `0.0.2+120240105144650`. We verify that this version is compatible with both SemVer and [PEP 440](https://peps.python.org/pep-0440/). [Publishing wheels](#publishing-wheels-to-databricks-workspace) update `version.json` file in the [install folder](#install-folder).\n\n```python\nproduct_info = ProductInfo(__file__)\n\nversion = product_info.unreleased_version()\nis_git = product_info.is_git_checkout()\nis_unreleased = product_info.is_unreleased_version()\n\nlogger.info(f'Version is: {version}')\nlogger.info(f'Git checkout: {is_git}')\nlogger.info(f'Is unreleased: {is_unreleased}')\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Application Name Detection\n\nLibrary can infer the name of application by taking the directory name when `__about__.py` file is located within the current project. See [released version detection](#released-version-detection) for more details.\n[`ProductInfo.for_testing(klass)`](#using-productinfo-with-integration-tests) creates a new `ProductInfo` object with a random `product_name`.\n\n```python\nfrom databricks.labs.blueprint.wheels import ProductInfo\n\nproduct_info = ProductInfo(__file__)\nlogger.info(f'Product name is: {product_info.product_name()}')\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Using `ProductInfo` with integration tests\n\nWhen you're integration testing your [installations](#installation), you may want to have different [installation folders](#install-folder) for each test execution. `ProductInfo.for_testing(klass)` helps you with this:\n\n```python\nfrom ... import ConfigurationClass\nfrom databricks.labs.blueprint.wheels import ProductInfo\n\nfirst = ProductInfo.for_testing(ConfigurationClass)\nsecond = ProductInfo.for_testing(ConfigurationClass)\nassert first.product_name() != second.product_name()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Publishing Wheels to Databricks Workspace\n\nBefore you execute a wheel on Databricks, you have to build it and upload it. This library provides detects [released](#released-version-detection) or [unreleased](#unreleased-version-detection) version of the wheel, copies it over to a temporary folder, changes the `__about__.py` file with the right version, and builds the wheel in the temporary location, so that it's not polluted with build artifacts. `Wheels` is a context manager, so it removes all temporary files and folders ather `with` block finishes. This library is successfully used to concurrently test wheels on Shared Databricks Clusters through notebook-scoped libraries. Before you deploy the new version of the wheel, it is highly advised that you perform [application state upgrades](#application-state-migrations).\n\nEvery call `wheels.upload_to_wsfs()` updates `version.json` file in the [install folder](#install-folder), which holds `version` field with the current wheel version. There's also `wheel` field, that contains the path to the current wheel file on WorkspaceFS.\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.wheels import ProductInfo\n\nw = WorkspaceClient()\nproduct_info = ProductInfo(__file__)\ninstallation = product_info.current_installation(w)\n\nwith product_info.wheels(w) as wheels:\n    remote_wheel = wheels.upload_to_wsfs()\n    logger.info(f'Uploaded to {remote_wheel}')\n```\n\nThis will print something like:\n\n```\n15:08:44  INFO [dist.logger] Uploaded to /Users/serge.smertin@databricks.com/.blueprint/wheels/databricks_labs_blueprint-0.0.2+120240105150840-py3-none-any.whl\n```\n\nYou can also do `wheels.upload_to_dbfs()`, though you're not able to set any access control over it.\n\n### Publishing upstream dependencies to workspaces without Public Internet access\n\nPython wheel may have dependencies that are not included in the wheel itself. These dependencies are usually other Python packages that your wheel relies on. During installation on regular Databricks Workspaces, these dependencies get automatically fetched from [Python Package Index](https://pypi.org/). \n\nSome Databricks Workspaces are configured with extra layers of network security, that block all access to Public Internet, including [Python Package Index](https://pypi.org/). To ensure installations working on these kinds of workspaces, developers need to explicitly upload all upstream dependencies for their applications to work correctly.\n\nThe `upload_wheel_dependencies(prefixes)` method can be used to upload these dependencies to Databricks Workspace. This method takes a list of prefixes as an argument. It will upload all the dependencies of the wheel that have names starting with any of the provided prefixes.\n\nHere is an example of how you can use this method:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.wheels import ProductInfo\n\nws = WorkspaceClient()\nproduct_info = ProductInfo(__file__)\ninstallation = product_info.current_installation(ws)\n\nwith product_info.wheels(ws) as wheels:\n    wheel_paths = wheels.upload_wheel_dependencies(['databricks_sdk', 'pandas'])\n    for path in wheel_paths:\n        print(f'Uploaded dependency to {path}')\n```\n\nIn this example, the `upload_wheel_dependencies(['databricks_sdk', 'pandas'])` call will upload all the dependencies of the wheel that have names starting with 'databricks_sdk' or 'pandas'. This method excludes any platform specific dependencies (i.e. ending with `-none-any.whl`). Also the main wheel file is not uploaded. The method returns a list of paths to the uploaded dependencies on WorkspaceFS.\n\n\n[[back to top](#databricks-labs-blueprint)]\n\n## Databricks CLI's `databricks labs ...` Router\n\nThis library contains common utilities for Databricks CLI entrypoints defined in [`labs.yml`](labs.yml) file. Here's the example metadata for a tool named `blueprint` with a single `me` command and flag named `--greeting`, that has `Hello` as default value:\n\n```yaml\n---\nname: blueprint\ndescription: Common libraries for Databricks Labs\ninstall:\n  script: src/databricks/labs/blueprint/__init__.py\nentrypoint: src/databricks/labs/blueprint/__main__.py\nmin_python: 3.10\ncommands:\n  - name: me\n    description: shows current username\n    flags:\n     - name: greeting\n       default: Hello\n       description: Greeting prefix\n```\n\nAnd here's the content for [`src/databricks/labs/blueprint/__main__.py`](src/databricks/labs/blueprint/__main__.py) file, that executes `databricks labs blueprint me` command with `databricks.sdk.WorkspaceClient` automatically injected into an argument with magical name `w`:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.entrypoint import get_logger\nfrom databricks.labs.blueprint.cli import App\n\napp = App(__file__)\nlogger = get_logger(__file__)\n\n\n@app.command\ndef me(w: WorkspaceClient, greeting: str):\n    \"\"\"Shows current username\"\"\"\n    logger.info(f\"{greeting}, {w.current_user.me().user_name}!\")\n\n\nif \"__main__\" == __name__:\n    app()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Account-level Commands\n\nAs you may have noticed, there were only workspace-level commands, but you can also nave native account-level command support. You need to specify the `is_account` property when declaring it in `labs.yml` file:\n\n```yaml\ncommands:\n  # ...\n  - name: workspaces\n    is_account: true\n    description: shows current workspaces\n```\n\nand `@app.command(is_account=True)` will get you `databricks.sdk.AccountClient` injected into `a` argument:\n\n```python\nfrom databricks.sdk import AccountClient\n\n@app.command(is_account=True)\ndef workspaces(a: AccountClient):\n    \"\"\"Shows workspaces\"\"\"\n    for ws in a.workspaces.list():\n        logger.info(f\"Workspace: {ws.workspace_name} ({ws.workspace_id})\")\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Commands with interactive prompts\n\nIf your command needs some terminal interactivity, simply add [`prompts: Prompts` argument](#basic-terminal-user-interface-tui-primitives) to your command:\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.labs.blueprint.entrypoint import get_logger\nfrom databricks.labs.blueprint.cli import App\nfrom databricks.labs.blueprint.tui import Prompts\n\napp = App(__file__)\nlogger = get_logger(__file__)\n\n\n@app.command\ndef me(w: WorkspaceClient, prompts: Prompts):\n    \"\"\"Shows current username\"\"\"\n    if prompts.confirm(\"Are you sure?\"):\n        logger.info(f\"Hello, {w.current_user.me().user_name}!\")\n\nif \"__main__\" == __name__:\n    app()\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Integration with Databricks Connect\n\nInvoking Sparksession using Databricks Connect\n\n```python\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.connect import DatabricksSession\n\n@app.command\ndef example(w: WorkspaceClient):\n    \"\"\"Building Spark Session using Databricks Connect\"\"\"\n    spark = DatabricksSession.builder().sdk_config(w.config).getOrCreate()\n    spark.sql(\"SHOW TABLES\")\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n### Starting New Projects\n\nThis tooling makes it easier to start new projects. First, install the CLI:\n\n```\ndatabricks labs install blueprint\n```\n\nAfter, create new project in a designated directory:\n\n```\ndatabricks labs blueprint init-project --target /path/to/folder\n```\n\n[[back to top](#databricks-labs-blueprint)]\n\n# Notable Downstream Projects\n\nThis library is used in the following projects:\n\n- [UCX - Automated upgrade to Unity Catalog](https://github.com/databrickslabs/ucx)\n\n[[back to top](#databricks-labs-blueprint)]\n\n# Project Support\n\nPlease note that this project is provided for your exploration only and is not \nformally supported by Databricks with Service Level Agreements (SLAs). They are \nprovided AS-IS, and we do not make any guarantees of any kind. Please do not \nsubmit a support ticket relating to any issues arising from the use of this project.\n\nAny issues discovered through the use of this project should be filed as GitHub \n[Issues on this repository](https://github.com/databrickslabs/blueprint/issues). \nThey will be reviewed as time permits, but no formal SLAs for support exist.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Common libraries for Databricks Labs",
    "version": "0.9.3",
    "project_urls": {
        "Issues": "https://github.com/databrickslabs/blueprint/issues",
        "Source": "https://github.com/databrickslabs/blueprint"
    },
    "split_keywords": [
        "databricks"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "73f74e77bdcd83fb5e53d79526f4532dd05af53e5dcbb2c2854ae536baecf133",
                "md5": "2fc8e1ab36f14b6e7e4e89ddd2fe44c0",
                "sha256": "0e640953deef5e41bc324d1035ce8c5d549023178ce50708700ab34c438451f3"
            },
            "downloads": -1,
            "filename": "databricks_labs_blueprint-0.9.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2fc8e1ab36f14b6e7e4e89ddd2fe44c0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 61549,
            "upload_time": "2024-11-14T13:33:21",
            "upload_time_iso_8601": "2024-11-14T13:33:21.818469Z",
            "url": "https://files.pythonhosted.org/packages/73/f7/4e77bdcd83fb5e53d79526f4532dd05af53e5dcbb2c2854ae536baecf133/databricks_labs_blueprint-0.9.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c9d09d818b50dc4fa86a9b5fdc0d74b96eacbd06410d7bd10b9e5f75dc416e35",
                "md5": "56a2a1249078d5df714aafe6adba7a24",
                "sha256": "a40628c0d58b6a9c8cf776b3ffa31237a8eec2f4d7a21142464cd2c285a2cd61"
            },
            "downloads": -1,
            "filename": "databricks_labs_blueprint-0.9.3.tar.gz",
            "has_sig": false,
            "md5_digest": "56a2a1249078d5df714aafe6adba7a24",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 76667,
            "upload_time": "2024-11-14T13:33:23",
            "upload_time_iso_8601": "2024-11-14T13:33:23.730507Z",
            "url": "https://files.pythonhosted.org/packages/c9/d0/9d818b50dc4fa86a9b5fdc0d74b96eacbd06410d7bd10b9e5f75dc416e35/databricks_labs_blueprint-0.9.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-14 13:33:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "databrickslabs",
    "github_project": "blueprint",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "databricks-labs-blueprint"
}
        
Elapsed time: 0.40191s