# tango-mlflow
[![Actions Status](https://github.com/altescy/tango-mlflow/workflows/CI/badge.svg)](https://github.com/altescy/tango-mlflow/actions/workflows/ci.yml)
[![Python version](https://img.shields.io/pypi/pyversions/tango-mlflow)](https://github.com/altescy/tango-mlflow)
[![pypi version](https://img.shields.io/pypi/v/tango-mlflow)](https://pypi.org/project/tango-mlflow/)
[![License](https://img.shields.io/github/license/altescy/tango-mlflow)](https://github.com/altescy/tango-mlflow/blob/main/LICENSE)
MLflow integration for [ai2-tango](https://github.com/allenai/tango)
## Introduction
`tango-mlflow` is a Python library that connects the [Tango](https://github.com/allenai/tango) to [MLflow](https://mlflow.org/).
Tango, developed by AllenAI, is a flexible pipeline library for managing experiments and caching outputs of each step.
MLflow, on the other hand, is a platform that helps manage the Machine Learning lifecycle, including experimentation, reproducibility, and deployment.
This integration enables you to store and manage complete experimental settings and artifacts of your Tango run in MLflow, thus enhancing your experiment tracking capabilities.
Here's a screenshot of the MLflow interface when executing with `tango-mlflow`:
![239279085-79ca2c6b-f9a1-49aa-a301-bb502a2340d2](https://github.com/altescy/tango-mlflow/assets/16734471/d41c6d08-faec-4ea0-8b23-fbc8c6147626)
- **Tango run**: The top-level run in MLflow corresponds to a single execution in Tango, and its name matches the execution name in Tango.
- **Tango steps**: Results of each step in Tango are recorded as nested runs in MLflow. The names of these nested runs correspond to the names of the steps in Tango.
- **Parameters and Metrics**: The entire settings for the execution, as well as the parameters for each step, are automatically logged in MLflow.
- **Artifacts and Caching**: The cached outputs of each step's execution are saved as artifacts under the corresponding MLflow run. These can be reused when executing Tango, enhancing efficiency and reproducibility.
## Installation
Install `tango-mlflow` by running the following command in your terminal:
```shell
pip install tango-mlflow[all]
```
## Usage
You can use the `MLFlowWorkspace` with command line arguments as follows:
```shell
tango run --workspace mlflow://your_experiment_name --include-package tango_mlflow
```
Alternatively, you can define your configuration in a `tango.yml` file:
```tango.yml
workspace:
type: mlflow
experiment_name: your_experiment_name
include_package:
- tango_mlflow
```
In the above `tango.yml` configuration, `type` specifies the workspace type as MLflow, and `experiment_name` sets the name of your experiment in MLflow.
The `include_package` field needs to include the `tango_mlflow` package to use `tango-mlflow` functionality.
Remember to replace `your_experiment_name` with the name of your specific experiment.
To log runs remotely, set the `MLFLOW_TRACKING_URI` environment variable to a tracking server’s URI like below:
```shell
export MLFLOW_TRACKING_URI=https://mlflow.example.com
```
## Functionalities
### Logging metrics into MLflow
The `tango-mlflow` package provides the `MlflowStep` class, which allows you to easily log the results of each step execution to MLflow.
```python
from tango_mlflow.step import MlflowStep
class TrainModel(MlflowStep):
def run(self, **params):
# pre-process...
for epoch in range(max_epochs):
loss = train_epoch(...)
metrics = evaluate_model(...)
# log metrics with mlflow_logger
for name, value in metrics.items():
self.mlflow_logger.log_metric(name, value, step=epoch)
# post-process...
```
In the example above, the `TrainModel` step inherits from `MlflowStep`.
Inside the step, you can directly record metrics to the corresponding MLflow run by invoking `self.mlflow_logger.log_metric(...)`.
Please note, this functionality must be used in conjunction with `MlflowWorkspace`.
### Summarizing Tango run metrics
You can specify a step to record its returned metrics as representative values of the Tango run by setting the class variable `MLFLOW_SUMMARY = True`.
This feature enables you to conveniently view metrics for each Tango run directly in the MLflow interface
```python
class EvaluateModel(Step):
MLFLOW_SUMMARY = True # Enables MLflow summary!
def run(self, ...) -> dict[str, float]:
# compute metrics ...
return metrics
```
In the example above, the `EvaluateModel` step returns metrics that are logged as the representative values for that Tango run. These metrics are then recorded in the corresponding (top-level) MLflow run.
Please note the following requirements:
- The return value of a step where `MLFLOW_SUMMARY = True` is set must always be `dict[str, float]`.
- You don't necessarily need to inherit from `MlflowStep` to use `MLFLOW_SUMMARY`.
### Tuning hyperparameters with Optuna
`tango-mlflow` also provides the `tango-mlflow tune` command for tuning hyperparameters with [Optuna](https://optuna.org/).
For more details, please refer to the [examples/breast_cancer](https://github.com/altescy/tango-mlflow/tree/main/examples/breast_cancer) directory.
## Examples
- Basic example: [examples/euler](https://github.com/altescy/tango-mlflow/tree/main/examples/euler)
- Hyper parameter tuning with [Optuna](https://optuna.org/): [examples/breast_cancer](https://github.com/altescy/tango-mlflow/tree/main/examples/breast_cancer)
Raw data
{
"_id": null,
"home_page": "https://github.com/altescy/tango-mlflow",
"name": "tango-mlflow",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8.1",
"maintainer_email": null,
"keywords": "python, mlflow, machine-learning",
"author": "altescy",
"author_email": "altescy@fastmail.com",
"download_url": "https://files.pythonhosted.org/packages/24/53/ad5ae3c63de413d8bbded25f027b56b03135bc9c032da1e696cf09732c30/tango_mlflow-2.0.0.tar.gz",
"platform": null,
"description": "# tango-mlflow\n\n[![Actions Status](https://github.com/altescy/tango-mlflow/workflows/CI/badge.svg)](https://github.com/altescy/tango-mlflow/actions/workflows/ci.yml)\n[![Python version](https://img.shields.io/pypi/pyversions/tango-mlflow)](https://github.com/altescy/tango-mlflow)\n[![pypi version](https://img.shields.io/pypi/v/tango-mlflow)](https://pypi.org/project/tango-mlflow/)\n[![License](https://img.shields.io/github/license/altescy/tango-mlflow)](https://github.com/altescy/tango-mlflow/blob/main/LICENSE)\n\nMLflow integration for [ai2-tango](https://github.com/allenai/tango)\n\n## Introduction\n\n`tango-mlflow` is a Python library that connects the [Tango](https://github.com/allenai/tango) to [MLflow](https://mlflow.org/).\nTango, developed by AllenAI, is a flexible pipeline library for managing experiments and caching outputs of each step.\nMLflow, on the other hand, is a platform that helps manage the Machine Learning lifecycle, including experimentation, reproducibility, and deployment.\nThis integration enables you to store and manage complete experimental settings and artifacts of your Tango run in MLflow, thus enhancing your experiment tracking capabilities.\n\nHere's a screenshot of the MLflow interface when executing with `tango-mlflow`:\n\n![239279085-79ca2c6b-f9a1-49aa-a301-bb502a2340d2](https://github.com/altescy/tango-mlflow/assets/16734471/d41c6d08-faec-4ea0-8b23-fbc8c6147626)\n\n- **Tango run**: The top-level run in MLflow corresponds to a single execution in Tango, and its name matches the execution name in Tango.\n- **Tango steps**: Results of each step in Tango are recorded as nested runs in MLflow. The names of these nested runs correspond to the names of the steps in Tango.\n- **Parameters and Metrics**: The entire settings for the execution, as well as the parameters for each step, are automatically logged in MLflow.\n- **Artifacts and Caching**: The cached outputs of each step's execution are saved as artifacts under the corresponding MLflow run. These can be reused when executing Tango, enhancing efficiency and reproducibility.\n\n## Installation\n\nInstall `tango-mlflow` by running the following command in your terminal:\n\n```shell\npip install tango-mlflow[all]\n```\n\n## Usage\n\nYou can use the `MLFlowWorkspace` with command line arguments as follows:\n\n```shell\ntango run --workspace mlflow://your_experiment_name --include-package tango_mlflow\n```\n\nAlternatively, you can define your configuration in a `tango.yml` file:\n\n```tango.yml\nworkspace:\n type: mlflow\n experiment_name: your_experiment_name\n\ninclude_package:\n - tango_mlflow\n```\n\nIn the above `tango.yml` configuration, `type` specifies the workspace type as MLflow, and `experiment_name` sets the name of your experiment in MLflow.\nThe `include_package` field needs to include the `tango_mlflow` package to use `tango-mlflow` functionality.\n\nRemember to replace `your_experiment_name` with the name of your specific experiment.\n\nTo log runs remotely, set the `MLFLOW_TRACKING_URI` environment variable to a tracking server\u2019s URI like below:\n\n```shell\nexport MLFLOW_TRACKING_URI=https://mlflow.example.com\n```\n\n## Functionalities\n\n### Logging metrics into MLflow\n\nThe `tango-mlflow` package provides the `MlflowStep` class, which allows you to easily log the results of each step execution to MLflow.\n\n```python\nfrom tango_mlflow.step import MlflowStep\n\nclass TrainModel(MlflowStep):\n def run(self, **params):\n\n # pre-process...\n\n for epoch in range(max_epochs):\n loss = train_epoch(...)\n metrics = evaluate_model(...)\n # log metrics with mlflow_logger\n for name, value in metrics.items():\n self.mlflow_logger.log_metric(name, value, step=epoch)\n\n # post-process...\n```\n\nIn the example above, the `TrainModel` step inherits from `MlflowStep`.\nInside the step, you can directly record metrics to the corresponding MLflow run by invoking `self.mlflow_logger.log_metric(...)`.\n\nPlease note, this functionality must be used in conjunction with `MlflowWorkspace`.\n\n### Summarizing Tango run metrics\n\nYou can specify a step to record its returned metrics as representative values of the Tango run by setting the class variable `MLFLOW_SUMMARY = True`.\nThis feature enables you to conveniently view metrics for each Tango run directly in the MLflow interface\n\n```python\nclass EvaluateModel(Step):\n MLFLOW_SUMMARY = True # Enables MLflow summary!\n\n def run(self, ...) -> dict[str, float]:\n # compute metrics ...\n return metrics\n```\n\nIn the example above, the `EvaluateModel` step returns metrics that are logged as the representative values for that Tango run. These metrics are then recorded in the corresponding (top-level) MLflow run.\n\nPlease note the following requirements:\n- The return value of a step where `MLFLOW_SUMMARY = True` is set must always be `dict[str, float]`.\n- You don't necessarily need to inherit from `MlflowStep` to use `MLFLOW_SUMMARY`.\n\n### Tuning hyperparameters with Optuna\n\n`tango-mlflow` also provides the `tango-mlflow tune` command for tuning hyperparameters with [Optuna](https://optuna.org/).\nFor more details, please refer to the [examples/breast_cancer](https://github.com/altescy/tango-mlflow/tree/main/examples/breast_cancer) directory.\n\n## Examples\n\n- Basic example: [examples/euler](https://github.com/altescy/tango-mlflow/tree/main/examples/euler)\n- Hyper parameter tuning with [Optuna](https://optuna.org/): [examples/breast_cancer](https://github.com/altescy/tango-mlflow/tree/main/examples/breast_cancer)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "MLflow integration for ai2-tango",
"version": "2.0.0",
"project_urls": {
"Homepage": "https://github.com/altescy/tango-mlflow"
},
"split_keywords": [
"python",
" mlflow",
" machine-learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a9b9d51027133457c6b9e4cefdaed52d7d80c0ac79f8213cb1579eb883155853",
"md5": "65875bb16a2bb979d4669c9b729d8932",
"sha256": "48e785c88a062198653fc858fb2698ec303e887123e839279a1d6b60569d1ff4"
},
"downloads": -1,
"filename": "tango_mlflow-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "65875bb16a2bb979d4669c9b729d8932",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8.1",
"size": 26293,
"upload_time": "2024-06-20T18:05:23",
"upload_time_iso_8601": "2024-06-20T18:05:23.001399Z",
"url": "https://files.pythonhosted.org/packages/a9/b9/d51027133457c6b9e4cefdaed52d7d80c0ac79f8213cb1579eb883155853/tango_mlflow-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2453ad5ae3c63de413d8bbded25f027b56b03135bc9c032da1e696cf09732c30",
"md5": "61f1c0102f61e0b041f9f802f8b6f851",
"sha256": "103c5114cead57dcb39c90b0b97eeb81d998ee128a1503b71cc03ba6f7b2fe39"
},
"downloads": -1,
"filename": "tango_mlflow-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "61f1c0102f61e0b041f9f802f8b6f851",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8.1",
"size": 22942,
"upload_time": "2024-06-20T18:05:24",
"upload_time_iso_8601": "2024-06-20T18:05:24.521696Z",
"url": "https://files.pythonhosted.org/packages/24/53/ad5ae3c63de413d8bbded25f027b56b03135bc9c032da1e696cf09732c30/tango_mlflow-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-20 18:05:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "altescy",
"github_project": "tango-mlflow",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "tango-mlflow"
}