MLflavors
=========
The MLflavors package adds MLflow support for some popular machine learning frameworks currently
not considered for inclusion as MLflow built-in flavors. Just like built-in flavors, you can use
this package to save your model as an MLflow artifact, load your model from MLflow for batch
inference, and deploy your model to a serving endpoint using MLflow deployment tools.
The following open-source libraries are currently supported:
.. list-table::
:widths: 15 10 15
:header-rows: 1
* - Framework
- Tutorials
- Category
* - `Orbit <https://github.com/uber/orbit>`_
- `MLflow-Orbit <https://mlflavors.readthedocs.io/en/latest/examples.html#orbit>`_
- Time Series Forecasting
* - `Sktime <https://github.com/sktime/sktime>`_
- `MLflow-Sktime <https://mlflavors.readthedocs.io/en/latest/examples.html#sktime>`_
- Time Series Forecasting
* - `StatsForecast <https://github.com/Nixtla/statsforecast>`_
- `MLflow-StatsForecast <https://mlflavors.readthedocs.io/en/latest/examples.html#statsforecast>`_
- Time Series Forecasting
* - `PyOD <https://github.com/yzhao062/pyod>`_
- `MLflow-PyOD <https://mlflavors.readthedocs.io/en/latest/examples.html#pyod>`_
- Anomaly Detection
* - `SDV <https://github.com/sdv-dev/SDV>`_
- `MLflow-SDV <https://mlflavors.readthedocs.io/en/latest/examples.html#sdv>`_
- Synthetic Data Generation
The MLflow interface for the supported frameworks closely follows the design of built-in flavors.
Particularly, the interface for utilizing the custom model loaded as a ``pyfunc`` flavor
for generating predictions uses a single-row Pandas DataFrame configuration argument to expose the
parameters of the flavor's inference API.
|tests| |coverage| |docs| |pypi| |license|
.. |tests| image:: https://img.shields.io/github/actions/workflow/status/ml-toolkits/mlflavors/ci.yml?style=for-the-badge&logo=github
:target: https://github.com/ml-toolkits/mlflavors/actions/workflows/ci.yml/
.. |coverage| image:: https://img.shields.io/codecov/c/github/ml-toolkits/mlflavors?style=for-the-badge&label=codecov&logo=codecov
:target: https://codecov.io/gh/ml-toolkits/mlflavors
.. |docs| image:: https://img.shields.io/readthedocs/mlflavors/latest.svg?style=for-the-badge&logoColor=white
:target: https://mlflavors.readthedocs.io/en/latest/index.html
:alt: Latest Docs
.. |pypi| image:: https://img.shields.io/pypi/v/mlflavors.svg?style=for-the-badge&logo=pypi&logoColor=white
:target: https://pypi.org/project/mlflavors/
:alt: Latest Python Release
.. |license| image:: https://img.shields.io/badge/License-BSD--3--Clause-blue?style=for-the-badge
:target: https://opensource.org/license/bsd-3-clause/
:alt: BSD-3-Clause License
Documentation
-------------
Usage examples for all flavors and the API reference can be found in the package
`documenation <https://mlflavors.readthedocs.io/en/latest/index.html>`_.
Installation
------------
Installing from PyPI:
.. code-block:: bash
$ pip install mlflavors
Quickstart
----------
This example trains a `PyOD <https://github.com/yzhao062/pyod>`_ KNN model
using a synthetic dataset. Normal data is generated by a multivariate
Gaussian distribution and outliers are generated by a uniform distribution.
A new MLflow experiment is created to log the evaluation metrics and the trained
model as an artifact and anomaly scores are computed loading the
trained model in native flavor and ``pyfunc`` flavor:
.. code-block:: python
import json
import mlflow
import pandas as pd
from pyod.models.knn import KNN
from pyod.utils.data import generate_data
from sklearn.metrics import roc_auc_score
import mlflavors
ARTIFACT_PATH = "model"
with mlflow.start_run() as run:
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points
X_train, X_test, _, y_test = generate_data(
n_train=n_train, n_test=n_test, contamination=contamination
)
# Train kNN detector
clf = KNN()
clf.fit(X_train)
# Evaluate model
y_test_scores = clf.decision_function(X_test)
metrics = {
"roc": roc_auc_score(y_test, y_test_scores),
}
print(f"Metrics: \n{json.dumps(metrics, indent=2)}")
# Log metrics
mlflow.log_metrics(metrics)
# Log model using pickle serialization (default).
mlflavors.pyod.log_model(
pyod_model=clf,
artifact_path=ARTIFACT_PATH,
serialization_format="pickle",
)
model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)
# Print the run id wich is used below for serving the model to a local REST API endpoint
print(f"\nMLflow run id:\n{run.info.run_id}")
Make a prediction loading the model from MLflow in native format:
.. code-block:: python
loaded_model = mlflavors.pyod.load_model(model_uri=model_uri)
print(loaded_model.decision_function(X_test))
Make a prediction loading the model from MLflow in ``pyfunc`` format:
.. code-block:: python
loaded_pyfunc = mlflavors.pyod.pyfunc.load_model(model_uri=model_uri)
# Create configuration DataFrame
predict_conf = pd.DataFrame(
[
{
"X": X_test,
"predict_method": "decision_function",
}
]
)
print(loaded_pyfunc.predict(predict_conf)[0])
To serve the model to a local REST API endpoint run the command below where you substitute
the run id printed above:
.. code-block:: bash
mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1
Open a new terminal and run the below model scoring script to request a prediction from
the served model:
.. code-block:: python
import pandas as pd
import requests
from pyod.utils.data import generate_data
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points
_, X_test, _, _ = generate_data(
n_train=n_train, n_test=n_test, contamination=contamination
)
# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"
# Convert to list for JSON serialization
X_test_list = X_test.tolist()
# Create configuration DataFrame
predict_conf = pd.DataFrame(
[
{
"X": X_test_list,
"predict_method": "decision_function",
}
]
)
# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}
# Score model
response = requests.post(url, json=json_data)
print(response.json())
Contributing
------------
Contributions from the community are welcome, I will be happy to support the inclusion
and development of new features and flavors. To open an issue or request a new feature, please
open a GitHub issue.
Versioning
----------
Versions and changes are documented in the
`changelog <https://github.com/ml-toolkits/mlflavors/tree/main/CHANGELOG.rst>`_ .
Development
-----------
To set up your local development environment, create a virtual environment, such as:
.. code-block:: bash
$ conda create -n mlflavors-dev python=3.9
$ source activate mlflavors-dev
Install project locally:
.. code-block:: bash
$ python -m pip install --upgrade pip
$ pip install -e ".[dev]"
Install pre-commit hooks:
.. code-block:: bash
$ pre-commit install
Run tests:
.. code-block:: bash
$ pytest
Build Sphinx docs:
.. code-block:: bash
$ cd docs
$ make html
Raw data
{
"_id": null,
"home_page": "https://github.com/ml-toolkits/mlflavors",
"name": "mlflavors",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "machine-learning ai mlflow",
"author": "Benjamin Bluhm",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/71/f5/7a5d157d848f3e9b6c3a4e6bc05e1ff632c617fc329d26fc9b7737e73c5e/mlflavors-0.1.0.tar.gz",
"platform": null,
"description": "\nMLflavors\n=========\n\nThe MLflavors package adds MLflow support for some popular machine learning frameworks currently\nnot considered for inclusion as MLflow built-in flavors. Just like built-in flavors, you can use\nthis package to save your model as an MLflow artifact, load your model from MLflow for batch\ninference, and deploy your model to a serving endpoint using MLflow deployment tools.\n\nThe following open-source libraries are currently supported:\n\n .. list-table::\n :widths: 15 10 15\n :header-rows: 1\n\n * - Framework\n - Tutorials\n - Category\n * - `Orbit <https://github.com/uber/orbit>`_\n - `MLflow-Orbit <https://mlflavors.readthedocs.io/en/latest/examples.html#orbit>`_\n - Time Series Forecasting\n * - `Sktime <https://github.com/sktime/sktime>`_\n - `MLflow-Sktime <https://mlflavors.readthedocs.io/en/latest/examples.html#sktime>`_\n - Time Series Forecasting\n * - `StatsForecast <https://github.com/Nixtla/statsforecast>`_\n - `MLflow-StatsForecast <https://mlflavors.readthedocs.io/en/latest/examples.html#statsforecast>`_\n - Time Series Forecasting\n * - `PyOD <https://github.com/yzhao062/pyod>`_\n - `MLflow-PyOD <https://mlflavors.readthedocs.io/en/latest/examples.html#pyod>`_\n - Anomaly Detection\n * - `SDV <https://github.com/sdv-dev/SDV>`_\n - `MLflow-SDV <https://mlflavors.readthedocs.io/en/latest/examples.html#sdv>`_\n - Synthetic Data Generation\n\nThe MLflow interface for the supported frameworks closely follows the design of built-in flavors.\nParticularly, the interface for utilizing the custom model loaded as a ``pyfunc`` flavor\nfor generating predictions uses a single-row Pandas DataFrame configuration argument to expose the\nparameters of the flavor's inference API.\n\n|tests| |coverage| |docs| |pypi| |license|\n\n.. |tests| image:: https://img.shields.io/github/actions/workflow/status/ml-toolkits/mlflavors/ci.yml?style=for-the-badge&logo=github\n :target: https://github.com/ml-toolkits/mlflavors/actions/workflows/ci.yml/\n\n.. |coverage| image:: https://img.shields.io/codecov/c/github/ml-toolkits/mlflavors?style=for-the-badge&label=codecov&logo=codecov\n :target: https://codecov.io/gh/ml-toolkits/mlflavors\n\n.. |docs| image:: https://img.shields.io/readthedocs/mlflavors/latest.svg?style=for-the-badge&logoColor=white\n :target: https://mlflavors.readthedocs.io/en/latest/index.html\n :alt: Latest Docs\n\n.. |pypi| image:: https://img.shields.io/pypi/v/mlflavors.svg?style=for-the-badge&logo=pypi&logoColor=white\n :target: https://pypi.org/project/mlflavors/\n :alt: Latest Python Release\n\n.. |license| image:: https://img.shields.io/badge/License-BSD--3--Clause-blue?style=for-the-badge\n :target: https://opensource.org/license/bsd-3-clause/\n :alt: BSD-3-Clause License\n\nDocumentation\n-------------\n\nUsage examples for all flavors and the API reference can be found in the package\n`documenation <https://mlflavors.readthedocs.io/en/latest/index.html>`_.\n\n\nInstallation\n------------\n\nInstalling from PyPI:\n\n.. code-block:: bash\n\n $ pip install mlflavors\n\nQuickstart\n----------\n\nThis example trains a `PyOD <https://github.com/yzhao062/pyod>`_ KNN model\nusing a synthetic dataset. Normal data is generated by a multivariate\nGaussian distribution and outliers are generated by a uniform distribution.\nA new MLflow experiment is created to log the evaluation metrics and the trained\nmodel as an artifact and anomaly scores are computed loading the\ntrained model in native flavor and ``pyfunc`` flavor:\n\n.. code-block:: python\n\n import json\n\n import mlflow\n import pandas as pd\n from pyod.models.knn import KNN\n from pyod.utils.data import generate_data\n from sklearn.metrics import roc_auc_score\n\n import mlflavors\n\n ARTIFACT_PATH = \"model\"\n\n with mlflow.start_run() as run:\n contamination = 0.1 # percentage of outliers\n n_train = 200 # number of training points\n n_test = 100 # number of testing points\n\n X_train, X_test, _, y_test = generate_data(\n n_train=n_train, n_test=n_test, contamination=contamination\n )\n\n # Train kNN detector\n clf = KNN()\n clf.fit(X_train)\n\n # Evaluate model\n y_test_scores = clf.decision_function(X_test)\n\n metrics = {\n \"roc\": roc_auc_score(y_test, y_test_scores),\n }\n\n print(f\"Metrics: \\n{json.dumps(metrics, indent=2)}\")\n\n # Log metrics\n mlflow.log_metrics(metrics)\n\n # Log model using pickle serialization (default).\n mlflavors.pyod.log_model(\n pyod_model=clf,\n artifact_path=ARTIFACT_PATH,\n serialization_format=\"pickle\",\n )\n model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)\n\n # Print the run id wich is used below for serving the model to a local REST API endpoint\n print(f\"\\nMLflow run id:\\n{run.info.run_id}\")\n\nMake a prediction loading the model from MLflow in native format:\n\n.. code-block:: python\n\n loaded_model = mlflavors.pyod.load_model(model_uri=model_uri)\n print(loaded_model.decision_function(X_test))\n\nMake a prediction loading the model from MLflow in ``pyfunc`` format:\n\n.. code-block:: python\n\n loaded_pyfunc = mlflavors.pyod.pyfunc.load_model(model_uri=model_uri)\n\n # Create configuration DataFrame\n predict_conf = pd.DataFrame(\n [\n {\n \"X\": X_test,\n \"predict_method\": \"decision_function\",\n }\n ]\n )\n\n print(loaded_pyfunc.predict(predict_conf)[0])\n\nTo serve the model to a local REST API endpoint run the command below where you substitute\nthe run id printed above:\n\n.. code-block:: bash\n\n mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1\n\nOpen a new terminal and run the below model scoring script to request a prediction from\nthe served model:\n\n.. code-block:: python\n\n import pandas as pd\n import requests\n from pyod.utils.data import generate_data\n\n contamination = 0.1 # percentage of outliers\n n_train = 200 # number of training points\n n_test = 100 # number of testing points\n\n _, X_test, _, _ = generate_data(\n n_train=n_train, n_test=n_test, contamination=contamination\n )\n\n # Define local host and endpoint url\n host = \"127.0.0.1\"\n url = f\"http://{host}:5000/invocations\"\n\n # Convert to list for JSON serialization\n X_test_list = X_test.tolist()\n\n # Create configuration DataFrame\n predict_conf = pd.DataFrame(\n [\n {\n \"X\": X_test_list,\n \"predict_method\": \"decision_function\",\n }\n ]\n )\n\n # Create dictionary with pandas DataFrame in the split orientation\n json_data = {\"dataframe_split\": predict_conf.to_dict(orient=\"split\")}\n\n # Score model\n response = requests.post(url, json=json_data)\n print(response.json())\n\nContributing\n------------\n\nContributions from the community are welcome, I will be happy to support the inclusion\nand development of new features and flavors. To open an issue or request a new feature, please\nopen a GitHub issue.\n\nVersioning\n----------\n\nVersions and changes are documented in the\n`changelog <https://github.com/ml-toolkits/mlflavors/tree/main/CHANGELOG.rst>`_ .\n\nDevelopment\n-----------\n\nTo set up your local development environment, create a virtual environment, such as:\n\n.. code-block:: bash\n\n $ conda create -n mlflavors-dev python=3.9\n $ source activate mlflavors-dev\n\nInstall project locally:\n\n.. code-block:: bash\n\n $ python -m pip install --upgrade pip\n $ pip install -e \".[dev]\"\n\nInstall pre-commit hooks:\n\n.. code-block:: bash\n\n $ pre-commit install\n\nRun tests:\n\n.. code-block:: bash\n\n $ pytest\n\nBuild Sphinx docs:\n\n.. code-block:: bash\n\n $ cd docs\n $ make html\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "MLflavors: A collection of custom MLflow flavors.",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://mlflavors.readthedocs.io/en/latest/",
"Homepage": "https://github.com/ml-toolkits/mlflavors",
"Issue Tracker": "https://github.com/ml-toolkits/mlflavors/issues"
},
"split_keywords": [
"machine-learning",
"ai",
"mlflow"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d356a89ceb7211ba9a955ee38ce523333d8b985bf9623e876c2600b65d835887",
"md5": "6150c6e1ac64498333a05e8d258383e0",
"sha256": "ad08d9ea692830db657b5e03ca280a3c46a4a1d88a5127913128290b6b3f7a1d"
},
"downloads": -1,
"filename": "mlflavors-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6150c6e1ac64498333a05e8d258383e0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 32913,
"upload_time": "2023-05-10T13:31:50",
"upload_time_iso_8601": "2023-05-10T13:31:50.337750Z",
"url": "https://files.pythonhosted.org/packages/d3/56/a89ceb7211ba9a955ee38ce523333d8b985bf9623e876c2600b65d835887/mlflavors-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "71f57a5d157d848f3e9b6c3a4e6bc05e1ff632c617fc329d26fc9b7737e73c5e",
"md5": "5184225676f5fe5407f503d310fa098a",
"sha256": "04560518e89bbd083506b72a77d01bccc124ccabed88970a02f19d6467860a9d"
},
"downloads": -1,
"filename": "mlflavors-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "5184225676f5fe5407f503d310fa098a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 25308,
"upload_time": "2023-05-10T13:31:52",
"upload_time_iso_8601": "2023-05-10T13:31:52.040847Z",
"url": "https://files.pythonhosted.org/packages/71/f5/7a5d157d848f3e9b6c3a4e6bc05e1ff632c617fc329d26fc9b7737e73c5e/mlflavors-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-10 13:31:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ml-toolkits",
"github_project": "mlflavors",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "mlflavors"
}