h2o-mlflow-flavor


Nameh2o-mlflow-flavor JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/h2oai/h2o-3.git
SummaryA mlflow flavor for working with H2O-3 MOJO and POJO models
upload_time2023-11-13 18:13:52
maintainer
docs_urlNone
authorH2O.ai
requires_python
licenseApache v2
keywords ml flow h2o-3
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            H2O-3 MLFlow Flavor
===================

A tiny library containing a `MLFlow <https://mlflow.org/>`_ flavor for working with H2O-3 MOJO and POJO models.

Logging Models to MLFlow Registry
---------------------------------

The model that was trained with H2O-3 runtime can be exported to MLFlow registry with `log_model` function.:

.. code-block:: Python

    import mlflow
    import h2o_mlflow_flavor

    mlflow.set_tracking_uri("http://127.0.0.1:8080")
    
    h2o_model = ... training phase ...
    
    with mlflow.start_run(run_name="myrun") as run:
	h2o_mlflow_flavor.log_model(h2o_model=h2o_model,
                                    artifact_path="folder",
                                    model_type="MOJO",
                                    extra_prediction_args=["--predictCalibrated"])


Compared to `log_model` functions of the other flavors being a part of MLFlow, this function has two extra arguments:
	
* ``model_type`` - It indicates whether the model should be exported as `MOJO <https://docs.h2o.ai/h2o/latest-stable/h2o-docs/mojo-quickstart.html#what-is-a-mojo>`_ or `POJO <https://docs.h2o.ai/h2o/latest-stable/h2o-docs/pojo-quickstart.html#what-is-a-pojo>`_. The default value is `MOJO`.

* ``extra_prediction_args`` - A list of extra arguments for java scoring process. Possible values:

  * ``--setConvertInvalidNum`` - The scoring process will convert invalid numbers to NA.

  * ``--predictContributions`` - The scoring process will Return also Shapley values a long with the predictions. Model must support that Shapley values, otherwise scoring process will throw an error.

  * ``--predictCalibrated`` - The scoring process will also return calibrated prediction values.
   
The `save_model` function that persists h2o binary model to MOJO or POJO has the same signature as the `log_model` function.

Extracting Information about Model
----------------------------------

The flavor offers several functions to extract information about the model.

* ``get_metrics(h2o_model, metric_type=None)`` - Extracts metrics from the trained H2O binary model. It returns dictionary and takes following parameters:

  * ``h2o_model`` - An H2O binary model.

  * ``metric_type`` - The type of metrics. Possible values are "training", "validation", "cross_validation". If parameter is not specified, metrics for all types are returned.

* ``get_params(h2o_model)`` - Extracts training parameters for the H2O binary model. It returns dictionary and expects one parameter:

  * ``h2o_model`` - An H2O binary model.

* ``get_input_example(h2o_model, number_of_records=5, relevant_columns_only=True)`` - Creates an example Pandas dataset from the training dataset of H2O binary model. It takes following parameters:

  * ``h2o_model`` - An H2O binary model.

  * ``number_of_records`` - A number of records that will be extracted from the training dataset.

  * ``relevant_columns_only`` - A flag indicating whether the output dataset should contain only columns required by the model. Defaults to ``True``.
  
The functions can be utilized as follows:

.. code-block:: Python

    import mlflow
    import h2o_mlflow_flavor
    
    mlflow.set_tracking_uri("http://127.0.0.1:8080")

    h2o_model = ... training phase ...

    with mlflow.start_run(run_name="myrun") as run:
	    mlflow.log_params(h2o_mlflow_flavor.get_params(h2o_model))
	    mlflow.log_metrics(h2o_mlflow_flavor.get_metrics(h2o_model))
	    input_example = h2o_mlflow_flavor.get_input_example(h2o_model)
	    h2o_mlflow_flavor.log_model(h2o_model=h2o_model,
                                        input_example=input_example,
                                        artifact_path="folder",
                                        model_type="MOJO",
                                        extra_prediction_args=["--predictCalibrated"])


Model Scoring
-------------

After a model obtained from the model registry, the model doesn't require h2o runtime for ability to score. The only thing
that model requires is a ``h2o-gemodel.jar`` which was persisted with the model during saving procedure.
The model could be loaded by the function ``load_model(model_uri, dst_path=None)``. It returns an objecting making
predictions on Pandas dataframe and takes the following parameters:

* ``model_uri`` - An unique identifier of the model within MLFlow registry.

* ``dst_path`` - (Optional) A local filesystem path for downloading the persisted form of the model. 

The object for scoring could be obtained also via the `pyfunc` flavor as follows:

.. code-block:: Python

    import mlflow
    mlflow.set_tracking_uri("http://127.0.0.1:8080")

    logged_model = 'runs:/9a42265cf0ef484c905b02afb8fe6246/iris'
    loaded_model = mlflow.pyfunc.load_model(logged_model)

    import pandas as pd
    data = pd.read_csv("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
    loaded_model.predict(data)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/h2oai/h2o-3.git",
    "name": "h2o-mlflow-flavor",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ML Flow,H2O-3",
    "author": "H2O.ai",
    "author_email": "support@h2o.ai",
    "download_url": "",
    "platform": null,
    "description": "H2O-3 MLFlow Flavor\n===================\n\nA tiny library containing a `MLFlow <https://mlflow.org/>`_ flavor for working with H2O-3 MOJO and POJO models.\n\nLogging Models to MLFlow Registry\n---------------------------------\n\nThe model that was trained with H2O-3 runtime can be exported to MLFlow registry with `log_model` function.:\n\n.. code-block:: Python\n\n    import mlflow\n    import h2o_mlflow_flavor\n\n    mlflow.set_tracking_uri(\"http://127.0.0.1:8080\")\n    \n    h2o_model = ... training phase ...\n    \n    with mlflow.start_run(run_name=\"myrun\") as run:\n\th2o_mlflow_flavor.log_model(h2o_model=h2o_model,\n                                    artifact_path=\"folder\",\n                                    model_type=\"MOJO\",\n                                    extra_prediction_args=[\"--predictCalibrated\"])\n\n\nCompared to `log_model` functions of the other flavors being a part of MLFlow, this function has two extra arguments:\n\t\n* ``model_type`` - It indicates whether the model should be exported as `MOJO <https://docs.h2o.ai/h2o/latest-stable/h2o-docs/mojo-quickstart.html#what-is-a-mojo>`_ or `POJO <https://docs.h2o.ai/h2o/latest-stable/h2o-docs/pojo-quickstart.html#what-is-a-pojo>`_. The default value is `MOJO`.\n\n* ``extra_prediction_args`` - A list of extra arguments for java scoring process. Possible values:\n\n  * ``--setConvertInvalidNum`` - The scoring process will convert invalid numbers to NA.\n\n  * ``--predictContributions`` - The scoring process will Return also Shapley values a long with the predictions. Model must support that Shapley values, otherwise scoring process will throw an error.\n\n  * ``--predictCalibrated`` - The scoring process will also return calibrated prediction values.\n   \nThe `save_model` function that persists h2o binary model to MOJO or POJO has the same signature as the `log_model` function.\n\nExtracting Information about Model\n----------------------------------\n\nThe flavor offers several functions to extract information about the model.\n\n* ``get_metrics(h2o_model, metric_type=None)`` - Extracts metrics from the trained H2O binary model. It returns dictionary and takes following parameters:\n\n  * ``h2o_model`` - An H2O binary model.\n\n  * ``metric_type`` - The type of metrics. Possible values are \"training\", \"validation\", \"cross_validation\". If parameter is not specified, metrics for all types are returned.\n\n* ``get_params(h2o_model)`` - Extracts training parameters for the H2O binary model. It returns dictionary and expects one parameter:\n\n  * ``h2o_model`` - An H2O binary model.\n\n* ``get_input_example(h2o_model, number_of_records=5, relevant_columns_only=True)`` - Creates an example Pandas dataset from the training dataset of H2O binary model. It takes following parameters:\n\n  * ``h2o_model`` - An H2O binary model.\n\n  * ``number_of_records`` - A number of records that will be extracted from the training dataset.\n\n  * ``relevant_columns_only`` - A flag indicating whether the output dataset should contain only columns required by the model. Defaults to ``True``.\n  \nThe functions can be utilized as follows:\n\n.. code-block:: Python\n\n    import mlflow\n    import h2o_mlflow_flavor\n    \n    mlflow.set_tracking_uri(\"http://127.0.0.1:8080\")\n\n    h2o_model = ... training phase ...\n\n    with mlflow.start_run(run_name=\"myrun\") as run:\n\t    mlflow.log_params(h2o_mlflow_flavor.get_params(h2o_model))\n\t    mlflow.log_metrics(h2o_mlflow_flavor.get_metrics(h2o_model))\n\t    input_example = h2o_mlflow_flavor.get_input_example(h2o_model)\n\t    h2o_mlflow_flavor.log_model(h2o_model=h2o_model,\n                                        input_example=input_example,\n                                        artifact_path=\"folder\",\n                                        model_type=\"MOJO\",\n                                        extra_prediction_args=[\"--predictCalibrated\"])\n\n\nModel Scoring\n-------------\n\nAfter a model obtained from the model registry, the model doesn't require h2o runtime for ability to score. The only thing\nthat model requires is a ``h2o-gemodel.jar`` which was persisted with the model during saving procedure.\nThe model could be loaded by the function ``load_model(model_uri, dst_path=None)``. It returns an objecting making\npredictions on Pandas dataframe and takes the following parameters:\n\n* ``model_uri`` - An unique identifier of the model within MLFlow registry.\n\n* ``dst_path`` - (Optional) A local filesystem path for downloading the persisted form of the model. \n\nThe object for scoring could be obtained also via the `pyfunc` flavor as follows:\n\n.. code-block:: Python\n\n    import mlflow\n    mlflow.set_tracking_uri(\"http://127.0.0.1:8080\")\n\n    logged_model = 'runs:/9a42265cf0ef484c905b02afb8fe6246/iris'\n    loaded_model = mlflow.pyfunc.load_model(logged_model)\n\n    import pandas as pd\n    data = pd.read_csv(\"http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv\")\n    loaded_model.predict(data)\n",
    "bugtrack_url": null,
    "license": "Apache v2",
    "summary": "A mlflow flavor for working with H2O-3 MOJO and POJO models",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/h2oai/h2o-3.git"
    },
    "split_keywords": [
        "ml flow",
        "h2o-3"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "82158e4b5d385c40f9a811dde97f37b7fd12bef614c03828c7db9b5d8ca3fcee",
                "md5": "070433e6ca894c25c2f8a0b241aa8cf7",
                "sha256": "2b06523b02d98ef914d2dc76914430a6655e949376ad150fe764b73cf5158777"
            },
            "downloads": -1,
            "filename": "h2o_mlflow_flavor-0.1.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "070433e6ca894c25c2f8a0b241aa8cf7",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 7416,
            "upload_time": "2023-11-13T18:13:52",
            "upload_time_iso_8601": "2023-11-13T18:13:52.175331Z",
            "url": "https://files.pythonhosted.org/packages/82/15/8e4b5d385c40f9a811dde97f37b7fd12bef614c03828c7db9b5d8ca3fcee/h2o_mlflow_flavor-0.1.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-13 18:13:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "h2oai",
    "github_project": "h2o-3",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "h2o-mlflow-flavor"
}
        
Elapsed time: 0.42156s