sklearn-instrumentation


Namesklearn-instrumentation JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/crflynn/sklearn-instrumentation
Summaryscikit-learn instrumentation tooling
upload_time2021-05-11 01:09:18
maintainer
docs_urlNone
authorflynn
requires_python>=3.6.1,<4.0.0
licenseMIT
keywords scikit-learn instrumentation machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            sklearn-instrumentation
=======================

|actions| |rtd| |pypi| |pyversions|

.. |actions| image:: https://github.com/crflynn/sklearn-instrumentation/workflows/build/badge.svg
    :target: https://github.com/crflynn/sklearn-instrumentation/actions

.. |rtd| image:: https://img.shields.io/readthedocs/sklearn-instrumentation.svg
    :target: http://sklearn-instrumentation.readthedocs.io/en/latest/

.. |pypi| image:: https://img.shields.io/pypi/v/sklearn-instrumentation.svg
    :target: https://pypi.python.org/pypi/sklearn-instrumentation

.. |pyversions| image:: https://img.shields.io/pypi/pyversions/sklearn-instrumentation.svg
    :target: https://pypi.python.org/pypi/sklearn-instrumentation


Generalized instrumentation tooling for scikit-learn models. ``sklearn_instrumentation`` allows instrumenting the ``sklearn`` package and any scikit-learn compatible packages with estimators and transformers inheriting from ``sklearn.base.BaseEstimator``.

Instrumentation applies decorators to methods of ``BaseEstimator``-derived classes or instances. By default the instrumentor applies instrumentation to the following methods (except when they are properties of instances):

* fit
* predict
* predict_log_proba
* predict_proba
* transform
* _fit
* _predict
* _predict_log_proba
* _predict_proba
* _transform

**sklearn-instrumentation** supports instrumentation of full sklearn-compatible packages, as well as recursive instrumentation of models (metaestimators like ``Pipeline``, or even single estimators like ``RandomForestClassifier``)

Installation
------------

The sklearn-instrumentation package is available on pypi and can be installed using pip

.. code-block:: bash

    pip install sklearn-instrumentation


Package instrumentation
-----------------------

Instrument any sklearn compatible package that has ``BaseEstimator``-derived classes.

.. code-block:: python

    from sklearn_instrumentation import SklearnInstrumentor

    instrumentor = SklearnInstrumentor(instrument=my_instrument)
    instrumentor.instrument_packages(["sklearn", "xgboost", "lightgbm"])


Full example:

.. code-block:: python

    import logging

    from sklearn.datasets import load_iris
    from sklearn.decomposition import PCA
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.pipeline import FeatureUnion
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler

    from sklearn_instrumentation import SklearnInstrumentor
    from sklearn_instrumentation.instruments.logging import TimeElapsedLogger

    logging.basicConfig(level=logging.INFO)

    # Create an instrumentor and instrument sklearn
    instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())
    instrumentor.instrument_packages(["sklearn"])

    # Create a toy model for classification
    ss = StandardScaler()
    pca = PCA(n_components=3)
    rf = RandomForestClassifier()
    classification_model = Pipeline(
        steps=[
            (
                "fu",
                FeatureUnion(
                    transformer_list=[
                        ("ss", ss),
                        ("pca", pca),
                    ]
                ),
            ),
            ("rf", rf),
        ]
    )
    X, y = load_iris(return_X_y=True)

    # Observe logging
    classification_model.fit(X, y)
    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.
    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.
    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.
    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0006406307220458984 seconds
    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0001430511474609375 seconds
    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.
    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0006711483001708984 seconds
    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.0026731491088867188 seconds
    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.
    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.1768970489501953 seconds
    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.17983102798461914 seconds

    # Observe logging
    classification_model.predict(X)
    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.
    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.
    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00024509429931640625 seconds
    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.
    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.0002181529998779297 seconds
    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0012080669403076172 seconds
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.013531208038330078 seconds
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.013692140579223633 seconds
    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.015219926834106445 seconds

    # Remove instrumentation
    instrumentor.uninstrument_packages(["sklearn"])

    # Observe no logging
    classification_model.predict(X)


Machine learning model instrumentation
--------------------------------------

Instrument any sklearn compatible trained estimator or metaestimator.

.. code-block:: python

    from sklearn_instrumentation import SklearnInstrumentor

    instrumentor = SklearnInstrumentor(instrument=my_instrument)
    instrumentor.instrument_estimator(estimator=my_ml_pipeline)


Example:

.. code-block:: python

    import logging

    from sklearn.datasets import load_iris
    from sklearn_instrumentation import SklearnInstrumentor
    from sklearn_instrumentation.instruments.logging import TimeElapsedLogger
    from sklearn.ensemble import RandomForestClassifier

    logging.basicConfig(level=logging.INFO)

    # Train a classifier
    X, y = load_iris(return_X_y=True)
    rf = RandomForestClassifier()

    rf.fit(X, y)

    # Create an instrumentor which decorates BaseEstimator methods with
    # logging output when entering and exiting methods, with time elapsed logged
    # on exit.
    instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())

    # Apply the decorator to all BaseEstimators in each of these libraries
    instrumentor.instrument_estimator(rf)

    # Observe the logging output
    rf.predict(X)
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.014165163040161133 seconds
    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.014327764511108398 seconds

    # Remove the decorator from all BaseEstimators in each of these libraries
    instrumentor.uninstrument_estimator(rf)

    # No more logging
    rf.predict(X)


Instrumentation
---------------

The package comes with a handful of instruments which log information about ``X`` or timing of execution. You can create your own instrumentation just by creating a decorator, following this pattern

.. code-block:: python

    from functools import wraps


    def my_instrumentation(func, **dkwargs):
        """Wrap an estimator method with instrumentation.

        :param func: The method to be instrumented.
        :param dkwargs: Decorator kwargs, which can be passed to the
            decorator at decoration time. For estimator instrumentation
            this allows different parametrizations for each ml model.
        """
        @wraps(func)
        def wrapper(*args, **kwargs):
            """Wrapping function.

            :param args: The args passed to methods, typically
                just ``X`` and/or ``y``
            :param kwargs: The kwargs passed to methods, usually
                weights or other params
            """
            # Code goes here before execution of the estimator method
            retval = func(*args, **kwargs)
            # Code goes here after execution of the estimator method
            return retval

        return wrapper


To create a stateful instrument, use a class with the ``__call__`` method for implementing the decorator:

.. code-block:: python

    from functools import wraps

    from sklearn_instrumentation.instruments.base import BaseInstrument


    class MyInstrument(BaseInstrument)

        def __init__(self, *args, **kwargs):
            # handle any statefulness here
            pass

        def __call__(self, func, **dkwargs):
            """Wrap an estimator method with instrumentation.

            :param func: The method to be instrumented.
            :param dkwargs: Decorator kwargs, which can be passed to the
                decorator at decoration time. For estimator instrumentation
                this allows different parametrizations for each ml model.
            """
            @wraps(func)
            def wrapper(*args, **kwargs):
                """Wrapping function.

                :param args: The args passed to methods, typically
                    just ``X`` and/or ``y``
                :param kwargs: The kwargs passed to methods, usually
                    weights or other params
                """
                # Code goes here before execution of the estimator method
                retval = func(*args, **kwargs)
                # Code goes here after execution of the estimator method
                return retval

            return wrapper


To pass kwargs for different ml models:

.. code-block:: python

    instrumentor = SklearnInstrumentor(instrument=my_instrument)

    instrumentor.instrument_estimator(estimator=ml_model_1, instrument_kwargs={"name": "awesome_model"})
    instrumentor.instrument_estimator(estimator=ml_model_2, instrument_kwargs={"name": "better_model"})


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/crflynn/sklearn-instrumentation",
    "name": "sklearn-instrumentation",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6.1,<4.0.0",
    "maintainer_email": "",
    "keywords": "scikit-learn,instrumentation,machine,learning",
    "author": "flynn",
    "author_email": "crf204@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1b/9d/7fd37b721c75a91db3125c8258108c2e2d6d78dc570e36207eef2193e745/sklearn-instrumentation-0.7.0.tar.gz",
    "platform": "",
    "description": "sklearn-instrumentation\n=======================\n\n|actions| |rtd| |pypi| |pyversions|\n\n.. |actions| image:: https://github.com/crflynn/sklearn-instrumentation/workflows/build/badge.svg\n    :target: https://github.com/crflynn/sklearn-instrumentation/actions\n\n.. |rtd| image:: https://img.shields.io/readthedocs/sklearn-instrumentation.svg\n    :target: http://sklearn-instrumentation.readthedocs.io/en/latest/\n\n.. |pypi| image:: https://img.shields.io/pypi/v/sklearn-instrumentation.svg\n    :target: https://pypi.python.org/pypi/sklearn-instrumentation\n\n.. |pyversions| image:: https://img.shields.io/pypi/pyversions/sklearn-instrumentation.svg\n    :target: https://pypi.python.org/pypi/sklearn-instrumentation\n\n\nGeneralized instrumentation tooling for scikit-learn models. ``sklearn_instrumentation`` allows instrumenting the ``sklearn`` package and any scikit-learn compatible packages with estimators and transformers inheriting from ``sklearn.base.BaseEstimator``.\n\nInstrumentation applies decorators to methods of ``BaseEstimator``-derived classes or instances. By default the instrumentor applies instrumentation to the following methods (except when they are properties of instances):\n\n* fit\n* predict\n* predict_log_proba\n* predict_proba\n* transform\n* _fit\n* _predict\n* _predict_log_proba\n* _predict_proba\n* _transform\n\n**sklearn-instrumentation** supports instrumentation of full sklearn-compatible packages, as well as recursive instrumentation of models (metaestimators like ``Pipeline``, or even single estimators like ``RandomForestClassifier``)\n\nInstallation\n------------\n\nThe sklearn-instrumentation package is available on pypi and can be installed using pip\n\n.. code-block:: bash\n\n    pip install sklearn-instrumentation\n\n\nPackage instrumentation\n-----------------------\n\nInstrument any sklearn compatible package that has ``BaseEstimator``-derived classes.\n\n.. code-block:: python\n\n    from sklearn_instrumentation import SklearnInstrumentor\n\n    instrumentor = SklearnInstrumentor(instrument=my_instrument)\n    instrumentor.instrument_packages([\"sklearn\", \"xgboost\", \"lightgbm\"])\n\n\nFull example:\n\n.. code-block:: python\n\n    import logging\n\n    from sklearn.datasets import load_iris\n    from sklearn.decomposition import PCA\n    from sklearn.ensemble import RandomForestClassifier\n    from sklearn.pipeline import FeatureUnion\n    from sklearn.pipeline import Pipeline\n    from sklearn.preprocessing import StandardScaler\n\n    from sklearn_instrumentation import SklearnInstrumentor\n    from sklearn_instrumentation.instruments.logging import TimeElapsedLogger\n\n    logging.basicConfig(level=logging.INFO)\n\n    # Create an instrumentor and instrument sklearn\n    instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())\n    instrumentor.instrument_packages([\"sklearn\"])\n\n    # Create a toy model for classification\n    ss = StandardScaler()\n    pca = PCA(n_components=3)\n    rf = RandomForestClassifier()\n    classification_model = Pipeline(\n        steps=[\n            (\n                \"fu\",\n                FeatureUnion(\n                    transformer_list=[\n                        (\"ss\", ss),\n                        (\"pca\", pca),\n                    ]\n                ),\n            ),\n            (\"rf\", rf),\n        ]\n    )\n    X, y = load_iris(return_X_y=True)\n\n    # Observe logging\n    classification_model.fit(X, y)\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0006406307220458984 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0001430511474609375 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0006711483001708984 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.0026731491088867188 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.1768970489501953 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.17983102798461914 seconds\n\n    # Observe logging\n    classification_model.predict(X)\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00024509429931640625 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.0002181529998779297 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0012080669403076172 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.013531208038330078 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.013692140579223633 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.015219926834106445 seconds\n\n    # Remove instrumentation\n    instrumentor.uninstrument_packages([\"sklearn\"])\n\n    # Observe no logging\n    classification_model.predict(X)\n\n\nMachine learning model instrumentation\n--------------------------------------\n\nInstrument any sklearn compatible trained estimator or metaestimator.\n\n.. code-block:: python\n\n    from sklearn_instrumentation import SklearnInstrumentor\n\n    instrumentor = SklearnInstrumentor(instrument=my_instrument)\n    instrumentor.instrument_estimator(estimator=my_ml_pipeline)\n\n\nExample:\n\n.. code-block:: python\n\n    import logging\n\n    from sklearn.datasets import load_iris\n    from sklearn_instrumentation import SklearnInstrumentor\n    from sklearn_instrumentation.instruments.logging import TimeElapsedLogger\n    from sklearn.ensemble import RandomForestClassifier\n\n    logging.basicConfig(level=logging.INFO)\n\n    # Train a classifier\n    X, y = load_iris(return_X_y=True)\n    rf = RandomForestClassifier()\n\n    rf.fit(X, y)\n\n    # Create an instrumentor which decorates BaseEstimator methods with\n    # logging output when entering and exiting methods, with time elapsed logged\n    # on exit.\n    instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())\n\n    # Apply the decorator to all BaseEstimators in each of these libraries\n    instrumentor.instrument_estimator(rf)\n\n    # Observe the logging output\n    rf.predict(X)\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.014165163040161133 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.014327764511108398 seconds\n\n    # Remove the decorator from all BaseEstimators in each of these libraries\n    instrumentor.uninstrument_estimator(rf)\n\n    # No more logging\n    rf.predict(X)\n\n\nInstrumentation\n---------------\n\nThe package comes with a handful of instruments which log information about ``X`` or timing of execution. You can create your own instrumentation just by creating a decorator, following this pattern\n\n.. code-block:: python\n\n    from functools import wraps\n\n\n    def my_instrumentation(func, **dkwargs):\n        \"\"\"Wrap an estimator method with instrumentation.\n\n        :param func: The method to be instrumented.\n        :param dkwargs: Decorator kwargs, which can be passed to the\n            decorator at decoration time. For estimator instrumentation\n            this allows different parametrizations for each ml model.\n        \"\"\"\n        @wraps(func)\n        def wrapper(*args, **kwargs):\n            \"\"\"Wrapping function.\n\n            :param args: The args passed to methods, typically\n                just ``X`` and/or ``y``\n            :param kwargs: The kwargs passed to methods, usually\n                weights or other params\n            \"\"\"\n            # Code goes here before execution of the estimator method\n            retval = func(*args, **kwargs)\n            # Code goes here after execution of the estimator method\n            return retval\n\n        return wrapper\n\n\nTo create a stateful instrument, use a class with the ``__call__`` method for implementing the decorator:\n\n.. code-block:: python\n\n    from functools import wraps\n\n    from sklearn_instrumentation.instruments.base import BaseInstrument\n\n\n    class MyInstrument(BaseInstrument)\n\n        def __init__(self, *args, **kwargs):\n            # handle any statefulness here\n            pass\n\n        def __call__(self, func, **dkwargs):\n            \"\"\"Wrap an estimator method with instrumentation.\n\n            :param func: The method to be instrumented.\n            :param dkwargs: Decorator kwargs, which can be passed to the\n                decorator at decoration time. For estimator instrumentation\n                this allows different parametrizations for each ml model.\n            \"\"\"\n            @wraps(func)\n            def wrapper(*args, **kwargs):\n                \"\"\"Wrapping function.\n\n                :param args: The args passed to methods, typically\n                    just ``X`` and/or ``y``\n                :param kwargs: The kwargs passed to methods, usually\n                    weights or other params\n                \"\"\"\n                # Code goes here before execution of the estimator method\n                retval = func(*args, **kwargs)\n                # Code goes here after execution of the estimator method\n                return retval\n\n            return wrapper\n\n\nTo pass kwargs for different ml models:\n\n.. code-block:: python\n\n    instrumentor = SklearnInstrumentor(instrument=my_instrument)\n\n    instrumentor.instrument_estimator(estimator=ml_model_1, instrument_kwargs={\"name\": \"awesome_model\"})\n    instrumentor.instrument_estimator(estimator=ml_model_2, instrument_kwargs={\"name\": \"better_model\"})\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "scikit-learn instrumentation tooling",
    "version": "0.7.0",
    "split_keywords": [
        "scikit-learn",
        "instrumentation",
        "machine",
        "learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b0d67e91683fb7ead1a4c8938a3f9f32",
                "sha256": "77ab70ce0024882cd2d5f3229333ee3941081ffd96a2d67a3ca8dc192141eb24"
            },
            "downloads": -1,
            "filename": "sklearn_instrumentation-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b0d67e91683fb7ead1a4c8938a3f9f32",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.1,<4.0.0",
            "size": 24599,
            "upload_time": "2021-05-11T01:09:20",
            "upload_time_iso_8601": "2021-05-11T01:09:20.685912Z",
            "url": "https://files.pythonhosted.org/packages/25/1c/f89343df5d3ecc2780d114175b3b72fe66f6abd928fdfd21fffa1d818327/sklearn_instrumentation-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "73da5a8e7c943df8cd34eff916ad58b6",
                "sha256": "84f414ba73268fc7cd6a96865af75ed75e830e599b0430de9503f80c6374dc76"
            },
            "downloads": -1,
            "filename": "sklearn-instrumentation-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "73da5a8e7c943df8cd34eff916ad58b6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.1,<4.0.0",
            "size": 19093,
            "upload_time": "2021-05-11T01:09:18",
            "upload_time_iso_8601": "2021-05-11T01:09:18.812403Z",
            "url": "https://files.pythonhosted.org/packages/1b/9d/7fd37b721c75a91db3125c8258108c2e2d6d78dc570e36207eef2193e745/sklearn-instrumentation-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-05-11 01:09:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "crflynn",
    "error": "Could not fetch GitHub repository",
    "lcname": "sklearn-instrumentation"
}
        
Elapsed time: 0.24443s