xi-method


Namexi-method JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/mfumagalli68/xi-method
SummaryPost hoc explanations for ML models through measures of statistical dependence
upload_time2023-09-22 15:34:19
maintainermarco.fumagalli
docs_urlNone
authormarco.fumagalli
requires_python>=3.8
licenseMIT
keywords xai explainable ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![build](https://github.com/mfumagalli68/xi/actions/workflows/build.yml/badge.svg)
![coverage](https://codecov.io/gh/mfumagalli68/xi/branch/main/graph/badge.svg)
[![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)
[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)
[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-310/)
[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-311/)


<p align="center">
    <em>Xi - Post hoc explanations</em>
</p>

<p align="center">
    <img src="logo.PNG">
</p>


**Xi** is a python package that implements the paper 
"Explaining classifiers with measures of statistical association"[[1]](#1)
and "The Xi method: unlocking the mysteries of regression with Statistics"[[2]](#2)<br>

The growing size and complexity of data as well as the need of accurate predictions, forces analysts to use black-box
model. While the success of those models extends statistical application, it also increases the need for
interpretability and ,when possible, explainability.

The paper proposes an approach to the problem based on measures of statistical association.<br>
Measures of statistical association deliver information regarding the strength of the statistical dependence between the
target and the feature(s) of interest, inferring this insight from the data in a **model-agnostic** fashion.<br>
In this respect, we note that an important class of measures of statistical associations is represented by probabilistic
sensitivity measures.<br>

We use these probabilistic sensitivity measures as part of the broad discourse of interpretability in statistical
machine learning. For brevity, we call this part the Xi-method.<br>
Briefly, the method consists in evaluating ML model predictions comparing the values of probabilistic sensitivity
measures obtained in a model-agnostic fashion, i.e., directly from the data, with the same indices computed replacing
the true targets with the ML model forecasts.<br>

To sum up, **Xi** has three main advantages:

- Model agnostic: as long as your model outputs predictions, you can use **Xi** with any model
- Data agnostic: **Xi** works with structured (tabular) and unstructured data ( text, image ).
- Computationally reasonable

# Installation

Install from pypi:

```[python]
pip install xi-method
```

# Usage

The package is quite simple and it's designed to give you post hoc explainations for your dataset and machine learning
model with minimal effort.<br> 
Import your data in a pandas dataframe format, splitting covariates and independent
variable.<br>

```[python]
from xi_method.utils import load_dataset
from xi_method.ximp import *

# load wine quality
df = load_wine_quality_red_dataset()

Y = df.quality
df.drop(columns='quality', inplace=True)
```

Create an instance of `XIClassifier` or `XIRegressor` depending on the type of problem you are working with:<br>

```[python]
xi = XIClassifier(m=20)
```

For the classification tasks, you can specify the number of partitions in three different ways:

- `m`: number of partitions can be a dictionary or an integer. The dictionary should have covariate name as key and
  number of desired partition as value. If m is an integer, the desired number of partition will be applied to all
  covariates.
- `discrete`: A list of covariates name you want to treat as categorical.
- `obs`: A dictionary mapping covariates name to number of desired observations in each partition.

For regression tasks, you can only specify `m` as an integer.<br>

A default `m` value will be computed if nothing is provided by the user, as indicated in the paper.<br>

To obtain post hoc explanations, run your favorite ML model, save the predictions as numpy array
and provide the covariates ( test set) and the predictions to the method `explain`:

```[python]
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(df.values, Y, test_size=0.3, random_state=42)
lr = LogisticRegression(multi_class='multinomial',max_iter=100)
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)

xi = XIClassifier(m=20)
p = xi.explain(X=x_test, y=y_pred, replicates=10, separation_measurement='L1')
```

Object `p` is a python dictionary mapping separation measurement and explanation.<br>
You can easily have access to the explanation:

```[python]
p.get('L1').explanation
```

You can choose from different separation measurement, as specified in the paper. You can specify one separation
measurement or more than one, using a list.

```[python]
p = xi.explain(X=x_test, y=y_pred, separation_measurement=['L1','Kuiper'])
```

Implemented separation measurement:

- Kullback - Leibler
- Kuiper
- L1
- L2
- Hellinger

You can get a list of implemented separation measurement running:

```[python]
from xi_method.utils import *
get_separation_measurement()
```

Plot your result:

```[python]
plot(separation_measurement='L1', type='tabular', explain=P, k=10)
```


## References
<a id="1">[1]</a> 
E. Borgonovo, V. Ghidini, R. Hahn a, E. Plischke (2023). 
Explaining classifiers with measures of statistical association
Computational Statistics and Data Analysis, Volume 182, June 2023, 107701

<a id="2">[2]</a> 
V. Ghidini (2023). 
The Xi method: unlocking the mysteries of regression with Statistics


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mfumagalli68/xi-method",
    "name": "xi-method",
    "maintainer": "marco.fumagalli",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "m.fumagalli68@gmail.com",
    "keywords": "XaI,explainable,AI",
    "author": "marco.fumagalli",
    "author_email": "m.fumagalli68@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3b/37/60c898d5cf12383cd3a1db9c34dbe4679a05935c0be1d1c5c43738c6e765/xi_method-0.1.7.tar.gz",
    "platform": null,
    "description": "![build](https://github.com/mfumagalli68/xi/actions/workflows/build.yml/badge.svg)\n![coverage](https://codecov.io/gh/mfumagalli68/xi/branch/main/graph/badge.svg)\n[![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)\n[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)\n[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-310/)\n[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-311/)\n\n\n<p align=\"center\">\n    <em>Xi - Post hoc explanations</em>\n</p>\n\n<p align=\"center\">\n    <img src=\"logo.PNG\">\n</p>\n\n\n**Xi** is a python package that implements the paper \n\"Explaining classifiers with measures of statistical association\"[[1]](#1)\nand \"The Xi method: unlocking the mysteries of regression with Statistics\"[[2]](#2)<br>\n\nThe growing size and complexity of data as well as the need of accurate predictions, forces analysts to use black-box\nmodel. While the success of those models extends statistical application, it also increases the need for\ninterpretability and ,when possible, explainability.\n\nThe paper proposes an approach to the problem based on measures of statistical association.<br>\nMeasures of statistical association deliver information regarding the strength of the statistical dependence between the\ntarget and the feature(s) of interest, inferring this insight from the data in a **model-agnostic** fashion.<br>\nIn this respect, we note that an important class of measures of statistical associations is represented by probabilistic\nsensitivity measures.<br>\n\nWe use these probabilistic sensitivity measures as part of the broad discourse of interpretability in statistical\nmachine learning. For brevity, we call this part the Xi-method.<br>\nBriefly, the method consists in evaluating ML model predictions comparing the values of probabilistic sensitivity\nmeasures obtained in a model-agnostic fashion, i.e., directly from the data, with the same indices computed replacing\nthe true targets with the ML model forecasts.<br>\n\nTo sum up, **Xi** has three main advantages:\n\n- Model agnostic: as long as your model outputs predictions, you can use **Xi** with any model\n- Data agnostic: **Xi** works with structured (tabular) and unstructured data ( text, image ).\n- Computationally reasonable\n\n# Installation\n\nInstall from pypi:\n\n```[python]\npip install xi-method\n```\n\n# Usage\n\nThe package is quite simple and it's designed to give you post hoc explainations for your dataset and machine learning\nmodel with minimal effort.<br> \nImport your data in a pandas dataframe format, splitting covariates and independent\nvariable.<br>\n\n```[python]\nfrom xi_method.utils import load_dataset\nfrom xi_method.ximp import *\n\n# load wine quality\ndf = load_wine_quality_red_dataset()\n\nY = df.quality\ndf.drop(columns='quality', inplace=True)\n```\n\nCreate an instance of `XIClassifier` or `XIRegressor` depending on the type of problem you are working with:<br>\n\n```[python]\nxi = XIClassifier(m=20)\n```\n\nFor the classification tasks, you can specify the number of partitions in three different ways:\n\n- `m`: number of partitions can be a dictionary or an integer. The dictionary should have covariate name as key and\n  number of desired partition as value. If m is an integer, the desired number of partition will be applied to all\n  covariates.\n- `discrete`: A list of covariates name you want to treat as categorical.\n- `obs`: A dictionary mapping covariates name to number of desired observations in each partition.\n\nFor regression tasks, you can only specify `m` as an integer.<br>\n\nA default `m` value will be computed if nothing is provided by the user, as indicated in the paper.<br>\n\nTo obtain post hoc explanations, run your favorite ML model, save the predictions as numpy array\nand provide the covariates ( test set) and the predictions to the method `explain`:\n\n```[python]\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import train_test_split\n\nx_train, x_test, y_train, y_test = train_test_split(df.values, Y, test_size=0.3, random_state=42)\nlr = LogisticRegression(multi_class='multinomial',max_iter=100)\nlr.fit(x_train, y_train)\ny_pred = lr.predict(x_test)\n\nxi = XIClassifier(m=20)\np = xi.explain(X=x_test, y=y_pred, replicates=10, separation_measurement='L1')\n```\n\nObject `p` is a python dictionary mapping separation measurement and explanation.<br>\nYou can easily have access to the explanation:\n\n```[python]\np.get('L1').explanation\n```\n\nYou can choose from different separation measurement, as specified in the paper. You can specify one separation\nmeasurement or more than one, using a list.\n\n```[python]\np = xi.explain(X=x_test, y=y_pred, separation_measurement=['L1','Kuiper'])\n```\n\nImplemented separation measurement:\n\n- Kullback - Leibler\n- Kuiper\n- L1\n- L2\n- Hellinger\n\nYou can get a list of implemented separation measurement running:\n\n```[python]\nfrom xi_method.utils import *\nget_separation_measurement()\n```\n\nPlot your result:\n\n```[python]\nplot(separation_measurement='L1', type='tabular', explain=P, k=10)\n```\n\n\n## References\n<a id=\"1\">[1]</a> \nE. Borgonovo, V. Ghidini, R. Hahn a, E. Plischke (2023). \nExplaining classifiers with measures of statistical association\nComputational Statistics and Data Analysis, Volume 182, June 2023, 107701\n\n<a id=\"2\">[2]</a> \nV. Ghidini (2023). \nThe Xi method: unlocking the mysteries of regression with Statistics\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Post hoc explanations for ML models through measures of statistical dependence",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/mfumagalli68/xi-method",
        "Repository": "https://github.com/mfumagalli68/xi-method"
    },
    "split_keywords": [
        "xai",
        "explainable",
        "ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "20801797116ee52865f08dcb0effc5b9906afe09f673dbcee87f5aebfb4bbca7",
                "md5": "24be913b508f757f9b1ace11ec01f688",
                "sha256": "87f5b87fc48559777d77de942f0d1aebcd98bd16b8b9f8b894991ea7ec78f379"
            },
            "downloads": -1,
            "filename": "xi_method-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "24be913b508f757f9b1ace11ec01f688",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 2876600,
            "upload_time": "2023-09-22T15:34:17",
            "upload_time_iso_8601": "2023-09-22T15:34:17.194520Z",
            "url": "https://files.pythonhosted.org/packages/20/80/1797116ee52865f08dcb0effc5b9906afe09f673dbcee87f5aebfb4bbca7/xi_method-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3b3760c898d5cf12383cd3a1db9c34dbe4679a05935c0be1d1c5c43738c6e765",
                "md5": "3be1bb2b52d3a9ad4eb1e958b97a8683",
                "sha256": "8b69120c29afdb2d710925a90a45af6b5ba0aa95518f09c946f2800136a3d2e7"
            },
            "downloads": -1,
            "filename": "xi_method-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "3be1bb2b52d3a9ad4eb1e958b97a8683",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 2821132,
            "upload_time": "2023-09-22T15:34:19",
            "upload_time_iso_8601": "2023-09-22T15:34:19.122205Z",
            "url": "https://files.pythonhosted.org/packages/3b/37/60c898d5cf12383cd3a1db9c34dbe4679a05935c0be1d1c5c43738c6e765/xi_method-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-22 15:34:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mfumagalli68",
    "github_project": "xi-method",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "xi-method"
}
        
Elapsed time: 0.15143s