merlinxai


Namemerlinxai JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/pypa/sampleproject
SummaryMERLIN
upload_time2023-09-15 08:08:26
maintainer
docs_urlNone
authorAndrea Seveso
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Pypi Downloads](https://img.shields.io/pypi/dm/MERLINXAI.svg?label=Pypi%20downloads)](https://pypi.org/project/MERLINXAI/)

<!-- [![DOI:10.1016/j.inffus.2021.11.016](http://img.shields.io/badge/DOI-10.1016/j.inffus.2021.11.016-blue.svg)](https://doi.org/10.1016/j.inffus.2021.11.016) -->
<!-- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hb4KN0SYxdj9SaExqqFGmAXAIyUnBVnA?usp=sharing) -->

[![Stars](https://img.shields.io/github/stars/Crisp-Unimib/MERLIN?style=social)](https://github.com/Crisp-Unimib/MERLIN)
[![Watchers](https://img.shields.io/github/watchers/Crisp-Unimib/MERLIN?style=social)](https://github.com/Crisp-Unimib/MERLIN)

# MERLIN

![](/img/MERLIN.jpg)

**_MERLIN is a global, model-agnostic, contrastive explainer for any tabular or text classifier_**. It provides contrastive explanations of how the behaviour of two machine learning models differs.

Imagine we have a machine learning classifier, let's say M1, and wish to understand how -and to what extent- it differs from a second model M2.
MERLIN aims at answering to the following questions:

1. Can we estimate to what extent M2 classifies data coherently to the predictions made by the M1 model?
2. Why do the criteria used by M1 result in class _c_, but M2 does not use the same criteria to classify as _c_?
3. Can we use natural language to explain the differences between models making them more comprehensible to final users?

For details and citations, see the [references' section](#References).

## Install

MERLIN is available on [PyPi](https://pypi.org/project/MERLINXAI/). Simply run:

```
pip install merlinxai
```

Or clone the repository and run:

```
pip install .
```

The PyEDA package is required but has not been added to the dependencies.
This is due to installation errors on Windows. If you are on Linux or Mac, you
should be able to install it by running:

```
pip3 install pyeda
```

However, if you are on Windows, we found that the best way to install is through
Christophe Gohlke's [pythonlibs page](https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyeda).
For further information, please consult the official PyEDA
[installation documentation](https://pyeda.readthedocs.io/en/latest/install.html).

To produce the PDF files, a Graphviz installation is also required.
Full documentation on how to install Graphviz on any platform is available
[here](https://graphviz.org/download/).

## Input

MERLIN takes as input the _"feature data"_ (can be training or test, tabular or free text) and the corresponding _"labels"_ predicted by the classifier. This means you don't need to wrap MERLIN within your code at all!
As optional parameters, the user can specify:

- the coverage of the dataset to be used (default is 100%); otherwise, a sampling procedure is used;
- the surrogate type to be used (decision tree or rulefit);
- a set of hyperparameters to be used for creating the most accurate surrogate models;
- the size of the test set to measure the fidelity of the surrogates.

## MERLIN on tabular data

In this example, we apply MERLIN on a tabular dataset named _Occupancy_, which revolves around predicting occupancy in an office room based on sensor measurements of light, temperature, humidity, and CO2 levels.
In this case, M1 is responsible for classifying instances during the daytime, while M2 handles instances during the nighttime.

```
from merlin import MERLIN

exp = MERLIN(X_left, predicted_labels_left,
             X_right, predicted_labels_right,
             data_type='tabular', surrogate_type='sklearn',
             save_path=f'results/',)

exp.run_trace()
```

### BDD2Text

The BDD2Text for _Occupancy_ reveals that one path has not changed between M1 and M2: a high level of light, in the 4th quartile, means that the room is well-lit and is the best indicator for showing whether it is occupied or not.

There is also one added path in M2: at nighttime, having the light variable in the 3rd quartile now leads to a positive classification, which was not true in M1. During the daytime, the light in this 3rd quartile would not have been sufficient to classify a data instance positively, but it is so during nighttime.

```
exp.run_explain()
exp.explain.BDD2Text()
```

&nbsp;
&nbsp;

<p align="center">
<img src="/img/bdd2text.png" width="700" >
</p>
&nbsp;
&nbsp;

### Get Rules

The NLE shows the differences between the two models. However, a user might also wish to see example instances in the datasets where these rules apply.

To do so, MERLIN provides the _get_rule_examples_ function, which requires the user to specify a rule to be applied and the number of examples to show.

```
exp.data_manager['left'].get_rule_examples(rule, n_examples=5)
```

&nbsp;
&nbsp;

<p align="center">
<img src="/img/get_examples.PNG" width="700" >
</p>
&nbsp;
&nbsp;

## MERLIN on text data

The same process can also be applied to text classifiers. For example, in the _20newsgroups_ dataset, one might closely look at class _atheism_ as for this class, the number of deleted paths is higher than the added ones.

### BDD2Text

The NLE for _atheism_ shows the presence of the word _bill_ leads the retrained classifier M2 to assign the label _atheism_ to a specific record, whilst the presence of such a feature was not a criterion for the previous classifier M1.
Conversely, the explanation shows that M1 used the feature _keith_ to assign the label, whilst M2 discarded this rule.

Both terms refer to the name of the posts' authors: _Bill_'s posts are only contained within the dataset used to retrain whilst _Keith_'s ones are more frequent in the initial dataset rather than the second one (dataset taken from _Jin, P., Zhang, Y., Chen, X., & Xia, Y. Bag-of-embeddings for text classification. In IJCAI-2016_).

Finally, M2 discarded the rule _having political atheist_ that was sufficient for M1 for classifying the instance.

&nbsp;
&nbsp;

<p align="center">
<img src="/img/bdd2text_atheism.PNG" width="700" >
</p>
&nbsp;
&nbsp;

## Tutorials and Usage

A complete example of MERLIN usage is provided in the notebook ["MERLIN Demo"](/MERLIN%20Demo.ipynb) inside of the main repository folder. A notebook example with ML model training is also available in this repository, which can also be accessed in this [Google Colab notebook](https://colab.research.google.com/drive/1hb4KN0SYxdj9SaExqqFGmAXAIyUnBVnA?usp=sharing).

## References

To cite MERLIN please refer to [the following paper](https://www.sciencedirect.com/science/article/pii/S016792362300115X)

```
@article{malandri2023model,
  title={Model-contrastive explanations through symbolic reasoning},
  author={Malandri, Lorenzo and Mercorio, Fabio and Mezzanzanica, Mario and Seveso, Andrea},
  journal={Decision Support Systems},
  pages={114040},
  year={2023},
  publisher={Elsevier}
}
```

MERLIN generalizes the approach proposed in _Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N., & Seveso, A. (2022). ContrXT: Generating contrastive explanations from any text classifier. Information Fusion, 81, 103-115._ [(bibtex)](https://scholar.googleusercontent.com/scholar.bib?q=info:0m4K2oHziA8J:scholar.google.com/&output=citation&scisdr=Cm3RQ6UsEMDigjKD5sU:AGlGAw8AAAAAZJGF_sX6i_Yv-u1e4Uchy_LnXps&scisig=AGlGAw8AAAAAZJGF_olHzQufUAHR9c2EorlOe2s&scisf=4&ct=citation&cd=-1&hl=en)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pypa/sampleproject",
    "name": "merlinxai",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Andrea Seveso",
    "author_email": "andrea.seveso@unimib.it",
    "download_url": "https://files.pythonhosted.org/packages/10/6e/a0a82dd356309a88fe24f0a2474cdd63b03c79fb5d9f2175b298197bcbff/merlinxai-0.1.2.tar.gz",
    "platform": null,
    "description": "[![Pypi Downloads](https://img.shields.io/pypi/dm/MERLINXAI.svg?label=Pypi%20downloads)](https://pypi.org/project/MERLINXAI/)\r\n\r\n<!-- [![DOI:10.1016/j.inffus.2021.11.016](http://img.shields.io/badge/DOI-10.1016/j.inffus.2021.11.016-blue.svg)](https://doi.org/10.1016/j.inffus.2021.11.016) -->\r\n<!-- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hb4KN0SYxdj9SaExqqFGmAXAIyUnBVnA?usp=sharing) -->\r\n\r\n[![Stars](https://img.shields.io/github/stars/Crisp-Unimib/MERLIN?style=social)](https://github.com/Crisp-Unimib/MERLIN)\r\n[![Watchers](https://img.shields.io/github/watchers/Crisp-Unimib/MERLIN?style=social)](https://github.com/Crisp-Unimib/MERLIN)\r\n\r\n# MERLIN\r\n\r\n![](/img/MERLIN.jpg)\r\n\r\n**_MERLIN is a global, model-agnostic, contrastive explainer for any tabular or text classifier_**. It provides contrastive explanations of how the behaviour of two machine learning models differs.\r\n\r\nImagine we have a machine learning classifier, let's say M1, and wish to understand how -and to what extent- it differs from a second model M2.\r\nMERLIN aims at answering to the following questions:\r\n\r\n1. Can we estimate to what extent M2 classifies data coherently to the predictions made by the M1 model?\r\n2. Why do the criteria used by M1 result in class _c_, but M2 does not use the same criteria to classify as _c_?\r\n3. Can we use natural language to explain the differences between models making them more comprehensible to final users?\r\n\r\nFor details and citations, see the [references' section](#References).\r\n\r\n## Install\r\n\r\nMERLIN is available on [PyPi](https://pypi.org/project/MERLINXAI/). Simply run:\r\n\r\n```\r\npip install merlinxai\r\n```\r\n\r\nOr clone the repository and run:\r\n\r\n```\r\npip install .\r\n```\r\n\r\nThe PyEDA package is required but has not been added to the dependencies.\r\nThis is due to installation errors on Windows. If you are on Linux or Mac, you\r\nshould be able to install it by running:\r\n\r\n```\r\npip3 install pyeda\r\n```\r\n\r\nHowever, if you are on Windows, we found that the best way to install is through\r\nChristophe Gohlke's [pythonlibs page](https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyeda).\r\nFor further information, please consult the official PyEDA\r\n[installation documentation](https://pyeda.readthedocs.io/en/latest/install.html).\r\n\r\nTo produce the PDF files, a Graphviz installation is also required.\r\nFull documentation on how to install Graphviz on any platform is available\r\n[here](https://graphviz.org/download/).\r\n\r\n## Input\r\n\r\nMERLIN takes as input the _\"feature data\"_ (can be training or test, tabular or free text) and the corresponding _\"labels\"_ predicted by the classifier. This means you don't need to wrap MERLIN within your code at all!\r\nAs optional parameters, the user can specify:\r\n\r\n- the coverage of the dataset to be used (default is 100%); otherwise, a sampling procedure is used;\r\n- the surrogate type to be used (decision tree or rulefit);\r\n- a set of hyperparameters to be used for creating the most accurate surrogate models;\r\n- the size of the test set to measure the fidelity of the surrogates.\r\n\r\n## MERLIN on tabular data\r\n\r\nIn this example, we apply MERLIN on a tabular dataset named _Occupancy_, which revolves around predicting occupancy in an office room based on sensor measurements of light, temperature, humidity, and CO2 levels.\r\nIn this case, M1 is responsible for classifying instances during the daytime, while M2 handles instances during the nighttime.\r\n\r\n```\r\nfrom merlin import MERLIN\r\n\r\nexp = MERLIN(X_left, predicted_labels_left,\r\n             X_right, predicted_labels_right,\r\n             data_type='tabular', surrogate_type='sklearn',\r\n             save_path=f'results/',)\r\n\r\nexp.run_trace()\r\n```\r\n\r\n### BDD2Text\r\n\r\nThe BDD2Text for _Occupancy_ reveals that one path has not changed between M1 and M2: a high level of light, in the 4th quartile, means that the room is well-lit and is the best indicator for showing whether it is occupied or not.\r\n\r\nThere is also one added path in M2: at nighttime, having the light variable in the 3rd quartile now leads to a positive classification, which was not true in M1. During the daytime, the light in this 3rd quartile would not have been sufficient to classify a data instance positively, but it is so during nighttime.\r\n\r\n```\r\nexp.run_explain()\r\nexp.explain.BDD2Text()\r\n```\r\n\r\n&nbsp;\r\n&nbsp;\r\n\r\n<p align=\"center\">\r\n<img src=\"/img/bdd2text.png\" width=\"700\" >\r\n</p>\r\n&nbsp;\r\n&nbsp;\r\n\r\n### Get Rules\r\n\r\nThe NLE shows the differences between the two models. However, a user might also wish to see example instances in the datasets where these rules apply.\r\n\r\nTo do so, MERLIN provides the _get_rule_examples_ function, which requires the user to specify a rule to be applied and the number of examples to show.\r\n\r\n```\r\nexp.data_manager['left'].get_rule_examples(rule, n_examples=5)\r\n```\r\n\r\n&nbsp;\r\n&nbsp;\r\n\r\n<p align=\"center\">\r\n<img src=\"/img/get_examples.PNG\" width=\"700\" >\r\n</p>\r\n&nbsp;\r\n&nbsp;\r\n\r\n## MERLIN on text data\r\n\r\nThe same process can also be applied to text classifiers. For example, in the _20newsgroups_ dataset, one might closely look at class _atheism_ as for this class, the number of deleted paths is higher than the added ones.\r\n\r\n### BDD2Text\r\n\r\nThe NLE for _atheism_ shows the presence of the word _bill_ leads the retrained classifier M2 to assign the label _atheism_ to a specific record, whilst the presence of such a feature was not a criterion for the previous classifier M1.\r\nConversely, the explanation shows that M1 used the feature _keith_ to assign the label, whilst M2 discarded this rule.\r\n\r\nBoth terms refer to the name of the posts' authors: _Bill_'s posts are only contained within the dataset used to retrain whilst _Keith_'s ones are more frequent in the initial dataset rather than the second one (dataset taken from _Jin, P., Zhang, Y., Chen, X., & Xia, Y. Bag-of-embeddings for text classification. In IJCAI-2016_).\r\n\r\nFinally, M2 discarded the rule _having political atheist_ that was sufficient for M1 for classifying the instance.\r\n\r\n&nbsp;\r\n&nbsp;\r\n\r\n<p align=\"center\">\r\n<img src=\"/img/bdd2text_atheism.PNG\" width=\"700\" >\r\n</p>\r\n&nbsp;\r\n&nbsp;\r\n\r\n## Tutorials and Usage\r\n\r\nA complete example of MERLIN usage is provided in the notebook [\"MERLIN Demo\"](/MERLIN%20Demo.ipynb) inside of the main repository folder. A notebook example with ML model training is also available in this repository, which can also be accessed in this [Google Colab notebook](https://colab.research.google.com/drive/1hb4KN0SYxdj9SaExqqFGmAXAIyUnBVnA?usp=sharing).\r\n\r\n## References\r\n\r\nTo cite MERLIN please refer to [the following paper](https://www.sciencedirect.com/science/article/pii/S016792362300115X)\r\n\r\n```\r\n@article{malandri2023model,\r\n  title={Model-contrastive explanations through symbolic reasoning},\r\n  author={Malandri, Lorenzo and Mercorio, Fabio and Mezzanzanica, Mario and Seveso, Andrea},\r\n  journal={Decision Support Systems},\r\n  pages={114040},\r\n  year={2023},\r\n  publisher={Elsevier}\r\n}\r\n```\r\n\r\nMERLIN generalizes the approach proposed in _Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N., & Seveso, A. (2022). ContrXT: Generating contrastive explanations from any text classifier. Information Fusion, 81, 103-115._ [(bibtex)](https://scholar.googleusercontent.com/scholar.bib?q=info:0m4K2oHziA8J:scholar.google.com/&output=citation&scisdr=Cm3RQ6UsEMDigjKD5sU:AGlGAw8AAAAAZJGF_sX6i_Yv-u1e4Uchy_LnXps&scisig=AGlGAw8AAAAAZJGF_olHzQufUAHR9c2EorlOe2s&scisf=4&ct=citation&cd=-1&hl=en)\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "MERLIN",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/pypa/sampleproject/issues",
        "Homepage": "https://github.com/pypa/sampleproject"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e932093ddd58e90809beff1db9fbea24535ba09ca9613ea47226b0de86a06f85",
                "md5": "042a1de2495e99b89dc52aa7245c0a92",
                "sha256": "203da3360aa7b6a065f218cabfafd0ad863b17a1e52a61ebb4355b161dd3d111"
            },
            "downloads": -1,
            "filename": "merlinxai-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "042a1de2495e99b89dc52aa7245c0a92",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 30857,
            "upload_time": "2023-09-15T08:08:24",
            "upload_time_iso_8601": "2023-09-15T08:08:24.770545Z",
            "url": "https://files.pythonhosted.org/packages/e9/32/093ddd58e90809beff1db9fbea24535ba09ca9613ea47226b0de86a06f85/merlinxai-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "106ea0a82dd356309a88fe24f0a2474cdd63b03c79fb5d9f2175b298197bcbff",
                "md5": "d6ba63874c489a289ae8a33f43f86ff4",
                "sha256": "e9f020cbe1d92a890a4e07f4b28b729a5e65898aee68ac339a107482b34bcb9f"
            },
            "downloads": -1,
            "filename": "merlinxai-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "d6ba63874c489a289ae8a33f43f86ff4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 30465,
            "upload_time": "2023-09-15T08:08:26",
            "upload_time_iso_8601": "2023-09-15T08:08:26.066346Z",
            "url": "https://files.pythonhosted.org/packages/10/6e/a0a82dd356309a88fe24f0a2474cdd63b03c79fb5d9f2175b298197bcbff/merlinxai-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-15 08:08:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pypa",
    "github_project": "sampleproject",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "merlinxai"
}
        
Elapsed time: 0.40691s