CFNOW


NameCFNOW JSON
Version 0.0.1 PyPI version JSON
download
home_page
SummaryGenerate counterfactuals with ease. This package takes a model and point (with a certain class) and minimally changes it to flip the classification result.
upload_time2023-09-01 11:32:20
maintainer
docs_urlNone
authorRaphael Mazzine Barbosa de Oliveira
requires_python
licenseMIT
keywords counterfactuals counterfactual explanations flipping original class explainable artificial intelligence
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div style="text-align:center">
    <img src="/imgs/cfnow_hq_logo.gif" alt="image" width="100%"/>
</div>


## CFNOW - CounterFactual Nearest Optimal Wololo
![unittests](https://github.com/rmazzine/CFNOW/actions/workflows/unittests.yaml/badge.svg)
[![codecov](https://codecov.io/gh/rmazzine/CFNOW/graph/badge.svg?token=4NHY0V9CN9)](https://codecov.io/gh/rmazzine/CFNOW)


### Description

> TL;DR: You just need a `dataset point` and a `model prediction function`. CFNOW will find the closest point with a different class.

The simplest way to generate counterfactuals for any tabular dataset and model.

This package finds an optimal point (closer to the  input dataset point), which the classification is different from the original classification (i.e. "flips" the classification of the original input by minimally changin it).

## Table of Contents

- [Minimal example](#minimal-example)
- [Counterfactual Charts](#showing-the-counterfactuals-graphically)
- [I have: binary categorical features!](#i-have-binary-categorical-features)
- [I have: one-hot encoded features!](#i-have-one-hot-encoded-features)
- [I have: binary and OHE features!](#i-have-a-mix-of-binary-categorical-and-one-hot-encoded-features)
- [How to cite](#how-to-cite)

## Requirements

- Python >= 3.8

### Minimal example:
```python
from cfnow import find_tabular
import sklearn.datasets
import sklearn.ensemble
import pandas as pd

# Generating a sample model
X, y = sklearn.datasets.load_iris(return_X_y=True)
model = sklearn.ensemble.RandomForestClassifier()
model.fit(X, y)

# Selecting a random point
x = X[0]

# Here we can see the original class
print(f"Factual: {x}\nFactual class: {model.predict([x])}")

# Then, we use CFNOW to generate the minimum modification to change the classification
cf_obj = find_tabular(
    factual=pd.Series(x),
    model_predict_proba=model.predict_proba,
    limit_seconds=10)

# Here we can see the new class
print(f"CF: {cf_obj.cfs[0]}\nCF class: {model.predict([cf_obj.cfs[0]])}")
```

### Showing the Counterfactuals graphically
This package is integrated with [CounterPlots](https://github.com/ADMAntwerp/CounterPlots), that allows you to graphically represent your counterfactual explanations!

You can simply generate Greedy, CounterShapley, and Constellation charts for a given CF with:
#### Greedy
```python
# Get the counterplots for the first CF and save as greedy.png
cf_obj.generate_counterplots(0).greedy('greedy.png')
```
#### Output Example
![image](/imgs/greedy_ex.png)
#### CounterShapley
```python
# Get the counterplots for the first CF and save as greedy.png
cf_obj.generate_counterplots(0).countershapley('countershapley.png')
```
![image](/imgs/countershapley_ex.png)
#### Constellation
```python
# Get the counterplots for the first CF and save as greedy.png
cf_obj.generate_counterplots(0).constellation('constellation.png')
```
![image](/imgs/const_ex.png)

### Improving your results
The minimal example above considers all features as numerical continuous, however, some datasets can have categorical (binary or one-hot encoded) features. CFNOW can handle these data types in a simple way as demonstrated below:

### I have binary categorical features!
#### 1 - Prepare the dataset
```python
import pandas as pd
import numpy as np
import sklearn.ensemble

# Generate data with 5 binary categorical features and 3 continuous numerical features
X = np.hstack((
    np.random.randint(0, 2, size=(1000, 5)),
    np.random.rand(1000, 3) * 100
))

# Random binary target variable
y = np.random.randint(0, 2, 1000)

# Train RandomForestClassifier
model = sklearn.ensemble.RandomForestClassifier().fit(X, y)

# Display the original class for a random test sample
x = X[0]
print(f"Factual: {x}\nFactual class: {model.predict([x])}")
```

#### 2 - Find the CF
```python
from cfnow import find_tabular
# Then, we use CFNOW to generate the minimum modification to change the classification
cf_obj = find_tabular(
    factual=pd.Series(x),
    feat_types={i: 'cat' if i < 5 else 'cont' for i in range(8)},
    model_predict_proba=model.predict_proba,
    limit_seconds=10)

# Here we can see the new class
print(f"CF: {cf_obj.cfs[0]}\nCF class: {model.predict([cf_obj.cfs[0]])}")



```

### I have one-hot encoded features!
#### 1 - Prepare the dataset
```python
import warnings
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import sklearn.ensemble
warnings.filterwarnings("ignore", message="X does not have valid feature names, but RandomForestClassifier was fitted with feature names")


# Generate data
X = np.hstack((np.random.randint(0, 10, size=(1000, 5)), np.random.rand(1000, 3) * 100))

# One-hot encode the first 5 categorical columns

# !!!IMPORTANT!!! The naming of OHE encoding features columns MUST follow feature_value format.
# Therefore, for a feature called color with value equal to red or blue, the OHE encoding columns
# must be named color_red and color_blue. Otherwise, the CF will not be able to find the correct
# columns to modify.
encoder = OneHotEncoder(sparse=False)
X_cat_encoded = encoder.fit_transform(X[:, :5])
names = encoder.get_feature_names_out(['cat1', 'cat2', 'cat3', 'cat4', 'cat5'])

# Combine and convert to DataFrame
df = pd.DataFrame(np.hstack((X_cat_encoded, X[:, 5:])), columns=list(names) + ['num1', 'num2', 'num3'])

# Random binary target variable
y = np.random.randint(0, 2, 1000)

# Train RandomForestClassifier
model = sklearn.ensemble.RandomForestClassifier().fit(df, y)

# Display the original class for a random test sample
x = df.iloc[0]
print(f"Factual: {x.tolist()}\nFactual class: {model.predict([x])}")
```

#### 2 - Find the CF
```python
from cfnow import find_tabular
# Then, we use CFNOW to generate the minimum modification to change the classification
cf_obj = find_tabular(
    factual=x,
    feat_types={c: 'cat' if 'cat' in c else 'cont' for c in df.columns},
    has_ohe=True,
    model_predict_proba=model.predict_proba,
    limit_seconds=10)

# Here we can see the new class
print(f"CF: {cf_obj.cfs[0]}\nCF class: {model.predict([cf_obj.cfs[0]])}")
```

### I have one-hot and binary categorical features!
#### 1 - Prepare the dataset
```python
import warnings
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import sklearn.ensemble
warnings.filterwarnings("ignore", message="X does not have valid feature names, but RandomForestClassifier was fitted with feature names")


# Generate data
X = np.hstack((np.random.randint(0, 10, size=(1000, 5)), np.random.rand(1000, 3) * 100))

# One-hot encode the first 5 categorical columns

# !!!IMPORTANT!!! The naming of OHE encoding features columns MUST follow feature_value format.
# Therefore, for a feature called color with value equal to red or blue, the OHE encoding columns
# must be named color_red and color_blue. Otherwise, the CF will not be able to find the correct
# columns to modify. For binary, it is just sufficient to name refer the column as cat.
encoder = OneHotEncoder(sparse=False)
X_cat_encoded = encoder.fit_transform(X[:, :5])
names = encoder.get_feature_names_out(['cat1', 'cat2', 'cat3', 'cat4', 'cat5'])

# Combine and convert to DataFrame
df = pd.DataFrame(np.hstack((X_cat_encoded, X[:, 5:])), columns=list(names) + ['num1', 'num2', 'num3'])

# Random binary target variable
y = np.random.randint(0, 2, 1000)

# Train RandomForestClassifier
model = sklearn.ensemble.RandomForestClassifier().fit(df, y)

# Display the original class for a random test sample
x = df.iloc[0]
print(f"Factual: {x.tolist()}\nFactual class: {model.predict([x])}")
```

#### 2 - Find the CF
```python
from cfnow import find_tabular
# Then, we use CFNOW to generate the minimum modification to change the classification
cf_obj = find_tabular(
    factual=x,
    feat_types={c: 'cat' if 'cat' in c else 'cont' for c in df.columns},
    has_ohe=True,
    model_predict_proba=model.predict_proba,
    limit_seconds=10)

# Here we can see the new class
print(f"CF: {cf_obj.cfs[0]}\nCF class: {model.predict([cf_obj.cfs[0]])}")
```

## How to cite
If you use CFNOW in your research, please cite the following paper:
```
@article{DEOLIVEIRA2023,
title = {A model-agnostic and data-independent tabu search algorithm to generate counterfactuals for tabular, image, and text data},
journal = {European Journal of Operational Research},
year = {2023},
issn = {0377-2217},
doi = {https://doi.org/10.1016/j.ejor.2023.08.031},
url = {https://www.sciencedirect.com/science/article/pii/S0377221723006598},
author = {Raphael Mazzine Barbosa {de Oliveira} and Kenneth Sörensen and David Martens},
}
```
### Version 0.0.0
Initial package version


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "CFNOW",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "counterfactuals,counterfactual explanations,flipping original class,explainable artificial intelligence",
    "author": "Raphael Mazzine Barbosa de Oliveira",
    "author_email": "mazzine.r@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9e/39/c195d811339dd48d06833a23b6263247f381426d407a246cfafe10f8fdc0/CFNOW-0.0.1.tar.gz",
    "platform": null,
    "description": "<div style=\"text-align:center\">\n    <img src=\"/imgs/cfnow_hq_logo.gif\" alt=\"image\" width=\"100%\"/>\n</div>\n\n\n## CFNOW - CounterFactual Nearest Optimal Wololo\n![unittests](https://github.com/rmazzine/CFNOW/actions/workflows/unittests.yaml/badge.svg)\n[![codecov](https://codecov.io/gh/rmazzine/CFNOW/graph/badge.svg?token=4NHY0V9CN9)](https://codecov.io/gh/rmazzine/CFNOW)\n\n\n### Description\n\n> TL;DR: You just need a `dataset point` and a `model prediction function`. CFNOW will find the closest point with a different class.\n\nThe simplest way to generate counterfactuals for any tabular dataset and model.\n\nThis package finds an optimal point (closer to the  input dataset point), which the classification is different from the original classification (i.e. \"flips\" the classification of the original input by minimally changin it).\n\n## Table of Contents\n\n- [Minimal example](#minimal-example)\n- [Counterfactual Charts](#showing-the-counterfactuals-graphically)\n- [I have: binary categorical features!](#i-have-binary-categorical-features)\n- [I have: one-hot encoded features!](#i-have-one-hot-encoded-features)\n- [I have: binary and OHE features!](#i-have-a-mix-of-binary-categorical-and-one-hot-encoded-features)\n- [How to cite](#how-to-cite)\n\n## Requirements\n\n- Python >= 3.8\n\n### Minimal example:\n```python\nfrom cfnow import find_tabular\nimport sklearn.datasets\nimport sklearn.ensemble\nimport pandas as pd\n\n# Generating a sample model\nX, y = sklearn.datasets.load_iris(return_X_y=True)\nmodel = sklearn.ensemble.RandomForestClassifier()\nmodel.fit(X, y)\n\n# Selecting a random point\nx = X[0]\n\n# Here we can see the original class\nprint(f\"Factual: {x}\\nFactual class: {model.predict([x])}\")\n\n# Then, we use CFNOW to generate the minimum modification to change the classification\ncf_obj = find_tabular(\n    factual=pd.Series(x),\n    model_predict_proba=model.predict_proba,\n    limit_seconds=10)\n\n# Here we can see the new class\nprint(f\"CF: {cf_obj.cfs[0]}\\nCF class: {model.predict([cf_obj.cfs[0]])}\")\n```\n\n### Showing the Counterfactuals graphically\nThis package is integrated with [CounterPlots](https://github.com/ADMAntwerp/CounterPlots), that allows you to graphically represent your counterfactual explanations!\n\nYou can simply generate Greedy, CounterShapley, and Constellation charts for a given CF with:\n#### Greedy\n```python\n# Get the counterplots for the first CF and save as greedy.png\ncf_obj.generate_counterplots(0).greedy('greedy.png')\n```\n#### Output Example\n![image](/imgs/greedy_ex.png)\n#### CounterShapley\n```python\n# Get the counterplots for the first CF and save as greedy.png\ncf_obj.generate_counterplots(0).countershapley('countershapley.png')\n```\n![image](/imgs/countershapley_ex.png)\n#### Constellation\n```python\n# Get the counterplots for the first CF and save as greedy.png\ncf_obj.generate_counterplots(0).constellation('constellation.png')\n```\n![image](/imgs/const_ex.png)\n\n### Improving your results\nThe minimal example above considers all features as numerical continuous, however, some datasets can have categorical (binary or one-hot encoded) features. CFNOW can handle these data types in a simple way as demonstrated below:\n\n### I have binary categorical features!\n#### 1 - Prepare the dataset\n```python\nimport pandas as pd\nimport numpy as np\nimport sklearn.ensemble\n\n# Generate data with 5 binary categorical features and 3 continuous numerical features\nX = np.hstack((\n    np.random.randint(0, 2, size=(1000, 5)),\n    np.random.rand(1000, 3) * 100\n))\n\n# Random binary target variable\ny = np.random.randint(0, 2, 1000)\n\n# Train RandomForestClassifier\nmodel = sklearn.ensemble.RandomForestClassifier().fit(X, y)\n\n# Display the original class for a random test sample\nx = X[0]\nprint(f\"Factual: {x}\\nFactual class: {model.predict([x])}\")\n```\n\n#### 2 - Find the CF\n```python\nfrom cfnow import find_tabular\n# Then, we use CFNOW to generate the minimum modification to change the classification\ncf_obj = find_tabular(\n    factual=pd.Series(x),\n    feat_types={i: 'cat' if i < 5 else 'cont' for i in range(8)},\n    model_predict_proba=model.predict_proba,\n    limit_seconds=10)\n\n# Here we can see the new class\nprint(f\"CF: {cf_obj.cfs[0]}\\nCF class: {model.predict([cf_obj.cfs[0]])}\")\n\n\n\n```\n\n### I have one-hot encoded features!\n#### 1 - Prepare the dataset\n```python\nimport warnings\nimport pandas as pd\nimport numpy as np\nfrom sklearn.preprocessing import OneHotEncoder\nimport sklearn.ensemble\nwarnings.filterwarnings(\"ignore\", message=\"X does not have valid feature names, but RandomForestClassifier was fitted with feature names\")\n\n\n# Generate data\nX = np.hstack((np.random.randint(0, 10, size=(1000, 5)), np.random.rand(1000, 3) * 100))\n\n# One-hot encode the first 5 categorical columns\n\n# !!!IMPORTANT!!! The naming of OHE encoding features columns MUST follow feature_value format.\n# Therefore, for a feature called color with value equal to red or blue, the OHE encoding columns\n# must be named color_red and color_blue. Otherwise, the CF will not be able to find the correct\n# columns to modify.\nencoder = OneHotEncoder(sparse=False)\nX_cat_encoded = encoder.fit_transform(X[:, :5])\nnames = encoder.get_feature_names_out(['cat1', 'cat2', 'cat3', 'cat4', 'cat5'])\n\n# Combine and convert to DataFrame\ndf = pd.DataFrame(np.hstack((X_cat_encoded, X[:, 5:])), columns=list(names) + ['num1', 'num2', 'num3'])\n\n# Random binary target variable\ny = np.random.randint(0, 2, 1000)\n\n# Train RandomForestClassifier\nmodel = sklearn.ensemble.RandomForestClassifier().fit(df, y)\n\n# Display the original class for a random test sample\nx = df.iloc[0]\nprint(f\"Factual: {x.tolist()}\\nFactual class: {model.predict([x])}\")\n```\n\n#### 2 - Find the CF\n```python\nfrom cfnow import find_tabular\n# Then, we use CFNOW to generate the minimum modification to change the classification\ncf_obj = find_tabular(\n    factual=x,\n    feat_types={c: 'cat' if 'cat' in c else 'cont' for c in df.columns},\n    has_ohe=True,\n    model_predict_proba=model.predict_proba,\n    limit_seconds=10)\n\n# Here we can see the new class\nprint(f\"CF: {cf_obj.cfs[0]}\\nCF class: {model.predict([cf_obj.cfs[0]])}\")\n```\n\n### I have one-hot and binary categorical features!\n#### 1 - Prepare the dataset\n```python\nimport warnings\nimport pandas as pd\nimport numpy as np\nfrom sklearn.preprocessing import OneHotEncoder\nimport sklearn.ensemble\nwarnings.filterwarnings(\"ignore\", message=\"X does not have valid feature names, but RandomForestClassifier was fitted with feature names\")\n\n\n# Generate data\nX = np.hstack((np.random.randint(0, 10, size=(1000, 5)), np.random.rand(1000, 3) * 100))\n\n# One-hot encode the first 5 categorical columns\n\n# !!!IMPORTANT!!! The naming of OHE encoding features columns MUST follow feature_value format.\n# Therefore, for a feature called color with value equal to red or blue, the OHE encoding columns\n# must be named color_red and color_blue. Otherwise, the CF will not be able to find the correct\n# columns to modify. For binary, it is just sufficient to name refer the column as cat.\nencoder = OneHotEncoder(sparse=False)\nX_cat_encoded = encoder.fit_transform(X[:, :5])\nnames = encoder.get_feature_names_out(['cat1', 'cat2', 'cat3', 'cat4', 'cat5'])\n\n# Combine and convert to DataFrame\ndf = pd.DataFrame(np.hstack((X_cat_encoded, X[:, 5:])), columns=list(names) + ['num1', 'num2', 'num3'])\n\n# Random binary target variable\ny = np.random.randint(0, 2, 1000)\n\n# Train RandomForestClassifier\nmodel = sklearn.ensemble.RandomForestClassifier().fit(df, y)\n\n# Display the original class for a random test sample\nx = df.iloc[0]\nprint(f\"Factual: {x.tolist()}\\nFactual class: {model.predict([x])}\")\n```\n\n#### 2 - Find the CF\n```python\nfrom cfnow import find_tabular\n# Then, we use CFNOW to generate the minimum modification to change the classification\ncf_obj = find_tabular(\n    factual=x,\n    feat_types={c: 'cat' if 'cat' in c else 'cont' for c in df.columns},\n    has_ohe=True,\n    model_predict_proba=model.predict_proba,\n    limit_seconds=10)\n\n# Here we can see the new class\nprint(f\"CF: {cf_obj.cfs[0]}\\nCF class: {model.predict([cf_obj.cfs[0]])}\")\n```\n\n## How to cite\nIf you use CFNOW in your research, please cite the following paper:\n```\n@article{DEOLIVEIRA2023,\ntitle = {A model-agnostic and data-independent tabu search algorithm to generate counterfactuals for tabular, image, and text data},\njournal = {European Journal of Operational Research},\nyear = {2023},\nissn = {0377-2217},\ndoi = {https://doi.org/10.1016/j.ejor.2023.08.031},\nurl = {https://www.sciencedirect.com/science/article/pii/S0377221723006598},\nauthor = {Raphael Mazzine Barbosa {de Oliveira} and Kenneth S\u00f6rensen and David Martens},\n}\n```\n### Version 0.0.0\nInitial package version\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generate counterfactuals with ease. This package takes a model and point (with a certain class) and minimally changes it to flip the classification result.",
    "version": "0.0.1",
    "project_urls": null,
    "split_keywords": [
        "counterfactuals",
        "counterfactual explanations",
        "flipping original class",
        "explainable artificial intelligence"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "34768581845846e3d5c192fa8d1e1e67c1233b330bb573d227dd7511a0e36f3a",
                "md5": "2a171a0bf25203910d8b808eb25b8ea8",
                "sha256": "991a1ee7076c28a55f5c485239e67594a94222c0982cf0db49293512ddce2f7f"
            },
            "downloads": -1,
            "filename": "CFNOW-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2a171a0bf25203910d8b808eb25b8ea8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 33781,
            "upload_time": "2023-09-01T11:32:19",
            "upload_time_iso_8601": "2023-09-01T11:32:19.075922Z",
            "url": "https://files.pythonhosted.org/packages/34/76/8581845846e3d5c192fa8d1e1e67c1233b330bb573d227dd7511a0e36f3a/CFNOW-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e39c195d811339dd48d06833a23b6263247f381426d407a246cfafe10f8fdc0",
                "md5": "675b060ade0e33a36e4eaa517977a809",
                "sha256": "bf9fdcc710a44683b3001cd1e8797af7dff2d3499e1ad2e855e7016693e076ef"
            },
            "downloads": -1,
            "filename": "CFNOW-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "675b060ade0e33a36e4eaa517977a809",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 31162,
            "upload_time": "2023-09-01T11:32:20",
            "upload_time_iso_8601": "2023-09-01T11:32:20.985679Z",
            "url": "https://files.pythonhosted.org/packages/9e/39/c195d811339dd48d06833a23b6263247f381426d407a246cfafe10f8fdc0/CFNOW-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-01 11:32:20",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cfnow"
}
        
Elapsed time: 2.23656s