# AEQUITAS Core Library
## Introduction
The AEQUITAS core library is one of the key components of the framework proposed by the AEQUITAS European project. The framework seeks to address and combat the various forms of bias and unfairness in AI systems by providing a controlled experimentation environment for AI developers.
This environment allows testing, assessing, and refining socio-technical systems (STS) to identify and mitigate their potentially biased behaviour. Such an environment lets users conduct experiments to evaluate their AI models' performance and behavior under different conditions. The end goal of such evaluation is to facilitate the creation of fairness-compliant AI systems. In other words, this environment empowers developers to make informed decisions about how to understand the fairness related limitations of their AI systems and correct them.
The core library is the component which allows users (precisely developers among all the possible users of the framework) to do essentially two things:
- detect bias within AI models through dedicated metrics
- mitigate the bias (if it exists) using the provided techniques
The library wraps the functionalities provided by the AIF360 library developed by IBM (<https://aif360.res.ibm.com>) while still giving developers the possibility to add their own bias detection or mitigation techniques. More details on the library's whole structure and examples on how its functions can be used as part of code will be given in the next sections.
Overall, we stress that even if the core library is a critical component of the framework proposed by AEQUITAS, its other intended usage is as a standalone Python library for working on AI fairness. In this document it will be presented without describing how it ties to all the other pieces of the framework. The focus will strictly be on the functionalities it provides as a off the shelf library.
## How to use
The first step to use the core library is to install it through pip by running the simple command:
```shell
pip install aequitas-core
```
Once all the required packages have been installed you can start exploiting the library's functionalities inside your own code.
## Examples
### Creating a dataset
```python
ds = create_dataset(
dataset_type="binary label",
unprivileged_groups=[{'prot_attr': 0}],
privileged_groups=[{'prot_attr': 1}],
imputation_strategy=MeanImputationStrategy(),
favorable_label=1.,
unfavorable_label=0.,
df=df,
label_names=['label'],
protected_attribute_names=['prot_attr']
)
```
The function `create_dataset` allows to instantiate objects of various classes, each representing a specific type of dataset. In this case, the call to `create_dataset` instantiates an object of type `BinaryLabelDataset`. This class represents datasets for classification purposes whose samples are assigned a label from $\{0,1\}$. In the example, the function is called by passing these parameters:
- `dataset_type`: it is a string which specifies the dataset type. The user won't need to remember the actual type of the object returned by the function because it is all handled by the `create_dataset` function interval
- `unprivileged_groups` and `privileged_groups`: these two parameters are used to distinguish between individuals belonging tho the *unprivileged* and *privileged* groups, depending on the value assigned to the protected attribute `prot_attr`
- `imputation_strategy`: the strategy adopted to impute the missing values is specified by the type of the class passed as value for this parameter. In this case, the class `MeanImputationStrategy` indicates that the missing values will be imputed by relying on the mean of the existing ones
- `unfavorable_label` and `favorable_label`: a label given by a system to an individual can be either *unfavorable* or *favorable* depending on the outcome of the decision made by the system (*i.e.* an unfavorable label corresponds to a negative outcome)
- `df`: the pandas dataframe containing the actual data. It needs to have columns named according to the values of the other two parameters, `label_names` and `protected_attribute_names`
The other supported dataset types are the following:
- `MuticlassLabelDataset`: the elements of this class are those datasets whose samples can be assigned non binary labels (*e.g.* labels from the set $\{0,1, 2, 3, 4\}$).
- `RegressionDataset`: it represents datasets for regression tasks. In these dataset samples are not given any label. Instead, the objective is to predict the value of a given target variable.
To instantiate a `MulticlassLabelDataset` one would call the `create_dataset` function as such:
```python
ds = create_dataset(
"multi class",
unprivileged_groups=[{'prot_attr': 0}],
privileged_groups=[{'prot_attr': 1}],
imputation_strategy=MCMCImputationStrategy(),
favorable_label=[0, 1., 2.],
unfavorable_label=[3., 4.],
df=df,
label_names=['label'],
protected_attribute_names=['prot_attr']
)
```
The only thing to note in this case, is that the `favorable_label` and `unfavorable_label` parameters are assigned lists and not single values as it happened in the previous example.
Finally, to create a dataset for regression tasks, the call to `create_dataset` would be:
```python
ds = create_dataset(
dataset_type="regression",
unprivileged_groups=[{'color': 'b'}],
privileged_groups=[{'color': 'r'}],
imputation_strategy=MeanImputationStrategy(),
df=df,
dep_var_name='score',
protected_attribute_names=['color'],
privileged_classes=[['r']]
)
```
The parameter `dep_var_name` refers to the *dependent* variable, which, in regression tasks, is the variable whose value has to be predicted.
**Note**: this readme, as well as the library itself is still a work in progress.
Raw data
{
"_id": null,
"home_page": "https://github.com/aequitas-aod/core-lib",
"name": "aequitas-core",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9.18",
"maintainer_email": null,
"keywords": "aeuitas, horizon2020, xai, bias",
"author": "Giovanni Ciatto",
"author_email": "giovanni.ciatto@unibo.it",
"download_url": "https://files.pythonhosted.org/packages/47/ab/ed4f3bb2bbeef91053104395ad09d18df426c646cddf730282731e0cb61b/aequitas_core-2.1.1.tar.gz",
"platform": "Independant",
"description": "# AEQUITAS Core Library\n\n## Introduction\n\nThe AEQUITAS core library is one of the key components of the framework proposed by the AEQUITAS European project. The framework seeks to address and combat the various forms of bias and unfairness in AI systems by providing a controlled experimentation environment for AI developers.\n\nThis environment allows testing, assessing, and refining socio-technical systems (STS) to identify and mitigate their potentially biased behaviour. Such an environment lets users conduct experiments to evaluate their AI models' performance and behavior under different conditions. The end goal of such evaluation is to facilitate the creation of fairness-compliant AI systems. In other words, this environment empowers developers to make informed decisions about how to understand the fairness related limitations of their AI systems and correct them.\n\nThe core library is the component which allows users (precisely developers among all the possible users of the framework) to do essentially two things:\n\n- detect bias within AI models through dedicated metrics\n- mitigate the bias (if it exists) using the provided techniques\n\nThe library wraps the functionalities provided by the AIF360 library developed by IBM (<https://aif360.res.ibm.com>) while still giving developers the possibility to add their own bias detection or mitigation techniques. More details on the library's whole structure and examples on how its functions can be used as part of code will be given in the next sections.\n\nOverall, we stress that even if the core library is a critical component of the framework proposed by AEQUITAS, its other intended usage is as a standalone Python library for working on AI fairness. In this document it will be presented without describing how it ties to all the other pieces of the framework. The focus will strictly be on the functionalities it provides as a off the shelf library.\n\n## How to use\n\nThe first step to use the core library is to install it through pip by running the simple command:\n\n```shell\npip install aequitas-core\n```\n\nOnce all the required packages have been installed you can start exploiting the library's functionalities inside your own code.\n\n## \u00a0Examples\n\n### Creating a dataset\n\n```python\nds = create_dataset(\n dataset_type=\"binary label\",\n unprivileged_groups=[{'prot_attr': 0}],\n privileged_groups=[{'prot_attr': 1}],\n imputation_strategy=MeanImputationStrategy(),\n favorable_label=1.,\n unfavorable_label=0.,\n df=df,\n label_names=['label'],\n protected_attribute_names=['prot_attr']\n )\n```\n\nThe function `create_dataset` allows to instantiate objects of various classes, each representing a specific type of dataset. In this case, the call to `create_dataset` instantiates an object of type `BinaryLabelDataset`. This class represents datasets for classification purposes whose samples are assigned a label from $\\{0,1\\}$. In the example, the function is called by passing these parameters:\n\n- `dataset_type`: it is a string which specifies the dataset type. The user won't need to remember the actual type of the object returned by the function because it is all handled by the `create_dataset` function interval\n- `unprivileged_groups` and `privileged_groups`: these two parameters are used to distinguish between individuals belonging tho the *unprivileged* and *privileged* groups, depending on the value assigned to the protected attribute `prot_attr`\n- `imputation_strategy`: the strategy adopted to impute the missing values is specified by the type of the class passed as value for this parameter. In this case, the class `MeanImputationStrategy` indicates that the missing values will be imputed by relying on the mean of the existing ones\n- `unfavorable_label` and `favorable_label`: a label given by a system to an individual can be either *unfavorable* or *favorable* depending on the outcome of the decision made by the system (*i.e.* an unfavorable label corresponds to a negative outcome)\n- `df`: the pandas dataframe containing the actual data. It needs to have columns named according to the values of the other two parameters, `label_names` and `protected_attribute_names`\n\nThe other supported dataset types are the following:\n\n- `MuticlassLabelDataset`: the elements of this class are those datasets whose samples can be assigned non binary labels (*e.g.* labels from the set $\\{0,1, 2, 3, 4\\}$).\n- `RegressionDataset`: it represents datasets for regression tasks. In these dataset samples are not given any label. Instead, the objective is to predict the value of a given target variable.\n\nTo instantiate a `MulticlassLabelDataset` one would call the `create_dataset` function as such:\n\n```python\nds = create_dataset(\n \"multi class\",\n unprivileged_groups=[{'prot_attr': 0}],\n privileged_groups=[{'prot_attr': 1}],\n imputation_strategy=MCMCImputationStrategy(),\n favorable_label=[0, 1., 2.],\n unfavorable_label=[3., 4.],\n df=df,\n label_names=['label'],\n protected_attribute_names=['prot_attr']\n )\n```\n\nThe only thing to note in this case, is that the `favorable_label` and `unfavorable_label` parameters are assigned lists and not single values as it happened in the previous example.\n\nFinally, to create a dataset for regression tasks, the call to `create_dataset` would be:\n\n```python\nds = create_dataset(\n dataset_type=\"regression\",\n unprivileged_groups=[{'color': 'b'}],\n privileged_groups=[{'color': 'r'}],\n imputation_strategy=MeanImputationStrategy(),\n df=df,\n dep_var_name='score',\n protected_attribute_names=['color'],\n privileged_classes=[['r']]\n )\n```\n\nThe parameter `dep_var_name` refers to the *dependent* variable, which, in regression tasks, is the variable whose value has to be predicted.\n\n**Note**: this readme, as well as the library itself is still a work in progress.\n",
"bugtrack_url": null,
"license": "Apache 2.0 License",
"summary": "Aequitas core library",
"version": "2.1.1",
"project_urls": {
"Bug Reports": "https://github.com/aequitas-aod/core-lib/issues",
"Homepage": "https://github.com/aequitas-aod/core-lib",
"Source": "https://github.com/aequitas-aod/core-lib"
},
"split_keywords": [
"aeuitas",
" horizon2020",
" xai",
" bias"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c2b5b7bb6e21ab825d62dbdf1ec08ad0c095f8db19fcbc33f52923a6231bb5a7",
"md5": "1dba2bfc3ee26de3a2001bf74f7ee6d1",
"sha256": "0f81cbb96ad8127aa7a97aafd791bcd9c3ca7fea50096faa3d40071ffe264f42"
},
"downloads": -1,
"filename": "aequitas_core-2.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1dba2bfc3ee26de3a2001bf74f7ee6d1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9.18",
"size": 16108,
"upload_time": "2024-09-26T15:26:04",
"upload_time_iso_8601": "2024-09-26T15:26:04.032387Z",
"url": "https://files.pythonhosted.org/packages/c2/b5/b7bb6e21ab825d62dbdf1ec08ad0c095f8db19fcbc33f52923a6231bb5a7/aequitas_core-2.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "47abed4f3bb2bbeef91053104395ad09d18df426c646cddf730282731e0cb61b",
"md5": "8cae55da3c49295c5b7813a79e23ad6e",
"sha256": "9427185f88325a7b5757fbea984062d71730b78a11d96e5685b85de5e17ad16f"
},
"downloads": -1,
"filename": "aequitas_core-2.1.1.tar.gz",
"has_sig": false,
"md5_digest": "8cae55da3c49295c5b7813a79e23ad6e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9.18",
"size": 20458,
"upload_time": "2024-09-26T15:26:05",
"upload_time_iso_8601": "2024-09-26T15:26:05.173835Z",
"url": "https://files.pythonhosted.org/packages/47/ab/ed4f3bb2bbeef91053104395ad09d18df426c646cddf730282731e0cb61b/aequitas_core-2.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-26 15:26:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aequitas-aod",
"github_project": "core-lib",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "aequitas-core"
}