canonical-sets


Namecanonical-sets JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://data.research.vub.be/
SummaryExposing Algorithmic Bias with Canonical Sets.
upload_time2023-02-03 13:20:29
maintainerAndres Algaba
docs_urlNone
authorIntegrated Intelligence Lab
requires_python>=3.8,<3.11
licenseMIT
keywords fairness bias inverse design generative models interpretability python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. |nbsp| unicode:: U+00A0 .. NO-BREAK SPACE

.. |pic1| image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue
.. |pic2| image:: https://img.shields.io/github/license/mashape/apistatus.svg
.. |pic3| image:: https://img.shields.io/badge/code%20style-black-000000.svg
.. |pic4| image:: https://img.shields.io/badge/%20type_checker-mypy-%231674b1?style=flat
.. |pic5| image:: https://img.shields.io/badge/platform-windows%20%7C%20linux%20%7C%20macos-lightgrey
.. |pic6| image:: https://github.com/Integrated-Intelligence-Lab/canonical_sets/actions/workflows/testing.yml/badge.svg
.. |pic7| image:: https://img.shields.io/readthedocs/canonical_sets
.. |pic8| image:: https://img.shields.io/pypi/v/canonical_sets

.. _canonical_sets: https://github.com/Integrated-Intelligence-Lab/canonical_sets/tree/main/canonical_sets
.. _examples: https://github.com/Integrated-Intelligence-Lab/canonical_sets/tree/main/examples
.. _contribute: https://github.com/Integrated-Intelligence-Lab/canonical_sets/blob/main/CONTRIBUTING.rst
.. _documentation: https://canonical-sets.readthedocs.io/en/latest/
.. _LUCID: https://arxiv.org/abs/2208.12786
.. _LUCID-GAN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4289597

.. _Twitter: https://twitter.com/DataLabBE
.. _website: https://data.research.vub.be/
.. _papers: https://researchportal.vub.be/en/organisations/data-analytics-laboratory/publications/

.. _ctgan: https://github.com/sdv-dev/CTGAN
.. _PR: https://github.com/sdv-dev/CTGAN/pulls/AndresAlgaba


Canonical sets 
==============

|pic2| |nbsp| |pic5| |nbsp| |pic1| |nbsp| |pic8|

|pic6| |nbsp| |pic7| |nbsp| |pic3| |nbsp| |pic4|

AI systems can create, propagate, support, and automate bias in decision-making processes. To mitigate biased decisions,
we both need to understand the origin of the bias and define what it means for an algorithm to make fair decisions.
By Locating Unfairness through Canonical Inverse Design (LUCID), we generate a canonical set that shows the desired inputs
for a model given a preferred output. The canonical set reveals the model's internal logic and exposes potential unethical
biases by repeatedly interrogating the decision-making process.

LUCID-GAN extends on LUCID by generating canonical inputs via a conditional generative model instead of
gradient-based inverse design. LUCID-GAN generates canonical inputs conditional on the predictions of the model under
fairness evaluation. LUCID-GAN has several benefits, including that it applies to non-differentiable models, ensures
that a canonical set consists of realistic inputs, and allows us to assess indirect discrimination and explicitly
check for intersectional unfairness.

Read our paper on `LUCID`_ and `LUCID-GAN`_ for more details, or check out the `documentation`_.

We encourage everyone to `contribute`_ to this project by submitting an issue or a pull request!


Installation
------------

Install ``canonical_sets`` from PyPi.

.. code-block:: bash

    pip install canonical_sets

For development install, see `contribute`_. You can also check the `documentation`_.


Usage
-----


LUCID
~~~~~

``LUCID`` can be used for the gradient-based inverse design to generate canonical sets, and is available for both
``PyTorch`` and ``Tensorflow`` models. It only requires a model, a preferred output, and an example input
(which is often a part of the training data). The results are stored in a ``pd.DataFrame``, and can be accessed by
calling ``results``. It's fully customizable, but can also be used out-of-the-box for a wide range of
applications by using its default settings:

.. code-block:: python

    import pandas as pd

    from canonical_sets.data import Adult
    from canonical_sets.models import ClassifierTF
    from canonical_sets import LUCID

    adult = Adult()

    model = ClassifierTF(2)
    outputs = pd.DataFrame([[0, 1]], columns=["<=50K", ">50K"])
    example_data = adult.train_data

    lucid = LUCID(model, outputs, example_data)
    lucid.results.head()


LUCID-GAN
~~~~~~~~~

``LUCIDGAN`` generates canonical sets by using conditional generative models (GANs). This approach has several benefits,
including that it applies to non--differentiable models, ensures that a canonical set consists of realistic inputs,
and allows us to assess indirect discrimination and explicitly check for intersectional unfairness. LUCID-GAN only
requires the input and predictions of a black-box model. It's fully customizable, but can also be used out-of-the-box
for a wide range of applications by using its default settings:

.. code-block:: python

    import pandas as pd

    from canonical_sets.data import Adult
    from canonical_sets.models import ClassifierTF
    from canonical_sets import LUCIDGAN

    model = ClassifierTF(2)
    adult = Adult()

    # we need original data as LUCID-GAN does some preprocessing
    test_data = adult.inverse_preprocess(adult.test_data) 

    # we only require the predictions for the positive class
    preds = model.predict(adult.test_data.to_numpy())[:, 1]

    data = pd.concat([test_data, pd.DataFrame(preds, columns=["preds"])], axis=1)

    lucidgan = LUCIDGAN(epochs=5)
    lucidgan.fit(data, conditional=["preds"])

    samples = lucidgan.sample(100, conditional=pd.DataFrame({"preds": [1]}))
    samples.head()

For detailed examples see `examples`_ and for the source code see `canonical_sets`_. For ``LUCID``, we advice to start with either the
``tensorflow`` or ``pytorch`` example, and then the advanced example. For ``LUCIDGAN``, you can replicate the experiments from the paper
with the ``GAN_adult`` and ``GAN_compas`` examples. You can also check the `documentation`_ for more details.
If you have any remaining questions, feel free to submit an issue or PR!


Output-based group metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~

Most group fairness notions focus on the equality of outcome by computing statistical parity metrics on a model's output.
The two most prominent examples of these statistical output-based metrics are Demographic Parity (DP) and Equality Of Opportunity (EOP).
In DP, we compare the Positivity Rate (PR) of the subpopulations under fairness evaluation, and in EOP, we compare the True Positive Rate (TPR).
The choice between DP and EOP depends on the underlying assumptions and worldview of the evaluator.
The ``Metrics`` class allows you to compute these metrics for binary classification tasks given the predictions and ground truth:

.. code-block:: python

    from canonical_sets.data import Adult
    from canonical_sets.models import ClassifierTF
    from canonical_sets.group import Metrics

    model = ClassifierTF(2)
    adult = Adult()

    preds = model.predict(adult.test_data.to_numpy()).argmax(axis=1)
    targets = adult.test_labels[">50K"]

    metrics = Metrics(preds, targets)
    metrics.metrics


Data
----

``canonical_sets`` contains some functionality to easily access commonly used data sets in the fairness literature:

.. code-block:: python

    from canonical_sets import Adult, Compas

    adult = Adult()
    adult.train_data.head()

    compas = Compas()
    compas.train_data.head()

The default settings can be customized to change the pre-processing, splitting, etc. See `examples`_  for details.
You can also check the `documentation`_.


Community
---------

If you are interested in cross-disciplinary research related to machine learning, feel free to:

* Follow DataLab on `Twitter`_.
* Check the `website`_.
* Read our `papers`_.


Disclaimer
----------

The package and the code is provided "as-is" and there is NO WARRANTY of any kind. 
Use it only if the content and output files make sense to you.


Acknowledgements
----------------

This project benefited from financial support from Innoviris.

``LUCIDGAN`` is based on the ``CTGAN`` class from the `ctgan`_ package. It has been extended to fix
several bugs (see my `PR`_ on the `CTGAN`_ GitHub page) and to allow for the extension of the conditional
vector. A part of the code and comments is identical to the original ``CTGAN`` class.


Citation
--------

.. code-block:: none

    @inproceedings{mazijn_lucid_2023,
      title={{LUCID: Exposing Algorithmic Bias through Inverse Design}},
      author={Mazijn, Carmen and Prunkl, Carina and Algaba, Andres and Danckaert, Jan and Ginis, Vincent},
      booktitle={Thirty-Seventh AAAI Conference on Artificial Intelligence (accepted)},
      year={2023},
    }

    @article{algaba_lucidgan_2022,
      title={{LUCID-GAN: Conditional Generative Models to Locate Unfairness}},
      author={Algaba, Andres and Mazijn, Carmen and Prunkl, Carina and Danckaert, Jan and Ginis, Vincent},
      year={2022},
      journal={Working paper}
    }


            

Raw data

            {
    "_id": null,
    "home_page": "https://data.research.vub.be/",
    "name": "canonical-sets",
    "maintainer": "Andres Algaba",
    "docs_url": null,
    "requires_python": ">=3.8,<3.11",
    "maintainer_email": "andres.algaba@vub.be",
    "keywords": "fairness,bias,inverse design,generative models,interpretability,python",
    "author": "Integrated Intelligence Lab",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/f1/5a/91ebea3c83997e237791d3903539a66c6392bce747dfc748c03b25439959/canonical_sets-0.0.3.tar.gz",
    "platform": null,
    "description": ".. |nbsp| unicode:: U+00A0 .. NO-BREAK SPACE\n\n.. |pic1| image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue\n.. |pic2| image:: https://img.shields.io/github/license/mashape/apistatus.svg\n.. |pic3| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n.. |pic4| image:: https://img.shields.io/badge/%20type_checker-mypy-%231674b1?style=flat\n.. |pic5| image:: https://img.shields.io/badge/platform-windows%20%7C%20linux%20%7C%20macos-lightgrey\n.. |pic6| image:: https://github.com/Integrated-Intelligence-Lab/canonical_sets/actions/workflows/testing.yml/badge.svg\n.. |pic7| image:: https://img.shields.io/readthedocs/canonical_sets\n.. |pic8| image:: https://img.shields.io/pypi/v/canonical_sets\n\n.. _canonical_sets: https://github.com/Integrated-Intelligence-Lab/canonical_sets/tree/main/canonical_sets\n.. _examples: https://github.com/Integrated-Intelligence-Lab/canonical_sets/tree/main/examples\n.. _contribute: https://github.com/Integrated-Intelligence-Lab/canonical_sets/blob/main/CONTRIBUTING.rst\n.. _documentation: https://canonical-sets.readthedocs.io/en/latest/\n.. _LUCID: https://arxiv.org/abs/2208.12786\n.. _LUCID-GAN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4289597\n\n.. _Twitter: https://twitter.com/DataLabBE\n.. _website: https://data.research.vub.be/\n.. _papers: https://researchportal.vub.be/en/organisations/data-analytics-laboratory/publications/\n\n.. _ctgan: https://github.com/sdv-dev/CTGAN\n.. _PR: https://github.com/sdv-dev/CTGAN/pulls/AndresAlgaba\n\n\nCanonical sets \n==============\n\n|pic2| |nbsp| |pic5| |nbsp| |pic1| |nbsp| |pic8|\n\n|pic6| |nbsp| |pic7| |nbsp| |pic3| |nbsp| |pic4|\n\nAI systems can create, propagate, support, and automate bias in decision-making processes. To mitigate biased decisions,\nwe both need to understand the origin of the bias and define what it means for an algorithm to make fair decisions.\nBy Locating Unfairness through Canonical Inverse Design (LUCID), we generate a canonical set that shows the desired inputs\nfor a model given a preferred output. The canonical set reveals the model's internal logic and exposes potential unethical\nbiases by repeatedly interrogating the decision-making process.\n\nLUCID-GAN extends on LUCID by generating canonical inputs via a conditional generative model instead of\ngradient-based inverse design. LUCID-GAN generates canonical inputs conditional on the predictions of the model under\nfairness evaluation. LUCID-GAN has several benefits, including that it applies to non-differentiable models, ensures\nthat a canonical set consists of realistic inputs, and allows us to assess indirect discrimination and explicitly\ncheck for intersectional unfairness.\n\nRead our paper on `LUCID`_ and `LUCID-GAN`_ for more details, or check out the `documentation`_.\n\nWe encourage everyone to `contribute`_ to this project by submitting an issue or a pull request!\n\n\nInstallation\n------------\n\nInstall ``canonical_sets`` from PyPi.\n\n.. code-block:: bash\n\n    pip install canonical_sets\n\nFor development install, see `contribute`_. You can also check the `documentation`_.\n\n\nUsage\n-----\n\n\nLUCID\n~~~~~\n\n``LUCID`` can be used for the gradient-based inverse design to generate canonical sets, and is available for both\n``PyTorch`` and ``Tensorflow`` models. It only requires a model, a preferred output, and an example input\n(which is often a part of the training data). The results are stored in a ``pd.DataFrame``, and can be accessed by\ncalling ``results``. It's fully customizable, but can also be used out-of-the-box for a wide range of\napplications by using its default settings:\n\n.. code-block:: python\n\n    import pandas as pd\n\n    from canonical_sets.data import Adult\n    from canonical_sets.models import ClassifierTF\n    from canonical_sets import LUCID\n\n    adult = Adult()\n\n    model = ClassifierTF(2)\n    outputs = pd.DataFrame([[0, 1]], columns=[\"<=50K\", \">50K\"])\n    example_data = adult.train_data\n\n    lucid = LUCID(model, outputs, example_data)\n    lucid.results.head()\n\n\nLUCID-GAN\n~~~~~~~~~\n\n``LUCIDGAN`` generates canonical sets by using conditional generative models (GANs). This approach has several benefits,\nincluding that it applies to non--differentiable models, ensures that a canonical set consists of realistic inputs,\nand allows us to assess indirect discrimination and explicitly check for intersectional unfairness. LUCID-GAN only\nrequires the input and predictions of a black-box model. It's fully customizable, but can also be used out-of-the-box\nfor a wide range of applications by using its default settings:\n\n.. code-block:: python\n\n    import pandas as pd\n\n    from canonical_sets.data import Adult\n    from canonical_sets.models import ClassifierTF\n    from canonical_sets import LUCIDGAN\n\n    model = ClassifierTF(2)\n    adult = Adult()\n\n    # we need original data as LUCID-GAN does some preprocessing\n    test_data = adult.inverse_preprocess(adult.test_data) \n\n    # we only require the predictions for the positive class\n    preds = model.predict(adult.test_data.to_numpy())[:, 1]\n\n    data = pd.concat([test_data, pd.DataFrame(preds, columns=[\"preds\"])], axis=1)\n\n    lucidgan = LUCIDGAN(epochs=5)\n    lucidgan.fit(data, conditional=[\"preds\"])\n\n    samples = lucidgan.sample(100, conditional=pd.DataFrame({\"preds\": [1]}))\n    samples.head()\n\nFor detailed examples see `examples`_ and for the source code see `canonical_sets`_. For ``LUCID``, we advice to start with either the\n``tensorflow`` or ``pytorch`` example, and then the advanced example. For ``LUCIDGAN``, you can replicate the experiments from the paper\nwith the ``GAN_adult`` and ``GAN_compas`` examples. You can also check the `documentation`_ for more details.\nIf you have any remaining questions, feel free to submit an issue or PR!\n\n\nOutput-based group metrics\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nMost group fairness notions focus on the equality of outcome by computing statistical parity metrics on a model's output.\nThe two most prominent examples of these statistical output-based metrics are Demographic Parity (DP) and Equality Of Opportunity (EOP).\nIn DP, we compare the Positivity Rate (PR) of the subpopulations under fairness evaluation, and in EOP, we compare the True Positive Rate (TPR).\nThe choice between DP and EOP depends on the underlying assumptions and worldview of the evaluator.\nThe ``Metrics`` class allows you to compute these metrics for binary classification tasks given the predictions and ground truth:\n\n.. code-block:: python\n\n    from canonical_sets.data import Adult\n    from canonical_sets.models import ClassifierTF\n    from canonical_sets.group import Metrics\n\n    model = ClassifierTF(2)\n    adult = Adult()\n\n    preds = model.predict(adult.test_data.to_numpy()).argmax(axis=1)\n    targets = adult.test_labels[\">50K\"]\n\n    metrics = Metrics(preds, targets)\n    metrics.metrics\n\n\nData\n----\n\n``canonical_sets`` contains some functionality to easily access commonly used data sets in the fairness literature:\n\n.. code-block:: python\n\n    from canonical_sets import Adult, Compas\n\n    adult = Adult()\n    adult.train_data.head()\n\n    compas = Compas()\n    compas.train_data.head()\n\nThe default settings can be customized to change the pre-processing, splitting, etc. See `examples`_  for details.\nYou can also check the `documentation`_.\n\n\nCommunity\n---------\n\nIf you are interested in cross-disciplinary research related to machine learning, feel free to:\n\n* Follow DataLab on `Twitter`_.\n* Check the `website`_.\n* Read our `papers`_.\n\n\nDisclaimer\n----------\n\nThe package and the code is provided \"as-is\" and there is NO WARRANTY of any kind. \nUse it only if the content and output files make sense to you.\n\n\nAcknowledgements\n----------------\n\nThis project benefited from financial support from Innoviris.\n\n``LUCIDGAN`` is based on the ``CTGAN`` class from the `ctgan`_ package. It has been extended to fix\nseveral bugs (see my `PR`_ on the `CTGAN`_ GitHub page) and to allow for the extension of the conditional\nvector. A part of the code and comments is identical to the original ``CTGAN`` class.\n\n\nCitation\n--------\n\n.. code-block:: none\n\n    @inproceedings{mazijn_lucid_2023,\n      title={{LUCID: Exposing Algorithmic Bias through Inverse Design}},\n      author={Mazijn, Carmen and Prunkl, Carina and Algaba, Andres and Danckaert, Jan and Ginis, Vincent},\n      booktitle={Thirty-Seventh AAAI Conference on Artificial Intelligence (accepted)},\n      year={2023},\n    }\n\n    @article{algaba_lucidgan_2022,\n      title={{LUCID-GAN: Conditional Generative Models to Locate Unfairness}},\n      author={Algaba, Andres and Mazijn, Carmen and Prunkl, Carina and Danckaert, Jan and Ginis, Vincent},\n      year={2022},\n      journal={Working paper}\n    }\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Exposing Algorithmic Bias with Canonical Sets.",
    "version": "0.0.3",
    "split_keywords": [
        "fairness",
        "bias",
        "inverse design",
        "generative models",
        "interpretability",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "74244da3a2056a1ff75b76ad730fb35f0f2ee62a810bd8a4fa76409bcc95f080",
                "md5": "438e0f1fa6d8f30f0899de03db27aa78",
                "sha256": "833d682fab4ee37d6ebd20710b615235cc86f39b5943b7d1e63ce3a8960b191c"
            },
            "downloads": -1,
            "filename": "canonical_sets-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "438e0f1fa6d8f30f0899de03db27aa78",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.11",
            "size": 27550,
            "upload_time": "2023-02-03T13:20:27",
            "upload_time_iso_8601": "2023-02-03T13:20:27.292601Z",
            "url": "https://files.pythonhosted.org/packages/74/24/4da3a2056a1ff75b76ad730fb35f0f2ee62a810bd8a4fa76409bcc95f080/canonical_sets-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f15a91ebea3c83997e237791d3903539a66c6392bce747dfc748c03b25439959",
                "md5": "690b23fbd5a7a3cfabb20cdfa5c2b00b",
                "sha256": "a913fcf3d58c21a0d82d2ade83b3bb71558f379a93e4c376da62da8d9988f2b7"
            },
            "downloads": -1,
            "filename": "canonical_sets-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "690b23fbd5a7a3cfabb20cdfa5c2b00b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.11",
            "size": 23759,
            "upload_time": "2023-02-03T13:20:29",
            "upload_time_iso_8601": "2023-02-03T13:20:29.270696Z",
            "url": "https://files.pythonhosted.org/packages/f1/5a/91ebea3c83997e237791d3903539a66c6392bce747dfc748c03b25439959/canonical_sets-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-03 13:20:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "canonical-sets"
}
        
Elapsed time: 0.05168s