vgd_counterfactuals

Name	vgd_counterfactuals JSON
Version	0.3.5 JSON
	download
home_page
Summary	Counterfactual explanations for GNNs based on the visual graph dataset format
upload_time	2023-11-14 14:15:47
maintainer	Jonas Teufel
docs_url	None
author	Jonas Teufel
requires_python	>=3.9,<=3.12
license	MIT
keywords	graph neural networks counterfactuals explainable ai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            |made-with-python| |python-version| |version|

.. |made-with-python| image:: https://img.shields.io/badge/Made%20with-Python-1f425f.svg
   :target: https://www.python.org/
   :alt: made with python

.. |python-version| image:: https://img.shields.io/badge/Python-3.8.0-green.svg
   :target: https://www.python.org/
   :alt: python 3.8

.. |version| image:: https://img.shields.io/badge/version-0.3.5-orange.svg
   :target: https://www.python.org/
   :alt: version

.. image:: banner.png
   :alt: banner image

===================
VGD Counterfactuals
===================

Library for the generation and more importantly the easy visualization of **Counterfactuals** for
**Graph Neural Networks (GNNs)** based on the
`VisualGraphDatasets <https://github.com/awa59kst120df/visual_graph_datasets>`_
dataset format.

❓ What are Counterfactuals?
============================

Counterfactuals are a method of explaining the predictions of complex machine learning models. For a certain
prediction of a model, a counterfactual is an input element that is as similar as possible to the original
input, but causes the largest possible deviation w.r.t. to the original model output prediction.
They are sort of "counter examples" for the behavior of a model and can help to understand the decision
boundary of the model.

The subject of this package are graph counterfactuals. They are generated by maximizing a customizable
distance function in regards to the prediction output over all immediate neighbors of the original graph
w.r.t. to the allowed, domain-specific graph edit operations.

📦 Installation
===============

.. code-block:: console

    git clone https://github.com/the16thpythonist/vgd_counterfactuals

Then in the main folder run a ``pip install``:

.. code-block:: console

    cd vgd_counterfactuals
    python3 -m pip install .

Afterwards, you can check the install by invoking the CLI:

.. code-block:: console

    python3 -m vgd_counterfactuals.cli --version
    python3 -m vgd_counterfactuals.cli --help


🚀 Quickstart
=============

The generation of counterfactual graphs is implemented via the ``CounterfactualGenerator`` class.
The instantiation of one such object requires the following 4 main components:

- ``processing``: A visual_graph_dataset "Processing" object. These implement the necessary functionality
  to convert a domain-specific graph representation into the full graph structure for the machine learning
  models. These are shipped with each specific visual graph dataset.
- ``model``: The model to be explained. This model has to implement the visual_graph_dataset "PredictGraph"
  interface to ensure that the model can be directly queried with the vgd GraphDict representation of
  graph elements.
- ``neighborhood_func``: A function which receives the domain-specific representation of a graph as an
  input and is supposed to return a list of all the domain-specific representations of the
  *immediate neighbors* of that graph. The implementation for this is highly specific to each application
  domain.
- ``distance_func``: A function which receives to arguments: The prediction of the original element and the
  prediction of a neighbor and should return a single numeric value for the distance between the two
  predictions. The generator will maximize this distance measure.

After the generator object was instantiated, it can be used to create counterfactuals for any number of
input elements using the ``generate`` method.

The following example shows a quickstart mock example of how all of this can be used. For more information
have a look at the example modules provided in the ``examples`` folder of the repository.

.. code-block:: python

    import tempfile

    from visual_graph_datasets.processing.molecules import MoleculeProcessing

    from vgd_counterfactuals.base import CounterfactualGenerator
    from vgd_counterfactuals.testing import MockModel
    from vgd_counterfactuals.generate.molecules import get_neighborhood

    processing = MoleculeProcessing()
    model = MockModel()

    generator = CounterfactualGenerator(
        processing=processing,
        model=model,
        neighborhood_func=get_neighborhood,
        distance_func=lambda orig, mod: abs(orig - mod),
    )

    with tempfile.TemporaryDirectory() as path:
        # The "generate" function will create all the possible neighbors of the
        # given "original" element, then query the model for to predict the
        # output for each of them, and sort them by their distance to the original.
        # The top k elements will be turned into a temporary visual graph dataset
        # within the given folder "path". That means in that folder two files will
        # be created per element: A metadata JSON file and a visualization PNG file.
        # Returns the dictionary for the loaded visual graph dataset.
        index_data_map = generator.generate(
            original='CCCCCC',
            # Path to the folder into which to save the vgd element files
            path=path,
            # The number of counterfactuals to be returned.
            # Elements will be sorted by their distance.
            k_results=10,
        )

        # The keys of the resulting dict are the integer indices and the values
        # are dicts themselves which describe the corresponding vgd elements.
        # These dicts contain for example the absolute path to the PNG file,
        # the full graph representation and additional metadata.
        print(f'generated {len(index_data_map)} counterfactuals:')
        for index, data in index_data_map.items():
            print(f' * {data["metadata"]["name"]} '
                  f' - distance: {data["metadata"]["distance"]:.2f}')


🤝 Credits
==========

* `PyComex <https://github.com/the16thpythonist/pycomex.git>`_
  is a micro framework which simplifies the setup, processing and management of computational
  experiments. It is also used to auto-generate the command line interface that can be used to interact
  with these experiments.
* `VisualGraphDatasets <https://github.com/awa59kst120df/visual_graph_datasets>`_
  is a library which deals with the VGD dataset format. In this format, graph datasets
  for machine learning are represented by a folder, where each graph is represented by *two* files: A
  metadata JSON file that contains the full graph representation and additional metadata and a PNG
  visualization of the graph. The library aims to provide a framework for explainable graph machine learning
  which is easier to use and produces more reproducable results.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "vgd_counterfactuals",
    "maintainer": "Jonas Teufel",
    "docs_url": null,
    "requires_python": ">=3.9,<=3.12",
    "maintainer_email": "jonseb1998@gmail.com",
    "keywords": "graph neural networks,counterfactuals,explainable AI",
    "author": "Jonas Teufel",
    "author_email": "jonseb1998@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/85/e5/8aa045317db9ee545a555bee0c0c1e14ab3901db4bf6615ef0075233af1e/vgd_counterfactuals-0.3.5.tar.gz",
    "platform": null,
    "description": "|made-with-python| |python-version| |version|\n\n.. |made-with-python| image:: https://img.shields.io/badge/Made%20with-Python-1f425f.svg\n   :target: https://www.python.org/\n   :alt: made with python\n\n.. |python-version| image:: https://img.shields.io/badge/Python-3.8.0-green.svg\n   :target: https://www.python.org/\n   :alt: python 3.8\n\n.. |version| image:: https://img.shields.io/badge/version-0.3.5-orange.svg\n   :target: https://www.python.org/\n   :alt: version\n\n.. image:: banner.png\n   :alt: banner image\n\n===================\nVGD Counterfactuals\n===================\n\nLibrary for the generation and more importantly the easy visualization of **Counterfactuals** for\n**Graph Neural Networks (GNNs)** based on the\n`VisualGraphDatasets <https://github.com/awa59kst120df/visual_graph_datasets>`_\ndataset format.\n\n\u2753 What are Counterfactuals?\n============================\n\nCounterfactuals are a method of explaining the predictions of complex machine learning models. For a certain\nprediction of a model, a counterfactual is an input element that is as similar as possible to the original\ninput, but causes the largest possible deviation w.r.t. to the original model output prediction.\nThey are sort of \"counter examples\" for the behavior of a model and can help to understand the decision\nboundary of the model.\n\nThe subject of this package are graph counterfactuals. They are generated by maximizing a customizable\ndistance function in regards to the prediction output over all immediate neighbors of the original graph\nw.r.t. to the allowed, domain-specific graph edit operations.\n\n\ud83d\udce6 Installation\n===============\n\n.. code-block:: console\n\n    git clone https://github.com/the16thpythonist/vgd_counterfactuals\n\nThen in the main folder run a ``pip install``:\n\n.. code-block:: console\n\n    cd vgd_counterfactuals\n    python3 -m pip install .\n\nAfterwards, you can check the install by invoking the CLI:\n\n.. code-block:: console\n\n    python3 -m vgd_counterfactuals.cli --version\n    python3 -m vgd_counterfactuals.cli --help\n\n\n\ud83d\ude80 Quickstart\n=============\n\nThe generation of counterfactual graphs is implemented via the ``CounterfactualGenerator`` class.\nThe instantiation of one such object requires the following 4 main components:\n\n- ``processing``: A visual_graph_dataset \"Processing\" object. These implement the necessary functionality\n  to convert a domain-specific graph representation into the full graph structure for the machine learning\n  models. These are shipped with each specific visual graph dataset.\n- ``model``: The model to be explained. This model has to implement the visual_graph_dataset \"PredictGraph\"\n  interface to ensure that the model can be directly queried with the vgd GraphDict representation of\n  graph elements.\n- ``neighborhood_func``: A function which receives the domain-specific representation of a graph as an\n  input and is supposed to return a list of all the domain-specific representations of the\n  *immediate neighbors* of that graph. The implementation for this is highly specific to each application\n  domain.\n- ``distance_func``: A function which receives to arguments: The prediction of the original element and the\n  prediction of a neighbor and should return a single numeric value for the distance between the two\n  predictions. The generator will maximize this distance measure.\n\nAfter the generator object was instantiated, it can be used to create counterfactuals for any number of\ninput elements using the ``generate`` method.\n\nThe following example shows a quickstart mock example of how all of this can be used. For more information\nhave a look at the example modules provided in the ``examples`` folder of the repository.\n\n.. code-block:: python\n\n    import tempfile\n\n    from visual_graph_datasets.processing.molecules import MoleculeProcessing\n\n    from vgd_counterfactuals.base import CounterfactualGenerator\n    from vgd_counterfactuals.testing import MockModel\n    from vgd_counterfactuals.generate.molecules import get_neighborhood\n\n    processing = MoleculeProcessing()\n    model = MockModel()\n\n    generator = CounterfactualGenerator(\n        processing=processing,\n        model=model,\n        neighborhood_func=get_neighborhood,\n        distance_func=lambda orig, mod: abs(orig - mod),\n    )\n\n    with tempfile.TemporaryDirectory() as path:\n        # The \"generate\" function will create all the possible neighbors of the\n        # given \"original\" element, then query the model for to predict the\n        # output for each of them, and sort them by their distance to the original.\n        # The top k elements will be turned into a temporary visual graph dataset\n        # within the given folder \"path\". That means in that folder two files will\n        # be created per element: A metadata JSON file and a visualization PNG file.\n        # Returns the dictionary for the loaded visual graph dataset.\n        index_data_map = generator.generate(\n            original='CCCCCC',\n            # Path to the folder into which to save the vgd element files\n            path=path,\n            # The number of counterfactuals to be returned.\n            # Elements will be sorted by their distance.\n            k_results=10,\n        )\n\n        # The keys of the resulting dict are the integer indices and the values\n        # are dicts themselves which describe the corresponding vgd elements.\n        # These dicts contain for example the absolute path to the PNG file,\n        # the full graph representation and additional metadata.\n        print(f'generated {len(index_data_map)} counterfactuals:')\n        for index, data in index_data_map.items():\n            print(f' * {data[\"metadata\"][\"name\"]} '\n                  f' - distance: {data[\"metadata\"][\"distance\"]:.2f}')\n\n\n\ud83e\udd1d Credits\n==========\n\n* `PyComex <https://github.com/the16thpythonist/pycomex.git>`_\n  is a micro framework which simplifies the setup, processing and management of computational\n  experiments. It is also used to auto-generate the command line interface that can be used to interact\n  with these experiments.\n* `VisualGraphDatasets <https://github.com/awa59kst120df/visual_graph_datasets>`_\n  is a library which deals with the VGD dataset format. In this format, graph datasets\n  for machine learning are represented by a folder, where each graph is represented by *two* files: A\n  metadata JSON file that contains the full graph representation and additional metadata and a PNG\n  visualization of the graph. The library aims to provide a framework for explainable graph machine learning\n  which is easier to use and produces more reproducable results.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Counterfactual explanations for GNNs based on the visual graph dataset format",
    "version": "0.3.5",
    "project_urls": null,
    "split_keywords": [
        "graph neural networks",
        "counterfactuals",
        "explainable ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "43bc103a6650853149e82e041f343294fe47fdcdbeb34930906883fdd45b7d1b",
                "md5": "b18fbd1d7b0362dbe6b2dfdf74d2a654",
                "sha256": "1a3298f328bc78f5bf0dc3ca22a953950438f91e3eff2f45f63cd15db3d1ffbf"
            },
            "downloads": -1,
            "filename": "vgd_counterfactuals-0.3.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b18fbd1d7b0362dbe6b2dfdf74d2a654",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<=3.12",
            "size": 106127,
            "upload_time": "2023-11-14T14:15:45",
            "upload_time_iso_8601": "2023-11-14T14:15:45.293544Z",
            "url": "https://files.pythonhosted.org/packages/43/bc/103a6650853149e82e041f343294fe47fdcdbeb34930906883fdd45b7d1b/vgd_counterfactuals-0.3.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "85e58aa045317db9ee545a555bee0c0c1e14ab3901db4bf6615ef0075233af1e",
                "md5": "aeeafb26766d52177e4e9bd041da016f",
                "sha256": "1b85d1af1789b6e8d374881bb540189437eb0245429b4de1fe3ff8c4135104f5"
            },
            "downloads": -1,
            "filename": "vgd_counterfactuals-0.3.5.tar.gz",
            "has_sig": false,
            "md5_digest": "aeeafb26766d52177e4e9bd041da016f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<=3.12",
            "size": 108246,
            "upload_time": "2023-11-14T14:15:47",
            "upload_time_iso_8601": "2023-11-14T14:15:47.156544Z",
            "url": "https://files.pythonhosted.org/packages/85/e5/8aa045317db9ee545a555bee0c0c1e14ab3901db4bf6615ef0075233af1e/vgd_counterfactuals-0.3.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-14 14:15:47",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "vgd_counterfactuals"
}

Jonas Teufel