DAindex

Name	DAindex JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/knowlab/DAindex-Framework
Summary	Deterioration Allocation Index Framework
upload_time	2024-09-12 17:06:46
maintainer	None
docs_url	None
author	Honghan
requires_python	<4.0,>=3.10
license	LICENSE
keywords	evaluation fairness machine learning data science bias
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # The Deterioration-Allocation Index (DAIndex): A framework for health inequality evaluation

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/DAIndex)](https://www.python.org/downloads/)
[![PyPI - Package Status](https://img.shields.io/pypi/status/DAIndex)](https://pypi.org/project/DAIndex/)
[![PyPI - Latest Release](https://img.shields.io/pypi/v/DAIndex)](https://pypi.org/project/DAIndex/)
[![PyPI - Wheel](https://img.shields.io/pypi/wheel/DAIndex)](https://pypi.org/project/DAIndex/)
[![PyPI - License](https://img.shields.io/pypi/l/DAIndex)](https://github.com/nhsengland/DAIndex/blob/main/LICENSE)
[![Snyk Package Health](https://snyk.io/advisor/python/DAIndex/badge.svg)](https://snyk.io/advisor/python/DAIndex)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

This repository implements a **DA-AUC** (deterioration-allocation area under curve) metric for quantifying **inequality** between patient groups (a) embedded in datasets; or (b) induced by statistical / ML / AI models. This is analogous to ROC-AUC for assessing performance of prediction models.

## Methodology

We define and quantify health inequalities in a generic resource allocation scenario using a novel deterioration-allocation framework. The basic idea is to define two indices: a **deterioration** index and an **allocation** index. The allocation index is to be derived from the model of interest.

Conceptually, models used in real-world contexts can be abstracted and thought of as **resource allocators**, predicting for example the probability of Intensive Care Unit (ICU) admission. Note that the models do not need to be particularly designed to allocate resources, for example, risk prediction of cardiovascular disease (CVD) among people with diabetes is also a valid index for downstream resource allocation. Essentially, a resource allocator is a computational model that takes patient data as input and outputs a (normalised) score between 0 and 1. We call this score the allocation index.

The deterioration index is a score between 0 and 1 to measure the deterioration status of patients. It can be derived from an objective measurement for disease prognosis (i.e., *a marker of prognosis* in epidemiology terminology), such as extensively used comorbidity scores or biomarker measurements like those for CVDs.

![Figure 1](imgs/fig1.png)

Once we have defined the two indices, each patient can then be represented as a point in a two-dimensional space of <*allocation index*, *deterioration index*>. A sample of the group of patients is then translated into a set of points in the space, for which a regression model can be fitted to approximate a curve in the space.

The **area** between the two curves is then the deterioration difference between their corresponding patient groups, quantifying the inequalities induced by the `allocator`, i.e., the model that produces the allocation index. The curve with the larger area under it represents the patient group which would be unfairly treated if the allocation index was to be used in allocating resources or services: *a patient from this group would be deemed healthier than a patient from another group who is equally ill*.

See the paper for more details: [Quantifying Health Inequalities Induced by Data and AI Models](https://doi.org/10.24963/ijcai.2022/721).


## Installation of the `DAindex` python package

```bash
pip install DAindex
```

### Advanced install (for developers)

After cloning the repository, you can install the package in a development `venv` using `poetry`:

```bash
poetry install --with dev
pre-commit install
```

## Tutorials

- This tutorial provides a basic use case for the DAindex: [DAindex-tutorial.ipynb](./DAindex-tutorial.ipynb).
- More tutorials will be added, including those for replicating studies on HiRID and MIMIC datasets.

## Usage

0. Create sample data for testing
   ```python
   import pandas as pd
   import numpy as np
   n_size = 100

   # generate female data
   female_mm = [int(m) for m in np.random.normal(3.2, .5, size=n_size)]
   df_female = pd.DataFrame(dict(mm=female_mm, gender=['f'] * n_size))
   df_female.head()

   # generate male data
   male_mm = [int(m) for m in np.random.normal(3, .5, size=n_size)]
   df_male = pd.DataFrame(dict(mm=male_mm, gender=['m'] * n_size))
   df_male.head()

   # merge dataframes
   df = pd.concat([df_female, df_male], ignore_index=True)
   ```
1. Import the `compare_two_groups` function:
    ```python
    from DAindex.util import compare_two_groups
    ```

2. Run inequality analysis between the female and male groups:
   ```python
   compare_two_groups(
      df[df.gender=='f'], df[df.gender=='m'], 'mm',
      'female', 'male', '#Multimorbidity', 3, is_discrete=True
   )
   ```

   You will see something similar to.
   ```python
   ({'overall-prob': 0.9999, 'one-step': 0.7199, 'k-step': 0.054609, '|X|': 100},
   {'overall-prob': 0.9999, 'one-step': 0.42, 'k-step': 0.03195, '|X|': 100},
   0.7092018779342724)
   ```
   The result means the inequality of female vs male is `0.709`.

## Contact

[honghan.wu@ucl.ac.uk](mailto:honghan.wu@ucl.ac.uk) or [h.wilde@ucl.ac.uk](mailto:h.wilde@ucl.ac.uk)

## Reference

If using this package in your own work, please cite:

> Honghan Wu, Aneeta Sylolypavan, Minhong Wang, and Sarah Wild. 2022. ‘Quantifying Health Inequalities Induced by Data and AI Models’. In IJCAI-ECAI, 6:5192–98. https://doi.org/10.24963/ijcai.2022/721.

Useful links: [slides](https://www.ucl.ac.uk/research-it-services/sites/research_it_services/files/quantifying_health_inequalities_induced_by_data_and_ai_models_0.pdf), [recording](https://web.microsoftstream.com/video/568b2e88-5c21-466e-9bbf-63274048161d), [arxiv](https://arxiv.org/abs/2205.01066), [proceedings](https://www.ijcai.org/proceedings/2022/721).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/knowlab/DAindex-Framework",
    "name": "DAindex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "evaluation, fairness, machine learning, data science, bias",
    "author": "Honghan",
    "author_email": "honghan.wu@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ab/0d/058511072e2b55d35d7687869bac369fac06cce65926ffce617d624fab5f/daindex-0.1.0.tar.gz",
    "platform": null,
    "description": "# The Deterioration-Allocation Index (DAIndex): A framework for health inequality evaluation\n\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/DAIndex)](https://www.python.org/downloads/)\n[![PyPI - Package Status](https://img.shields.io/pypi/status/DAIndex)](https://pypi.org/project/DAIndex/)\n[![PyPI - Latest Release](https://img.shields.io/pypi/v/DAIndex)](https://pypi.org/project/DAIndex/)\n[![PyPI - Wheel](https://img.shields.io/pypi/wheel/DAIndex)](https://pypi.org/project/DAIndex/)\n[![PyPI - License](https://img.shields.io/pypi/l/DAIndex)](https://github.com/nhsengland/DAIndex/blob/main/LICENSE)\n[![Snyk Package Health](https://snyk.io/advisor/python/DAIndex/badge.svg)](https://snyk.io/advisor/python/DAIndex)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\nThis repository implements a **DA-AUC** (deterioration-allocation area under curve) metric for quantifying **inequality** between patient groups (a) embedded in datasets; or (b) induced by statistical / ML / AI models. This is analogous to ROC-AUC for assessing performance of prediction models.\n\n## Methodology\n\nWe define and quantify health inequalities in a generic resource allocation scenario using a novel deterioration-allocation framework. The basic idea is to define two indices: a **deterioration** index and an **allocation** index. The allocation index is to be derived from the model of interest.\n\nConceptually, models used in real-world contexts can be abstracted and thought of as **resource allocators**, predicting for example the probability of Intensive Care Unit (ICU) admission. Note that the models do not need to be particularly designed to allocate resources, for example, risk prediction of cardiovascular disease (CVD) among people with diabetes is also a valid index for downstream resource allocation. Essentially, a resource allocator is a computational model that takes patient data as input and outputs a (normalised) score between 0 and 1. We call this score the allocation index.\n\nThe deterioration index is a score between 0 and 1 to measure the deterioration status of patients. It can be derived from an objective measurement for disease prognosis (i.e., *a marker of prognosis* in epidemiology terminology), such as extensively used comorbidity scores or biomarker measurements like those for CVDs.\n\n![Figure 1](imgs/fig1.png)\n\nOnce we have defined the two indices, each patient can then be represented as a point in a two-dimensional space of <*allocation index*, *deterioration index*>. A sample of the group of patients is then translated into a set of points in the space, for which a regression model can be fitted to approximate a curve in the space.\n\nThe **area** between the two curves is then the deterioration difference between their corresponding patient groups, quantifying the inequalities induced by the `allocator`, i.e., the model that produces the allocation index. The curve with the larger area under it represents the patient group which would be unfairly treated if the allocation index was to be used in allocating resources or services: *a patient from this group would be deemed healthier than a patient from another group who is equally ill*.\n\nSee the paper for more details: [Quantifying Health Inequalities Induced by Data and AI Models](https://doi.org/10.24963/ijcai.2022/721).\n\n\n## Installation of the `DAindex` python package\n\n```bash\npip install DAindex\n```\n\n### Advanced install (for developers)\n\nAfter cloning the repository, you can install the package in a development `venv` using `poetry`:\n\n```bash\npoetry install --with dev\npre-commit install\n```\n\n## Tutorials\n\n- This tutorial provides a basic use case for the DAindex: [DAindex-tutorial.ipynb](./DAindex-tutorial.ipynb).\n- More tutorials will be added, including those for replicating studies on HiRID and MIMIC datasets.\n\n## Usage\n\n0. Create sample data for testing\n   ```python\n   import pandas as pd\n   import numpy as np\n   n_size = 100\n\n   # generate female data\n   female_mm = [int(m) for m in np.random.normal(3.2, .5, size=n_size)]\n   df_female = pd.DataFrame(dict(mm=female_mm, gender=['f'] * n_size))\n   df_female.head()\n\n   # generate male data\n   male_mm = [int(m) for m in np.random.normal(3, .5, size=n_size)]\n   df_male = pd.DataFrame(dict(mm=male_mm, gender=['m'] * n_size))\n   df_male.head()\n\n   # merge dataframes\n   df = pd.concat([df_female, df_male], ignore_index=True)\n   ```\n1. Import the `compare_two_groups` function:\n    ```python\n    from DAindex.util import compare_two_groups\n    ```\n\n2. Run inequality analysis between the female and male groups:\n   ```python\n   compare_two_groups(\n      df[df.gender=='f'], df[df.gender=='m'], 'mm',\n      'female', 'male', '#Multimorbidity', 3, is_discrete=True\n   )\n   ```\n\n   You will see something similar to.\n   ```python\n   ({'overall-prob': 0.9999, 'one-step': 0.7199, 'k-step': 0.054609, '|X|': 100},\n   {'overall-prob': 0.9999, 'one-step': 0.42, 'k-step': 0.03195, '|X|': 100},\n   0.7092018779342724)\n   ```\n   The result means the inequality of female vs male is `0.709`.\n\n## Contact\n\n[honghan.wu@ucl.ac.uk](mailto:honghan.wu@ucl.ac.uk) or [h.wilde@ucl.ac.uk](mailto:h.wilde@ucl.ac.uk)\n\n## Reference\n\nIf using this package in your own work, please cite:\n\n> Honghan Wu, Aneeta Sylolypavan, Minhong Wang, and Sarah Wild. 2022. \u2018Quantifying Health Inequalities Induced by Data and AI Models\u2019. In IJCAI-ECAI, 6:5192\u201398. https://doi.org/10.24963/ijcai.2022/721.\n\nUseful links: [slides](https://www.ucl.ac.uk/research-it-services/sites/research_it_services/files/quantifying_health_inequalities_induced_by_data_and_ai_models_0.pdf), [recording](https://web.microsoftstream.com/video/568b2e88-5c21-466e-9bbf-63274048161d), [arxiv](https://arxiv.org/abs/2205.01066), [proceedings](https://www.ijcai.org/proceedings/2022/721).\n",
    "bugtrack_url": null,
    "license": "LICENSE",
    "summary": "Deterioration Allocation Index Framework",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/knowlab/DAindex-Framework",
        "Repository": "https://github.com/knowlab/DAindex-Framework"
    },
    "split_keywords": [
        "evaluation",
        " fairness",
        " machine learning",
        " data science",
        " bias"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c80ebd39bb145d3921f1868bd72d1a3a2b4819af2399aef3f1ad6f5231a1e39c",
                "md5": "bb12901a2ab10d78d90898d30d768bc9",
                "sha256": "8d7eb3af4f3bf06ae60abe12c221c09caa506c6a7a6dd72c4cdf49c55442cc8d"
            },
            "downloads": -1,
            "filename": "daindex-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bb12901a2ab10d78d90898d30d768bc9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 9953,
            "upload_time": "2024-09-12T17:06:45",
            "upload_time_iso_8601": "2024-09-12T17:06:45.277324Z",
            "url": "https://files.pythonhosted.org/packages/c8/0e/bd39bb145d3921f1868bd72d1a3a2b4819af2399aef3f1ad6f5231a1e39c/daindex-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab0d058511072e2b55d35d7687869bac369fac06cce65926ffce617d624fab5f",
                "md5": "b81f2b03c8b6f4723c26051a31bcfd47",
                "sha256": "369de229de9ca7e362e55173320db7253984fa8dc64b0a7a50d74f868b93db7f"
            },
            "downloads": -1,
            "filename": "daindex-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b81f2b03c8b6f4723c26051a31bcfd47",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 9310,
            "upload_time": "2024-09-12T17:06:46",
            "upload_time_iso_8601": "2024-09-12T17:06:46.873667Z",
            "url": "https://files.pythonhosted.org/packages/ab/0d/058511072e2b55d35d7687869bac369fac06cce65926ffce617d624fab5f/daindex-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-12 17:06:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "knowlab",
    "github_project": "DAindex-Framework",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "daindex"
}

Honghan