causallib

Name	causallib JSON
Version	0.9.7 JSON
	download
home_page	https://github.com/BiomedSciAI/causallib
Summary	A Python package for flexible and modular causal inference modeling
upload_time	2024-07-31 11:25:59
maintainer	None
docs_url	None
author	Causal Machine Learning for Healthcare and Life Sciences, IBM Research Israel
requires_python	None
license	Apache License 2.0
keywords	causal inference effect estimation causality
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![CI Status](https://github.com/BiomedSciAI/causallib/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/BiomedSciAI/causallib/actions/workflows/build.yml)
[![Code Climate coverage](https://img.shields.io/codeclimate/coverage/BiomedSciAI/causallib?logo=codeclimate)](https://codeclimate.com/github/BiomedSciAI/causallib/test_coverage)
[![PyPI](https://img.shields.io/pypi/v/causallib?color=blue&logo=pypi&logoColor=yellow)](https://badge.fury.io/py/causallib)
[![Documentation Status](https://readthedocs.org/projects/causallib/badge/?version=latest)](https://causallib.readthedocs.io/en/latest/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/BiomedSciAI/causallib/HEAD)
[![Slack channel](https://img.shields.io/badge/join-slack-blue.svg?logo=slack)](https://join.slack.com/t/causallib/shared_invite/zt-mwxnwe1t-htEgAXr3j3T2UeZj61gP6g)
[![Slack channel](https://img.shields.io/badge/support-slack-blue.svg?logo=slack)](https://causallib.slack.com/)
[![Downloads](https://static.pepy.tech/badge/causallib)](https://pepy.tech/project/causallib)
# Causal Inference 360
A Python package for inferring causal effects from observational data.

## Description
Causal inference analysis enables estimating the causal effect of 
an intervention on some outcome from real-world non-experimental observational data.  

This package provides a suite of causal methods, 
under a unified scikit-learn-inspired API.
It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models. 
This modular approach supports highly-flexible causal modelling.
The fit-and-predict-like 
API makes it possible to train on one set of examples 
and estimate an effect on the other (out-of-bag),
which allows for a more "honest"<sup>1</sup> effect estimation.

The package also includes an evaluation suite. 
Since most causal-models utilize machine learning models internally, 
we can diagnose poor-performing models by re-interpreting known ML evaluations from  a causal perspective.

If you use the package, please consider citing [Shimoni et al., 2019](https://arxiv.org/abs/1906.00442):
<details>
  <summary>Reference</summary>
  
```bibtex
@article{causalevaluations,
  title={An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference},
  author={Shimoni, Yishai and Karavani, Ehud and Ravid, Sivan and Bak, Peter and Ng, Tan Hung and Alford, Sharon Hensley and Meade, Denise and Goldschmidt, Yaara},
  journal={arXiv preprint arXiv:1906.00442},
  year={2019}
}
```

-------------
</details>

<sup>1</sup> Borrowing [Wager & Athey](https://arxiv.org/abs/1510.04342) terminology of avoiding overfit.  


## Installation
```bash
pip install causallib
```

## Usage
The package is imported using the name `causallib`.
Each causal model requires an internal machine-learning model.
`causallib` supports any model that has a sklearn-like fit-predict API
(note some models might require a `predict_proba` implementation).
For example:
```Python
from sklearn.linear_model import LogisticRegression
from causallib.estimation import IPW 
from causallib.datasets import load_nhefs

data = load_nhefs()
ipw = IPW(LogisticRegression())
ipw.fit(data.X, data.a)
potential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y)
effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])
```
Comprehensive Jupyter Notebooks examples can be found in the [examples directory](examples).

### Community support
We use the Slack workspace at [causallib.slack.com](https://causallib.slack.com/) for informal communication.
We encourage you to ask questions regarding causal-inference modelling or 
usage of causallib that don't necessarily merit opening an issue on Github.  

Use this [invite link to join causallib on Slack](https://join.slack.com/t/causallib/shared_invite/zt-mwxnwe1t-htEgAXr3j3T2UeZj61gP6g). 

### Approach to causal-inference
Some key points on how we address causal-inference estimation

##### 1. Emphasis on potential outcome prediction  
Causal effect may be the desired outcome. 
However, every effect is defined by two potential (counterfactual) outcomes. 
We adopt this two-step approach by separating the effect-estimating step 
from the potential-outcome-prediction step. 
A beneficial consequence to this approach is that it better supports 
multi-treatment problems where "effect" is not well-defined.

##### 2. Stratified average treatment effect
The causal inference literature devotes special attention to the population 
on which the effect is estimated on.
For example, ATE (average treatment effect on the entire sample),
ATT (average treatment effect on the treated), etc. 
By allowing out-of-bag estimation, we leave this specification to the user.
For example, ATE is achieved by `model.estimate_population_outcome(X, a)`
and ATT is done by stratifying on the treated: `model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])`

##### 3. Families of causal inference models
We distinguish between two types of models:
* *Weight models*: weight the data to balance between the treatment and control groups, 
   and then estimates the potential outcome by using a weighted average of the observed outcome. 
   Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models. 
* *Direct outcome models*: uses the covariates (features) and treatment assignment to build a
   model that predicts the outcome directly. The model can then be used to predict the outcome
   under any assignment of treatment values, specifically the potential-outcome under assignment of
   all controls or all treated.  
   These models are usually known as *Standardization* models, and it should be noted that, currently,
   they are the only ones able to generate *individual effect estimation* (otherwise known as CATE).

##### 4. Confounders and DAGs
One of the most important steps in causal inference analysis is to have 
proper selection on both dimensions of the data to avoid introducing bias:
* On rows: thoughtfully choosing the right inclusion\exclusion criteria 
  for individuals in the data. 
* On columns: thoughtfully choosing what covariates (features) act as confounders 
  and should be included in the analysis.

This is a place where domain expert knowledge is required and cannot be fully and truly automated
by algorithms. 
This package assumes that the data provided to the model fit the criteria. 
However, filtering can be applied in real-time using a scikit-learn pipeline estimator
that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/BiomedSciAI/causallib",
    "name": "causallib",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "causal inference effect estimation causality",
    "author": "Causal Machine Learning for Healthcare and Life Sciences, IBM Research Israel",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/f8/73/8cee3dcf9a0880daf2425f026219137e411f3855b76a9420e941806b5bb0/causallib-0.9.7.tar.gz",
    "platform": null,
    "description": "[![CI Status](https://github.com/BiomedSciAI/causallib/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/BiomedSciAI/causallib/actions/workflows/build.yml)\n[![Code Climate coverage](https://img.shields.io/codeclimate/coverage/BiomedSciAI/causallib?logo=codeclimate)](https://codeclimate.com/github/BiomedSciAI/causallib/test_coverage)\n[![PyPI](https://img.shields.io/pypi/v/causallib?color=blue&logo=pypi&logoColor=yellow)](https://badge.fury.io/py/causallib)\n[![Documentation Status](https://readthedocs.org/projects/causallib/badge/?version=latest)](https://causallib.readthedocs.io/en/latest/)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/BiomedSciAI/causallib/HEAD)\n[![Slack channel](https://img.shields.io/badge/join-slack-blue.svg?logo=slack)](https://join.slack.com/t/causallib/shared_invite/zt-mwxnwe1t-htEgAXr3j3T2UeZj61gP6g)\n[![Slack channel](https://img.shields.io/badge/support-slack-blue.svg?logo=slack)](https://causallib.slack.com/)\n[![Downloads](https://static.pepy.tech/badge/causallib)](https://pepy.tech/project/causallib)\n# Causal Inference 360\nA Python package for inferring causal effects from observational data.\n\n## Description\nCausal inference analysis enables estimating the causal effect of \nan intervention on some outcome from real-world non-experimental observational data.  \n\nThis package provides a suite of causal methods, \nunder a unified scikit-learn-inspired API.\nIt implements meta-algorithms that allow plugging in arbitrarily complex machine learning models. \nThis modular approach supports highly-flexible causal modelling.\nThe fit-and-predict-like \nAPI makes it possible to train on one set of examples \nand estimate an effect on the other (out-of-bag),\nwhich allows for a more \"honest\"<sup>1</sup> effect estimation.\n\nThe package also includes an evaluation suite. \nSince most causal-models utilize machine learning models internally, \nwe can diagnose poor-performing models by re-interpreting known ML evaluations from  a causal perspective.\n\nIf you use the package, please consider citing [Shimoni et al., 2019](https://arxiv.org/abs/1906.00442):\n<details>\n  <summary>Reference</summary>\n  \n```bibtex\n@article{causalevaluations,\n  title={An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference},\n  author={Shimoni, Yishai and Karavani, Ehud and Ravid, Sivan and Bak, Peter and Ng, Tan Hung and Alford, Sharon Hensley and Meade, Denise and Goldschmidt, Yaara},\n  journal={arXiv preprint arXiv:1906.00442},\n  year={2019}\n}\n```\n\n-------------\n</details>\n\n<sup>1</sup> Borrowing [Wager & Athey](https://arxiv.org/abs/1510.04342) terminology of avoiding overfit.  \n\n\n## Installation\n```bash\npip install causallib\n```\n\n## Usage\nThe package is imported using the name `causallib`.\nEach causal model requires an internal machine-learning model.\n`causallib` supports any model that has a sklearn-like fit-predict API\n(note some models might require a `predict_proba` implementation).\nFor example:\n```Python\nfrom sklearn.linear_model import LogisticRegression\nfrom causallib.estimation import IPW \nfrom causallib.datasets import load_nhefs\n\ndata = load_nhefs()\nipw = IPW(LogisticRegression())\nipw.fit(data.X, data.a)\npotential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y)\neffect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])\n```\nComprehensive Jupyter Notebooks examples can be found in the [examples directory](examples).\n\n### Community support\nWe use the Slack workspace at [causallib.slack.com](https://causallib.slack.com/) for informal communication.\nWe encourage you to ask questions regarding causal-inference modelling or \nusage of causallib that don't necessarily merit opening an issue on Github.  \n\nUse this [invite link to join causallib on Slack](https://join.slack.com/t/causallib/shared_invite/zt-mwxnwe1t-htEgAXr3j3T2UeZj61gP6g). \n\n### Approach to causal-inference\nSome key points on how we address causal-inference estimation\n\n##### 1. Emphasis on potential outcome prediction  \nCausal effect may be the desired outcome. \nHowever, every effect is defined by two potential (counterfactual) outcomes. \nWe adopt this two-step approach by separating the effect-estimating step \nfrom the potential-outcome-prediction step. \nA beneficial consequence to this approach is that it better supports \nmulti-treatment problems where \"effect\" is not well-defined.\n\n##### 2. Stratified average treatment effect\nThe causal inference literature devotes special attention to the population \non which the effect is estimated on.\nFor example, ATE (average treatment effect on the entire sample),\nATT (average treatment effect on the treated), etc. \nBy allowing out-of-bag estimation, we leave this specification to the user.\nFor example, ATE is achieved by `model.estimate_population_outcome(X, a)`\nand ATT is done by stratifying on the treated: `model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])`\n\n##### 3. Families of causal inference models\nWe distinguish between two types of models:\n* *Weight models*: weight the data to balance between the treatment and control groups, \n   and then estimates the potential outcome by using a weighted average of the observed outcome. \n   Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models. \n* *Direct outcome models*: uses the covariates (features) and treatment assignment to build a\n   model that predicts the outcome directly. The model can then be used to predict the outcome\n   under any assignment of treatment values, specifically the potential-outcome under assignment of\n   all controls or all treated.  \n   These models are usually known as *Standardization* models, and it should be noted that, currently,\n   they are the only ones able to generate *individual effect estimation* (otherwise known as CATE).\n\n##### 4. Confounders and DAGs\nOne of the most important steps in causal inference analysis is to have \nproper selection on both dimensions of the data to avoid introducing bias:\n* On rows: thoughtfully choosing the right inclusion\\exclusion criteria \n  for individuals in the data. \n* On columns: thoughtfully choosing what covariates (features) act as confounders \n  and should be included in the analysis.\n\nThis is a place where domain expert knowledge is required and cannot be fully and truly automated\nby algorithms. \nThis package assumes that the data provided to the model fit the criteria. \nHowever, filtering can be applied in real-time using a scikit-learn pipeline estimator\nthat chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "A Python package for flexible and modular causal inference modeling",
    "version": "0.9.7",
    "project_urls": {
        "Bug Tracker": "https://github.com/BiomedSciAI/causallib/issues",
        "Documentation": "https://causallib.readthedocs.io/en/latest/",
        "Homepage": "https://github.com/BiomedSciAI/causallib",
        "Source Code": "https://github.com/BiomedSciAI/causallib"
    },
    "split_keywords": [
        "causal",
        "inference",
        "effect",
        "estimation",
        "causality"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "23029e09984dc2d993dd1ac163bf12a5be30144b0178e20fe4de0fd2acd7934e",
                "md5": "fea10d947efd967f6d7964f749cde8d2",
                "sha256": "0748b0e436f7b9effdd6985a664bc98290fb0e6d58bbd1e2b40cd3dfe653521f"
            },
            "downloads": -1,
            "filename": "causallib-0.9.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fea10d947efd967f6d7964f749cde8d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 2181779,
            "upload_time": "2024-07-31T11:25:57",
            "upload_time_iso_8601": "2024-07-31T11:25:57.549090Z",
            "url": "https://files.pythonhosted.org/packages/23/02/9e09984dc2d993dd1ac163bf12a5be30144b0178e20fe4de0fd2acd7934e/causallib-0.9.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f8738cee3dcf9a0880daf2425f026219137e411f3855b76a9420e941806b5bb0",
                "md5": "26bbc53cf1b4e8ce9118139bb60b8f5f",
                "sha256": "a453f294e97dc63288daab9cce1f32bfdb244c3d0702b636bfe417192890d0b1"
            },
            "downloads": -1,
            "filename": "causallib-0.9.7.tar.gz",
            "has_sig": false,
            "md5_digest": "26bbc53cf1b4e8ce9118139bb60b8f5f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 1925837,
            "upload_time": "2024-07-31T11:25:59",
            "upload_time_iso_8601": "2024-07-31T11:25:59.538139Z",
            "url": "https://files.pythonhosted.org/packages/f8/73/8cee3dcf9a0880daf2425f026219137e411f3855b76a9420e941806b5bb0/causallib-0.9.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-31 11:25:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BiomedSciAI",
    "github_project": "causallib",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "causallib"
}

Causal Machine Learning for Healthcare and Life Sciences, IBM Research Israel