causalnlp


Namecausalnlp JSON
Version 0.8.0 PyPI version JSON
download
home_pagehttps://github.com/amaiya/causalnlp/tree/main/
SummaryCausalNLP: A Practical Toolkit for Causal Inference with Text
upload_time2024-06-15 16:45:15
maintainerNone
docs_urlNone
authorArun S. Maiya
requires_python>=3.8
licenseApache Software License 2.0
keywords causality nlp causal-inference natural-language-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Welcome to CausalNLP


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## What is CausalNLP?

> CausalNLP is a practical toolkit for causal inference with text as
> treatment, outcome, or “controlled-for” variable.

## Features

- Low-code [causal
  inference](https://amaiya.github.io/causalnlp/examples.html) in as
  little as two commands
- Out-of-the-box support for using [**text** as a “controlled-for”
  variable](https://amaiya.github.io/causalnlp/examples.html#What-is-the-causal-impact-of-a-positive-review-on-product-views?)
  (e.g., confounder)
- Built-in
  [Autocoder](https://amaiya.github.io/causalnlp/autocoder.html) that
  transforms raw text into useful variables for causal analyses (e.g.,
  topics, sentiment, emotion, etc.)
- Sensitivity analysis to [assess robustness of causal
  estimates](https://amaiya.github.io/causalnlp/causalinference.html#CausalInferenceModel.evaluate_robustness)
- Quick and simple [key driver
  analysis](https://amaiya.github.io/causalnlp/key_driver_analysis.html)
  to yield clues on potential drivers of an outcome based on predictive
  power, correlations, etc.
- Can easily be applied to [“traditional” tabular datasets without
  text](https://amaiya.github.io/causalnlp/examples.html#What-is-the-causal-impact-of-having-a-PhD-on-making-over-$50K?)
  (i.e., datasets with only numerical and categorical variables)
- Includes an experimental [PyTorch
  implementation](https://amaiya.github.io/causalnlp/core.causalbert.html)
  of [CausalBert](https://arxiv.org/abs/1905.12741) by Veitch, Sridar,
  and Blei (based on [reference
  implementation](https://github.com/rpryzant/causal-bert-pytorch) by R.
  Pryzant)

## Install

1.  `pip install -U pip`
2.  `pip install causalnlp`

**NOTE**: On Python 3.6.x, if you get a
`RuntimeError: Python version >= 3.7 required`, try ensuring NumPy is
installed **before** CausalNLP (e.g., `pip install numpy==1.18.5`).

## Usage

To try out the
[examples](https://amaiya.github.io/causalnlp/examples.html) yourself:

<a href="https://colab.research.google.com/drive/1hu7j2QCWkVlFsKbuereWWRDOBy1anMbQ?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Example: What is the causal impact of a positive review on a product click?

``` python
import pandas as pd
```

``` python
df = pd.read_csv('sample_data/music_seed50.tsv', sep='\t', on_bad_lines='skip')
```

The file `music_seed50.tsv` is a semi-simulated dataset from
[here](https://github.com/rpryzant/causal-text). Columns of relevance
include: - `Y_sim`: outcome, where 1 means product was clicked and 0
means not. - `text`: raw text of review - `rating`: rating associated
with review (1 through 5) - `T_true`: 0 means rating less than 3, 1
means rating of 5, where `T_true` affects the outcome `Y_sim`. - `T_ac`:
an approximation of true review sentiment (`T_true`) created with
[Autocoder](https://amaiya.github.io/causalnlp/autocoder.html) from raw
review text - `C_true`:confounding categorical variable (1=audio CD,
0=other)

We’ll pretend the true sentiment (i.e., review rating and `T_true`) is
hidden and only use `T_ac` as the treatment variable.

Using the `text_col` parameter, we include the raw review text as
another “controlled-for” variable.

``` python
from causalnlp import CausalInferenceModel
from lightgbm import LGBMClassifier
```

``` python
cm = CausalInferenceModel(df, 
                         metalearner_type='t-learner', learner=LGBMClassifier(num_leaves=500),
                         treatment_col='T_ac', outcome_col='Y_sim', text_col='text',
                         include_cols=['C_true'])
cm.fit()
```

    outcome column (categorical): Y_sim
    treatment column: T_ac
    numerical/categorical covariates: ['C_true']
    text covariate: text
    preprocess time:  1.1179866790771484  sec
    start fitting causal inference model
    time to fit causal inference model:  10.361494302749634  sec

#### Estimating Treatment Effects

CausalNLP supports estimation of heterogeneous treatment effects (i.e.,
how causal impacts vary across observations, which could be documents,
emails, posts, individuals, or organizations).

We will first calculate the overall average treatment effect (or ATE),
which shows that a positive review increases the probability of a click
by **13 percentage points** in this dataset.

**Average Treatment Effect** (or **ATE**):

``` python
print( cm.estimate_ate() )
```

    {'ate': 0.1309311542209525}

**Conditional Average Treatment Effect** (or **CATE**): reviews that
mention the word “toddler”:

``` python
print( cm.estimate_ate(df['text'].str.contains('toddler')) )
```

    {'ate': 0.15559234254638685}

**Individualized Treatment Effects** (or **ITE**):

``` python
test_df = pd.DataFrame({'T_ac' : [1], 'C_true' : [1], 
                        'text' : ['I never bought this album, but I love his music and will soon!']})
effect = cm.predict(test_df)
print(effect)
```

    [[0.80538201]]

**Model Interpretability**:

``` python
print( cm.interpret(plot=False)[1][:10] )
```

    v_music    0.079042
    v_cd       0.066838
    v_album    0.055168
    v_like     0.040784
    v_love     0.040635
    C_true     0.039949
    v_just     0.035671
    v_song     0.035362
    v_great    0.029918
    v_heard    0.028373
    dtype: float64

Features with the `v_` prefix are word features. `C_true` is the
categorical variable indicating whether or not the product is a CD.

### Text is Optional in CausalNLP

Despite the “NLP” in CausalNLP, the library can be used for causal
inference on data **without** text (e.g., only numerical and categorical
variables). See [the
examples](https://amaiya.github.io/causalnlp/examples.html#What-is-the-causal-impact-of-having-a-PhD-on-making-over-$50K?)
for more info.

## Documentation

API documentation and additional usage examples are available at:
https://amaiya.github.io/causalnlp/

## How to Cite

Please cite [the following paper](https://arxiv.org/abs/2106.08043) when
using CausalNLP in your work:

    @article{maiya2021causalnlp,
        title={CausalNLP: A Practical Toolkit for Causal Inference with Text},
        author={Arun S. Maiya},
        year={2021},
        eprint={2106.08043},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        journal={arXiv preprint arXiv:2106.08043},
    }

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/amaiya/causalnlp/tree/main/",
    "name": "causalnlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "causality nlp causal-inference natural-language-processing",
    "author": "Arun S. Maiya",
    "author_email": "arun@maiya.net",
    "download_url": "https://files.pythonhosted.org/packages/7e/ee/d8f7702e390fc38c171d4dfeeb22b137e30100f2e5d0db97dff68ffa973c/causalnlp-0.8.0.tar.gz",
    "platform": null,
    "description": "# Welcome to CausalNLP\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n## What is CausalNLP?\n\n> CausalNLP is a practical toolkit for causal inference with text as\n> treatment, outcome, or \u201ccontrolled-for\u201d variable.\n\n## Features\n\n- Low-code [causal\n  inference](https://amaiya.github.io/causalnlp/examples.html) in as\n  little as two commands\n- Out-of-the-box support for using [**text** as a \u201ccontrolled-for\u201d\n  variable](https://amaiya.github.io/causalnlp/examples.html#What-is-the-causal-impact-of-a-positive-review-on-product-views?)\n  (e.g., confounder)\n- Built-in\n  [Autocoder](https://amaiya.github.io/causalnlp/autocoder.html) that\n  transforms raw text into useful variables for causal analyses (e.g.,\n  topics, sentiment, emotion, etc.)\n- Sensitivity analysis to [assess robustness of causal\n  estimates](https://amaiya.github.io/causalnlp/causalinference.html#CausalInferenceModel.evaluate_robustness)\n- Quick and simple [key driver\n  analysis](https://amaiya.github.io/causalnlp/key_driver_analysis.html)\n  to yield clues on potential drivers of an outcome based on predictive\n  power, correlations, etc.\n- Can easily be applied to [\u201ctraditional\u201d tabular datasets without\n  text](https://amaiya.github.io/causalnlp/examples.html#What-is-the-causal-impact-of-having-a-PhD-on-making-over-$50K?)\n  (i.e., datasets with only numerical and categorical variables)\n- Includes an experimental [PyTorch\n  implementation](https://amaiya.github.io/causalnlp/core.causalbert.html)\n  of [CausalBert](https://arxiv.org/abs/1905.12741) by Veitch, Sridar,\n  and Blei (based on [reference\n  implementation](https://github.com/rpryzant/causal-bert-pytorch) by R.\n  Pryzant)\n\n## Install\n\n1.  `pip install -U pip`\n2.  `pip install causalnlp`\n\n**NOTE**: On Python 3.6.x, if you get a\n`RuntimeError: Python version >= 3.7 required`, try ensuring NumPy is\ninstalled **before** CausalNLP (e.g., `pip install numpy==1.18.5`).\n\n## Usage\n\nTo try out the\n[examples](https://amaiya.github.io/causalnlp/examples.html) yourself:\n\n<a href=\"https://colab.research.google.com/drive/1hu7j2QCWkVlFsKbuereWWRDOBy1anMbQ?usp=sharing\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\n### Example: What is the causal impact of a positive review on a product click?\n\n``` python\nimport pandas as pd\n```\n\n``` python\ndf = pd.read_csv('sample_data/music_seed50.tsv', sep='\\t', on_bad_lines='skip')\n```\n\nThe file `music_seed50.tsv` is a semi-simulated dataset from\n[here](https://github.com/rpryzant/causal-text). Columns of relevance\ninclude: - `Y_sim`: outcome, where 1 means product was clicked and 0\nmeans not. - `text`: raw text of review - `rating`: rating associated\nwith review (1 through 5) - `T_true`: 0 means rating less than 3, 1\nmeans rating of 5, where `T_true` affects the outcome `Y_sim`. - `T_ac`:\nan approximation of true review sentiment (`T_true`) created with\n[Autocoder](https://amaiya.github.io/causalnlp/autocoder.html) from raw\nreview text - `C_true`:confounding categorical variable (1=audio CD,\n0=other)\n\nWe\u2019ll pretend the true sentiment (i.e., review rating and `T_true`) is\nhidden and only use `T_ac` as the treatment variable.\n\nUsing the `text_col` parameter, we include the raw review text as\nanother \u201ccontrolled-for\u201d variable.\n\n``` python\nfrom causalnlp import CausalInferenceModel\nfrom lightgbm import LGBMClassifier\n```\n\n``` python\ncm = CausalInferenceModel(df, \n                         metalearner_type='t-learner', learner=LGBMClassifier(num_leaves=500),\n                         treatment_col='T_ac', outcome_col='Y_sim', text_col='text',\n                         include_cols=['C_true'])\ncm.fit()\n```\n\n    outcome column (categorical): Y_sim\n    treatment column: T_ac\n    numerical/categorical covariates: ['C_true']\n    text covariate: text\n    preprocess time:  1.1179866790771484  sec\n    start fitting causal inference model\n    time to fit causal inference model:  10.361494302749634  sec\n\n#### Estimating Treatment Effects\n\nCausalNLP supports estimation of heterogeneous treatment effects (i.e.,\nhow causal impacts vary across observations, which could be documents,\nemails, posts, individuals, or organizations).\n\nWe will first calculate the overall average treatment effect (or ATE),\nwhich shows that a positive review increases the probability of a click\nby **13 percentage points** in this dataset.\n\n**Average Treatment Effect** (or **ATE**):\n\n``` python\nprint( cm.estimate_ate() )\n```\n\n    {'ate': 0.1309311542209525}\n\n**Conditional Average Treatment Effect** (or **CATE**): reviews that\nmention the word \u201ctoddler\u201d:\n\n``` python\nprint( cm.estimate_ate(df['text'].str.contains('toddler')) )\n```\n\n    {'ate': 0.15559234254638685}\n\n**Individualized Treatment Effects** (or **ITE**):\n\n``` python\ntest_df = pd.DataFrame({'T_ac' : [1], 'C_true' : [1], \n                        'text' : ['I never bought this album, but I love his music and will soon!']})\neffect = cm.predict(test_df)\nprint(effect)\n```\n\n    [[0.80538201]]\n\n**Model Interpretability**:\n\n``` python\nprint( cm.interpret(plot=False)[1][:10] )\n```\n\n    v_music    0.079042\n    v_cd       0.066838\n    v_album    0.055168\n    v_like     0.040784\n    v_love     0.040635\n    C_true     0.039949\n    v_just     0.035671\n    v_song     0.035362\n    v_great    0.029918\n    v_heard    0.028373\n    dtype: float64\n\nFeatures with the `v_` prefix are word features. `C_true` is the\ncategorical variable indicating whether or not the product is a CD.\n\n### Text is Optional in CausalNLP\n\nDespite the \u201cNLP\u201d in CausalNLP, the library can be used for causal\ninference on data **without** text (e.g., only numerical and categorical\nvariables). See [the\nexamples](https://amaiya.github.io/causalnlp/examples.html#What-is-the-causal-impact-of-having-a-PhD-on-making-over-$50K?)\nfor more info.\n\n## Documentation\n\nAPI documentation and additional usage examples are available at:\nhttps://amaiya.github.io/causalnlp/\n\n## How to Cite\n\nPlease cite [the following paper](https://arxiv.org/abs/2106.08043) when\nusing CausalNLP in your work:\n\n    @article{maiya2021causalnlp,\n        title={CausalNLP: A Practical Toolkit for Causal Inference with Text},\n        author={Arun S. Maiya},\n        year={2021},\n        eprint={2106.08043},\n        archivePrefix={arXiv},\n        primaryClass={cs.CL},\n        journal={arXiv preprint arXiv:2106.08043},\n    }\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "CausalNLP: A Practical Toolkit for Causal Inference with Text",
    "version": "0.8.0",
    "project_urls": {
        "Homepage": "https://github.com/amaiya/causalnlp/tree/main/"
    },
    "split_keywords": [
        "causality",
        "nlp",
        "causal-inference",
        "natural-language-processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8ffe17c6a48b0b612a6fdc177427aefe4e0d7015356adca03228514c18752983",
                "md5": "f6d2c95852f8a1908ec7bc93956803b7",
                "sha256": "0986bd5a5ccf17f549322b0ea902eacbfbeaf999b05cc8219cec248c5fe56db1"
            },
            "downloads": -1,
            "filename": "causalnlp-0.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f6d2c95852f8a1908ec7bc93956803b7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 71961,
            "upload_time": "2024-06-15T16:45:14",
            "upload_time_iso_8601": "2024-06-15T16:45:14.052865Z",
            "url": "https://files.pythonhosted.org/packages/8f/fe/17c6a48b0b612a6fdc177427aefe4e0d7015356adca03228514c18752983/causalnlp-0.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7eeed8f7702e390fc38c171d4dfeeb22b137e30100f2e5d0db97dff68ffa973c",
                "md5": "1222664e55a25049aaffcfa6a2c7fd65",
                "sha256": "a8722afa25779133269161473af7c2c14757efa774b1e358fb93d5b8d872ef09"
            },
            "downloads": -1,
            "filename": "causalnlp-0.8.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1222664e55a25049aaffcfa6a2c7fd65",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 63342,
            "upload_time": "2024-06-15T16:45:15",
            "upload_time_iso_8601": "2024-06-15T16:45:15.251955Z",
            "url": "https://files.pythonhosted.org/packages/7e/ee/d8f7702e390fc38c171d4dfeeb22b137e30100f2e5d0db97dff68ffa973c/causalnlp-0.8.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-15 16:45:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "amaiya",
    "github_project": "causalnlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "causalnlp"
}
        
Elapsed time: 0.29328s