iscan-dag


Nameiscan-dag JSON
Version 0.0.4 PyPI version JSON
download
home_page
SummaryImplementation of the iSCAN algorithm for detecting distribution shifts
upload_time2023-09-24 09:32:00
maintainer
docs_urlNone
author
requires_python>=3.6
licenseApache 2.0
keywords iscan distribution shifts causal mechanisms bayesian networks structure learning difference network
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ![iSCAN](https://raw.githubusercontent.com/kevinsbello/iscan/master/logo/iscan.png)

<div align=center>
  <a href="https://pypi.org/project/iscan-dag"><img src="https://img.shields.io/pypi/v/iscan-dag"></a>
  <a href="https://pypi.org/project/iscan-dag"><img src="https://img.shields.io/pypi/pyversions/iscan-dag"></a>
  <a href="https://pypi.org/project/iscan-dag"><img src="https://img.shields.io/pypi/wheel/iscan-dag"></a>
  <a href="https://pypistats.org/packages/iscan-dag"><img src="https://img.shields.io/pypi/dm/iscan-dag"></a>
  <a href="https://pypi.org/project/iscan-dag"><img src="https://img.shields.io/pypi/l/iscan-dag"></a>
</div>


The `iscan-dag` library is a Python 3 package designed for the direct detection of shifted nodes and structural shifted edges across multiple DAGs originating from distinct environments.

iSCAN-dag operates through a systematic process:

1. It initially calculates the derivatives of the score function, a key step in identifying the leaf nodes for all environments.
2. Subsequently, it computes the derivative of the score function from the mixture distribution, then evaluating the variance of these derivatives. If the variance exceeds a threshold of zero, it designates these leaf nodes as shifted nodes.

This process is iteratively applied, eliminating the identified leaf nodes across all environments and repeating the procedure to uncover all shifted nodes.

To detect structural shifted edges, the library leverages the by-products of the prior steps, the topological order. It employs  [`FOCI`](https://cran.r-project.org/web/packages/FOCI/index.html)  to identify discrepancies in parental relationships.


## Citation

This is an implementation of the following paper:

[1] Chen T., Bello K., Aragam B., Ravikumar P. (2023). ["iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models"][iscan]. 

[iscan]: https://arxiv.org/abs/2306.17361

If you find this code useful, please consider citing:

### BibTeX

```bibtex
@article{chen2023iscan,
    author = {Chen, Tianyu and Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},
    journal = {ArXiv Preprint 2306.17361},
    title = {{iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models}},
    year = {2023}
}
```

## Features

- Detecting shifted nodes without the need for separate DAG estimations.
- Accommodates any score estimators that can seamlessly integrate into this versatile framework.
- Unlike DCI and UT-IGSP, iSCAN's time complexity is not influenced by graph density and runs faster in larger networks due to its omission of non-parametric conditional independence tests.

## Getting Started

### Install the package

We recommend using a virtual environment via `virtualenv` or `conda`, and use `pip` to install the `iscan-dag` package.
```bash
$ pip install -U iscan-dag
```

### Using iSCAN

See an example on how to use iSCAN in this [iPython notebook][example].

[example]: https://github.com/kevinsbello/iscan/blob/master/example/example.ipynb

## An Overview of iSCAN

We propose a new method of  identifying changes (shifts) in causal mechanisms between related Structure causal models  (SCMs) directly, without  recovering the entire underlying DAG structure. This paper focuses on identifying mechanism shifts in two or more related SCMs over the same set of variables---**without estimating the entire DAG structure of each SCM**. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). We prove a surprising result where the Jacobian of the score function for the **mixture distribution** reveals information about shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences (if any) for the shifted variables.  The advantages of our method is it is easy to understand and implement, lead to significant improvement in identifying shifted nodes (e.g F1, Recall, Precision).

### Identifying the shifted leaf nodes

Let $h$ be the index of the environment, and $p^h(x)$ denote the pdf of the $h$-th environment. Let $q(x)$ be the pdf of the mixture distribution of the all $H$ environments such that $q(x) = \sum_h w_h p^h(x)$.
Also, let $s(x) = \nabla \log q(x)$ be the associated score function. 
Then, under with nonlinear assumption and additive noise assumption, we have:

```math
\begin{array}{cl}
(i) \text{ If } j \text{ is a leaf in all DAGs } G^h, \text{ then } j \text{ is a shifted node if and only if }  \text{Var}_X\left[ \frac{\partial s_j(X)}{\partial x_j} \right] > 0\\
(ii) \text{ If } j \text{ is not a leaf in at least one DAG } G^h, \text{ then } \text{Var}_X\left[ \frac{\partial s_j(X)}{\partial x_j} \right] > 0
\end{array}
```
### Identifying the shifted edges

By utilizing a common estimated topological order across all environments, individuals can customize their definition of functional shifted edges and apply any available (non)parametric statistical technique to identify these edges based on the detected shifted nodes. This approach can significantly expedite the process, particularly when the occurrence of shifted nodes is sparse, obviating the need for exhaustive edge comparisons across all nodes and environments.

Regarding structural shifted edges, we have established a connection between FOCI and our method, enabling the detection of such edges with a time complexity of $\mathcal{O}(n\log n)$.

## Requirements

- Python 3.6+
- `numpy`
- `igraph`
- `torch`

## Contents

- `score_estimator.py`:  Estimates the diagonal of the Hessian of $\log p(x)$ at the provided samples points.
- `utils.py`
  - `set_seed`: Manually sets the random seed.
  - `node_metrics`, `ddag_metrics`: Metrics for identifying shifted nodes and structural shifted edges.
  - `DataGenerator`: Generates data for testing purposes.
- `shifted_nodes.py`: Implements iSCAN, providing detected shifted nodes and test cases.
- `shifted_edges.py`: Implements the discovery of structural shifted edges using FOCI, along with test cases.
- `my_foci.R`: Implements FOCI for finding parents based on given nodes and topological order.

## Acknowledgements

We thank the authors of the [SCORE](https://github.com/paulrolland1307/SCORE/tree/main) for making their code available. Part of our code is based on their implementation, specially the `score_estimator.py` file.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "iscan-dag",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "iscan,distribution shifts,causal mechanisms,bayesian networks,structure learning,difference network",
    "author": "",
    "author_email": "Kevin Bello <kbello@cs.cmu.edu>, Tianyu Chen <tianyuchen@utexas.edu>",
    "download_url": "https://files.pythonhosted.org/packages/18/0b/18a056ac8ef41c9e329fc6d305c2a3461cb42ee452ea83fb6ac5b8eabda6/iscan-dag-0.0.4.tar.gz",
    "platform": null,
    "description": "# ![iSCAN](https://raw.githubusercontent.com/kevinsbello/iscan/master/logo/iscan.png)\n\n<div align=center>\n  <a href=\"https://pypi.org/project/iscan-dag\"><img src=\"https://img.shields.io/pypi/v/iscan-dag\"></a>\n  <a href=\"https://pypi.org/project/iscan-dag\"><img src=\"https://img.shields.io/pypi/pyversions/iscan-dag\"></a>\n  <a href=\"https://pypi.org/project/iscan-dag\"><img src=\"https://img.shields.io/pypi/wheel/iscan-dag\"></a>\n  <a href=\"https://pypistats.org/packages/iscan-dag\"><img src=\"https://img.shields.io/pypi/dm/iscan-dag\"></a>\n  <a href=\"https://pypi.org/project/iscan-dag\"><img src=\"https://img.shields.io/pypi/l/iscan-dag\"></a>\n</div>\n\n\nThe `iscan-dag` library is a Python 3 package designed for the direct detection of shifted nodes and structural shifted edges across multiple DAGs originating from distinct environments.\n\niSCAN-dag operates through a systematic process:\n\n1. It initially calculates the derivatives of the score function, a key step in identifying the leaf nodes for all environments.\n2. Subsequently, it computes the derivative of the score function from the mixture distribution, then evaluating the variance of these derivatives. If the variance exceeds a threshold of zero, it designates these leaf nodes as shifted nodes.\n\nThis process is iteratively applied, eliminating the identified leaf nodes across all environments and repeating the procedure to uncover all shifted nodes.\n\nTo detect structural shifted edges, the library leverages the by-products of the prior steps, the topological order. It employs  [`FOCI`](https://cran.r-project.org/web/packages/FOCI/index.html)  to identify discrepancies in parental relationships.\n\n\n## Citation\n\nThis is an implementation of the following paper:\n\n[1] Chen T., Bello K., Aragam B., Ravikumar P. (2023). [\"iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models\"][iscan]. \n\n[iscan]: https://arxiv.org/abs/2306.17361\n\nIf you find this code useful, please consider citing:\n\n### BibTeX\n\n```bibtex\n@article{chen2023iscan,\n    author = {Chen, Tianyu and Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},\n    journal = {ArXiv Preprint 2306.17361},\n    title = {{iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models}},\n    year = {2023}\n}\n```\n\n## Features\n\n- Detecting shifted nodes without the need for separate DAG estimations.\n- Accommodates any score estimators that can seamlessly integrate into this versatile framework.\n- Unlike DCI and UT-IGSP, iSCAN's time complexity is not influenced by graph density and runs faster in larger networks due to its omission of non-parametric conditional independence tests.\n\n## Getting Started\n\n### Install the package\n\nWe recommend using a virtual environment via `virtualenv` or `conda`, and use `pip` to install the `iscan-dag` package.\n```bash\n$ pip install -U iscan-dag\n```\n\n### Using iSCAN\n\nSee an example on how to use iSCAN in this [iPython notebook][example].\n\n[example]: https://github.com/kevinsbello/iscan/blob/master/example/example.ipynb\n\n## An Overview of iSCAN\n\nWe propose a new method of  identifying changes (shifts) in causal mechanisms between related Structure causal models  (SCMs) directly, without  recovering the entire underlying DAG structure. This paper focuses on identifying mechanism shifts in two or more related SCMs over the same set of variables---**without estimating the entire DAG structure of each SCM**. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). We prove a surprising result where the Jacobian of the score function for the **mixture distribution** reveals information about shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences (if any) for the shifted variables.  The advantages of our method is it is easy to understand and implement, lead to significant improvement in identifying shifted nodes (e.g F1, Recall, Precision).\n\n### Identifying the shifted leaf nodes\n\nLet $h$ be the index of the environment, and $p^h(x)$ denote the pdf of the $h$-th environment. Let $q(x)$ be the pdf of the mixture distribution of the all $H$ environments such that $q(x) = \\sum_h w_h p^h(x)$.\nAlso, let $s(x) = \\nabla \\log q(x)$ be the associated score function. \nThen, under with nonlinear assumption and additive noise assumption, we have:\n\n```math\n\\begin{array}{cl}\n(i) \\text{ If } j \\text{ is a leaf in all DAGs } G^h, \\text{ then } j \\text{ is a shifted node if and only if }  \\text{Var}_X\\left[ \\frac{\\partial s_j(X)}{\\partial x_j} \\right] > 0\\\\\n(ii) \\text{ If } j \\text{ is not a leaf in at least one DAG } G^h, \\text{ then } \\text{Var}_X\\left[ \\frac{\\partial s_j(X)}{\\partial x_j} \\right] > 0\n\\end{array}\n```\n### Identifying the shifted edges\n\nBy utilizing a common estimated topological order across all environments, individuals can customize their definition of functional shifted edges and apply any available (non)parametric statistical technique to identify these edges based on the detected shifted nodes. This approach can significantly expedite the process, particularly when the occurrence of shifted nodes is sparse, obviating the need for exhaustive edge comparisons across all nodes and environments.\n\nRegarding structural shifted edges, we have established a connection between FOCI and our method, enabling the detection of such edges with a time complexity of $\\mathcal{O}(n\\log n)$.\n\n## Requirements\n\n- Python 3.6+\n- `numpy`\n- `igraph`\n- `torch`\n\n## Contents\n\n- `score_estimator.py`:  Estimates the diagonal of the Hessian of $\\log p(x)$ at the provided samples points.\n- `utils.py`\n  - `set_seed`: Manually sets the random seed.\n  - `node_metrics`, `ddag_metrics`: Metrics for identifying shifted nodes and structural shifted edges.\n  - `DataGenerator`: Generates data for testing purposes.\n- `shifted_nodes.py`: Implements iSCAN, providing detected shifted nodes and test cases.\n- `shifted_edges.py`: Implements the discovery of structural shifted edges using FOCI, along with test cases.\n- `my_foci.R`: Implements FOCI for finding parents based on given nodes and topological order.\n\n## Acknowledgements\n\nWe thank the authors of the [SCORE](https://github.com/paulrolland1307/SCORE/tree/main) for making their code available. Part of our code is based on their implementation, specially the `score_estimator.py` file.\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Implementation of the iSCAN algorithm for detecting distribution shifts",
    "version": "0.0.4",
    "project_urls": {
        "Documentation": "https://iscan-dag.readthedocs.io/en/latest/",
        "Issues": "https://github.com/kevinsbello/iscan/issues",
        "Repository": "https://github.com/kevinsbello/iscan"
    },
    "split_keywords": [
        "iscan",
        "distribution shifts",
        "causal mechanisms",
        "bayesian networks",
        "structure learning",
        "difference network"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f5eb868fe40c670d5358a594c22d95d8310d30721e65098169c056fc11dd7b8",
                "md5": "626b021352bf027b8677cbf398eb3120",
                "sha256": "dd794da999bbbb62761b40bd3682dba8a60cd5d7e90dd5fa56b9167202e3688f"
            },
            "downloads": -1,
            "filename": "iscan_dag-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "626b021352bf027b8677cbf398eb3120",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 17085,
            "upload_time": "2023-09-24T09:31:58",
            "upload_time_iso_8601": "2023-09-24T09:31:58.987844Z",
            "url": "https://files.pythonhosted.org/packages/2f/5e/b868fe40c670d5358a594c22d95d8310d30721e65098169c056fc11dd7b8/iscan_dag-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "180b18a056ac8ef41c9e329fc6d305c2a3461cb42ee452ea83fb6ac5b8eabda6",
                "md5": "46573ac13a2744b2c1a08556056fea2f",
                "sha256": "44e9dbc7d80880af623b878a28de26665ddc87b0140984c7844e13275de08626"
            },
            "downloads": -1,
            "filename": "iscan-dag-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "46573ac13a2744b2c1a08556056fea2f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 18628,
            "upload_time": "2023-09-24T09:32:00",
            "upload_time_iso_8601": "2023-09-24T09:32:00.499270Z",
            "url": "https://files.pythonhosted.org/packages/18/0b/18a056ac8ef41c9e329fc6d305c2a3461cb42ee452ea83fb6ac5b8eabda6/iscan-dag-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-24 09:32:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kevinsbello",
    "github_project": "iscan",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "iscan-dag"
}
        
Elapsed time: 0.13500s