dagma

Name	dagma JSON
Version	1.1.1 JSON
	download
home_page
Summary	Implementation of the DAGMA algorithm
upload_time	2023-09-02 01:20:45
maintainer
docs_url	None
author
requires_python	>=3.7
license	Apache 2.0
keywords	dagma notears causal discovery bayesian network structure learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ![DAGMA](https://raw.githubusercontent.com/kevinsbello/dagma/main/logo/dagma.png)

<div align=center>
  <a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/v/dagma"></a>
  <a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/pyversions/dagma"></a>
  <a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/wheel/dagma"></a>
  <a href="https://pypistats.org/packages/dagma"><img src="https://img.shields.io/pypi/dm/dagma"></a>
  <a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/l/dagma"></a>
</div>


The `dagma` library is a Python 3 package for learning DAGs (a.k.a. Bayesian networks) from data.

DAGMA works by optimizing a given **score/loss function**, where the structure that relates the variables is constrained to be a directed acyclic graph (DAG). Due to the super-exponential number of DAGs w.r.t. the number of variables, the vanilla formulation results in a hard combinatorial optimization problem. DAGMA reformulates this optimization problem, by **replacing the combinatorial constraint with a non-convex differentiable function that exactly characterizes DAGs**, thus, making the optimization amenable to continuous optimization methods such as gradient descent.

## Citation

This is an implementation of the following paper:

[1] Bello K., Aragam B., Ravikumar P. (2022). [DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization][dagma]. [NeurIPS'22](https://nips.cc/Conferences/2022/). 

[notears]: https://arxiv.org/abs/1803.01422
[dagma]: https://arxiv.org/abs/2209.08037

If you find this code useful, please consider citing:

### BibTeX

```bibtex
@inproceedings{bello2022dagma,
    author = {Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},
    booktitle = {Advances in Neural Information Processing Systems},
    title = {{DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization}},
    year = {2022}
}
```

## Features

- Supports continuous data for linear (see [dagma.linear][dagma-linear]) and nonlinear models (see [dagma.nonlinear][dagma-nonlinear]).
- Supports binary (0/1) data for generalized linear models, via [dagma.linear.DagmaLinear][DagmaLinear] and using ``logistic`` as score.
- Faster than other continuous optimization methods for structure learning, e.g., NOTEARS, GOLEM.

[dagma-linear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/
[dagma-nonlinear]: https://dagma.readthedocs.io/en/latest/api/dagma/nonlinear/
[DagmaLinear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/DagmaLinear/

## Getting Started

### Install the package

We recommend using a virtual environment via `virtualenv` or `conda`, and use `pip` to install the `dagma` package.
```bash
$ pip install dagma
```

### Using dagma

See an example on how to use dagma in this [iPython notebook][example].

## An Overview of DAGMA

We propose a new acyclicity characterization of DAGs via a log-det function for learning DAGs from observational data. Similar to previously proposed acyclicity functions (e.g. [NOTEARS][notears]), our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. These advantages of our log-det formulation, together with a path-following scheme, lead to significant improvements in structure accuracy (e.g. SHD).

### The log-det acyclicity characterization

Let $W \in \mathbb{R}^{d\times d}$ be a weighted adjacency matrix  of a graph of $d$ nodes, the log-det function takes the following form:

$$h^{s}(W) = -\log \det (sI-W\circ W) + d \log s,$$

where $I$ is the identity matrix, $s$ is a given scalar (e.g., 1), and $\circ$ denotes the element-wise Hadamard product. Of particular interest, we have that $h(W) = 0$ if and only if $W$ represents a DAG, and when the domain of $h$ is the set of M-matrices then $h$ is well-defined and non-negative. For more properties of $h(W)$ (e.g., being an invex function), $\nabla h(W)$, and $\nabla^2 h(W)$, we invite you to look at [[1][dagma]].

### A path-following approach

Given the exact differentiable characterization of a DAG, we are interested in solving the following optimization problem:
```math
\begin{array}{cl}
\min _{W \in \mathbb{R}^{d \times d}} & Q(W;\mathbf{X}) \\
\text { subject to } & h^{s}(W) = 0,
\end{array}
```
where $Q$ is a given score function (e.g., square loss) that depends on $W$ and the dataset $\mathbf{X}$. To solve the above constrained problem, we propose a path-following approach where we solve a few of the following unconstrained problems:
```math
\hat{W}^{(t+1)} = \arg\min_{W}\; \mu^{(t)} Q(W;\mathbf{X}) + h(W),
```
where $\mu^{(t)} \to 0$ as $t$ increases. Leveraging the properties of $h$, we show that, at the limit, the solution is a DAG. The trick to make this work is to **use the previous solution as a starting point when solving the current unconstrained problem**, as usually done in interior-point algorithms. Finally, we use a simple accelerated gradient descent method to solve each unconstrained problem.

Let us give an illustration of how DAGMA works in a two-node graph (see Figure 1 in [[1][dagma]] for more details). Here $w_1$ (the x-axis) represents the edge weight from node 1 to node 2; while $w_2$ (y-axis) represents the edge weight from node 2 to node 1. Moreover, in this example, the ground-truth DAG corresponds to $w_1 = 1.2$ and $w_2 = 0$. 

Below we have 4 plots, where each illustrates the solution to an unconstrained problem for different values of $\mu$. In the top-left plot, we have $\mu=1$, and we solve the unconstrained problem starting at the empty graph (i.e., $w_1 = w_2 = 0$), which corresponds to the red point, and after running gradient descent, we arrive at the cyan point (i.e., $w_1 = 1.06, w_2 = 0.24$). Then, for the next unconstrained problem in the top-right plot, we have $\mu = 0.1$, and we initialize gradient descent at the previous solution, i.e., $w_1 = 1.06, w_2 = 0.24$, and arrive at the cyan point $w_1 = 1.16, w_2 = 0.04$. Similarly, DAGMA solves for $\mu=0.01$ and $\mu=0.001$, and we can observe how the solution at the final iteration (bottom-right plot) is close to the ground-truth DAG  $w_1 = 1.2, w_2 = 0$.

<img width="1350" alt="dagma_4iters" src="https://user-images.githubusercontent.com/6846921/200969570-8a3434d5-b3b8-4260-966b-6fe1e0303188.png">


## Requirements

- Python 3.7+
- `numpy`
- `scipy`
- `igraph`
- `tqdm`
- `torch`: Only used for nonlinear models.

## Contents 

- `linear.py` - implementation of DAGMA for linear models with l1 regularization (supports L2 and Logistic losses).
- `nonlinear.py` - implementation of DAGMA for nonlinear models using MLP
- `locally_connected.py` - special layer structure used for MLP
- `utils.py` - graph simulation, data simulation, and accuracy evaluation


## Acknowledgments

We thank the authors of the [NOTEARS repo][notears-repo] for making their code available. Part of our code is based on their implementation, specially the `utils.py` file and some code from their implementation of nonlinear models.

[notears-repo]: https://github.com/xunzheng/notears
[example]: https://github.com/kevinsbello/dagma/blob/main/examples/dagma_test.ipynb

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "dagma",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "dagma,notears,causal discovery,bayesian network,structure learning",
    "author": "",
    "author_email": "Kevin Bello <kbello@cs.cmu.edu>",
    "download_url": "https://files.pythonhosted.org/packages/56/79/f7cb5023d096b0cdcf4fa1b0e9999cf1d3c59c4e45280f889e00247fd90c/dagma-1.1.1.tar.gz",
    "platform": null,
    "description": "# ![DAGMA](https://raw.githubusercontent.com/kevinsbello/dagma/main/logo/dagma.png)\n\n<div align=center>\n  <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/v/dagma\"></a>\n  <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/pyversions/dagma\"></a>\n  <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/wheel/dagma\"></a>\n  <a href=\"https://pypistats.org/packages/dagma\"><img src=\"https://img.shields.io/pypi/dm/dagma\"></a>\n  <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/l/dagma\"></a>\n</div>\n\n\nThe `dagma` library is a Python 3 package for learning DAGs (a.k.a. Bayesian networks) from data.\n\nDAGMA works by optimizing a given **score/loss function**, where the structure that relates the variables is constrained to be a directed acyclic graph (DAG). Due to the super-exponential number of DAGs w.r.t. the number of variables, the vanilla formulation results in a hard combinatorial optimization problem. DAGMA reformulates this optimization problem, by **replacing the combinatorial constraint with a non-convex differentiable function that exactly characterizes DAGs**, thus, making the optimization amenable to continuous optimization methods such as gradient descent.\n\n## Citation\n\nThis is an implementation of the following paper:\n\n[1] Bello K., Aragam B., Ravikumar P. (2022). [DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization][dagma]. [NeurIPS'22](https://nips.cc/Conferences/2022/). \n\n[notears]: https://arxiv.org/abs/1803.01422\n[dagma]: https://arxiv.org/abs/2209.08037\n\nIf you find this code useful, please consider citing:\n\n### BibTeX\n\n```bibtex\n@inproceedings{bello2022dagma,\n    author = {Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},\n    booktitle = {Advances in Neural Information Processing Systems},\n    title = {{DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization}},\n    year = {2022}\n}\n```\n\n## Features\n\n- Supports continuous data for linear (see [dagma.linear][dagma-linear]) and nonlinear models (see [dagma.nonlinear][dagma-nonlinear]).\n- Supports binary (0/1) data for generalized linear models, via [dagma.linear.DagmaLinear][DagmaLinear] and using ``logistic`` as score.\n- Faster than other continuous optimization methods for structure learning, e.g., NOTEARS, GOLEM.\n\n[dagma-linear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/\n[dagma-nonlinear]: https://dagma.readthedocs.io/en/latest/api/dagma/nonlinear/\n[DagmaLinear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/DagmaLinear/\n\n## Getting Started\n\n### Install the package\n\nWe recommend using a virtual environment via `virtualenv` or `conda`, and use `pip` to install the `dagma` package.\n```bash\n$ pip install dagma\n```\n\n### Using dagma\n\nSee an example on how to use dagma in this [iPython notebook][example].\n\n## An Overview of DAGMA\n\nWe propose a new acyclicity characterization of DAGs via a log-det function for learning DAGs from observational data. Similar to previously proposed acyclicity functions (e.g. [NOTEARS][notears]), our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. These advantages of our log-det formulation, together with a path-following scheme, lead to significant improvements in structure accuracy (e.g. SHD).\n\n### The log-det acyclicity characterization\n\nLet $W \\in \\mathbb{R}^{d\\times d}$ be a weighted adjacency matrix  of a graph of $d$ nodes, the log-det function takes the following form:\n\n$$h^{s}(W) = -\\log \\det (sI-W\\circ W) + d \\log s,$$\n\nwhere $I$ is the identity matrix, $s$ is a given scalar (e.g., 1), and $\\circ$ denotes the element-wise Hadamard product. Of particular interest, we have that $h(W) = 0$ if and only if $W$ represents a DAG, and when the domain of $h$ is the set of M-matrices then $h$ is well-defined and non-negative. For more properties of $h(W)$ (e.g., being an invex function), $\\nabla h(W)$, and $\\nabla^2 h(W)$, we invite you to look at [[1][dagma]].\n\n### A path-following approach\n\nGiven the exact differentiable characterization of a DAG, we are interested in solving the following optimization problem:\n```math\n\\begin{array}{cl}\n\\min _{W \\in \\mathbb{R}^{d \\times d}} & Q(W;\\mathbf{X}) \\\\\n\\text { subject to } & h^{s}(W) = 0,\n\\end{array}\n```\nwhere $Q$ is a given score function (e.g., square loss) that depends on $W$ and the dataset $\\mathbf{X}$. To solve the above constrained problem, we propose a path-following approach where we solve a few of the following unconstrained problems:\n```math\n\\hat{W}^{(t+1)} = \\arg\\min_{W}\\; \\mu^{(t)} Q(W;\\mathbf{X}) + h(W),\n```\nwhere $\\mu^{(t)} \\to 0$ as $t$ increases. Leveraging the properties of $h$, we show that, at the limit, the solution is a DAG. The trick to make this work is to **use the previous solution as a starting point when solving the current unconstrained problem**, as usually done in interior-point algorithms. Finally, we use a simple accelerated gradient descent method to solve each unconstrained problem.\n\nLet us give an illustration of how DAGMA works in a two-node graph (see Figure 1 in [[1][dagma]] for more details). Here $w_1$ (the x-axis) represents the edge weight from node 1 to node 2; while $w_2$ (y-axis) represents the edge weight from node 2 to node 1. Moreover, in this example, the ground-truth DAG corresponds to $w_1 = 1.2$ and $w_2 = 0$. \n\nBelow we have 4 plots, where each illustrates the solution to an unconstrained problem for different values of $\\mu$. In the top-left plot, we have $\\mu=1$, and we solve the unconstrained problem starting at the empty graph (i.e., $w_1 = w_2 = 0$), which corresponds to the red point, and after running gradient descent, we arrive at the cyan point (i.e., $w_1 = 1.06, w_2 = 0.24$). Then, for the next unconstrained problem in the top-right plot, we have $\\mu = 0.1$, and we initialize gradient descent at the previous solution, i.e., $w_1 = 1.06, w_2 = 0.24$, and arrive at the cyan point $w_1 = 1.16, w_2 = 0.04$. Similarly, DAGMA solves for $\\mu=0.01$ and $\\mu=0.001$, and we can observe how the solution at the final iteration (bottom-right plot) is close to the ground-truth DAG  $w_1 = 1.2, w_2 = 0$.\n\n<img width=\"1350\" alt=\"dagma_4iters\" src=\"https://user-images.githubusercontent.com/6846921/200969570-8a3434d5-b3b8-4260-966b-6fe1e0303188.png\">\n\n\n## Requirements\n\n- Python 3.7+\n- `numpy`\n- `scipy`\n- `igraph`\n- `tqdm`\n- `torch`: Only used for nonlinear models.\n\n## Contents \n\n- `linear.py` - implementation of DAGMA for linear models with l1 regularization (supports L2 and Logistic losses).\n- `nonlinear.py` - implementation of DAGMA for nonlinear models using MLP\n- `locally_connected.py` - special layer structure used for MLP\n- `utils.py` - graph simulation, data simulation, and accuracy evaluation\n\n\n## Acknowledgments\n\nWe thank the authors of the [NOTEARS repo][notears-repo] for making their code available. Part of our code is based on their implementation, specially the `utils.py` file and some code from their implementation of nonlinear models.\n\n[notears-repo]: https://github.com/xunzheng/notears\n[example]: https://github.com/kevinsbello/dagma/blob/main/examples/dagma_test.ipynb\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Implementation of the DAGMA algorithm",
    "version": "1.1.1",
    "project_urls": {
        "Documentation": "https://dagma.readthedocs.io/en/latest/",
        "Issues": "https://github.com/kevinsbello/dagma/issues",
        "Repository": "https://github.com/kevinsbello/dagma"
    },
    "split_keywords": [
        "dagma",
        "notears",
        "causal discovery",
        "bayesian network",
        "structure learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "81ee5626b0d7a2c3943db2ba9e64db256384726fb2c7cc2b08326a975a06ff5b",
                "md5": "959376c2e40fc1b486ade21e7352e2d8",
                "sha256": "f93383d432d37acc84fc9dc2b407fe662bc7f566adb89e840073c15b1b20df07"
            },
            "downloads": -1,
            "filename": "dagma-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "959376c2e40fc1b486ade21e7352e2d8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 21671,
            "upload_time": "2023-09-02T01:20:43",
            "upload_time_iso_8601": "2023-09-02T01:20:43.810073Z",
            "url": "https://files.pythonhosted.org/packages/81/ee/5626b0d7a2c3943db2ba9e64db256384726fb2c7cc2b08326a975a06ff5b/dagma-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5679f7cb5023d096b0cdcf4fa1b0e9999cf1d3c59c4e45280f889e00247fd90c",
                "md5": "7d211c919135d2147302d35cf4fd0ee1",
                "sha256": "41788ea727fde843b2424aced45dcf36bfc0c82bbd874c915bc939775246eaf6"
            },
            "downloads": -1,
            "filename": "dagma-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7d211c919135d2147302d35cf4fd0ee1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 22567,
            "upload_time": "2023-09-02T01:20:45",
            "upload_time_iso_8601": "2023-09-02T01:20:45.688353Z",
            "url": "https://files.pythonhosted.org/packages/56/79/f7cb5023d096b0cdcf4fa1b0e9999cf1d3c59c4e45280f889e00247fd90c/dagma-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-02 01:20:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kevinsbello",
    "github_project": "dagma",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "dagma"
}