# ![DAGMA](https://raw.githubusercontent.com/kevinsbello/dagma/main/logo/dagma.png)
<div align=center>
<a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/v/dagma"></a>
<a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/pyversions/dagma"></a>
<a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/wheel/dagma"></a>
<a href="https://pypistats.org/packages/dagma"><img src="https://img.shields.io/pypi/dm/dagma"></a>
<a href="https://pypi.org/project/dagma"><img src="https://img.shields.io/pypi/l/dagma"></a>
</div>
The `dagma` library is a Python 3 package for learning DAGs (a.k.a. Bayesian networks) from data.
DAGMA works by optimizing a given **score/loss function**, where the structure that relates the variables is constrained to be a directed acyclic graph (DAG). Due to the super-exponential number of DAGs w.r.t. the number of variables, the vanilla formulation results in a hard combinatorial optimization problem. DAGMA reformulates this optimization problem, by **replacing the combinatorial constraint with a non-convex differentiable function that exactly characterizes DAGs**, thus, making the optimization amenable to continuous optimization methods such as gradient descent.
## Citation
This is an implementation of the following paper:
[1] Bello K., Aragam B., Ravikumar P. (2022). [DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization][dagma]. [NeurIPS'22](https://nips.cc/Conferences/2022/).
[notears]: https://arxiv.org/abs/1803.01422
[dagma]: https://arxiv.org/abs/2209.08037
If you find this code useful, please consider citing:
### BibTeX
```bibtex
@inproceedings{bello2022dagma,
author = {Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},
booktitle = {Advances in Neural Information Processing Systems},
title = {{DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization}},
year = {2022}
}
```
## Features
- Supports continuous data for linear (see [dagma.linear][dagma-linear]) and nonlinear models (see [dagma.nonlinear][dagma-nonlinear]).
- Supports binary (0/1) data for generalized linear models, via [dagma.linear.DagmaLinear][DagmaLinear] and using ``logistic`` as score.
- Faster than other continuous optimization methods for structure learning, e.g., NOTEARS, GOLEM.
[dagma-linear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/
[dagma-nonlinear]: https://dagma.readthedocs.io/en/latest/api/dagma/nonlinear/
[DagmaLinear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/DagmaLinear/
## Getting Started
### Install the package
We recommend using a virtual environment via `virtualenv` or `conda`, and use `pip` to install the `dagma` package.
```bash
$ pip install dagma
```
### Using dagma
See an example on how to use dagma in this [iPython notebook][example].
## An Overview of DAGMA
We propose a new acyclicity characterization of DAGs via a log-det function for learning DAGs from observational data. Similar to previously proposed acyclicity functions (e.g. [NOTEARS][notears]), our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. These advantages of our log-det formulation, together with a path-following scheme, lead to significant improvements in structure accuracy (e.g. SHD).
### The log-det acyclicity characterization
Let $W \in \mathbb{R}^{d\times d}$ be a weighted adjacency matrix of a graph of $d$ nodes, the log-det function takes the following form:
$$h^{s}(W) = -\log \det (sI-W\circ W) + d \log s,$$
where $I$ is the identity matrix, $s$ is a given scalar (e.g., 1), and $\circ$ denotes the element-wise Hadamard product. Of particular interest, we have that $h(W) = 0$ if and only if $W$ represents a DAG, and when the domain of $h$ is the set of M-matrices then $h$ is well-defined and non-negative. For more properties of $h(W)$ (e.g., being an invex function), $\nabla h(W)$, and $\nabla^2 h(W)$, we invite you to look at [[1][dagma]].
### A path-following approach
Given the exact differentiable characterization of a DAG, we are interested in solving the following optimization problem:
```math
\begin{array}{cl}
\min _{W \in \mathbb{R}^{d \times d}} & Q(W;\mathbf{X}) \\
\text { subject to } & h^{s}(W) = 0,
\end{array}
```
where $Q$ is a given score function (e.g., square loss) that depends on $W$ and the dataset $\mathbf{X}$. To solve the above constrained problem, we propose a path-following approach where we solve a few of the following unconstrained problems:
```math
\hat{W}^{(t+1)} = \arg\min_{W}\; \mu^{(t)} Q(W;\mathbf{X}) + h(W),
```
where $\mu^{(t)} \to 0$ as $t$ increases. Leveraging the properties of $h$, we show that, at the limit, the solution is a DAG. The trick to make this work is to **use the previous solution as a starting point when solving the current unconstrained problem**, as usually done in interior-point algorithms. Finally, we use a simple accelerated gradient descent method to solve each unconstrained problem.
Let us give an illustration of how DAGMA works in a two-node graph (see Figure 1 in [[1][dagma]] for more details). Here $w_1$ (the x-axis) represents the edge weight from node 1 to node 2; while $w_2$ (y-axis) represents the edge weight from node 2 to node 1. Moreover, in this example, the ground-truth DAG corresponds to $w_1 = 1.2$ and $w_2 = 0$.
Below we have 4 plots, where each illustrates the solution to an unconstrained problem for different values of $\mu$. In the top-left plot, we have $\mu=1$, and we solve the unconstrained problem starting at the empty graph (i.e., $w_1 = w_2 = 0$), which corresponds to the red point, and after running gradient descent, we arrive at the cyan point (i.e., $w_1 = 1.06, w_2 = 0.24$). Then, for the next unconstrained problem in the top-right plot, we have $\mu = 0.1$, and we initialize gradient descent at the previous solution, i.e., $w_1 = 1.06, w_2 = 0.24$, and arrive at the cyan point $w_1 = 1.16, w_2 = 0.04$. Similarly, DAGMA solves for $\mu=0.01$ and $\mu=0.001$, and we can observe how the solution at the final iteration (bottom-right plot) is close to the ground-truth DAG $w_1 = 1.2, w_2 = 0$.
<img width="1350" alt="dagma_4iters" src="https://user-images.githubusercontent.com/6846921/200969570-8a3434d5-b3b8-4260-966b-6fe1e0303188.png">
## Requirements
- Python 3.7+
- `numpy`
- `scipy`
- `igraph`
- `tqdm`
- `torch`: Only used for nonlinear models.
## Contents
- `linear.py` - implementation of DAGMA for linear models with l1 regularization (supports L2 and Logistic losses).
- `nonlinear.py` - implementation of DAGMA for nonlinear models using MLP
- `locally_connected.py` - special layer structure used for MLP
- `utils.py` - graph simulation, data simulation, and accuracy evaluation
## Acknowledgments
We thank the authors of the [NOTEARS repo][notears-repo] for making their code available. Part of our code is based on their implementation, specially the `utils.py` file and some code from their implementation of nonlinear models.
[notears-repo]: https://github.com/xunzheng/notears
[example]: https://github.com/kevinsbello/dagma/blob/main/examples/dagma_test.ipynb
Raw data
{
"_id": null,
"home_page": "",
"name": "dagma",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "dagma,notears,causal discovery,bayesian network,structure learning",
"author": "",
"author_email": "Kevin Bello <kbello@cs.cmu.edu>",
"download_url": "https://files.pythonhosted.org/packages/56/79/f7cb5023d096b0cdcf4fa1b0e9999cf1d3c59c4e45280f889e00247fd90c/dagma-1.1.1.tar.gz",
"platform": null,
"description": "# ![DAGMA](https://raw.githubusercontent.com/kevinsbello/dagma/main/logo/dagma.png)\n\n<div align=center>\n <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/v/dagma\"></a>\n <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/pyversions/dagma\"></a>\n <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/wheel/dagma\"></a>\n <a href=\"https://pypistats.org/packages/dagma\"><img src=\"https://img.shields.io/pypi/dm/dagma\"></a>\n <a href=\"https://pypi.org/project/dagma\"><img src=\"https://img.shields.io/pypi/l/dagma\"></a>\n</div>\n\n\nThe `dagma` library is a Python 3 package for learning DAGs (a.k.a. Bayesian networks) from data.\n\nDAGMA works by optimizing a given **score/loss function**, where the structure that relates the variables is constrained to be a directed acyclic graph (DAG). Due to the super-exponential number of DAGs w.r.t. the number of variables, the vanilla formulation results in a hard combinatorial optimization problem. DAGMA reformulates this optimization problem, by **replacing the combinatorial constraint with a non-convex differentiable function that exactly characterizes DAGs**, thus, making the optimization amenable to continuous optimization methods such as gradient descent.\n\n## Citation\n\nThis is an implementation of the following paper:\n\n[1] Bello K., Aragam B., Ravikumar P. (2022). [DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization][dagma]. [NeurIPS'22](https://nips.cc/Conferences/2022/). \n\n[notears]: https://arxiv.org/abs/1803.01422\n[dagma]: https://arxiv.org/abs/2209.08037\n\nIf you find this code useful, please consider citing:\n\n### BibTeX\n\n```bibtex\n@inproceedings{bello2022dagma,\n author = {Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},\n booktitle = {Advances in Neural Information Processing Systems},\n title = {{DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization}},\n year = {2022}\n}\n```\n\n## Features\n\n- Supports continuous data for linear (see [dagma.linear][dagma-linear]) and nonlinear models (see [dagma.nonlinear][dagma-nonlinear]).\n- Supports binary (0/1) data for generalized linear models, via [dagma.linear.DagmaLinear][DagmaLinear] and using ``logistic`` as score.\n- Faster than other continuous optimization methods for structure learning, e.g., NOTEARS, GOLEM.\n\n[dagma-linear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/\n[dagma-nonlinear]: https://dagma.readthedocs.io/en/latest/api/dagma/nonlinear/\n[DagmaLinear]: https://dagma.readthedocs.io/en/latest/api/dagma/linear/DagmaLinear/\n\n## Getting Started\n\n### Install the package\n\nWe recommend using a virtual environment via `virtualenv` or `conda`, and use `pip` to install the `dagma` package.\n```bash\n$ pip install dagma\n```\n\n### Using dagma\n\nSee an example on how to use dagma in this [iPython notebook][example].\n\n## An Overview of DAGMA\n\nWe propose a new acyclicity characterization of DAGs via a log-det function for learning DAGs from observational data. Similar to previously proposed acyclicity functions (e.g. [NOTEARS][notears]), our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. These advantages of our log-det formulation, together with a path-following scheme, lead to significant improvements in structure accuracy (e.g. SHD).\n\n### The log-det acyclicity characterization\n\nLet $W \\in \\mathbb{R}^{d\\times d}$ be a weighted adjacency matrix of a graph of $d$ nodes, the log-det function takes the following form:\n\n$$h^{s}(W) = -\\log \\det (sI-W\\circ W) + d \\log s,$$\n\nwhere $I$ is the identity matrix, $s$ is a given scalar (e.g., 1), and $\\circ$ denotes the element-wise Hadamard product. Of particular interest, we have that $h(W) = 0$ if and only if $W$ represents a DAG, and when the domain of $h$ is the set of M-matrices then $h$ is well-defined and non-negative. For more properties of $h(W)$ (e.g., being an invex function), $\\nabla h(W)$, and $\\nabla^2 h(W)$, we invite you to look at [[1][dagma]].\n\n### A path-following approach\n\nGiven the exact differentiable characterization of a DAG, we are interested in solving the following optimization problem:\n```math\n\\begin{array}{cl}\n\\min _{W \\in \\mathbb{R}^{d \\times d}} & Q(W;\\mathbf{X}) \\\\\n\\text { subject to } & h^{s}(W) = 0,\n\\end{array}\n```\nwhere $Q$ is a given score function (e.g., square loss) that depends on $W$ and the dataset $\\mathbf{X}$. To solve the above constrained problem, we propose a path-following approach where we solve a few of the following unconstrained problems:\n```math\n\\hat{W}^{(t+1)} = \\arg\\min_{W}\\; \\mu^{(t)} Q(W;\\mathbf{X}) + h(W),\n```\nwhere $\\mu^{(t)} \\to 0$ as $t$ increases. Leveraging the properties of $h$, we show that, at the limit, the solution is a DAG. The trick to make this work is to **use the previous solution as a starting point when solving the current unconstrained problem**, as usually done in interior-point algorithms. Finally, we use a simple accelerated gradient descent method to solve each unconstrained problem.\n\nLet us give an illustration of how DAGMA works in a two-node graph (see Figure 1 in [[1][dagma]] for more details). Here $w_1$ (the x-axis) represents the edge weight from node 1 to node 2; while $w_2$ (y-axis) represents the edge weight from node 2 to node 1. Moreover, in this example, the ground-truth DAG corresponds to $w_1 = 1.2$ and $w_2 = 0$. \n\nBelow we have 4 plots, where each illustrates the solution to an unconstrained problem for different values of $\\mu$. In the top-left plot, we have $\\mu=1$, and we solve the unconstrained problem starting at the empty graph (i.e., $w_1 = w_2 = 0$), which corresponds to the red point, and after running gradient descent, we arrive at the cyan point (i.e., $w_1 = 1.06, w_2 = 0.24$). Then, for the next unconstrained problem in the top-right plot, we have $\\mu = 0.1$, and we initialize gradient descent at the previous solution, i.e., $w_1 = 1.06, w_2 = 0.24$, and arrive at the cyan point $w_1 = 1.16, w_2 = 0.04$. Similarly, DAGMA solves for $\\mu=0.01$ and $\\mu=0.001$, and we can observe how the solution at the final iteration (bottom-right plot) is close to the ground-truth DAG $w_1 = 1.2, w_2 = 0$.\n\n<img width=\"1350\" alt=\"dagma_4iters\" src=\"https://user-images.githubusercontent.com/6846921/200969570-8a3434d5-b3b8-4260-966b-6fe1e0303188.png\">\n\n\n## Requirements\n\n- Python 3.7+\n- `numpy`\n- `scipy`\n- `igraph`\n- `tqdm`\n- `torch`: Only used for nonlinear models.\n\n## Contents \n\n- `linear.py` - implementation of DAGMA for linear models with l1 regularization (supports L2 and Logistic losses).\n- `nonlinear.py` - implementation of DAGMA for nonlinear models using MLP\n- `locally_connected.py` - special layer structure used for MLP\n- `utils.py` - graph simulation, data simulation, and accuracy evaluation\n\n\n## Acknowledgments\n\nWe thank the authors of the [NOTEARS repo][notears-repo] for making their code available. Part of our code is based on their implementation, specially the `utils.py` file and some code from their implementation of nonlinear models.\n\n[notears-repo]: https://github.com/xunzheng/notears\n[example]: https://github.com/kevinsbello/dagma/blob/main/examples/dagma_test.ipynb\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Implementation of the DAGMA algorithm",
"version": "1.1.1",
"project_urls": {
"Documentation": "https://dagma.readthedocs.io/en/latest/",
"Issues": "https://github.com/kevinsbello/dagma/issues",
"Repository": "https://github.com/kevinsbello/dagma"
},
"split_keywords": [
"dagma",
"notears",
"causal discovery",
"bayesian network",
"structure learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "81ee5626b0d7a2c3943db2ba9e64db256384726fb2c7cc2b08326a975a06ff5b",
"md5": "959376c2e40fc1b486ade21e7352e2d8",
"sha256": "f93383d432d37acc84fc9dc2b407fe662bc7f566adb89e840073c15b1b20df07"
},
"downloads": -1,
"filename": "dagma-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "959376c2e40fc1b486ade21e7352e2d8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 21671,
"upload_time": "2023-09-02T01:20:43",
"upload_time_iso_8601": "2023-09-02T01:20:43.810073Z",
"url": "https://files.pythonhosted.org/packages/81/ee/5626b0d7a2c3943db2ba9e64db256384726fb2c7cc2b08326a975a06ff5b/dagma-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5679f7cb5023d096b0cdcf4fa1b0e9999cf1d3c59c4e45280f889e00247fd90c",
"md5": "7d211c919135d2147302d35cf4fd0ee1",
"sha256": "41788ea727fde843b2424aced45dcf36bfc0c82bbd874c915bc939775246eaf6"
},
"downloads": -1,
"filename": "dagma-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "7d211c919135d2147302d35cf4fd0ee1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 22567,
"upload_time": "2023-09-02T01:20:45",
"upload_time_iso_8601": "2023-09-02T01:20:45.688353Z",
"url": "https://files.pythonhosted.org/packages/56/79/f7cb5023d096b0cdcf4fa1b0e9999cf1d3c59c4e45280f889e00247fd90c/dagma-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-02 01:20:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kevinsbello",
"github_project": "dagma",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "dagma"
}