# GODM
[![PyPI version](https://badge.fury.io/py/godm.svg)](https://badge.fury.io/py/godm)
GODM is a data augmentation package for supervised graph outlier detection. It generates synthetic graph outliers with latent diffusion models. This is the official implementation of [Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models](https://arxiv.org/abs/2312.17679).
<p align="center">
<img src="modelfig.png" alt="model architecture"/>
</p>
## Installation
It is recommended to use **pip** for installation:
```pip install godm```
Alternatively, you can build from source by cloning this repository:
```
git clone https://github.com/kayzliu/godm.git
cd pygod
pip install .
```
## Usage
```python
from pygod.utils import load_data
data = load_data('weibo') # load data
from godm import GODM # import GODM
godm = GODM(lr=0.004) # init. GODM
aug_data = godm(data) # augment data
detector(aug_data) # train on data
```
The input data should be [`torch_geometric.Data`](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html#torch_geometric.data.Data) object with the following keys:
- `x`: node features,
- `edge_index`: edge index,
- `edge_time`: edge times (optional, name can be changed by `time_attr`),
- `edge_type`: edge types (optional, name can be changed by `type_attr`),
- `y`: node labels,
- `train_mask`: training node mask,
- `val_mask`: validation node mask,
- `test_mask`: testing node mask.
So far, no additional keys is allowed. We may support more keys by padding in the future.
## Parameters
- ```hid_dim``` (type: `int`, default: `None`): hidden dimension for VAE, i.e., latent embedding dimension. `None` means the largest power of 2 that is less than or equal to the feature dimension divided by two.
- ```diff_dim``` (type: `int`, default: `None`): hidden dimension for denoiser. `None` means as twice as `hid_dim`.
- ```vae_epochs``` (type: `int`, default: `100`): number of epochs for training VAE.
- ```diff_epochs``` (type: `int`, default: `100`): number of epochs for training diffusion model.
- ```patience``` (type: `int`, default: `50`): patience for early stopping.
- ```lr``` (type: `float`, default: `0.001`): learning rate.
- ```wd``` (type: `float`, default: `0.`): weight decay.
- ```batch_size``` (type: `int`, default: `2048`): batch size.
- ```threshold``` (type: `float`, default: `0.75`): threshold for edge generation.
- ```wx``` (type: `float`, default: `1.`): weight for node feature reconstruction loss.
- ```we``` (type: `float`, default: `0.5`): weight for edge reconstruction loss.
- ```beta``` (type: `float`, default: `0.001`): weight for KL divergence loss.
- ```wt``` (type: `float`, default: `1.`): weight for time prediction loss.
- ```time_attr``` (type: `str`, default: `edge_time`): attribute name for edge time.
- ```type_attr``` (type: `str`, default: `edge_type`): attribute name for edge type.
- ```wp``` (type: `float`, default: `0.3`): weight for node prediction loss.
- ```gen_nodes``` (type: `int`, default: `None`): number of nodes to generate. `None` means the same as the number of outliers in the original graph.
- ```sample_steps``` (type: `int`, default: `50`): number of steps for diffusion model sampling.
- ```device``` (type: `int`, default: `0`): GPU index, set to -1 for CPU.
- ```verbose``` (type: `bool`, default: `False`): verbose mode, enable for logging.
## Cite Us:
Our [paper](https://arxiv.org/abs/2312.17679) is publicly available. If you use GODM in a scientific publication, we would appreciate your citations:
@article{liu2023data,
title={Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models},
author={Liu, Kay and Zhang, Hengrui and Hu, Ziqing and Wang, Fangxin and Yu, Philip S.},
journal={arXiv preprint arXiv:2312.17679},
year={2023}
}
or:
Liu, K., Zhang, H., Hu, Z., Wang, F., and Yu, P.S. 2023. Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models. arXiv preprint arXiv:2312.17679.
##
Raw data
{
"_id": null,
"home_page": "https://github.com/kayzliu/godm",
"name": "godm",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "outlier detection,data augmentation,diffusion models,graph neural networks,graph generative model",
"author": "kayzliu",
"author_email": "zliu234@uic.edu",
"download_url": "https://files.pythonhosted.org/packages/55/86/3677691bb0dd9b2fdb3eb203e7ce1b8b2b87d364cf60276f3dfa58d1f404/godm-0.1.0.tar.gz",
"platform": null,
"description": "# GODM\n\n[![PyPI version](https://badge.fury.io/py/godm.svg)](https://badge.fury.io/py/godm)\n\nGODM is a data augmentation package for supervised graph outlier detection. It generates synthetic graph outliers with latent diffusion models. This is the official implementation of [Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models](https://arxiv.org/abs/2312.17679).\n\n<p align=\"center\">\n<img src=\"modelfig.png\" alt=\"model architecture\"/>\n</p>\n\n## Installation\n\nIt is recommended to use **pip** for installation:\n\n```pip install godm```\n\nAlternatively, you can build from source by cloning this repository:\n\n```\ngit clone https://github.com/kayzliu/godm.git\ncd pygod\npip install .\n```\n\n## Usage\n\n```python\nfrom pygod.utils import load_data\ndata = load_data('weibo') # load data\n\nfrom godm import GODM # import GODM\ngodm = GODM(lr=0.004) # init. GODM\naug_data = godm(data) # augment data\n\ndetector(aug_data) # train on data\n```\n\nThe input data should be [`torch_geometric.Data`](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html#torch_geometric.data.Data) object with the following keys:\n\n- `x`: node features,\n- `edge_index`: edge index, \n- `edge_time`: edge times (optional, name can be changed by `time_attr`),\n- `edge_type`: edge types (optional, name can be changed by `type_attr`), \n- `y`: node labels, \n- `train_mask`: training node mask, \n- `val_mask`: validation node mask, \n- `test_mask`: testing node mask.\n\nSo far, no additional keys is allowed. We may support more keys by padding in the future.\n\n## Parameters\n\n- ```hid_dim``` (type: `int`, default: `None`): hidden dimension for VAE, i.e., latent embedding dimension. `None` means the largest power of 2 that is less than or equal to the feature dimension divided by two.\n- ```diff_dim``` (type: `int`, default: `None`): hidden dimension for denoiser. `None` means as twice as `hid_dim`.\n- ```vae_epochs``` (type: `int`, default: `100`): number of epochs for training VAE.\n- ```diff_epochs``` (type: `int`, default: `100`): number of epochs for training diffusion model.\n- ```patience``` (type: `int`, default: `50`): patience for early stopping.\n- ```lr``` (type: `float`, default: `0.001`): learning rate.\n- ```wd``` (type: `float`, default: `0.`): weight decay.\n- ```batch_size``` (type: `int`, default: `2048`): batch size.\n- ```threshold``` (type: `float`, default: `0.75`): threshold for edge generation.\n- ```wx``` (type: `float`, default: `1.`): weight for node feature reconstruction loss.\n- ```we``` (type: `float`, default: `0.5`): weight for edge reconstruction loss.\n- ```beta``` (type: `float`, default: `0.001`): weight for KL divergence loss.\n- ```wt``` (type: `float`, default: `1.`): weight for time prediction loss.\n- ```time_attr``` (type: `str`, default: `edge_time`): attribute name for edge time.\n- ```type_attr``` (type: `str`, default: `edge_type`): attribute name for edge type.\n- ```wp``` (type: `float`, default: `0.3`): weight for node prediction loss.\n- ```gen_nodes``` (type: `int`, default: `None`): number of nodes to generate. `None` means the same as the number of outliers in the original graph.\n- ```sample_steps``` (type: `int`, default: `50`): number of steps for diffusion model sampling.\n- ```device``` (type: `int`, default: `0`): GPU index, set to -1 for CPU.\n- ```verbose``` (type: `bool`, default: `False`): verbose mode, enable for logging.\n\n## Cite Us:\n\nOur [paper](https://arxiv.org/abs/2312.17679) is publicly available. If you use GODM in a scientific publication, we would appreciate your citations:\n\n @article{liu2023data,\n title={Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models},\n author={Liu, Kay and Zhang, Hengrui and Hu, Ziqing and Wang, Fangxin and Yu, Philip S.},\n journal={arXiv preprint arXiv:2312.17679},\n year={2023}\n }\n\nor:\n\n Liu, K., Zhang, H., Hu, Z., Wang, F., and Yu, P.S. 2023. Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models. arXiv preprint arXiv:2312.17679.\n \n## \n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "GODM",
"version": "0.1.0",
"project_urls": {
"Download": "https://github.com/kayzliu/godm/archive/master.zip",
"Homepage": "https://github.com/kayzliu/godm"
},
"split_keywords": [
"outlier detection",
"data augmentation",
"diffusion models",
"graph neural networks",
"graph generative model"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "55863677691bb0dd9b2fdb3eb203e7ce1b8b2b87d364cf60276f3dfa58d1f404",
"md5": "3ca24904215fd42d09a6c3b9466daf96",
"sha256": "35d090492c803bc531c46df32d9d097273fa6ec4627f8679847d00f381b6d2ff"
},
"downloads": -1,
"filename": "godm-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "3ca24904215fd42d09a6c3b9466daf96",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16542,
"upload_time": "2024-01-03T03:29:48",
"upload_time_iso_8601": "2024-01-03T03:29:48.941729Z",
"url": "https://files.pythonhosted.org/packages/55/86/3677691bb0dd9b2fdb3eb203e7ce1b8b2b87d364cf60276f3dfa58d1f404/godm-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-03 03:29:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kayzliu",
"github_project": "godm",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "tqdm",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "pygod",
"specs": []
},
{
"name": "torch_geometric",
"specs": []
}
],
"lcname": "godm"
}