DBAnomTransformer


NameDBAnomTransformer JSON
Version 0.1.14 PyPI version JSON
download
home_page
SummaryA collection of useful util functions
upload_time2023-11-26 06:53:08
maintainer
docs_urlNone
author
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements wget gdown Pillow pandas scikit-learn hkkang_utils
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Anomaly Detection and Explanation
We develop deep learning model that detects and explain anomaly in multivariate time series data.

Our model is based on [Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR'22)](https://openreview.net/forum?id=LzQQ89U1qm_). We train and evaluate the model on [DBSherlock dataset](https://github.com/hyukkyukang/DBSherlock).

## Anomaly Transformer

Anomaly transformer is a transformer-based model that detects anomaly in multivariate time series data. It is based on the assumption that the normal data is highly correlated, while the abnormal data is not. It uses a transformer encoder to learn the correlation between different time steps, and then uses a discriminator to distinguish the normal and abnormal data based on the learned correlation.

- An inherent distinguishable criterion as **Association Discrepancy** for detection.
- A new **Anomaly-Attention** mechanism to compute the association discrepancy.
- A **minimax strategy** to amplify the normal-abnormal distinguishability of the association discrepancy.

<p align="center">
<img src=".\pics\structure.png" height = "350" alt="" align=center />
</p>

For more details, please refer to the [paper](https://openreview.net/forum?id=LzQQ89U1qm_).

## Environment Setup
Start docker container using docker compose, and login to the container

```bash
docker compose up -d
```
Install python packages
```bash
pip install -r requirements.txt
```

## Prepare Dataset
### Download
Download DBSherlock dataset.
```bash
python scripts/dataset/download_datasets.py
```

Append `--download_all` argument to download all datasets (i.e., SMD, SMAP, PSM, MSL, and DBSherlock).
```bash
python scripts/dataset/download_datasets.py --download_all
```

### Preprocess data

Convert DBSherlock data (.mat file to .json file):
```bash
python src/DBAnomTransformer/data_factory/convert_dbsherlock.py \
    --input dataset/dbsherlock/tpcc_16w.mat \
    --out_dir dataset/dbsherlock/converted/ \
    --prefix tpcc_16w

python src/DBAnomTransformer/data_factory/convert_dbsherlock.py \
    --input dataset/dbsherlock/tpcc_500w.mat \
    --out_dir dataset/dbsherlock/converted/ \
    --prefix tpcc_500w

python src/DBAnomTransformer/data_factory/convert_dbsherlock.py \
    --input dataset/dbsherlock/tpce_3000.mat \
    --out_dir dataset/dbsherlock/converted/ \
    --prefix tpce_3000
```

Convert DBSherlock data into train & validate data for Anomaly Transformer:
```bash
python src/DBAnomTransformer/data_factory/process.py \
    --input_path dataset/dbsherlock/converted/tpcc_16w_test.json \
    --output_path dataset/dbsherlock/processed/tpcc_16w/

python src/DBAnomTransformer/data_factory/process.py \
    --input_path dataset/dbsherlock/converted/tpcc_500w_test.json \
    --output_path dataset/dbsherlock/processed/tpcc_500w/

python src/DBAnomTransformer/data_factory/process.py \
    --input_path dataset/dbsherlock/converted/tpce_3000_test.json \
    --output_path dataset/dbsherlock/processed/tpce_3000/
```

## Reproducing Experiments
We provide the experiment scripts under the folder `./scripts`. You can reproduce the experiment results with the below script:
```bash
bash ./scripts/experiment/DBS.sh
```
or you can run the below commands to train and evaluate the model step by step.

### Training
Train the model on DBSherlock dataset:
```bash
python src/DBAnomTransformer/main.py \
    --dataset EDA \
    --dataset_path dataset/EDA/ \
    --mode train
```

### Evaluating
Evaluate the trained model on the test split of the same dataset:
```bash
python src/DBAnomTransformer/main.py \
    --dataset EDA \
    --dataset_path dataset/EDA/ \
    --mode test 
```

## Inference
Download the package through pip
```bash
pip install DBAnomTransformer
```
Load the trained model and use it to detect anomaly in new data.
Below is an example of using the model to detect anomaly in dummy data (as DBS or EDA dataset).
```python
import numpy as np
import pandas as pd
from omegaconf import OmegaConf

from DBAnomTransformer.config.utils import default_config
from DBAnomTransformer.detector import DBAnomDector

# dataset_name = "DBS"
dataset_name = "EDA"

# Create config
eda_config = default_config
dbsherlock_config = OmegaConf.create(
    {
        "model": {"num_anomaly_cause": 11, "num_feature": 200},
        "model_path": "checkpoints/DBS_checkpoint.pth",
        "scaler_path": "checkpoints/DBS_scaler.pkl",
        "stats_path": "checkpoints/DBS_stats.json",
    }
)


# Create dummy data
if dataset_name == "EDA":
    feature_num = 29
elif dataset_name == "DBS":
    feature_num = 200
dummy_data = np.random.rand(130, feature_num)
dummy_data = pd.DataFrame(dummy_data, columns=[f"attr_{i}" for i in range(feature_num)])


# Initialize and train model
if dataset_name == "EDA":
    detector = DBAnomDector()
    detector.train(dataset_path="dataset/EDA/")
elif dataset_name == "DBS":
    detector = DBAnomDector(override_config=dbsherlock_config)
    detector.train(
        dataset_path="dataset/dbsherlock/converted/tpcc_500w_test.json",
        dataset_name="DBS",
    )

# Run inference (detect anomaly)
anomaly_score, is_anomaly, anomaly_cause = detector.infer(data=dummy_data)
```

Note that the dataset folder should be organized as follows:
```text
dataset
├── EDA
│   ├── meta_data
│   │   ├── db_backup.csv
│   │   ├── index.csv
│   │   ├── ...
│   │   └── workload_spike.csv
│   ├── raw_data
│   │   ├── db_backup_1.csv
│   │   ├── db_backup_2.csv
│   │   ├── ...
│   │   ├── workload_spike_1.csv
│   │   ├── workload_spike_2.csv
│   │   ├── ...
```

## Reference
This respository is based on [Anomaly Transformer](https://github.com/thuml/Anomaly-Transformer).

```
@inproceedings{
xu2022anomaly,
title={Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy},
author={Jiehui Xu and Haixu Wu and Jianmin Wang and Mingsheng Long},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=LzQQ89U1qm_}
}
```
            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "DBAnomTransformer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "Hyukkyu Kang <hyukkyukang@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a9/81/4dba20c89a65beae7ccae75dc83e9ddc428f3b903a0b55f246143c03cdd8/dbanomtransformer-0.1.14.tar.gz",
    "platform": null,
    "description": "# Anomaly Detection and Explanation\nWe develop deep learning model that detects and explain anomaly in multivariate time series data.\n\nOur model is based on [Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR'22)](https://openreview.net/forum?id=LzQQ89U1qm_). We train and evaluate the model on [DBSherlock dataset](https://github.com/hyukkyukang/DBSherlock).\n\n## Anomaly Transformer\n\nAnomaly transformer is a transformer-based model that detects anomaly in multivariate time series data. It is based on the assumption that the normal data is highly correlated, while the abnormal data is not. It uses a transformer encoder to learn the correlation between different time steps, and then uses a discriminator to distinguish the normal and abnormal data based on the learned correlation.\n\n- An inherent distinguishable criterion as **Association Discrepancy** for detection.\n- A new **Anomaly-Attention** mechanism to compute the association discrepancy.\n- A **minimax strategy** to amplify the normal-abnormal distinguishability of the association discrepancy.\n\n<p align=\"center\">\n<img src=\".\\pics\\structure.png\" height = \"350\" alt=\"\" align=center />\n</p>\n\nFor more details, please refer to the [paper](https://openreview.net/forum?id=LzQQ89U1qm_).\n\n## Environment Setup\nStart docker container using docker compose, and login to the container\n\n```bash\ndocker compose up -d\n```\nInstall python packages\n```bash\npip install -r requirements.txt\n```\n\n## Prepare Dataset\n### Download\nDownload DBSherlock dataset.\n```bash\npython scripts/dataset/download_datasets.py\n```\n\nAppend `--download_all` argument to download all datasets (i.e., SMD, SMAP, PSM, MSL, and DBSherlock).\n```bash\npython scripts/dataset/download_datasets.py --download_all\n```\n\n### Preprocess data\n\nConvert DBSherlock data (.mat file to .json file):\n```bash\npython src/DBAnomTransformer/data_factory/convert_dbsherlock.py \\\n    --input dataset/dbsherlock/tpcc_16w.mat \\\n    --out_dir dataset/dbsherlock/converted/ \\\n    --prefix tpcc_16w\n\npython src/DBAnomTransformer/data_factory/convert_dbsherlock.py \\\n    --input dataset/dbsherlock/tpcc_500w.mat \\\n    --out_dir dataset/dbsherlock/converted/ \\\n    --prefix tpcc_500w\n\npython src/DBAnomTransformer/data_factory/convert_dbsherlock.py \\\n    --input dataset/dbsherlock/tpce_3000.mat \\\n    --out_dir dataset/dbsherlock/converted/ \\\n    --prefix tpce_3000\n```\n\nConvert DBSherlock data into train & validate data for Anomaly Transformer:\n```bash\npython src/DBAnomTransformer/data_factory/process.py \\\n    --input_path dataset/dbsherlock/converted/tpcc_16w_test.json \\\n    --output_path dataset/dbsherlock/processed/tpcc_16w/\n\npython src/DBAnomTransformer/data_factory/process.py \\\n    --input_path dataset/dbsherlock/converted/tpcc_500w_test.json \\\n    --output_path dataset/dbsherlock/processed/tpcc_500w/\n\npython src/DBAnomTransformer/data_factory/process.py \\\n    --input_path dataset/dbsherlock/converted/tpce_3000_test.json \\\n    --output_path dataset/dbsherlock/processed/tpce_3000/\n```\n\n## Reproducing Experiments\nWe provide the experiment scripts under the folder `./scripts`. You can reproduce the experiment results with the below script:\n```bash\nbash ./scripts/experiment/DBS.sh\n```\nor you can run the below commands to train and evaluate the model step by step.\n\n### Training\nTrain the model on DBSherlock dataset:\n```bash\npython src/DBAnomTransformer/main.py \\\n    --dataset EDA \\\n    --dataset_path dataset/EDA/ \\\n    --mode train\n```\n\n### Evaluating\nEvaluate the trained model on the test split of the same dataset:\n```bash\npython src/DBAnomTransformer/main.py \\\n    --dataset EDA \\\n    --dataset_path dataset/EDA/ \\\n    --mode test \n```\n\n## Inference\nDownload the package through pip\n```bash\npip install DBAnomTransformer\n```\nLoad the trained model and use it to detect anomaly in new data.\nBelow is an example of using the model to detect anomaly in dummy data (as DBS or EDA dataset).\n```python\nimport numpy as np\nimport pandas as pd\nfrom omegaconf import OmegaConf\n\nfrom DBAnomTransformer.config.utils import default_config\nfrom DBAnomTransformer.detector import DBAnomDector\n\n# dataset_name = \"DBS\"\ndataset_name = \"EDA\"\n\n# Create config\neda_config = default_config\ndbsherlock_config = OmegaConf.create(\n    {\n        \"model\": {\"num_anomaly_cause\": 11, \"num_feature\": 200},\n        \"model_path\": \"checkpoints/DBS_checkpoint.pth\",\n        \"scaler_path\": \"checkpoints/DBS_scaler.pkl\",\n        \"stats_path\": \"checkpoints/DBS_stats.json\",\n    }\n)\n\n\n# Create dummy data\nif dataset_name == \"EDA\":\n    feature_num = 29\nelif dataset_name == \"DBS\":\n    feature_num = 200\ndummy_data = np.random.rand(130, feature_num)\ndummy_data = pd.DataFrame(dummy_data, columns=[f\"attr_{i}\" for i in range(feature_num)])\n\n\n# Initialize and train model\nif dataset_name == \"EDA\":\n    detector = DBAnomDector()\n    detector.train(dataset_path=\"dataset/EDA/\")\nelif dataset_name == \"DBS\":\n    detector = DBAnomDector(override_config=dbsherlock_config)\n    detector.train(\n        dataset_path=\"dataset/dbsherlock/converted/tpcc_500w_test.json\",\n        dataset_name=\"DBS\",\n    )\n\n# Run inference (detect anomaly)\nanomaly_score, is_anomaly, anomaly_cause = detector.infer(data=dummy_data)\n```\n\nNote that the dataset folder should be organized as follows:\n```text\ndataset\n\u251c\u2500\u2500 EDA\n\u2502   \u251c\u2500\u2500 meta_data\n\u2502   \u2502   \u251c\u2500\u2500 db_backup.csv\n\u2502   \u2502   \u251c\u2500\u2500 index.csv\n\u2502   \u2502   \u251c\u2500\u2500 ...\n\u2502   \u2502   \u2514\u2500\u2500 workload_spike.csv\n\u2502   \u251c\u2500\u2500 raw_data\n\u2502   \u2502   \u251c\u2500\u2500 db_backup_1.csv\n\u2502   \u2502   \u251c\u2500\u2500 db_backup_2.csv\n\u2502   \u2502   \u251c\u2500\u2500 ...\n\u2502   \u2502   \u251c\u2500\u2500 workload_spike_1.csv\n\u2502   \u2502   \u251c\u2500\u2500 workload_spike_2.csv\n\u2502   \u2502   \u251c\u2500\u2500 ...\n```\n\n## Reference\nThis respository is based on [Anomaly Transformer](https://github.com/thuml/Anomaly-Transformer).\n\n```\n@inproceedings{\nxu2022anomaly,\ntitle={Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy},\nauthor={Jiehui Xu and Haixu Wu and Jianmin Wang and Mingsheng Long},\nbooktitle={International Conference on Learning Representations},\nyear={2022},\nurl={https://openreview.net/forum?id=LzQQ89U1qm_}\n}\n```",
    "bugtrack_url": null,
    "license": "",
    "summary": "A collection of useful util functions",
    "version": "0.1.14",
    "project_urls": {
        "Bug Tracker": "https://github.com/pshlego/Anomaly_Explanation/issues",
        "Homepage": "https://github.com/pshlego/Anomaly_Explanation"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b7de81aa19e756f6da0b7104550ff693aad9864b5763121d28932e4c5df10c28",
                "md5": "6e33e5baff4f1a6a96eeaf281fb857be",
                "sha256": "aa3aa568702c35a86c56df17a47b4e52e2380e4ebce920bbd9762189bc3f8919"
            },
            "downloads": -1,
            "filename": "dbanomtransformer-0.1.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6e33e5baff4f1a6a96eeaf281fb857be",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 50153,
            "upload_time": "2023-11-26T06:53:04",
            "upload_time_iso_8601": "2023-11-26T06:53:04.852723Z",
            "url": "https://files.pythonhosted.org/packages/b7/de/81aa19e756f6da0b7104550ff693aad9864b5763121d28932e4c5df10c28/dbanomtransformer-0.1.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a9814dba20c89a65beae7ccae75dc83e9ddc428f3b903a0b55f246143c03cdd8",
                "md5": "e541bf766028a99ca656ac3e82cd0176",
                "sha256": "578d5c49bc50b6402c40cd0c03a039bb3a050f3a555054e37b4c0772ccdb61d6"
            },
            "downloads": -1,
            "filename": "dbanomtransformer-0.1.14.tar.gz",
            "has_sig": false,
            "md5_digest": "e541bf766028a99ca656ac3e82cd0176",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 2726008,
            "upload_time": "2023-11-26T06:53:08",
            "upload_time_iso_8601": "2023-11-26T06:53:08.565311Z",
            "url": "https://files.pythonhosted.org/packages/a9/81/4dba20c89a65beae7ccae75dc83e9ddc428f3b903a0b55f246143c03cdd8/dbanomtransformer-0.1.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-26 06:53:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pshlego",
    "github_project": "Anomaly_Explanation",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "wget",
            "specs": []
        },
        {
            "name": "gdown",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "hkkang_utils",
            "specs": []
        }
    ],
    "lcname": "dbanomtransformer"
}
        
Elapsed time: 0.15800s