contextual-anomaly-detector

Name	contextual-anomaly-detector JSON
Version	1.0.1 JSON
	download
home_page	None
Summary	Contextual matrix profile for anomaly detection in building electrical loads
upload_time	2024-11-20 16:06:23
maintainer	None
docs_url	None
author	RobertoChiosa
requires_python	<4.0,>=3.11
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Contextual Matrix Profile Calculation Tool

Matrix Profile is an algorithm capable to discover motifs and discords in time series data. It is a powerful tool that
by calculating the (z-normalized) Euclidean distance between any subsequence within a time series and its nearest
neighbor it is able to provide insights on potential anomalies and/or repetitive patterns. In the field of building
energy management it can be employed to detect anomalies in electrical load timeseries.

This tool is a Python implementation of the Matrix Profile algorithm that employs contextual information (such as
external air temperature) to identify abnormal pattens in electrical load subsequences that start in predefined sub
daily time windows, as shown in the following figure.

![](./docs/example.png)

**Table of Contents**

* [Usage](#usage)
    * [Data format](#data-format)
    * [Run locally](#run-locally)
    * [Run with Docker](#run-with-docker)
* [Additional Information](#additional-information)
* [Cite](#cite)
* [Contributors](#contributors)
* [License](#license)

## Usage

The tool comes with a CLI that helps you to execute the script with the desired commands

```console 
$ python -m src.cmp.main -h

Matrix profile

positional arguments:
  input_file     Path to file
  variable_name  Variable name
  output_file    Path to the output file

options:
  -h, --help     show this help message and exit
```

The arguments to pass to the script are the following:

* `input_file`: The input dataset via an HTTP URL. The tool should then download the dataset from that URL; since it's a
  pre-signed URL, the tool would not need to deal with authentication—it can just download the dataset directly.
* `variable_name`: The variable name to be used for the analysis (i.e., the column of the csv that contains the
  electrical load under analysis).
* `output_file`: The local path to the output HTML report. The platform would then get that HTML report and upload it to
  the object
  storage service for the user to review later.

You can run the main script through the console using either local files or download data from an external url. This
repository comes with a sample dataset ([data.csv](.src/cmp/data/data.csv)) that you can use to generate a report and
you can pass the local path
as `input_file` argument as follows:

### Data format

The tool requires the user to provide a csv file as input that contains electrical power timeseries for a specific
building, meter or energy system (e.g., whole building electrical power timeseries). The `csv` is a wide table format as
follows:

```csv
timestamp,column_1,temp
2019-01-01 00:00:00,116.4,-0.6
2019-01-01 00:15:00,125.6,-0.9
2019-01-01 00:30:00,119.2,-1.2
```

The csv must have the following columns:

- `timestamp` [case sensitive]: The timestamp of the observation in the format `YYYY-MM-DD HH:MM:SS`. This column is
  supposed to be in
  UTC timezone string format. It will be internally transformed by the tool into the index of the dataframe.
- `temp` [case sensitive]: Contains the external air temperature in Celsius degrees. This column is required to perform
  thermal sensitive
  analysis on the electrical load.
- `column_1`: Then the dataframe may have `N` arbitrary columns that refers to electrical load time series. The user has
  to specify the column name that refers to the electrical load time series in the `variable_name` argument.

### Run locally

Create virtual environment and activate it and install dependencies:

- Makefile
  ```bash
  make setup
  ```

- Linux:
  ```bash
  python3 -m venv .venv
  source .venv/bin/activate
  pip install poetry
  poetry install
  ```
- Windows:
  ```bash
  python -m venv venv
  venv\Scripts\activate
  pip install poetry
  poetry install
  ```

Now you can run the script from the console by passing the desired arguments. In the following we pass the sample
dataset [`data.csv`](src/cmp/data/data.csv) as input file and the variable `Total_Power` as the variable name to be used
for the analysis. The output file will be saved in the [`results`](src/cmp/results) folder.

```console
$ python -m src.cmp.main src/cmp/data/data.csv Total_Power src/cmp/results/reports/report.html

2024-08-13 12:45:42,821 [INFO](src.cmp.utils) ⬇️ Downloading file from <src/cmp/data/data.csv>
2024-08-13 12:45:43,070 [INFO](src.cmp.utils) 📊 Data processed successfully

*********************
CONTEXT 1 : Subsequences of 05:45 h (m = 23) that start in [00:00,01:00) (ctx_from00_00_to01_00_m05_45)
99.997%        0.0 sec

- Cluster 1 (1.660 s)   -> 1 anomalies
- Cluster 2 (0.372 s)   -> 3 anomalies
- Cluster 3 (0.389 s)   -> 4 anomalies
- Cluster 4 (0.593 s)   -> 5 anomalies
- Cluster 5 (-)         -> no anomalies green

[...]

2024-08-13 12:46:27,187 [INFO](__main__) TOTAL 0 min 44 s
2024-08-13 12:46:32,349 [INFO](src.cmp.utils) 🎉 Report generated successfully on src/cmp/results/reports/report.html

```

At the end of the execution you can find the report in the path specified by the `output_file` argument, in this case
you will find it in the [`results`](src/cmp/results) folder.

### Run with Docker

Build the docker image.

- Makefile
  ```bash
  make docker-build
  ```
- Linux:
  ```bash
  docker build -t cmp .
  ```

Run the docker image with the same arguments as before

- Makefile
  ```bash
  make docker-run
  ```
- Linux:
  ```bash
  docker run cmp data/data.csv Total_Power results/reports/report.html
  ```

At the end of the execution you can find the results in the [`results`](src/cmp/results) folder inside the docker
container.

## Cite

You can cite this work by using the following reference or either though [this Bibtex file](./docs/ref.bib) or the
following plain text citation

> Chiosa, Roberto, et al. "Towards a self-tuned data analytics-based process for an automatic context-aware detection
> and
> diagnosis of anomalies in building energy consumption timeseries." Energy and Buildings 270 (2022): 112302.

## Contributors

- Author [Roberto Chiosa](https://github.com/RobertoChiosa)

## References

- Series Distance Matrix repository (https://github.com/predict-idlab/seriesdistancematrix)
- Stumpy Package (https://stumpy.readthedocs.io/en/latest/)

## License

This code is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "contextual-anomaly-detector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "RobertoChiosa",
    "author_email": "roberto.chiosa@polito.it",
    "download_url": "https://files.pythonhosted.org/packages/e6/ef/ad4c4cfff016224ea2b4996e9ba41ed1ab6d1e117de89f6dd905cb0ce0c2/contextual_anomaly_detector-1.0.1.tar.gz",
    "platform": null,
    "description": "# Contextual Matrix Profile Calculation Tool\n\nMatrix Profile is an algorithm capable to discover motifs and discords in time series data. It is a powerful tool that\nby calculating the (z-normalized) Euclidean distance between any subsequence within a time series and its nearest\nneighbor it is able to provide insights on potential anomalies and/or repetitive patterns. In the field of building\nenergy management it can be employed to detect anomalies in electrical load timeseries.\n\nThis tool is a Python implementation of the Matrix Profile algorithm that employs contextual information (such as\nexternal air temperature) to identify abnormal pattens in electrical load subsequences that start in predefined sub\ndaily time windows, as shown in the following figure.\n\n![](./docs/example.png)\n\n**Table of Contents**\n\n* [Usage](#usage)\n    * [Data format](#data-format)\n    * [Run locally](#run-locally)\n    * [Run with Docker](#run-with-docker)\n* [Additional Information](#additional-information)\n* [Cite](#cite)\n* [Contributors](#contributors)\n* [License](#license)\n\n## Usage\n\nThe tool comes with a CLI that helps you to execute the script with the desired commands\n\n```console \n$ python -m src.cmp.main -h\n\nMatrix profile\n\npositional arguments:\n  input_file     Path to file\n  variable_name  Variable name\n  output_file    Path to the output file\n\noptions:\n  -h, --help     show this help message and exit\n```\n\nThe arguments to pass to the script are the following:\n\n* `input_file`: The input dataset via an HTTP URL. The tool should then download the dataset from that URL; since it's a\n  pre-signed URL, the tool would not need to deal with authentication\u2014it can just download the dataset directly.\n* `variable_name`: The variable name to be used for the analysis (i.e., the column of the csv that contains the\n  electrical load under analysis).\n* `output_file`: The local path to the output HTML report. The platform would then get that HTML report and upload it to\n  the object\n  storage service for the user to review later.\n\nYou can run the main script through the console using either local files or download data from an external url. This\nrepository comes with a sample dataset ([data.csv](.src/cmp/data/data.csv)) that you can use to generate a report and\nyou can pass the local path\nas `input_file` argument as follows:\n\n### Data format\n\nThe tool requires the user to provide a csv file as input that contains electrical power timeseries for a specific\nbuilding, meter or energy system (e.g., whole building electrical power timeseries). The `csv` is a wide table format as\nfollows:\n\n```csv\ntimestamp,column_1,temp\n2019-01-01 00:00:00,116.4,-0.6\n2019-01-01 00:15:00,125.6,-0.9\n2019-01-01 00:30:00,119.2,-1.2\n```\n\nThe csv must have the following columns:\n\n- `timestamp` [case sensitive]: The timestamp of the observation in the format `YYYY-MM-DD HH:MM:SS`. This column is\n  supposed to be in\n  UTC timezone string format. It will be internally transformed by the tool into the index of the dataframe.\n- `temp` [case sensitive]: Contains the external air temperature in Celsius degrees. This column is required to perform\n  thermal sensitive\n  analysis on the electrical load.\n- `column_1`: Then the dataframe may have `N` arbitrary columns that refers to electrical load time series. The user has\n  to specify the column name that refers to the electrical load time series in the `variable_name` argument.\n\n### Run locally\n\nCreate virtual environment and activate it and install dependencies:\n\n- Makefile\n  ```bash\n  make setup\n  ```\n\n- Linux:\n  ```bash\n  python3 -m venv .venv\n  source .venv/bin/activate\n  pip install poetry\n  poetry install\n  ```\n- Windows:\n  ```bash\n  python -m venv venv\n  venv\\Scripts\\activate\n  pip install poetry\n  poetry install\n  ```\n\nNow you can run the script from the console by passing the desired arguments. In the following we pass the sample\ndataset [`data.csv`](src/cmp/data/data.csv) as input file and the variable `Total_Power` as the variable name to be used\nfor the analysis. The output file will be saved in the [`results`](src/cmp/results) folder.\n\n```console\n$ python -m src.cmp.main src/cmp/data/data.csv Total_Power src/cmp/results/reports/report.html\n\n2024-08-13 12:45:42,821 [INFO](src.cmp.utils) \u2b07\ufe0f Downloading file from <src/cmp/data/data.csv>\n2024-08-13 12:45:43,070 [INFO](src.cmp.utils) \ud83d\udcca Data processed successfully\n\n*********************\nCONTEXT 1 : Subsequences of 05:45 h (m = 23) that start in [00:00,01:00) (ctx_from00_00_to01_00_m05_45)\n99.997%        0.0 sec\n\n- Cluster 1 (1.660 s)   -> 1 anomalies\n- Cluster 2 (0.372 s)   -> 3 anomalies\n- Cluster 3 (0.389 s)   -> 4 anomalies\n- Cluster 4 (0.593 s)   -> 5 anomalies\n- Cluster 5 (-)         -> no anomalies green\n\n[...]\n\n2024-08-13 12:46:27,187 [INFO](__main__) TOTAL 0 min 44 s\n2024-08-13 12:46:32,349 [INFO](src.cmp.utils) \ud83c\udf89 Report generated successfully on src/cmp/results/reports/report.html\n\n```\n\nAt the end of the execution you can find the report in the path specified by the `output_file` argument, in this case\nyou will find it in the [`results`](src/cmp/results) folder.\n\n### Run with Docker\n\nBuild the docker image.\n\n- Makefile\n  ```bash\n  make docker-build\n  ```\n- Linux:\n  ```bash\n  docker build -t cmp .\n  ```\n\nRun the docker image with the same arguments as before\n\n- Makefile\n  ```bash\n  make docker-run\n  ```\n- Linux:\n  ```bash\n  docker run cmp data/data.csv Total_Power results/reports/report.html\n  ```\n\nAt the end of the execution you can find the results in the [`results`](src/cmp/results) folder inside the docker\ncontainer.\n\n## Cite\n\nYou can cite this work by using the following reference or either though [this Bibtex file](./docs/ref.bib) or the\nfollowing plain text citation\n\n> Chiosa, Roberto, et al. \"Towards a self-tuned data analytics-based process for an automatic context-aware detection\n> and\n> diagnosis of anomalies in building energy consumption timeseries.\" Energy and Buildings 270 (2022): 112302.\n\n## Contributors\n\n- Author [Roberto Chiosa](https://github.com/RobertoChiosa)\n\n## References\n\n- Series Distance Matrix repository (https://github.com/predict-idlab/seriesdistancematrix)\n- Stumpy Package (https://stumpy.readthedocs.io/en/latest/)\n\n## License\n\nThis code is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Contextual matrix profile for anomaly detection in building electrical loads",
    "version": "1.0.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "497df2c30d031b94d090f15647b4fbbb56f099223ff188bd54854879550bb849",
                "md5": "44af154342f0aa0089a181a3c463fb2e",
                "sha256": "9387f4f085dc3acd96a8f2a922e6c8635e8e62cf55d718e62dba21b0f821f306"
            },
            "downloads": -1,
            "filename": "contextual_anomaly_detector-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "44af154342f0aa0089a181a3c463fb2e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 308564,
            "upload_time": "2024-11-20T16:06:21",
            "upload_time_iso_8601": "2024-11-20T16:06:21.234659Z",
            "url": "https://files.pythonhosted.org/packages/49/7d/f2c30d031b94d090f15647b4fbbb56f099223ff188bd54854879550bb849/contextual_anomaly_detector-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e6efad4c4cfff016224ea2b4996e9ba41ed1ab6d1e117de89f6dd905cb0ce0c2",
                "md5": "5d9890bd28ee407fabde6db1b374caea",
                "sha256": "d333725c37e67a0f0b94de75894e18ed781ef01fa8e01722c5200a25ce7d081a"
            },
            "downloads": -1,
            "filename": "contextual_anomaly_detector-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5d9890bd28ee407fabde6db1b374caea",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 280323,
            "upload_time": "2024-11-20T16:06:23",
            "upload_time_iso_8601": "2024-11-20T16:06:23.214734Z",
            "url": "https://files.pythonhosted.org/packages/e6/ef/ad4c4cfff016224ea2b4996e9ba41ed1ab6d1e117de89f6dd905cb0ce0c2/contextual_anomaly_detector-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-20 16:06:23",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "contextual-anomaly-detector"
}

RobertoChiosa