## PyCatcher
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/aseemanand/pycatcher/blob/main/LICENSE) [![PyPI Downloads](https://static.pepy.tech/badge/pycatcher)](https://pepy.tech/projects/pycatcher) [![PyPI Downloads](https://static.pepy.tech/badge/pycatcher/month)](https://pepy.tech/projects/pycatcher) [![PyPI Downloads](https://static.pepy.tech/badge/pycatcher/week)](https://pepy.tech/projects/pycatcher) ![PYPI version](https://img.shields.io/pypi/v/pycatcher.svg) ![PYPI - Python Version](https://img.shields.io/pypi/pyversions/pycatcher.svg)
## Outlier Detection for Time-series Data
This package identifies outlier(s) for a given time-series dataset in simple steps. It supports day, week, month and
quarter level time-series data.
- [Highlights](https://aseemanand.github.io/pycatcher/highlights/)
- [Outlier Detection Functions](https://aseemanand.github.io/pycatcher/outlier_detection_functions/)
- [Diagnostic Functions](https://aseemanand.github.io/pycatcher/diagnostic_functions/)
### Installation
```bash
pip install pycatcher
```
### Basic Requirements
* PyCatcher expects a Pandas DataFrame as an input for various outlier detection methods. It can convert Spark DataFrame
to Pandas DataFrame at the data processing stage.
* First column in the dataframe must be a time period column (date in 'YYYY-MM-DD'; month in 'YYYY-MM'; year in 'YYYY'
format) and the last column a numeric column (sum or total count for the time period) to detect outliers using
Seasonal Decomposition algorithms.
* Last column must be a numeric column to detect outliers using Interquartile Range (IQR) and Moving Average algorithms.
* At present, PyCatcher does not depend on labeled observations (ground truth). Outliers are detected solely through
underlying algorithms (for example, seasonal-trend decomposition and dispersion methods like MAD or Z-Score).
<hr style="border:1.25px solid gray">
### Summary of features
PyCatcher provides an efficient solution for detecting anomalies in time-series data using various statistical methods.
Below are the available techniques for anomaly detection, each optimized for different data characteristics.
### **1. Seasonal-Decomposition Based Anomaly Detection**
Seasonal decomposition algorithms (Classical; STL; MSTL) requires at least 2 years of data, otherwise we
can use simpler methods (Inter Quartile Range (IQR); Moving Average method) to detect outliers.
#### **Detect Outliers Using Classical Seasonal Decomposition**
For datasets with at least two years of data, PyCatcher automatically determines whether the data follows
an additive or multiplicative model to detect anomalies.
- **Method**: `detect_outliers_classic(df)`
- **Output**: DataFrame of detected anomalies or a message indicating no anomalies.
#### **Detect Today's Outliers**
Quickly identify if there are any anomalies specifically for the current date.
- **Method**: `detect_outliers_today_classic(df)`
- **Output**: Anomaly details for today or a message indicating no outliers.
#### **Detect the Latest Anomalies**
Retrieve the most recent anomalies identified in your time-series data.
- **Method**: `detect_outliers_latest_classic(df)`
- **Output**: Details of the latest detected anomalies.
#### **Visualize Outliers with Seasonal Decomposition**
Show outliers in your data through classical seasonal decomposition.
- **Method**: `build_outliers_plot_classic(df)`
- **Output**: Outlier plot generated using classical seasonal decomposition.
#### **Visualize Seasonal Decomposition**
Understand seasonality in your data by visualizing classical seasonal decomposition.
- **Method**: `build_seasonal_plot_classic(df)`
- **Output**: Seasonal plots displaying additive or multiplicative trends.
#### **Visualize Monthly Patterns**
Show month-wise box plot
- **Method**: `build_monthwise_plot(df)`
- **Output**: Month-wise box plots showing spread and skewness of data.
#### **Detect Outliers Using Seasonal-Trend Decomposition using LOESS (STL)**
Use the Seasonal-Trend Decomposition method (STL) to detect anomalies.
- **Method**: `detect_outliers_stl(df)`
- **Output**: Rows flagged as outliers using STL.
#### **Detect Today's Outliers**
Quickly identify if there are any anomalies specifically for the current date.
- **Method**: `detect_outliers_today_stl(df)`
- **Output**: Anomaly details for today or a message indicating no outliers.
#### **Detect the Latest Anomalies**
Retrieve the most recent anomalies identified in your time-series data.
- **Method**: `detect_outliers_latest_stl(df)`
- **Output**: Details of the latest detected anomalies.
#### **Visualize STL Outliers**
Show outliers using the Seasonal-Trend Decomposition using LOESS (STL).
- **Method**: `build_outliers_plot_stl(df)`
- **Output**: Outlier plot generated using STL.
#### **Visualize Seasonal Decomposition using STL**
Understand seasonality in your data by visualizing Seasonal-Trend Decomposition using LOESS (STL).
- **Method**: `build_seasonal_plot_stl(df)`
- **Output**: Seasonal plot to decompose a time series into a trend component, seasonal components,
and a residual component.
#### **Detect Outliers Using Multiple Seasonal-Trend decomposition using LOESS (MSTL)**
Use the Multiple Seasonal-Trend Decomposition method (MSTL) to detect anomalies.
- **Method**: `detect_outliers_mstl(df)`
- **Output**: Rows flagged as outliers using MSTL.
#### **Detect Today's Outliers**
Quickly identify if there are any anomalies specifically for the current date.
- **Method**: `detect_outliers_today_mstl(df)`
- **Output**: Anomaly details for today or a message indicating no outliers.
#### **Detect the Latest Anomalies**
Retrieve the most recent anomalies identified in your time-series data.
- **Method**: `detect_outliers_latest_mstl(df)`
- **Output**: Details of the latest detected anomalies.
#### **Visualize MSTL Outliers**
Show outliers using the Multiple Seasonal-Trend Decomposition using LOESS (MSTL).
- **Method**: `build_outliers_plot_mstl(df)`
- **Output**: Outlier plot generated using MSTL.
#### **Visualize Multiple Seasonal Decomposition**
Understand seasonality in your data by visualizing Multiple Seasonal-Trend Decomposition using LOESS (MSTL).
- **Method**: `build_seasonal_plot_mstl(df)`
- **Output**: Seasonal plot to decompose a time series into a trend component, multiple seasonal components,
and a residual component.
***
### **2. Detect Outliers Using ESD (Extreme Studentized Deviate)**
Detect anomalies in time-series data using the ESD algorithm.
- **Method**: `detect_outliers_esd(df)`
- **Output**: Rows flagged as outliers using the Generalized ESD or Seasonal ESD algorithm.
#### **Visualize ESD Outliers**
Show outliers using the Generalized ESD or Seasonal ESD algorithm.
- **Method**: `build_outliers_plot_esd(df)`
- **Output**: Outlier plot generated using Generalized ESD or Seasonal ESD algorithm.
---
### **3. Detect Outliers Using Moving Average**
Detect anomalies in time-series data using the Moving Average method.
- **Method**: `detect_outliers_moving_average(df)`
- **Output**: Rows flagged as outliers using Moving Average and Z-score algorithm.
#### **Visualize Moving Average Outliers**
Show outliers using the Moving Average and Z-score algorithm.
- **Method**: `build_outliers_plot_moving_average(df)`
- **Output**: Outlier plot generated using Moving Average method.
---
### **4. IQR-Based Anomaly Detection**
#### **Detect Outliers Using Interquartile Range (IQR)**
For datasets spanning less than two years, the IQR method is employed.
- **Method**: `find_outliers_iqr(df)`
- **Output**: Rows flagged as outliers based on IQR.
#### **Visualize IQR Plot**
Build an IQR plot for a given dataframe (for less than 2 years of data).
- **Method**: `build_iqr_plot(df)`
- **Output**: IQR plot for the time-series data.
<hr style="border:1.25px solid gray">
### Example Usage
To see an example of how to use the `pycatcher` package for outlier detection in time-series data, check out the [Example Notebook](https://github.com/aseemanand/pycatcher/blob/main/notebooks/Example%20Notebook.ipynb).
The notebook provides step-by-step guidance and demonstrates the key features of the library.
Raw data
{
"_id": null,
"home_page": "https://github.com/aseemanand/pycatcher/",
"name": "pycatcher",
"maintainer": "Jagadish Pamarthi",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": "jagadish.vrsec@gmail.com",
"keywords": "outlier-detection, python, timeseries",
"author": "Aseem Anand",
"author_email": "aseemanand@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/eb/c5/a88f29a2a0d2813bff52ccf7f07c106b39e873cb8cbdb1b20a9a35b519e0/pycatcher-0.0.67.tar.gz",
"platform": null,
"description": "## PyCatcher\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/aseemanand/pycatcher/blob/main/LICENSE) [![PyPI Downloads](https://static.pepy.tech/badge/pycatcher)](https://pepy.tech/projects/pycatcher) [![PyPI Downloads](https://static.pepy.tech/badge/pycatcher/month)](https://pepy.tech/projects/pycatcher) [![PyPI Downloads](https://static.pepy.tech/badge/pycatcher/week)](https://pepy.tech/projects/pycatcher) ![PYPI version](https://img.shields.io/pypi/v/pycatcher.svg) ![PYPI - Python Version](https://img.shields.io/pypi/pyversions/pycatcher.svg)\n\n## Outlier Detection for Time-series Data\nThis package identifies outlier(s) for a given time-series dataset in simple steps. It supports day, week, month and \nquarter level time-series data.\n\n- [Highlights](https://aseemanand.github.io/pycatcher/highlights/)\n- [Outlier Detection Functions](https://aseemanand.github.io/pycatcher/outlier_detection_functions/)\n- [Diagnostic Functions](https://aseemanand.github.io/pycatcher/diagnostic_functions/)\n\n### Installation\n\n```bash\npip install pycatcher\n```\n\n### Basic Requirements\n* PyCatcher expects a Pandas DataFrame as an input for various outlier detection methods. It can convert Spark DataFrame \nto Pandas DataFrame at the data processing stage. \n* First column in the dataframe must be a time period column (date in 'YYYY-MM-DD'; month in 'YYYY-MM'; year in 'YYYY' \nformat) and the last column a numeric column (sum or total count for the time period) to detect outliers using \nSeasonal Decomposition algorithms.\n* Last column must be a numeric column to detect outliers using Interquartile Range (IQR) and Moving Average algorithms. \n* At present, PyCatcher does not depend on labeled observations (ground truth). Outliers are detected solely through \nunderlying algorithms (for example, seasonal-trend decomposition and dispersion methods like MAD or Z-Score). \n\n<hr style=\"border:1.25px solid gray\">\n\n### Summary of features \nPyCatcher provides an efficient solution for detecting anomalies in time-series data using various statistical methods.\nBelow are the available techniques for anomaly detection, each optimized for different data characteristics.\n\n### **1. Seasonal-Decomposition Based Anomaly Detection**\n\nSeasonal decomposition algorithms (Classical; STL; MSTL) requires at least 2 years of data, otherwise we \ncan use simpler methods (Inter Quartile Range (IQR); Moving Average method) to detect outliers.\n\n#### **Detect Outliers Using Classical Seasonal Decomposition**\nFor datasets with at least two years of data, PyCatcher automatically determines whether the data follows \nan additive or multiplicative model to detect anomalies.\n\n- **Method**: `detect_outliers_classic(df)`\n- **Output**: DataFrame of detected anomalies or a message indicating no anomalies.\n\n#### **Detect Today's Outliers**\nQuickly identify if there are any anomalies specifically for the current date.\n\n- **Method**: `detect_outliers_today_classic(df)`\n- **Output**: Anomaly details for today or a message indicating no outliers.\n\n#### **Detect the Latest Anomalies**\nRetrieve the most recent anomalies identified in your time-series data.\n\n- **Method**: `detect_outliers_latest_classic(df)`\n- **Output**: Details of the latest detected anomalies.\n\n#### **Visualize Outliers with Seasonal Decomposition**\nShow outliers in your data through classical seasonal decomposition.\n\n- **Method**: `build_outliers_plot_classic(df)`\n- **Output**: Outlier plot generated using classical seasonal decomposition.\n\n#### **Visualize Seasonal Decomposition**\nUnderstand seasonality in your data by visualizing classical seasonal decomposition.\n\n- **Method**: `build_seasonal_plot_classic(df)`\n- **Output**: Seasonal plots displaying additive or multiplicative trends.\n\n#### **Visualize Monthly Patterns**\nShow month-wise box plot \n\n- **Method**: `build_monthwise_plot(df)`\n- **Output**: Month-wise box plots showing spread and skewness of data.\n\n\n#### **Detect Outliers Using Seasonal-Trend Decomposition using LOESS (STL)**\nUse the Seasonal-Trend Decomposition method (STL) to detect anomalies.\n\n- **Method**: `detect_outliers_stl(df)`\n- **Output**: Rows flagged as outliers using STL.\n\n#### **Detect Today's Outliers**\nQuickly identify if there are any anomalies specifically for the current date.\n\n- **Method**: `detect_outliers_today_stl(df)`\n- **Output**: Anomaly details for today or a message indicating no outliers.\n\n#### **Detect the Latest Anomalies**\nRetrieve the most recent anomalies identified in your time-series data.\n\n- **Method**: `detect_outliers_latest_stl(df)`\n- **Output**: Details of the latest detected anomalies.\n\n#### **Visualize STL Outliers**\nShow outliers using the Seasonal-Trend Decomposition using LOESS (STL).\n\n- **Method**: `build_outliers_plot_stl(df)`\n- **Output**: Outlier plot generated using STL.\n\n#### **Visualize Seasonal Decomposition using STL**\nUnderstand seasonality in your data by visualizing Seasonal-Trend Decomposition using LOESS (STL).\n\n- **Method**: `build_seasonal_plot_stl(df)`\n- **Output**: Seasonal plot to decompose a time series into a trend component, seasonal components, \nand a residual component.\n\n#### **Detect Outliers Using Multiple Seasonal-Trend decomposition using LOESS (MSTL)**\nUse the Multiple Seasonal-Trend Decomposition method (MSTL) to detect anomalies. \n\n- **Method**: `detect_outliers_mstl(df)`\n- **Output**: Rows flagged as outliers using MSTL.\n\n#### **Detect Today's Outliers**\nQuickly identify if there are any anomalies specifically for the current date.\n\n- **Method**: `detect_outliers_today_mstl(df)`\n- **Output**: Anomaly details for today or a message indicating no outliers.\n\n#### **Detect the Latest Anomalies**\nRetrieve the most recent anomalies identified in your time-series data.\n\n- **Method**: `detect_outliers_latest_mstl(df)`\n- **Output**: Details of the latest detected anomalies.\n\n#### **Visualize MSTL Outliers**\nShow outliers using the Multiple Seasonal-Trend Decomposition using LOESS (MSTL).\n\n- **Method**: `build_outliers_plot_mstl(df)`\n- **Output**: Outlier plot generated using MSTL.\n\n#### **Visualize Multiple Seasonal Decomposition**\nUnderstand seasonality in your data by visualizing Multiple Seasonal-Trend Decomposition using LOESS (MSTL).\n\n- **Method**: `build_seasonal_plot_mstl(df)`\n- **Output**: Seasonal plot to decompose a time series into a trend component, multiple seasonal components, \nand a residual component.\n\n***\n\n### **2. Detect Outliers Using ESD (Extreme Studentized Deviate)**\nDetect anomalies in time-series data using the ESD algorithm.\n\n- **Method**: `detect_outliers_esd(df)`\n- **Output**: Rows flagged as outliers using the Generalized ESD or Seasonal ESD algorithm.\n\n#### **Visualize ESD Outliers**\nShow outliers using the Generalized ESD or Seasonal ESD algorithm.\n\n- **Method**: `build_outliers_plot_esd(df)`\n- **Output**: Outlier plot generated using Generalized ESD or Seasonal ESD algorithm.\n \n---\n\n### **3. Detect Outliers Using Moving Average**\nDetect anomalies in time-series data using the Moving Average method.\n\n- **Method**: `detect_outliers_moving_average(df)`\n- **Output**: Rows flagged as outliers using Moving Average and Z-score algorithm.\n\n#### **Visualize Moving Average Outliers**\nShow outliers using the Moving Average and Z-score algorithm.\n\n- **Method**: `build_outliers_plot_moving_average(df)`\n- **Output**: Outlier plot generated using Moving Average method.\n \n---\n\n### **4. IQR-Based Anomaly Detection**\n\n#### **Detect Outliers Using Interquartile Range (IQR)**\nFor datasets spanning less than two years, the IQR method is employed.\n\n- **Method**: `find_outliers_iqr(df)`\n- **Output**: Rows flagged as outliers based on IQR.\n\n#### **Visualize IQR Plot**\nBuild an IQR plot for a given dataframe (for less than 2 years of data).\n\n- **Method**: `build_iqr_plot(df)`\n- **Output**: IQR plot for the time-series data.\n\n<hr style=\"border:1.25px solid gray\">\n\n### Example Usage\n\nTo see an example of how to use the `pycatcher` package for outlier detection in time-series data, check out the [Example Notebook](https://github.com/aseemanand/pycatcher/blob/main/notebooks/Example%20Notebook.ipynb).\n\nThe notebook provides step-by-step guidance and demonstrates the key features of the library.\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "This package identifies outlier(s) for a given time-series dataset in simple steps. It supports day, week, month and quarter level time-series data.",
"version": "0.0.67",
"project_urls": {
"Homepage": "https://github.com/aseemanand/pycatcher/",
"Repository": "https://github.com/aseemanand/pycatcher/"
},
"split_keywords": [
"outlier-detection",
" python",
" timeseries"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "64a9204ba9f5bafe6a225a4e64865219f8f9e70ce1c19a2be06d0ad0f41090f2",
"md5": "a79de4a39d8d23670763f518e2104217",
"sha256": "f27a9cb18543bf489e8eff76bc24e0126f719a8dfc195d1e82983e791a68df4c"
},
"downloads": -1,
"filename": "pycatcher-0.0.67-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a79de4a39d8d23670763f518e2104217",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 32436,
"upload_time": "2025-01-20T21:43:56",
"upload_time_iso_8601": "2025-01-20T21:43:56.355325Z",
"url": "https://files.pythonhosted.org/packages/64/a9/204ba9f5bafe6a225a4e64865219f8f9e70ce1c19a2be06d0ad0f41090f2/pycatcher-0.0.67-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ebc5a88f29a2a0d2813bff52ccf7f07c106b39e873cb8cbdb1b20a9a35b519e0",
"md5": "db26e6302ea67dc30be347e7fdc88ca8",
"sha256": "7435974c00bdd9e648de7ea3468bd0c64234c71cd7010f9e09918dc377a01cb7"
},
"downloads": -1,
"filename": "pycatcher-0.0.67.tar.gz",
"has_sig": false,
"md5_digest": "db26e6302ea67dc30be347e7fdc88ca8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 30435,
"upload_time": "2025-01-20T21:43:58",
"upload_time_iso_8601": "2025-01-20T21:43:58.297604Z",
"url": "https://files.pythonhosted.org/packages/eb/c5/a88f29a2a0d2813bff52ccf7f07c106b39e873cb8cbdb1b20a9a35b519e0/pycatcher-0.0.67.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-20 21:43:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aseemanand",
"github_project": "pycatcher",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "pycatcher"
}