pyod


Namepyod JSON
Version 2.0.2 PyPI version JSON
download
home_pagehttps://github.com/yzhao062/pyod
SummaryA Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)
upload_time2024-09-06 03:24:51
maintainerNone
docs_urlNone
authorYue Zhao
requires_pythonNone
licenseNone
keywords outlier detection anomaly detection outlier ensembles data mining neural networks
VCS
bugtrack_url
requirements joblib matplotlib numpy numba scipy scikit-learn
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Python Outlier Detection (PyOD)
===============================

**Deployment & Documentation & Stats & License**

|badge_pypi| |badge_anaconda| |badge_docs| |badge_stars| |badge_forks| |badge_downloads| |badge_testing| |badge_coverage| |badge_maintainability| |badge_license| |badge_benchmark|

.. |badge_pypi| image:: https://img.shields.io/pypi/v/pyod.svg?color=brightgreen
   :target: https://pypi.org/project/pyod/
   :alt: PyPI version

.. |badge_anaconda| image:: https://anaconda.org/conda-forge/pyod/badges/version.svg
   :target: https://anaconda.org/conda-forge/pyod
   :alt: Anaconda version

.. |badge_docs| image:: https://readthedocs.org/projects/pyod/badge/?version=latest
   :target: https://pyod.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation status

.. |badge_stars| image:: https://img.shields.io/github/stars/yzhao062/pyod.svg
   :target: https://github.com/yzhao062/pyod/stargazers
   :alt: GitHub stars

.. |badge_forks| image:: https://img.shields.io/github/forks/yzhao062/pyod.svg?color=blue
   :target: https://github.com/yzhao062/pyod/network
   :alt: GitHub forks

.. |badge_downloads| image:: https://pepy.tech/badge/pyod
   :target: https://pepy.tech/project/pyod
   :alt: Downloads

.. |badge_testing| image:: https://github.com/yzhao062/pyod/actions/workflows/testing.yml/badge.svg
   :target: https://github.com/yzhao062/pyod/actions/workflows/testing.yml
   :alt: Testing


.. |badge_coverage| image:: https://coveralls.io/repos/github/yzhao062/pyod/badge.svg
   :target: https://coveralls.io/github/yzhao062/pyod
   :alt: Coverage Status

.. |badge_maintainability| image:: https://api.codeclimate.com/v1/badges/bdc3d8d0454274c753c4/maintainability
   :target: https://codeclimate.com/github/yzhao062/Pyod/maintainability
   :alt: Maintainability

.. |badge_license| image:: https://img.shields.io/github/license/yzhao062/pyod.svg
   :target: https://github.com/yzhao062/pyod/blob/master/LICENSE
   :alt: License

.. |badge_benchmark| image:: https://img.shields.io/badge/ADBench-benchmark_results-pink
   :target: https://github.com/Minqi824/ADBench
   :alt: Benchmark


-----


Read Me First
^^^^^^^^^^^^^

Welcome to PyOD, a comprehensive but easy-to-use Python library for detecting anomalies in multivariate data. Whether you're tackling a small-scale project or large datasets, PyOD offers a range of algorithms to suit your needs.

* **For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.

* **For graph outlier detection**, please use `PyGOD <https://pygod.org/>`_.

* **Performance Comparison & Datasets**: We have a 45-page, comprehensive `anomaly detection benchmark paper <https://openreview.net/forum?id=foA_SFQ9zo0>`_. The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

* **Learn more about anomaly detection** at `Anomaly Detection Resources <https://github.com/yzhao062/anomaly-detection-resources>`_

* **PyOD on Distributed Systems**: you can also run `PyOD on databricks <https://www.databricks.com/blog/2023/03/13/unsupervised-outlier-detection-databricks.html>`_.

----

About PyOD
^^^^^^^^^^

PyOD, established in 2017, has become a go-to **Python library** for **detecting anomalous/outlying objects** in multivariate data. This exciting yet challenging field is commonly referred to as `Outlier Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 50 detection algorithms, from classical LOF (SIGMOD 2000) to the cutting-edge ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic research projects and commercial products with more than `22 million downloads <https://pepy.tech/project/pyod>`_. It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including `Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_, `KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and `Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.

**PyOD is featured for**:

* **Unified, User-Friendly Interface** across various algorithms.
* **Wide Range of Models**, from classic techniques to the latest deep learning methods in **PyTorch**.
* **High Performance & Efficiency**, leveraging `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_ for JIT compilation and parallel processing.
* **Fast Training & Prediction**, achieved through the SUOD framework [#Zhao2021SUOD]_.

**Outlier Detection with 5 Lines of Code**:

.. code-block:: python

    # Example: Training an ECOD detector
    from pyod.models.ecod import ECOD
    clf = ECOD()
    clf.fit(X_train)
    y_train_scores = clf.decision_scores_  # Outlier scores for training data
    y_test_scores = clf.decision_function(X_test)  # Outlier scores for test data


**Selecting the Right Algorithm:** Unsure where to start? Consider these robust and interpretable options:

- `ECOD <https://github.com/yzhao062/pyod/blob/master/examples/ecod_example.py>`_: Example of using ECOD for outlier detection
- `Isolation Forest <https://github.com/yzhao062/pyod/blob/master/examples/iforest_example.py>`_: Example of using Isolation Forest for outlier detection

Alternatively, explore `MetaOD <https://github.com/yzhao062/MetaOD>`_ for a data-driven approach.

**Citing PyOD**:

`PyOD paper <http://www.jmlr.org/papers/volume20/19-011/19-011.pdf>`_ is published in `Journal of Machine Learning Research (JMLR) <http://www.jmlr.org/>`_ (MLOSS track). If you use PyOD in a scientific publication, we would appreciate citations to the following paper::

    @article{zhao2019pyod,
        author  = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
        title   = {PyOD: A Python Toolbox for Scalable Outlier Detection},
        journal = {Journal of Machine Learning Research},
        year    = {2019},
        volume  = {20},
        number  = {96},
        pages   = {1-7},
        url     = {http://jmlr.org/papers/v20/19-011.html}
    }

or::

    Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.

For a broader perspective on anomaly detection, see our NeurIPS papers `ADBench: Anomaly Detection Benchmark Paper <https://arxiv.org/abs/2206.09426>`_ and `ADGym: Design Choices for Deep Anomaly Detection <https://arxiv.org/abs/2309.15376>`_::

    @article{han2022adbench,
        title={Adbench: Anomaly detection benchmark},
        author={Han, Songqiao and Hu, Xiyang and Huang, Hailiang and Jiang, Minqi and Zhao, Yue},
        journal={Advances in Neural Information Processing Systems},
        volume={35},
        pages={32142--32159},
        year={2022}
    }

    @article{jiang2023adgym,
        title={ADGym: Design Choices for Deep Anomaly Detection},
        author={Jiang, Minqi and Hou, Chaochuan and Zheng, Ao and Han, Songqiao and Huang, Hailiang and Wen, Qingsong and Hu, Xiyang and Zhao, Yue},
        journal={Advances in Neural Information Processing Systems},
        volume={36},
        year={2023}
    }


**Table of Contents**:

* `Installation <#installation>`_
* `API Cheatsheet & Reference <#api-cheatsheet--reference>`_
* `ADBench Benchmark and Datasets <#adbench-benchmark-and-datasets>`_
* `Model Save & Load <#model-save--load>`_
* `Fast Train with SUOD <#fast-train-with-suod>`_
* `Thresholding Outlier Scores <#thresholding-outlier-scores>`_
* `Implemented Algorithms <#implemented-algorithms>`_
* `Quick Start for Outlier Detection <#quick-start-for-outlier-detection>`_
* `How to Contribute <#how-to-contribute>`_
* `Inclusion Criteria <#inclusion-criteria>`_

----

Installation
^^^^^^^^^^^^

PyOD is designed for easy installation using either **pip** or **conda**. We recommend using the latest version of PyOD due to frequent updates and enhancements:

.. code-block:: bash

   pip install pyod            # normal install
   pip install --upgrade pyod  # or update if needed

.. code-block:: bash

   conda install -c conda-forge pyod

Alternatively, you can clone and run the setup.py file:

.. code-block:: bash

   git clone https://github.com/yzhao062/pyod.git
   cd pyod
   pip install .

**Required Dependencies**:

* Python 3.8 or higher
* joblib
* matplotlib
* numpy>=1.19
* numba>=0.51
* scipy>=1.5.1
* scikit_learn>=0.22.0

**Optional Dependencies (see details below)**:

* combo (optional, required for models/combination.py and FeatureBagging)
* pytorch (optional, required for AutoEncoder, and other deep learning models)
* suod (optional, required for running SUOD model)
* xgboost (optional, required for XGBOD)
* pythresh (optional, required for thresholding)

----


API Cheatsheet & Reference
^^^^^^^^^^^^^^^^^^^^^^^^^^

The full API Reference is available at `PyOD Documentation <https://pyod.readthedocs.io/en/latest/pyod.html>`_. Below is a quick cheatsheet for all detectors:

* **fit(X)**: Fit the detector. The parameter y is ignored in unsupervised methods.
* **decision_function(X)**: Predict raw anomaly scores for X using the fitted detector.
* **predict(X)**: Determine whether a sample is an outlier or not as binary labels using the fitted detector.
* **predict_proba(X)**: Estimate the probability of a sample being an outlier using the fitted detector.
* **predict_confidence(X)**: Assess the model's confidence on a per-sample basis (applicable in predict and predict_proba) [#Perini2020Quantifying]_.

**Key Attributes of a fitted model**:

* **decision_scores_**: Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores.
* **labels_**: Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.


----


ADBench Benchmark and Datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We just released a 45-page, the most comprehensive `ADBench: Anomaly Detection Benchmark <https://arxiv.org/abs/2206.09426>`_ [#Han2022ADBench]_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

The organization of **ADBench** is provided below:

.. image:: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true
   :target: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true
   :alt: benchmark-fig


For a simpler visualization, we make **the comparison of selected models** via
`compare_all_models.py <https://github.com/yzhao062/pyod/blob/master/examples/compare_all_models.py>`_\.

.. image:: https://github.com/yzhao062/pyod/blob/development/examples/ALL.png?raw=true
   :target: https://github.com/yzhao062/pyod/blob/development/examples/ALL.png?raw=true
   :alt: Comparison_of_All



----

Model Save & Load
^^^^^^^^^^^^^^^^^

PyOD takes a similar approach of sklearn regarding model persistence.
See `model persistence <https://scikit-learn.org/stable/modules/model_persistence.html>`_ for clarification.

In short, we recommend to use joblib or pickle for saving and loading PyOD models.
See `"examples/save_load_model_example.py" <https://github.com/yzhao062/pyod/blob/master/examples/save_load_model_example.py>`_ for an example.
In short, it is simple as below:

.. code-block:: python

    from joblib import dump, load

    # save the model
    dump(clf, 'clf.joblib')
    # load the model
    clf = load('clf.joblib')

It is known that there are challenges in saving neural network models.
Check `#328 <https://github.com/yzhao062/pyod/issues/328#issuecomment-917192704>`_
and `#88 <https://github.com/yzhao062/pyod/issues/88#issuecomment-615343139>`_
for temporary workaround.


----


Fast Train with SUOD
^^^^^^^^^^^^^^^^^^^^

**Fast training and prediction**: it is possible to train and predict with
a large number of detection models in PyOD by leveraging SUOD framework [#Zhao2021SUOD]_.
See  `SUOD Paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/21-mlsys-suod.pdf>`_
and  `SUOD example <https://github.com/yzhao062/pyod/blob/master/examples/suod_example.py>`_.


.. code-block:: python

    from pyod.models.suod import SUOD

    # initialized a group of outlier detectors for acceleration
    detector_list = [LOF(n_neighbors=15), LOF(n_neighbors=20),
                     LOF(n_neighbors=25), LOF(n_neighbors=35),
                     COPOD(), IForest(n_estimators=100),
                     IForest(n_estimators=200)]

    # decide the number of parallel process, and the combination method
    # then clf can be used as any outlier detection model
    clf = SUOD(base_estimators=detector_list, n_jobs=2, combination='average',
               verbose=False)

----

Thresholding Outlier Scores
^^^^^^^^^^^^^^^^^^^^^^^^^^^

A more data-based approach can be taken when setting the contamination level. By using a thresholding method, guessing an arbitrary value can be replaced with tested techniques for separating inliers and outliers. Refer to `PyThresh <https://github.com/KulikDM/pythresh>`_ for a more in-depth look at thresholding.

.. code-block:: python

    from pyod.models.knn import KNN
    from pyod.models.thresholds import FILTER

    # Set the outlier detection and thresholding methods
    clf = KNN(contamination=FILTER())


See supported thresholding methods in `thresholding <https://github.com/yzhao062/pyod/blob/master/docs/thresholding.rst>`_.

----



Implemented Algorithms
^^^^^^^^^^^^^^^^^^^^^^

PyOD toolkit consists of four major functional groups:

**(i) Individual Detection Algorithms** :

===================  ==================  ======================================================================================================  =====  ========================================
Type                 Abbr                Algorithm                                                                                               Year   Ref
===================  ==================  ======================================================================================================  =====  ========================================
Probabilistic        ECOD                Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions                        2022   [#Li2021ECOD]_
Probabilistic        ABOD                Angle-Based Outlier Detection                                                                           2008   [#Kriegel2008Angle]_
Probabilistic        FastABOD            Fast Angle-Based Outlier Detection using approximation                                                  2008   [#Kriegel2008Angle]_
Probabilistic        COPOD               COPOD: Copula-Based Outlier Detection                                                                   2020   [#Li2020COPOD]_
Probabilistic        MAD                 Median Absolute Deviation (MAD)                                                                         1993   [#Iglewicz1993How]_
Probabilistic        SOS                 Stochastic Outlier Selection                                                                            2012   [#Janssens2012Stochastic]_
Probabilistic        QMCD                Quasi-Monte Carlo Discrepancy outlier detection                                                         2001   [#Fang2001Wrap]_
Probabilistic        KDE                 Outlier Detection with Kernel Density Functions                                                         2007   [#Latecki2007Outlier]_
Probabilistic        Sampling            Rapid distance-based outlier detection via sampling                                                     2013   [#Sugiyama2013Rapid]_
Probabilistic        GMM                 Probabilistic Mixture Modeling for Outlier Analysis                                                            [#Aggarwal2015Outlier]_ [Ch.2]
Linear Model         PCA                 Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes)   2003   [#Shyu2003A]_
Linear Model         KPCA                Kernel Principal Component Analysis                                                                     2007   [#Hoffmann2007Kernel]_
Linear Model         MCD                 Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores)                    1999   [#Hardin2004Outlier]_ [#Rousseeuw1999A]_
Linear Model         CD                  Use Cook's distance for outlier detection                                                               1977   [#Cook1977Detection]_
Linear Model         OCSVM               One-Class Support Vector Machines                                                                       2001   [#Scholkopf2001Estimating]_
Linear Model         LMDD                Deviation-based Outlier Detection (LMDD)                                                                1996   [#Arning1996A]_
Proximity-Based      LOF                 Local Outlier Factor                                                                                    2000   [#Breunig2000LOF]_
Proximity-Based      COF                 Connectivity-Based Outlier Factor                                                                       2002   [#Tang2002Enhancing]_
Proximity-Based      (Incremental) COF   Memory Efficient Connectivity-Based Outlier Factor (slower but reduce storage complexity)               2002   [#Tang2002Enhancing]_
Proximity-Based      CBLOF               Clustering-Based Local Outlier Factor                                                                   2003   [#He2003Discovering]_
Proximity-Based      LOCI                LOCI: Fast outlier detection using the local correlation integral                                       2003   [#Papadimitriou2003LOCI]_
Proximity-Based      HBOS                Histogram-based Outlier Score                                                                           2012   [#Goldstein2012Histogram]_
Proximity-Based      kNN                 k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score)                 2000   [#Ramaswamy2000Efficient]_
Proximity-Based      AvgKNN              Average kNN (use the average distance to k nearest neighbors as the outlier score)                      2002   [#Angiulli2002Fast]_
Proximity-Based      MedKNN              Median kNN (use the median distance to k nearest neighbors as the outlier score)                        2002   [#Angiulli2002Fast]_
Proximity-Based      SOD                 Subspace Outlier Detection                                                                              2009   [#Kriegel2009Outlier]_
Proximity-Based      ROD                 Rotation-based Outlier Detection                                                                        2020   [#Almardeny2020A]_
Outlier Ensembles    IForest             Isolation Forest                                                                                        2008   [#Liu2008Isolation]_
Outlier Ensembles    INNE                Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles                                      2018   [#Bandaragoda2018Isolation]_
Outlier Ensembles    DIF                 Deep Isolation Forest for Anomaly Detection                                                             2023   [#Xu2023Deep]_
Outlier Ensembles    FB                  Feature Bagging                                                                                         2005   [#Lazarevic2005Feature]_
Outlier Ensembles    LSCP                LSCP: Locally Selective Combination of Parallel Outlier Ensembles                                       2019   [#Zhao2019LSCP]_
Outlier Ensembles    XGBOD               Extreme Boosting Based Outlier Detection **(Supervised)**                                               2018   [#Zhao2018XGBOD]_
Outlier Ensembles    LODA                Lightweight On-line Detector of Anomalies                                                               2016   [#Pevny2016Loda]_
Outlier Ensembles    SUOD                SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection **(Acceleration)**          2021   [#Zhao2021SUOD]_
Neural Networks      AutoEncoder         Fully connected AutoEncoder (use reconstruction error as the outlier score)                                    [#Aggarwal2015Outlier]_ [Ch.3]
Neural Networks      VAE                 Variational AutoEncoder (use reconstruction error as the outlier score)                                 2013   [#Kingma2013Auto]_
Neural Networks      Beta-VAE            Variational AutoEncoder (all customized loss term by varying gamma and capacity)                        2018   [#Burgess2018Understanding]_
Neural Networks      SO_GAAL             Single-Objective Generative Adversarial Active Learning                                                 2019   [#Liu2019Generative]_
Neural Networks      MO_GAAL             Multiple-Objective Generative Adversarial Active Learning                                               2019   [#Liu2019Generative]_
Neural Networks      DeepSVDD            Deep One-Class Classification                                                                           2018   [#Ruff2018Deep]_
Neural Networks      AnoGAN              Anomaly Detection with Generative Adversarial Networks                                                  2017   [#Schlegl2017Unsupervised]_
Neural Networks      ALAD                Adversarially learned anomaly detection                                                                 2018   [#Zenati2018Adversarially]_
Neural Networks      AE1SVM              Autoencoder-based One-class Support Vector Machine                                                      2019   [#Nguyen2019scalable]_
Neural Networks      DevNet              Deep Anomaly Detection with Deviation Networks                                                          2019   [#Pang2019Deep]_
Graph-based          R-Graph             Outlier detection by R-graph                                                                            2017   [#You2017Provable]_
Graph-based          LUNAR               LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks                               2022   [#Goodge2022Lunar]_
===================  ==================  ======================================================================================================  =====  ========================================


**(ii) Outlier Ensembles & Outlier Detector Combination Frameworks**:

===================  ================  =====================================================================================================  =====  ========================================
Type                 Abbr              Algorithm                                                                                              Year   Ref
===================  ================  =====================================================================================================  =====  ========================================
Outlier Ensembles    FB                Feature Bagging                                                                                        2005   [#Lazarevic2005Feature]_
Outlier Ensembles    LSCP              LSCP: Locally Selective Combination of Parallel Outlier Ensembles                                      2019   [#Zhao2019LSCP]_
Outlier Ensembles    XGBOD             Extreme Boosting Based Outlier Detection **(Supervised)**                                              2018   [#Zhao2018XGBOD]_
Outlier Ensembles    LODA              Lightweight On-line Detector of Anomalies                                                              2016   [#Pevny2016Loda]_
Outlier Ensembles    SUOD              SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection **(Acceleration)**         2021   [#Zhao2021SUOD]_
Outlier Ensembles    INNE              Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles                                     2018   [#Bandaragoda2018Isolation]_
Combination          Average           Simple combination by averaging the scores                                                             2015   [#Aggarwal2015Theoretical]_
Combination          Weighted Average  Simple combination by averaging the scores with detector weights                                       2015   [#Aggarwal2015Theoretical]_
Combination          Maximization      Simple combination by taking the maximum scores                                                        2015   [#Aggarwal2015Theoretical]_
Combination          AOM               Average of Maximum                                                                                     2015   [#Aggarwal2015Theoretical]_
Combination          MOA               Maximization of Average                                                                                2015   [#Aggarwal2015Theoretical]_
Combination          Median            Simple combination by taking the median of the scores                                                  2015   [#Aggarwal2015Theoretical]_
Combination          majority Vote     Simple combination by taking the majority vote of the labels (weights can be used)                     2015   [#Aggarwal2015Theoretical]_
===================  ================  =====================================================================================================  =====  ========================================


**(iii) Utility Functions**:

===================  ======================  =====================================================================================================================================================  ======================================================================================================================================
Type                 Name                    Function                                                                                                                                               Documentation
===================  ======================  =====================================================================================================================================================  ======================================================================================================================================
Data                 generate_data           Synthesized data generation; normal data is generated by a multivariate Gaussian and outliers are generated by a uniform distribution                  `generate_data <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.data.generate_data>`_
Data                 generate_data_clusters  Synthesized data generation in clusters; more complex data patterns can be created with multiple clusters                                              `generate_data_clusters <https://pyod.readthedocs.io/en/latest/pyod.utils.html#pyod.utils.data.generate_data_clusters>`_
Stat                 wpearsonr               Calculate the weighted Pearson correlation of two samples                                                                                              `wpearsonr <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.stat_models.wpearsonr>`_
Utility              get_label_n             Turn raw outlier scores into binary labels by assign 1 to top n outlier scores                                                                         `get_label_n <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.utility.get_label_n>`_
Utility              precision_n_scores      calculate precision @ rank n                                                                                                                           `precision_n_scores <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.utility.precision_n_scores>`_
===================  ======================  =====================================================================================================================================================  ======================================================================================================================================

----

Quick Start for Outlier Detection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

PyOD has been well acknowledged by the machine learning community with a few featured posts and tutorials.

**Analytics Vidhya**: `An Awesome Tutorial to Learn Outlier Detection in Python using PyOD Library <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_

**KDnuggets**: `Intuitive Visualization of Outlier Detection Methods <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, `An Overview of Outlier Detection Methods from PyOD <https://www.kdnuggets.com/2019/06/overview-outlier-detection-methods-pyod.html>`_

**Towards Data Science**: `Anomaly Detection for Dummies <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_

`"examples/knn_example.py" <https://github.com/yzhao062/pyod/blob/master/examples/knn_example.py>`_
demonstrates the basic API of using kNN detector. **It is noted that the API across all other algorithms are consistent/similar**.

More detailed instructions for running examples can be found in `examples directory <https://github.com/yzhao062/pyod/blob/master/examples>`_.


#. Initialize a kNN detector, fit the model, and make the prediction.

   .. code-block:: python


       from pyod.models.knn import KNN   # kNN detector

       # train kNN detector
       clf_name = 'KNN'
       clf = KNN()
       clf.fit(X_train)

       # get the prediction label and outlier scores of the training data
       y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
       y_train_scores = clf.decision_scores_  # raw outlier scores

       # get the prediction on the test data
       y_test_pred = clf.predict(X_test)  # outlier labels (0 or 1)
       y_test_scores = clf.decision_function(X_test)  # outlier scores

       # it is possible to get the prediction confidence as well
       y_test_pred, y_test_pred_confidence = clf.predict(X_test, return_confidence=True)  # outlier labels (0 or 1) and confidence in the range of [0,1]

#. Evaluate the prediction by ROC and Precision @ Rank n (p@n).

   .. code-block:: python

       from pyod.utils.data import evaluate_print
       
       # evaluate and print the results
       print("\nOn Training Data:")
       evaluate_print(clf_name, y_train, y_train_scores)
       print("\nOn Test Data:")
       evaluate_print(clf_name, y_test, y_test_scores)


#. See a sample output & visualization.


   .. code-block:: python


       On Training Data:
       KNN ROC:1.0, precision @ rank n:1.0

       On Test Data:
       KNN ROC:0.9989, precision @ rank n:0.9

   .. code-block:: python


       visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
           y_test_pred, show_figure=True, save_figure=False)

Visualization (\ `knn_figure <https://raw.githubusercontent.com/yzhao062/pyod/master/examples/KNN.png>`_\ ):

.. image:: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/KNN.png
   :target: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/KNN.png
   :alt: kNN example figure

----

Reference
^^^^^^^^^


.. [#Aggarwal2015Outlier] Aggarwal, C.C., 2015. Outlier analysis. In Data mining (pp. 237-263). Springer, Cham.

.. [#Aggarwal2015Theoretical] Aggarwal, C.C. and Sathe, S., 2015. Theoretical foundations and algorithms for outlier ensembles.\ *ACM SIGKDD Explorations Newsletter*\ , 17(1), pp.24-47.

.. [#Aggarwal2017Outlier] Aggarwal, C.C. and Sathe, S., 2017. Outlier ensembles: An introduction. Springer.

.. [#Almardeny2020A] Almardeny, Y., Boujnah, N. and Cleary, F., 2020. A Novel Outlier Detection Method for Multivariate Data. *IEEE Transactions on Knowledge and Data Engineering*.

.. [#Angiulli2002Fast] Angiulli, F. and Pizzuti, C., 2002, August. Fast outlier detection in high dimensional spaces. In *European Conference on Principles of Data Mining and Knowledge Discovery* pp. 15-27.

.. [#Arning1996A] Arning, A., Agrawal, R. and Raghavan, P., 1996, August. A Linear Method for Deviation Detection in Large Databases. In *KDD* (Vol. 1141, No. 50, pp. 972-981).

.. [#Bandaragoda2018Isolation] Bandaragoda, T. R., Ting, K. M., Albrecht, D., Liu, F. T., Zhu, Y., and Wells, J. R., 2018, Isolation-based anomaly detection using nearest-neighbor ensembles. *Computational Intelligence*\ , 34(4), pp. 968-998.

.. [#Breunig2000LOF] Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. *ACM Sigmod Record*\ , 29(2), pp. 93-104.

.. [#Burgess2018Understanding] Burgess, Christopher P., et al. "Understanding disentangling in beta-VAE." arXiv preprint arXiv:1804.03599 (2018).

.. [#Cook1977Detection] Cook, R.D., 1977. Detection of influential observation in linear regression. Technometrics, 19(1), pp.15-18.

.. [#Fang2001Wrap] Fang, K.T. and Ma, C.X., 2001. Wrap-around L2-discrepancy of random sampling, Latin hypercube and uniform designs. Journal of complexity, 17(4), pp.608-624.

.. [#Goldstein2012Histogram] Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In *KI-2012: Poster and Demo Track*\ , pp.59-63.

.. [#Goodge2022Lunar] Goodge, A., Hooi, B., Ng, S.K. and Ng, W.S., 2022, June. Lunar: Unifying local outlier detection methods via graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence.

.. [#Gopalan2019PIDForest] Gopalan, P., Sharan, V. and Wieder, U., 2019. PIDForest: Anomaly Detection via Partial Identification. In Advances in Neural Information Processing Systems, pp. 15783-15793.

.. [#Han2022ADBench] Han, S., Hu, X., Huang, H., Jiang, M. and Zhao, Y., 2022. ADBench: Anomaly Detection Benchmark. arXiv preprint arXiv:2206.09426.

.. [#Hardin2004Outlier] Hardin, J. and Rocke, D.M., 2004. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. *Computational Statistics & Data Analysis*\ , 44(4), pp.625-638.

.. [#He2003Discovering] He, Z., Xu, X. and Deng, S., 2003. Discovering cluster-based local outliers. *Pattern Recognition Letters*\ , 24(9-10), pp.1641-1650.

.. [#Hoffmann2007Kernel] Hoffmann, H., 2007. Kernel PCA for novelty detection. Pattern recognition, 40(3), pp.863-874.

.. [#Iglewicz1993How] Iglewicz, B. and Hoaglin, D.C., 1993. How to detect and handle outliers (Vol. 16). Asq Press.

.. [#Janssens2012Stochastic] Janssens, J.H.M., Huszár, F., Postma, E.O. and van den Herik, H.J., 2012. Stochastic outlier selection. Technical report TiCC TR 2012-001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands.

.. [#Kingma2013Auto] Kingma, D.P. and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

.. [#Kriegel2008Angle] Kriegel, H.P. and Zimek, A., 2008, August. Angle-based outlier detection in high-dimensional data. In *KDD '08*\ , pp. 444-452. ACM.

.. [#Kriegel2009Outlier] Kriegel, H.P., Kröger, P., Schubert, E. and Zimek, A., 2009, April. Outlier detection in axis-parallel subspaces of high dimensional data. In *Pacific-Asia Conference on Knowledge Discovery and Data Mining*\ , pp. 831-838. Springer, Berlin, Heidelberg.

.. [#Latecki2007Outlier] Latecki, L.J., Lazarevic, A. and Pokrajac, D., 2007, July. Outlier detection with kernel density functions. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 61-75). Springer, Berlin, Heidelberg.

.. [#Lazarevic2005Feature] Lazarevic, A. and Kumar, V., 2005, August. Feature bagging for outlier detection. In *KDD '05*. 2005.

.. [#Li2019MADGAN] Li, D., Chen, D., Jin, B., Shi, L., Goh, J. and Ng, S.K., 2019, September. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In *International Conference on Artificial Neural Networks* (pp. 703-716). Springer, Cham.

.. [#Li2020COPOD] Li, Z., Zhao, Y., Botta, N., Ionescu, C. and Hu, X. COPOD: Copula-Based Outlier Detection. *IEEE International Conference on Data Mining (ICDM)*, 2020.

.. [#Li2021ECOD] Li, Z., Zhao, Y., Hu, X., Botta, N., Ionescu, C. and Chen, H. G. ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. *IEEE Transactions on Knowledge and Data Engineering (TKDE)*, 2022.

.. [#Liu2008Isolation] Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In *International Conference on Data Mining*\ , pp. 413-422. IEEE.

.. [#Liu2019Generative] Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M. and He, X., 2019. Generative adversarial active learning for unsupervised outlier detection. *IEEE Transactions on Knowledge and Data Engineering*.

.. [#Nguyen2019scalable] Nguyen, M.N. and Vien, N.A., 2019. Scalable and interpretable one-class svms with deep learning and random fourier features. In *Machine Learning and Knowledge Discovery in Databases: European Conference*, ECML PKDD, 2018.

.. [#Pang2019Deep] Pang, Guansong, Chunhua Shen, and Anton Van Den Hengel. "Deep anomaly detection with deviation networks." In *KDD*, pp. 353-362. 2019.

.. [#Papadimitriou2003LOCI] Papadimitriou, S., Kitagawa, H., Gibbons, P.B. and Faloutsos, C., 2003, March. LOCI: Fast outlier detection using the local correlation integral. In *ICDE '03*, pp. 315-326. IEEE.

.. [#Pevny2016Loda] Pevný, T., 2016. Loda: Lightweight on-line detector of anomalies. *Machine Learning*, 102(2), pp.275-304.

.. [#Perini2020Quantifying] Perini, L., Vercruyssen, V., Davis, J. Quantifying the confidence of anomaly detectors in their example-wise predictions. In *Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD)*, 2020.

.. [#Ramaswamy2000Efficient] Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. *ACM Sigmod Record*\ , 29(2), pp. 427-438.

.. [#Rousseeuw1999A] Rousseeuw, P.J. and Driessen, K.V., 1999. A fast algorithm for the minimum covariance determinant estimator. *Technometrics*\ , 41(3), pp.212-223.

.. [#Ruff2018Deep] Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E. and Kloft, M., 2018, July. Deep one-class classification. In *International conference on machine learning* (pp. 4393-4402). PMLR.

.. [#Schlegl2017Unsupervised] Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U. and Langs, G., 2017, June. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International conference on information processing in medical imaging (pp. 146-157). Springer, Cham.

.. [#Scholkopf2001Estimating] Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C., 2001. Estimating the support of a high-dimensional distribution. *Neural Computation*, 13(7), pp.1443-1471.

.. [#Shyu2003A] Shyu, M.L., Chen, S.C., Sarinnapakorn, K. and Chang, L., 2003. A novel anomaly detection scheme based on principal component classifier. *MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING*.

.. [#Sugiyama2013Rapid] Sugiyama, M. and Borgwardt, K., 2013. Rapid distance-based outlier detection via sampling. Advances in neural information processing systems, 26.

.. [#Tang2002Enhancing] Tang, J., Chen, Z., Fu, A.W.C. and Cheung, D.W., 2002, May. Enhancing effectiveness of outlier detections for low density patterns. In *Pacific-Asia Conference on Knowledge Discovery and Data Mining*, pp. 535-548. Springer, Berlin, Heidelberg.

.. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.

.. [#Xu2023Deep] Xu, H., Pang, G., Wang, Y., Wang, Y., 2023. Deep isolation forest for anomaly detection. *IEEE Transactions on Knowledge and Data Engineering*.

.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.

.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.

.. [#Zhao2018XGBOD] Zhao, Y. and Hryniewicki, M.K. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. *IEEE International Joint Conference on Neural Networks*\ , 2018.

.. [#Zhao2019LSCP] Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In *Proceedings of the 2019 SIAM International Conference on Data Mining (SDM)*, pp. 585-593. Society for Industrial and Applied Mathematics.

.. [#Zhao2021SUOD] Zhao, Y., Hu, X., Cheng, C., Wang, C., Wan, C., Wang, W., Yang, J., Bai, H., Li, Z., Xiao, C., Wang, Y., Qiao, Z., Sun, J. and Akoglu, L. (2021). SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection. *Conference on Machine Learning and Systems (MLSys)*.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yzhao062/pyod",
    "name": "pyod",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "outlier detection, anomaly detection, outlier ensembles, data mining, neural networks",
    "author": "Yue Zhao",
    "author_email": "yzhao062@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ef/86/80ff0719a4a24e84e1e6bfc0149c21e40fc9dc061f0c476e169a7b4eea15/pyod-2.0.2.tar.gz",
    "platform": null,
    "description": "Python Outlier Detection (PyOD)\r\n===============================\r\n\r\n**Deployment & Documentation & Stats & License**\r\n\r\n|badge_pypi| |badge_anaconda| |badge_docs| |badge_stars| |badge_forks| |badge_downloads| |badge_testing| |badge_coverage| |badge_maintainability| |badge_license| |badge_benchmark|\r\n\r\n.. |badge_pypi| image:: https://img.shields.io/pypi/v/pyod.svg?color=brightgreen\r\n   :target: https://pypi.org/project/pyod/\r\n   :alt: PyPI version\r\n\r\n.. |badge_anaconda| image:: https://anaconda.org/conda-forge/pyod/badges/version.svg\r\n   :target: https://anaconda.org/conda-forge/pyod\r\n   :alt: Anaconda version\r\n\r\n.. |badge_docs| image:: https://readthedocs.org/projects/pyod/badge/?version=latest\r\n   :target: https://pyod.readthedocs.io/en/latest/?badge=latest\r\n   :alt: Documentation status\r\n\r\n.. |badge_stars| image:: https://img.shields.io/github/stars/yzhao062/pyod.svg\r\n   :target: https://github.com/yzhao062/pyod/stargazers\r\n   :alt: GitHub stars\r\n\r\n.. |badge_forks| image:: https://img.shields.io/github/forks/yzhao062/pyod.svg?color=blue\r\n   :target: https://github.com/yzhao062/pyod/network\r\n   :alt: GitHub forks\r\n\r\n.. |badge_downloads| image:: https://pepy.tech/badge/pyod\r\n   :target: https://pepy.tech/project/pyod\r\n   :alt: Downloads\r\n\r\n.. |badge_testing| image:: https://github.com/yzhao062/pyod/actions/workflows/testing.yml/badge.svg\r\n   :target: https://github.com/yzhao062/pyod/actions/workflows/testing.yml\r\n   :alt: Testing\r\n\r\n\r\n.. |badge_coverage| image:: https://coveralls.io/repos/github/yzhao062/pyod/badge.svg\r\n   :target: https://coveralls.io/github/yzhao062/pyod\r\n   :alt: Coverage Status\r\n\r\n.. |badge_maintainability| image:: https://api.codeclimate.com/v1/badges/bdc3d8d0454274c753c4/maintainability\r\n   :target: https://codeclimate.com/github/yzhao062/Pyod/maintainability\r\n   :alt: Maintainability\r\n\r\n.. |badge_license| image:: https://img.shields.io/github/license/yzhao062/pyod.svg\r\n   :target: https://github.com/yzhao062/pyod/blob/master/LICENSE\r\n   :alt: License\r\n\r\n.. |badge_benchmark| image:: https://img.shields.io/badge/ADBench-benchmark_results-pink\r\n   :target: https://github.com/Minqi824/ADBench\r\n   :alt: Benchmark\r\n\r\n\r\n-----\r\n\r\n\r\nRead Me First\r\n^^^^^^^^^^^^^\r\n\r\nWelcome to PyOD, a comprehensive but easy-to-use Python library for detecting anomalies in multivariate data. Whether you're tackling a small-scale project or large datasets, PyOD offers a range of algorithms to suit your needs.\r\n\r\n* **For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.\r\n\r\n* **For graph outlier detection**, please use `PyGOD <https://pygod.org/>`_.\r\n\r\n* **Performance Comparison & Datasets**: We have a 45-page, comprehensive `anomaly detection benchmark paper <https://openreview.net/forum?id=foA_SFQ9zo0>`_. The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.\r\n\r\n* **Learn more about anomaly detection** at `Anomaly Detection Resources <https://github.com/yzhao062/anomaly-detection-resources>`_\r\n\r\n* **PyOD on Distributed Systems**: you can also run `PyOD on databricks <https://www.databricks.com/blog/2023/03/13/unsupervised-outlier-detection-databricks.html>`_.\r\n\r\n----\r\n\r\nAbout PyOD\r\n^^^^^^^^^^\r\n\r\nPyOD, established in 2017, has become a go-to **Python library** for **detecting anomalous/outlying objects** in multivariate data. This exciting yet challenging field is commonly referred to as `Outlier Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.\r\n\r\nPyOD includes more than 50 detection algorithms, from classical LOF (SIGMOD 2000) to the cutting-edge ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic research projects and commercial products with more than `22 million downloads <https://pepy.tech/project/pyod>`_. It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including `Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_, `KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and `Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.\r\n\r\n**PyOD is featured for**:\r\n\r\n* **Unified, User-Friendly Interface** across various algorithms.\r\n* **Wide Range of Models**, from classic techniques to the latest deep learning methods in **PyTorch**.\r\n* **High Performance & Efficiency**, leveraging `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_ for JIT compilation and parallel processing.\r\n* **Fast Training & Prediction**, achieved through the SUOD framework [#Zhao2021SUOD]_.\r\n\r\n**Outlier Detection with 5 Lines of Code**:\r\n\r\n.. code-block:: python\r\n\r\n    # Example: Training an ECOD detector\r\n    from pyod.models.ecod import ECOD\r\n    clf = ECOD()\r\n    clf.fit(X_train)\r\n    y_train_scores = clf.decision_scores_  # Outlier scores for training data\r\n    y_test_scores = clf.decision_function(X_test)  # Outlier scores for test data\r\n\r\n\r\n**Selecting the Right Algorithm:** Unsure where to start? Consider these robust and interpretable options:\r\n\r\n- `ECOD <https://github.com/yzhao062/pyod/blob/master/examples/ecod_example.py>`_: Example of using ECOD for outlier detection\r\n- `Isolation Forest <https://github.com/yzhao062/pyod/blob/master/examples/iforest_example.py>`_: Example of using Isolation Forest for outlier detection\r\n\r\nAlternatively, explore `MetaOD <https://github.com/yzhao062/MetaOD>`_ for a data-driven approach.\r\n\r\n**Citing PyOD**:\r\n\r\n`PyOD paper <http://www.jmlr.org/papers/volume20/19-011/19-011.pdf>`_ is published in `Journal of Machine Learning Research (JMLR) <http://www.jmlr.org/>`_ (MLOSS track). If you use PyOD in a scientific publication, we would appreciate citations to the following paper::\r\n\r\n    @article{zhao2019pyod,\r\n        author  = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},\r\n        title   = {PyOD: A Python Toolbox for Scalable Outlier Detection},\r\n        journal = {Journal of Machine Learning Research},\r\n        year    = {2019},\r\n        volume  = {20},\r\n        number  = {96},\r\n        pages   = {1-7},\r\n        url     = {http://jmlr.org/papers/v20/19-011.html}\r\n    }\r\n\r\nor::\r\n\r\n    Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.\r\n\r\nFor a broader perspective on anomaly detection, see our NeurIPS papers `ADBench: Anomaly Detection Benchmark Paper <https://arxiv.org/abs/2206.09426>`_ and `ADGym: Design Choices for Deep Anomaly Detection <https://arxiv.org/abs/2309.15376>`_::\r\n\r\n    @article{han2022adbench,\r\n        title={Adbench: Anomaly detection benchmark},\r\n        author={Han, Songqiao and Hu, Xiyang and Huang, Hailiang and Jiang, Minqi and Zhao, Yue},\r\n        journal={Advances in Neural Information Processing Systems},\r\n        volume={35},\r\n        pages={32142--32159},\r\n        year={2022}\r\n    }\r\n\r\n    @article{jiang2023adgym,\r\n        title={ADGym: Design Choices for Deep Anomaly Detection},\r\n        author={Jiang, Minqi and Hou, Chaochuan and Zheng, Ao and Han, Songqiao and Huang, Hailiang and Wen, Qingsong and Hu, Xiyang and Zhao, Yue},\r\n        journal={Advances in Neural Information Processing Systems},\r\n        volume={36},\r\n        year={2023}\r\n    }\r\n\r\n\r\n**Table of Contents**:\r\n\r\n* `Installation <#installation>`_\r\n* `API Cheatsheet & Reference <#api-cheatsheet--reference>`_\r\n* `ADBench Benchmark and Datasets <#adbench-benchmark-and-datasets>`_\r\n* `Model Save & Load <#model-save--load>`_\r\n* `Fast Train with SUOD <#fast-train-with-suod>`_\r\n* `Thresholding Outlier Scores <#thresholding-outlier-scores>`_\r\n* `Implemented Algorithms <#implemented-algorithms>`_\r\n* `Quick Start for Outlier Detection <#quick-start-for-outlier-detection>`_\r\n* `How to Contribute <#how-to-contribute>`_\r\n* `Inclusion Criteria <#inclusion-criteria>`_\r\n\r\n----\r\n\r\nInstallation\r\n^^^^^^^^^^^^\r\n\r\nPyOD is designed for easy installation using either **pip** or **conda**. We recommend using the latest version of PyOD due to frequent updates and enhancements:\r\n\r\n.. code-block:: bash\r\n\r\n   pip install pyod            # normal install\r\n   pip install --upgrade pyod  # or update if needed\r\n\r\n.. code-block:: bash\r\n\r\n   conda install -c conda-forge pyod\r\n\r\nAlternatively, you can clone and run the setup.py file:\r\n\r\n.. code-block:: bash\r\n\r\n   git clone https://github.com/yzhao062/pyod.git\r\n   cd pyod\r\n   pip install .\r\n\r\n**Required Dependencies**:\r\n\r\n* Python 3.8 or higher\r\n* joblib\r\n* matplotlib\r\n* numpy>=1.19\r\n* numba>=0.51\r\n* scipy>=1.5.1\r\n* scikit_learn>=0.22.0\r\n\r\n**Optional Dependencies (see details below)**:\r\n\r\n* combo (optional, required for models/combination.py and FeatureBagging)\r\n* pytorch (optional, required for AutoEncoder, and other deep learning models)\r\n* suod (optional, required for running SUOD model)\r\n* xgboost (optional, required for XGBOD)\r\n* pythresh (optional, required for thresholding)\r\n\r\n----\r\n\r\n\r\nAPI Cheatsheet & Reference\r\n^^^^^^^^^^^^^^^^^^^^^^^^^^\r\n\r\nThe full API Reference is available at `PyOD Documentation <https://pyod.readthedocs.io/en/latest/pyod.html>`_. Below is a quick cheatsheet for all detectors:\r\n\r\n* **fit(X)**: Fit the detector. The parameter y is ignored in unsupervised methods.\r\n* **decision_function(X)**: Predict raw anomaly scores for X using the fitted detector.\r\n* **predict(X)**: Determine whether a sample is an outlier or not as binary labels using the fitted detector.\r\n* **predict_proba(X)**: Estimate the probability of a sample being an outlier using the fitted detector.\r\n* **predict_confidence(X)**: Assess the model's confidence on a per-sample basis (applicable in predict and predict_proba) [#Perini2020Quantifying]_.\r\n\r\n**Key Attributes of a fitted model**:\r\n\r\n* **decision_scores_**: Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores.\r\n* **labels_**: Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.\r\n\r\n\r\n----\r\n\r\n\r\nADBench Benchmark and Datasets\r\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\r\n\r\nWe just released a 45-page, the most comprehensive `ADBench: Anomaly Detection Benchmark <https://arxiv.org/abs/2206.09426>`_ [#Han2022ADBench]_.\r\nThe fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.\r\n\r\nThe organization of **ADBench** is provided below:\r\n\r\n.. image:: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true\r\n   :target: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true\r\n   :alt: benchmark-fig\r\n\r\n\r\nFor a simpler visualization, we make **the comparison of selected models** via\r\n`compare_all_models.py <https://github.com/yzhao062/pyod/blob/master/examples/compare_all_models.py>`_\\.\r\n\r\n.. image:: https://github.com/yzhao062/pyod/blob/development/examples/ALL.png?raw=true\r\n   :target: https://github.com/yzhao062/pyod/blob/development/examples/ALL.png?raw=true\r\n   :alt: Comparison_of_All\r\n\r\n\r\n\r\n----\r\n\r\nModel Save & Load\r\n^^^^^^^^^^^^^^^^^\r\n\r\nPyOD takes a similar approach of sklearn regarding model persistence.\r\nSee `model persistence <https://scikit-learn.org/stable/modules/model_persistence.html>`_ for clarification.\r\n\r\nIn short, we recommend to use joblib or pickle for saving and loading PyOD models.\r\nSee `\"examples/save_load_model_example.py\" <https://github.com/yzhao062/pyod/blob/master/examples/save_load_model_example.py>`_ for an example.\r\nIn short, it is simple as below:\r\n\r\n.. code-block:: python\r\n\r\n    from joblib import dump, load\r\n\r\n    # save the model\r\n    dump(clf, 'clf.joblib')\r\n    # load the model\r\n    clf = load('clf.joblib')\r\n\r\nIt is known that there are challenges in saving neural network models.\r\nCheck `#328 <https://github.com/yzhao062/pyod/issues/328#issuecomment-917192704>`_\r\nand `#88 <https://github.com/yzhao062/pyod/issues/88#issuecomment-615343139>`_\r\nfor temporary workaround.\r\n\r\n\r\n----\r\n\r\n\r\nFast Train with SUOD\r\n^^^^^^^^^^^^^^^^^^^^\r\n\r\n**Fast training and prediction**: it is possible to train and predict with\r\na large number of detection models in PyOD by leveraging SUOD framework [#Zhao2021SUOD]_.\r\nSee  `SUOD Paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/21-mlsys-suod.pdf>`_\r\nand  `SUOD example <https://github.com/yzhao062/pyod/blob/master/examples/suod_example.py>`_.\r\n\r\n\r\n.. code-block:: python\r\n\r\n    from pyod.models.suod import SUOD\r\n\r\n    # initialized a group of outlier detectors for acceleration\r\n    detector_list = [LOF(n_neighbors=15), LOF(n_neighbors=20),\r\n                     LOF(n_neighbors=25), LOF(n_neighbors=35),\r\n                     COPOD(), IForest(n_estimators=100),\r\n                     IForest(n_estimators=200)]\r\n\r\n    # decide the number of parallel process, and the combination method\r\n    # then clf can be used as any outlier detection model\r\n    clf = SUOD(base_estimators=detector_list, n_jobs=2, combination='average',\r\n               verbose=False)\r\n\r\n----\r\n\r\nThresholding Outlier Scores\r\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\r\n\r\nA more data-based approach can be taken when setting the contamination level. By using a thresholding method, guessing an arbitrary value can be replaced with tested techniques for separating inliers and outliers. Refer to `PyThresh <https://github.com/KulikDM/pythresh>`_ for a more in-depth look at thresholding.\r\n\r\n.. code-block:: python\r\n\r\n    from pyod.models.knn import KNN\r\n    from pyod.models.thresholds import FILTER\r\n\r\n    # Set the outlier detection and thresholding methods\r\n    clf = KNN(contamination=FILTER())\r\n\r\n\r\nSee supported thresholding methods in `thresholding <https://github.com/yzhao062/pyod/blob/master/docs/thresholding.rst>`_.\r\n\r\n----\r\n\r\n\r\n\r\nImplemented Algorithms\r\n^^^^^^^^^^^^^^^^^^^^^^\r\n\r\nPyOD toolkit consists of four major functional groups:\r\n\r\n**(i) Individual Detection Algorithms** :\r\n\r\n===================  ==================  ======================================================================================================  =====  ========================================\r\nType                 Abbr                Algorithm                                                                                               Year   Ref\r\n===================  ==================  ======================================================================================================  =====  ========================================\r\nProbabilistic        ECOD                Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions                        2022   [#Li2021ECOD]_\r\nProbabilistic        ABOD                Angle-Based Outlier Detection                                                                           2008   [#Kriegel2008Angle]_\r\nProbabilistic        FastABOD            Fast Angle-Based Outlier Detection using approximation                                                  2008   [#Kriegel2008Angle]_\r\nProbabilistic        COPOD               COPOD: Copula-Based Outlier Detection                                                                   2020   [#Li2020COPOD]_\r\nProbabilistic        MAD                 Median Absolute Deviation (MAD)                                                                         1993   [#Iglewicz1993How]_\r\nProbabilistic        SOS                 Stochastic Outlier Selection                                                                            2012   [#Janssens2012Stochastic]_\r\nProbabilistic        QMCD                Quasi-Monte Carlo Discrepancy outlier detection                                                         2001   [#Fang2001Wrap]_\r\nProbabilistic        KDE                 Outlier Detection with Kernel Density Functions                                                         2007   [#Latecki2007Outlier]_\r\nProbabilistic        Sampling            Rapid distance-based outlier detection via sampling                                                     2013   [#Sugiyama2013Rapid]_\r\nProbabilistic        GMM                 Probabilistic Mixture Modeling for Outlier Analysis                                                            [#Aggarwal2015Outlier]_ [Ch.2]\r\nLinear Model         PCA                 Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes)   2003   [#Shyu2003A]_\r\nLinear Model         KPCA                Kernel Principal Component Analysis                                                                     2007   [#Hoffmann2007Kernel]_\r\nLinear Model         MCD                 Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores)                    1999   [#Hardin2004Outlier]_ [#Rousseeuw1999A]_\r\nLinear Model         CD                  Use Cook's distance for outlier detection                                                               1977   [#Cook1977Detection]_\r\nLinear Model         OCSVM               One-Class Support Vector Machines                                                                       2001   [#Scholkopf2001Estimating]_\r\nLinear Model         LMDD                Deviation-based Outlier Detection (LMDD)                                                                1996   [#Arning1996A]_\r\nProximity-Based      LOF                 Local Outlier Factor                                                                                    2000   [#Breunig2000LOF]_\r\nProximity-Based      COF                 Connectivity-Based Outlier Factor                                                                       2002   [#Tang2002Enhancing]_\r\nProximity-Based      (Incremental) COF   Memory Efficient Connectivity-Based Outlier Factor (slower but reduce storage complexity)               2002   [#Tang2002Enhancing]_\r\nProximity-Based      CBLOF               Clustering-Based Local Outlier Factor                                                                   2003   [#He2003Discovering]_\r\nProximity-Based      LOCI                LOCI: Fast outlier detection using the local correlation integral                                       2003   [#Papadimitriou2003LOCI]_\r\nProximity-Based      HBOS                Histogram-based Outlier Score                                                                           2012   [#Goldstein2012Histogram]_\r\nProximity-Based      kNN                 k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score)                 2000   [#Ramaswamy2000Efficient]_\r\nProximity-Based      AvgKNN              Average kNN (use the average distance to k nearest neighbors as the outlier score)                      2002   [#Angiulli2002Fast]_\r\nProximity-Based      MedKNN              Median kNN (use the median distance to k nearest neighbors as the outlier score)                        2002   [#Angiulli2002Fast]_\r\nProximity-Based      SOD                 Subspace Outlier Detection                                                                              2009   [#Kriegel2009Outlier]_\r\nProximity-Based      ROD                 Rotation-based Outlier Detection                                                                        2020   [#Almardeny2020A]_\r\nOutlier Ensembles    IForest             Isolation Forest                                                                                        2008   [#Liu2008Isolation]_\r\nOutlier Ensembles    INNE                Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles                                      2018   [#Bandaragoda2018Isolation]_\r\nOutlier Ensembles    DIF                 Deep Isolation Forest for Anomaly Detection                                                             2023   [#Xu2023Deep]_\r\nOutlier Ensembles    FB                  Feature Bagging                                                                                         2005   [#Lazarevic2005Feature]_\r\nOutlier Ensembles    LSCP                LSCP: Locally Selective Combination of Parallel Outlier Ensembles                                       2019   [#Zhao2019LSCP]_\r\nOutlier Ensembles    XGBOD               Extreme Boosting Based Outlier Detection **(Supervised)**                                               2018   [#Zhao2018XGBOD]_\r\nOutlier Ensembles    LODA                Lightweight On-line Detector of Anomalies                                                               2016   [#Pevny2016Loda]_\r\nOutlier Ensembles    SUOD                SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection **(Acceleration)**          2021   [#Zhao2021SUOD]_\r\nNeural Networks      AutoEncoder         Fully connected AutoEncoder (use reconstruction error as the outlier score)                                    [#Aggarwal2015Outlier]_ [Ch.3]\r\nNeural Networks      VAE                 Variational AutoEncoder (use reconstruction error as the outlier score)                                 2013   [#Kingma2013Auto]_\r\nNeural Networks      Beta-VAE            Variational AutoEncoder (all customized loss term by varying gamma and capacity)                        2018   [#Burgess2018Understanding]_\r\nNeural Networks      SO_GAAL             Single-Objective Generative Adversarial Active Learning                                                 2019   [#Liu2019Generative]_\r\nNeural Networks      MO_GAAL             Multiple-Objective Generative Adversarial Active Learning                                               2019   [#Liu2019Generative]_\r\nNeural Networks      DeepSVDD            Deep One-Class Classification                                                                           2018   [#Ruff2018Deep]_\r\nNeural Networks      AnoGAN              Anomaly Detection with Generative Adversarial Networks                                                  2017   [#Schlegl2017Unsupervised]_\r\nNeural Networks      ALAD                Adversarially learned anomaly detection                                                                 2018   [#Zenati2018Adversarially]_\r\nNeural Networks      AE1SVM              Autoencoder-based One-class Support Vector Machine                                                      2019   [#Nguyen2019scalable]_\r\nNeural Networks      DevNet              Deep Anomaly Detection with Deviation Networks                                                          2019   [#Pang2019Deep]_\r\nGraph-based          R-Graph             Outlier detection by R-graph                                                                            2017   [#You2017Provable]_\r\nGraph-based          LUNAR               LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks                               2022   [#Goodge2022Lunar]_\r\n===================  ==================  ======================================================================================================  =====  ========================================\r\n\r\n\r\n**(ii) Outlier Ensembles & Outlier Detector Combination Frameworks**:\r\n\r\n===================  ================  =====================================================================================================  =====  ========================================\r\nType                 Abbr              Algorithm                                                                                              Year   Ref\r\n===================  ================  =====================================================================================================  =====  ========================================\r\nOutlier Ensembles    FB                Feature Bagging                                                                                        2005   [#Lazarevic2005Feature]_\r\nOutlier Ensembles    LSCP              LSCP: Locally Selective Combination of Parallel Outlier Ensembles                                      2019   [#Zhao2019LSCP]_\r\nOutlier Ensembles    XGBOD             Extreme Boosting Based Outlier Detection **(Supervised)**                                              2018   [#Zhao2018XGBOD]_\r\nOutlier Ensembles    LODA              Lightweight On-line Detector of Anomalies                                                              2016   [#Pevny2016Loda]_\r\nOutlier Ensembles    SUOD              SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection **(Acceleration)**         2021   [#Zhao2021SUOD]_\r\nOutlier Ensembles    INNE              Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles                                     2018   [#Bandaragoda2018Isolation]_\r\nCombination          Average           Simple combination by averaging the scores                                                             2015   [#Aggarwal2015Theoretical]_\r\nCombination          Weighted Average  Simple combination by averaging the scores with detector weights                                       2015   [#Aggarwal2015Theoretical]_\r\nCombination          Maximization      Simple combination by taking the maximum scores                                                        2015   [#Aggarwal2015Theoretical]_\r\nCombination          AOM               Average of Maximum                                                                                     2015   [#Aggarwal2015Theoretical]_\r\nCombination          MOA               Maximization of Average                                                                                2015   [#Aggarwal2015Theoretical]_\r\nCombination          Median            Simple combination by taking the median of the scores                                                  2015   [#Aggarwal2015Theoretical]_\r\nCombination          majority Vote     Simple combination by taking the majority vote of the labels (weights can be used)                     2015   [#Aggarwal2015Theoretical]_\r\n===================  ================  =====================================================================================================  =====  ========================================\r\n\r\n\r\n**(iii) Utility Functions**:\r\n\r\n===================  ======================  =====================================================================================================================================================  ======================================================================================================================================\r\nType                 Name                    Function                                                                                                                                               Documentation\r\n===================  ======================  =====================================================================================================================================================  ======================================================================================================================================\r\nData                 generate_data           Synthesized data generation; normal data is generated by a multivariate Gaussian and outliers are generated by a uniform distribution                  `generate_data <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.data.generate_data>`_\r\nData                 generate_data_clusters  Synthesized data generation in clusters; more complex data patterns can be created with multiple clusters                                              `generate_data_clusters <https://pyod.readthedocs.io/en/latest/pyod.utils.html#pyod.utils.data.generate_data_clusters>`_\r\nStat                 wpearsonr               Calculate the weighted Pearson correlation of two samples                                                                                              `wpearsonr <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.stat_models.wpearsonr>`_\r\nUtility              get_label_n             Turn raw outlier scores into binary labels by assign 1 to top n outlier scores                                                                         `get_label_n <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.utility.get_label_n>`_\r\nUtility              precision_n_scores      calculate precision @ rank n                                                                                                                           `precision_n_scores <https://pyod.readthedocs.io/en/latest/pyod.utils.html#module-pyod.utils.utility.precision_n_scores>`_\r\n===================  ======================  =====================================================================================================================================================  ======================================================================================================================================\r\n\r\n----\r\n\r\nQuick Start for Outlier Detection\r\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\r\n\r\nPyOD has been well acknowledged by the machine learning community with a few featured posts and tutorials.\r\n\r\n**Analytics Vidhya**: `An Awesome Tutorial to Learn Outlier Detection in Python using PyOD Library <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_\r\n\r\n**KDnuggets**: `Intuitive Visualization of Outlier Detection Methods <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, `An Overview of Outlier Detection Methods from PyOD <https://www.kdnuggets.com/2019/06/overview-outlier-detection-methods-pyod.html>`_\r\n\r\n**Towards Data Science**: `Anomaly Detection for Dummies <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_\r\n\r\n`\"examples/knn_example.py\" <https://github.com/yzhao062/pyod/blob/master/examples/knn_example.py>`_\r\ndemonstrates the basic API of using kNN detector. **It is noted that the API across all other algorithms are consistent/similar**.\r\n\r\nMore detailed instructions for running examples can be found in `examples directory <https://github.com/yzhao062/pyod/blob/master/examples>`_.\r\n\r\n\r\n#. Initialize a kNN detector, fit the model, and make the prediction.\r\n\r\n   .. code-block:: python\r\n\r\n\r\n       from pyod.models.knn import KNN   # kNN detector\r\n\r\n       # train kNN detector\r\n       clf_name = 'KNN'\r\n       clf = KNN()\r\n       clf.fit(X_train)\r\n\r\n       # get the prediction label and outlier scores of the training data\r\n       y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)\r\n       y_train_scores = clf.decision_scores_  # raw outlier scores\r\n\r\n       # get the prediction on the test data\r\n       y_test_pred = clf.predict(X_test)  # outlier labels (0 or 1)\r\n       y_test_scores = clf.decision_function(X_test)  # outlier scores\r\n\r\n       # it is possible to get the prediction confidence as well\r\n       y_test_pred, y_test_pred_confidence = clf.predict(X_test, return_confidence=True)  # outlier labels (0 or 1) and confidence in the range of [0,1]\r\n\r\n#. Evaluate the prediction by ROC and Precision @ Rank n (p@n).\r\n\r\n   .. code-block:: python\r\n\r\n       from pyod.utils.data import evaluate_print\r\n       \r\n       # evaluate and print the results\r\n       print(\"\\nOn Training Data:\")\r\n       evaluate_print(clf_name, y_train, y_train_scores)\r\n       print(\"\\nOn Test Data:\")\r\n       evaluate_print(clf_name, y_test, y_test_scores)\r\n\r\n\r\n#. See a sample output & visualization.\r\n\r\n\r\n   .. code-block:: python\r\n\r\n\r\n       On Training Data:\r\n       KNN ROC:1.0, precision @ rank n:1.0\r\n\r\n       On Test Data:\r\n       KNN ROC:0.9989, precision @ rank n:0.9\r\n\r\n   .. code-block:: python\r\n\r\n\r\n       visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,\r\n           y_test_pred, show_figure=True, save_figure=False)\r\n\r\nVisualization (\\ `knn_figure <https://raw.githubusercontent.com/yzhao062/pyod/master/examples/KNN.png>`_\\ ):\r\n\r\n.. image:: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/KNN.png\r\n   :target: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/KNN.png\r\n   :alt: kNN example figure\r\n\r\n----\r\n\r\nReference\r\n^^^^^^^^^\r\n\r\n\r\n.. [#Aggarwal2015Outlier] Aggarwal, C.C., 2015. Outlier analysis. In Data mining (pp. 237-263). Springer, Cham.\r\n\r\n.. [#Aggarwal2015Theoretical] Aggarwal, C.C. and Sathe, S., 2015. Theoretical foundations and algorithms for outlier ensembles.\\ *ACM SIGKDD Explorations Newsletter*\\ , 17(1), pp.24-47.\r\n\r\n.. [#Aggarwal2017Outlier] Aggarwal, C.C. and Sathe, S., 2017. Outlier ensembles: An introduction. Springer.\r\n\r\n.. [#Almardeny2020A] Almardeny, Y., Boujnah, N. and Cleary, F., 2020. A Novel Outlier Detection Method for Multivariate Data. *IEEE Transactions on Knowledge and Data Engineering*.\r\n\r\n.. [#Angiulli2002Fast] Angiulli, F. and Pizzuti, C., 2002, August. Fast outlier detection in high dimensional spaces. In *European Conference on Principles of Data Mining and Knowledge Discovery* pp. 15-27.\r\n\r\n.. [#Arning1996A] Arning, A., Agrawal, R. and Raghavan, P., 1996, August. A Linear Method for Deviation Detection in Large Databases. In *KDD* (Vol. 1141, No. 50, pp. 972-981).\r\n\r\n.. [#Bandaragoda2018Isolation] Bandaragoda, T. R., Ting, K. M., Albrecht, D., Liu, F. T., Zhu, Y., and Wells, J. R., 2018, Isolation-based anomaly detection using nearest-neighbor ensembles. *Computational Intelligence*\\ , 34(4), pp. 968-998.\r\n\r\n.. [#Breunig2000LOF] Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. *ACM Sigmod Record*\\ , 29(2), pp. 93-104.\r\n\r\n.. [#Burgess2018Understanding] Burgess, Christopher P., et al. \"Understanding disentangling in beta-VAE.\" arXiv preprint arXiv:1804.03599 (2018).\r\n\r\n.. [#Cook1977Detection] Cook, R.D., 1977. Detection of influential observation in linear regression. Technometrics, 19(1), pp.15-18.\r\n\r\n.. [#Fang2001Wrap] Fang, K.T. and Ma, C.X., 2001. Wrap-around L2-discrepancy of random sampling, Latin hypercube and uniform designs. Journal of complexity, 17(4), pp.608-624.\r\n\r\n.. [#Goldstein2012Histogram] Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In *KI-2012: Poster and Demo Track*\\ , pp.59-63.\r\n\r\n.. [#Goodge2022Lunar] Goodge, A., Hooi, B., Ng, S.K. and Ng, W.S., 2022, June. Lunar: Unifying local outlier detection methods via graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence.\r\n\r\n.. [#Gopalan2019PIDForest] Gopalan, P., Sharan, V. and Wieder, U., 2019. PIDForest: Anomaly Detection via Partial Identification. In Advances in Neural Information Processing Systems, pp. 15783-15793.\r\n\r\n.. [#Han2022ADBench] Han, S., Hu, X., Huang, H., Jiang, M. and Zhao, Y., 2022. ADBench: Anomaly Detection Benchmark. arXiv preprint arXiv:2206.09426.\r\n\r\n.. [#Hardin2004Outlier] Hardin, J. and Rocke, D.M., 2004. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. *Computational Statistics & Data Analysis*\\ , 44(4), pp.625-638.\r\n\r\n.. [#He2003Discovering] He, Z., Xu, X. and Deng, S., 2003. Discovering cluster-based local outliers. *Pattern Recognition Letters*\\ , 24(9-10), pp.1641-1650.\r\n\r\n.. [#Hoffmann2007Kernel] Hoffmann, H., 2007. Kernel PCA for novelty detection. Pattern recognition, 40(3), pp.863-874.\r\n\r\n.. [#Iglewicz1993How] Iglewicz, B. and Hoaglin, D.C., 1993. How to detect and handle outliers (Vol. 16). Asq Press.\r\n\r\n.. [#Janssens2012Stochastic] Janssens, J.H.M., Husz\u00e1r, F., Postma, E.O. and van den Herik, H.J., 2012. Stochastic outlier selection. Technical report TiCC TR 2012-001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands.\r\n\r\n.. [#Kingma2013Auto] Kingma, D.P. and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.\r\n\r\n.. [#Kriegel2008Angle] Kriegel, H.P. and Zimek, A., 2008, August. Angle-based outlier detection in high-dimensional data. In *KDD '08*\\ , pp. 444-452. ACM.\r\n\r\n.. [#Kriegel2009Outlier] Kriegel, H.P., Kr\u00f6ger, P., Schubert, E. and Zimek, A., 2009, April. Outlier detection in axis-parallel subspaces of high dimensional data. In *Pacific-Asia Conference on Knowledge Discovery and Data Mining*\\ , pp. 831-838. Springer, Berlin, Heidelberg.\r\n\r\n.. [#Latecki2007Outlier] Latecki, L.J., Lazarevic, A. and Pokrajac, D., 2007, July. Outlier detection with kernel density functions. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 61-75). Springer, Berlin, Heidelberg.\r\n\r\n.. [#Lazarevic2005Feature] Lazarevic, A. and Kumar, V., 2005, August. Feature bagging for outlier detection. In *KDD '05*. 2005.\r\n\r\n.. [#Li2019MADGAN] Li, D., Chen, D., Jin, B., Shi, L., Goh, J. and Ng, S.K., 2019, September. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In *International Conference on Artificial Neural Networks* (pp. 703-716). Springer, Cham.\r\n\r\n.. [#Li2020COPOD] Li, Z., Zhao, Y., Botta, N., Ionescu, C. and Hu, X. COPOD: Copula-Based Outlier Detection. *IEEE International Conference on Data Mining (ICDM)*, 2020.\r\n\r\n.. [#Li2021ECOD] Li, Z., Zhao, Y., Hu, X., Botta, N., Ionescu, C. and Chen, H. G. ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. *IEEE Transactions on Knowledge and Data Engineering (TKDE)*, 2022.\r\n\r\n.. [#Liu2008Isolation] Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In *International Conference on Data Mining*\\ , pp. 413-422. IEEE.\r\n\r\n.. [#Liu2019Generative] Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M. and He, X., 2019. Generative adversarial active learning for unsupervised outlier detection. *IEEE Transactions on Knowledge and Data Engineering*.\r\n\r\n.. [#Nguyen2019scalable] Nguyen, M.N. and Vien, N.A., 2019. Scalable and interpretable one-class svms with deep learning and random fourier features. In *Machine Learning and Knowledge Discovery in Databases: European Conference*, ECML PKDD, 2018.\r\n\r\n.. [#Pang2019Deep] Pang, Guansong, Chunhua Shen, and Anton Van Den Hengel. \"Deep anomaly detection with deviation networks.\" In *KDD*, pp. 353-362. 2019.\r\n\r\n.. [#Papadimitriou2003LOCI] Papadimitriou, S., Kitagawa, H., Gibbons, P.B. and Faloutsos, C., 2003, March. LOCI: Fast outlier detection using the local correlation integral. In *ICDE '03*, pp. 315-326. IEEE.\r\n\r\n.. [#Pevny2016Loda] Pevn\u00fd, T., 2016. Loda: Lightweight on-line detector of anomalies. *Machine Learning*, 102(2), pp.275-304.\r\n\r\n.. [#Perini2020Quantifying] Perini, L., Vercruyssen, V., Davis, J. Quantifying the confidence of anomaly detectors in their example-wise predictions. In *Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD)*, 2020.\r\n\r\n.. [#Ramaswamy2000Efficient] Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. *ACM Sigmod Record*\\ , 29(2), pp. 427-438.\r\n\r\n.. [#Rousseeuw1999A] Rousseeuw, P.J. and Driessen, K.V., 1999. A fast algorithm for the minimum covariance determinant estimator. *Technometrics*\\ , 41(3), pp.212-223.\r\n\r\n.. [#Ruff2018Deep] Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., M\u00fcller, E. and Kloft, M., 2018, July. Deep one-class classification. In *International conference on machine learning* (pp. 4393-4402). PMLR.\r\n\r\n.. [#Schlegl2017Unsupervised] Schlegl, T., Seeb\u00f6ck, P., Waldstein, S.M., Schmidt-Erfurth, U. and Langs, G., 2017, June. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International conference on information processing in medical imaging (pp. 146-157). Springer, Cham.\r\n\r\n.. [#Scholkopf2001Estimating] Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C., 2001. Estimating the support of a high-dimensional distribution. *Neural Computation*, 13(7), pp.1443-1471.\r\n\r\n.. [#Shyu2003A] Shyu, M.L., Chen, S.C., Sarinnapakorn, K. and Chang, L., 2003. A novel anomaly detection scheme based on principal component classifier. *MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING*.\r\n\r\n.. [#Sugiyama2013Rapid] Sugiyama, M. and Borgwardt, K., 2013. Rapid distance-based outlier detection via sampling. Advances in neural information processing systems, 26.\r\n\r\n.. [#Tang2002Enhancing] Tang, J., Chen, Z., Fu, A.W.C. and Cheung, D.W., 2002, May. Enhancing effectiveness of outlier detections for low density patterns. In *Pacific-Asia Conference on Knowledge Discovery and Data Mining*, pp. 535-548. Springer, Berlin, Heidelberg.\r\n\r\n.. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.\r\n\r\n.. [#Xu2023Deep] Xu, H., Pang, G., Wang, Y., Wang, Y., 2023. Deep isolation forest for anomaly detection. *IEEE Transactions on Knowledge and Data Engineering*.\r\n\r\n.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.\r\n\r\n.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.\r\n\r\n.. [#Zhao2018XGBOD] Zhao, Y. and Hryniewicki, M.K. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. *IEEE International Joint Conference on Neural Networks*\\ , 2018.\r\n\r\n.. [#Zhao2019LSCP] Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In *Proceedings of the 2019 SIAM International Conference on Data Mining (SDM)*, pp. 585-593. Society for Industrial and Applied Mathematics.\r\n\r\n.. [#Zhao2021SUOD] Zhao, Y., Hu, X., Cheng, C., Wang, C., Wan, C., Wang, W., Yang, J., Bai, H., Li, Z., Xiao, C., Wang, Y., Qiao, Z., Sun, J. and Akoglu, L. (2021). SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection. *Conference on Machine Learning and Systems (MLSys)*.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)",
    "version": "2.0.2",
    "project_urls": {
        "Download": "https://github.com/yzhao062/pyod/archive/master.zip",
        "Homepage": "https://github.com/yzhao062/pyod"
    },
    "split_keywords": [
        "outlier detection",
        " anomaly detection",
        " outlier ensembles",
        " data mining",
        " neural networks"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ef8680ff0719a4a24e84e1e6bfc0149c21e40fc9dc061f0c476e169a7b4eea15",
                "md5": "4848b8b83d229c909db207e437ae1699",
                "sha256": "e05b7a457ac9bd8b0cc41754fdafe3aefcc1f61270f33d38f43112676d9db2b8"
            },
            "downloads": -1,
            "filename": "pyod-2.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4848b8b83d229c909db207e437ae1699",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 165811,
            "upload_time": "2024-09-06T03:24:51",
            "upload_time_iso_8601": "2024-09-06T03:24:51.612229Z",
            "url": "https://files.pythonhosted.org/packages/ef/86/80ff0719a4a24e84e1e6bfc0149c21e40fc9dc061f0c476e169a7b4eea15/pyod-2.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-06 03:24:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yzhao062",
    "github_project": "pyod",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "joblib",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.19"
                ]
            ]
        },
        {
            "name": "numba",
            "specs": [
                [
                    ">=",
                    "0.51"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "0.22.0"
                ]
            ]
        }
    ],
    "lcname": "pyod"
}
        
Elapsed time: 0.29658s