pythresh


Namepythresh JSON
Version 0.3.8 PyPI version JSON
download
home_pagehttps://github.com/KulikDM/pythresh
SummaryA Python Toolbox for Outlier Detection Thresholding
upload_time2024-12-16 05:06:42
maintainerNone
docs_urlNone
authorD Kulik
requires_pythonNone
licenseNone
keywords outlier detection anomaly detection thresholding cutoff contamintion level data science machine learning
VCS
bugtrack_url
requirements numpy pyod scikit-learn scipy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ##################################################
 Python Outlier Detection Thresholding (PyThresh)
##################################################

**Deployment, Stats, & License**

|badge_pypi| |badge_anaconda| |badge_docs| |badge_testing|
|badge_coverage| |badge_maintainability| |badge_stars| |badge_downloads|
|badge_versions| |badge_licence| |badge_citation|

.. |badge_pypi| image:: https://img.shields.io/pypi/v/pythresh.svg?color=brightgreen&logo=pypi&logoColor=white
   :alt: PyPI version
   :target: https://pypi.org/project/pythresh/

.. |badge_anaconda| image:: https://img.shields.io/conda/vn/conda-forge/pythresh?color=brightgreen&logo=conda-forge&logoColor=white
   :alt: Anaconda version
   :target: https://anaconda.org/conda-forge/pythresh

.. |badge_docs| image:: https://img.shields.io/readthedocs/pythresh.svg?version=latest&logo=read-the-docs&logoColor=white
   :alt: Documentation status
   :target: http://pythresh.readthedocs.io/?badge=latest

.. |badge_testing| image:: https://github.com/KulikDM/pythresh/actions/workflows/python-package.yml/badge.svg
   :alt: testing
   :target: https://github.com/KulikDM/pythresh/actions/workflows/python-package.yml

.. |badge_coverage| image:: https://codecov.io/gh/KulikDM/pythresh/branch/main/graph/badge.svg?token=8ZAPXTLW9Y
   :alt: Codecov
   :target: https://codecov.io/gh/KulikDM/pythresh

.. |badge_maintainability| image:: https://api.codeclimate.com/v1/badges/3e2de42b48701c731ef6/maintainability
   :alt: Maintainability
   :target: https://codeclimate.com/github/KulikDM/pythresh/maintainability

.. |badge_stars| image:: https://img.shields.io/github/stars/KulikDM/pythresh.svg?logo=github&logoColor=white&style=flat
   :alt: GitHub stars
   :target: https://github.com/KulikDM/pythresh/stargazers

.. |badge_downloads| image:: https://img.shields.io/badge/dynamic/xml?url=https%3A%2F%2Fstatic.pepy.tech%2Fbadge%2Fpythresh&query=%2F%2F*%5Blocal-name()%20%3D%20%27text%27%5D%5Blast()%5D&logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyBzdHlsZT0iZW5hYmxlLWJhY2tncm91bmQ6bmV3IDAgMCAyNCAyNDsiIHZlcnNpb249IjEuMSIgdmlld0JveD0iMCAwIDI0IDI0IiB4bWw6c3BhY2U9InByZXNlcnZlIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIj48ZyBpZD0iaW5mbyIvPjxnIGlkPSJpY29ucyI%2BPGcgaWQ9InNhdmUiPjxwYXRoIGQ9Ik0xMS4yLDE2LjZjMC40LDAuNSwxLjIsMC41LDEuNiwwbDYtNi4zQzE5LjMsOS44LDE4LjgsOSwxOCw5aC00YzAsMCwwLjItNC42LDAtN2MtMC4xLTEuMS0wLjktMi0yLTJjLTEuMSwwLTEuOSwwLjktMiwyICAgIGMtMC4yLDIuMywwLDcsMCw3SDZjLTAuOCwwLTEuMywwLjgtMC44LDEuNEwxMS4yLDE2LjZ6IiBmaWxsPSIjZWJlYmViIi8%2BPHBhdGggZD0iTTE5LDE5SDVjLTEuMSwwLTIsMC45LTIsMnYwYzAsMC42LDAuNCwxLDEsMWgxNmMwLjYsMCwxLTAuNCwxLTF2MEMyMSwxOS45LDIwLjEsMTksMTksMTl6IiBmaWxsPSIjZWJlYmViIi8%2BPC9nPjwvZz48L3N2Zz4%3D&label=downloads
   :alt: Downloads
   :target: https://pepy.tech/project/pythresh

.. |badge_versions| image:: https://img.shields.io/pypi/pyversions/pythresh.svg?logo=python&logoColor=white
   :alt: Python versions
   :target: https://pypi.org/project/pythresh/

.. |badge_licence| image:: https://img.shields.io/github/license/KulikDM/pythresh.svg?logo=
   :alt: License
   :target: https://github.com/KulikDM/pythresh/blob/master/LICENSE

.. |badge_citation| image:: https://zenodo.org/badge/497683169.svg
   :alt: Zenodo DOI
   :target: https://zenodo.org/badge/latestdoi/497683169

----

PyThresh is a comprehensive and scalable **Python toolkit** for
**thresholding outlier detection likelihood scores** in
univariate/multivariate data. It has been written to work in tandem with
PyOD and has similar syntax and data structures. However, it is not
limited to this single library. PyThresh is meant to threshold
likelihood scores generated by an outlier detector. It thresholds these
likelihood scores and replaces the need to set a contamination level or
have the user guess the amount of outliers that may exist in the dataset
beforehand. These non-parametric methods were written to reduce the
user's input/guess work and rather rely on statistics instead to
threshold outlier likelihood scores. For thresholding to be applied
correctly, the outlier detection likelihood scores must follow this
rule: the higher the score, the higher the probability that it is an
outlier in the dataset. All threshold functions return a binary array
where inliers and outliers are represented by a 0 and 1 respectively.

PyThresh includes more than 30 thresholding algorithms. These algorithms
range from using simple statistical analysis like the Z-score to more
complex mathematical methods that involve graph theory and topology.

************************
 Documentation & Citing
************************

Visit `PyThresh Docs
<https://pythresh.readthedocs.io/en/latest/?badge=latest>`_ for full
documentation or see below for a quickstart installation and usage
example.

To cite this work you can visit `PyThresh Citation
<https://zenodo.org/badge/latestdoi/497683169>`_

----

**Outlier Detection Thresholding with 7 Lines of Code**:

.. code:: python

   # train the KNN detector
   from pyod.models.knn import KNN
   from pythresh.thresholds.filter import FILTER

   clf = KNN()
   clf.fit(X_train)

   # get outlier scores
   decision_scores = clf.decision_scores_  # raw outlier scores on the train data

   # get outlier labels
   thres = FILTER()
   labels = thres.eval(decision_scores)

or using multiple outlier detection score sets

.. code:: python

   # train multiple detectors
   from pyod.models.knn import KNN
   from pyod.models.pca import PCA
   from pyod.models.iforest import IForest
   from pythresh.thresholds.filter import FILTER

   clfs = [KNN(), IForest(), PCA()]

   # get outlier scores for each detector
   scores = [clf.fit(X_train).decision_scores_ for clf in clfs]

   scores = np.vstack(scores).T

   # get outlier labels
   thres = FILTER()
   labels = thres.eval(scores)

**************
 Installation
**************

It is recommended to use **pip** or **conda** for installation:

.. code:: bash

   pip install pythresh            # normal install
   pip install --upgrade pythresh  # or update if needed

.. code:: bash

   conda install -c conda-forge pythresh

Alternatively, you can get the version with the latest updates by
cloning the repo and run setup.py file:

.. code:: bash

   git clone https://github.com/KulikDM/pythresh.git
   cd pythresh
   pip install .

Or with **pip**:

.. code:: bash

   pip install https://github.com/KulikDM/pythresh/archive/main.zip

**Required Dependencies**:

-  numpy>=1.13
-  pyod
-  scipy>=1.3.1
-  scikit_learn>=0.20.0

**Optional Dependencies**:

-  pyclustering (used in the CLUST thresholder)
-  ruptures (used in the CPD thresholder)
-  scikit-lego (used in the META thresholder)
-  joblib>=0.14.1 (used in the META thresholder and RANK)
-  pandas (used in the META thresholder)
-  torch (used in the VAE thresholder)
-  tqdm (used in the VAE thresholder)
-  xgboost>=2.0.0 (used in the RANK)

****************
 API Cheatsheet
****************

-  **eval(score)**: evaluate a single outlier or multiple outlier
   detection likelihood score sets.

Key Attributes of threshold:

-  **thresh_**: Return the threshold value that separates inliers from
   outliers. Outliers are considered all values above this threshold
   value. Note the threshold value has been derived from likelihood
   scores normalized between 0 and 1.

-  **confidence_interval_**: Return the lower and upper confidence
   interval of the contamination level. Only applies to the COMB
   thresholder

-  **dscores_**: 1D array of the TruncatedSVD decomposed decision scores
   if multiple outlier detector score sets are passed

-  **mixture_**: fitted mixture model class of the selected model used
   for thresholding. Only applies to MIXMOD. Attributes include:
   components, weights, params. Functions include: fit, loglikelihood,
   pdf, and posterior.

************************
 External Feature Cases
************************

**Towards Data Science**: `Thresholding Outlier Detection Scores with
PyThresh
<https://towardsdatascience.com/thresholding-outlier-detection-scores-with-pythresh-f26299d14fa>`_

**Towards Data Science**: `When Outliers are Significant: Weighted
Linear Regression
<https://towardsdatascience.com/when-outliers-are-significant-weighted-linear-regression-bcdc8389ab10>`_

**ArXiv**: `Estimating the Contamination Factor's Distribution in
Unsupervised Anomaly Detection. <https://arxiv.org/abs/2210.10487>`_

***********************************
 Available Thresholding Algorithms
***********************************

+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Abbr      | Description                               | References         | Documentation                                                                                                                                          |
+===========+===========================================+====================+========================================================================================================================================================+
| AUCP      | Area Under Curve Percentage               | [#aucp1]_          | `pythresh.thresholds.aucp module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.aucp>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| BOOT      | Bootstrapping                             | [#boot1]_          | `pythresh.thresholds.boot module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.boot>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| CHAU      | Chauvenet's Criterion                     | [#chau1]_          | `pythresh.thresholds.chau module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.chau>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLF       | Trained Linear Classifier                 | [#clf1]_           | `pythresh.thresholds.clf module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.clf>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLUST     | Clustering Based                          | [#clust1]_         | `pythresh.thresholds.clust module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.clust>`_              |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| CPD       | Change Point Detection                    | [#cpd1]_           | `pythresh.thresholds.cpd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.cpd>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| DECOMP    | Decomposition                             | [#decomp1]_        | `pythresh.thresholds.decomp module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.decomp>`_            |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| DSN       | Distance Shift from Normal                | [#dsn1]_           | `pythresh.thresholds.dsn module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.dsn>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| EB        | Elliptical Boundary                       | [#eb1]_            | `pythresh.thresholds.eb module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.eb>`_                    |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| FGD       | Fixed Gradient Descent                    | [#fgd1]_           | `pythresh.thresholds.fgd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.fgd>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| FILTER    | Filtering Based                           | [#filter1]_        | `pythresh.thresholds.filter module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.filter>`_            |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| FWFM      | Full Width at Full Minimum                | [#fwfm1]_          | `pythresh.thresholds.fwfm module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.fwfm>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| GAMGMM    | Bayesian Gamma GMM                        | [#gamgmm1]_        | `pythresh.thresholds.gamgmm module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.gamgmm>`_            |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| GESD      | Generalized Extreme Studentized Deviate   | [#gesd1]_          | `pythresh.thresholds.gesd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.gesd>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| HIST      | Histogram Based                           | [#hist1]_          | `pythresh.thresholds.hist module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.hist>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| IQR       | Inter-Quartile Region                     | [#iqr1]_           | `pythresh.thresholds.iqr module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.iqr>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| KARCH     | Karcher mean (Riemannian Center of Mass)  | [#karch1]_         | `pythresh.thresholds.karch module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.karch>`_              |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| MAD       | Median Absolute Deviation                 | [#mad1]_           | `pythresh.thresholds.mad module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mad>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| MCST      | Monte Carlo Shapiro Tests                 | [#mcst1]_          | `pythresh.thresholds.mcst module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mcst>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| META      | Meta-model Trained Classifier             | [#meta1]_          | `pythresh.thresholds.meta module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.meta>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| MIXMOD    | Normal & Non-Normal Mixture Models        | [#mixmod1]_        | `pythresh.thresholds.mixmod module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mixmod>`_            |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| MOLL      | Friedrichs' Mollifier                     | [#moll1]_          | `pythresh.thresholds.moll module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.moll>`_                |
|           |                                           | [#moll2]_          |                                                                                                                                                        |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| MTT       | Modified Thompson Tau Test                | [#mtt1]_           | `pythresh.thresholds.mtt module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mtt>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| OCSVM     | One-Class Support Vector Machine          | [#ocsvm]_          | `pythresh.thresholds.ocsvm module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#pythresh-thresholds-ocsvm-module>`_              |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| QMCD      | Quasi-Monte Carlo Discrepancy             | [#qmcd1]_          | `pythresh.thresholds.qmcd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.qmcd>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| REGR      | Regression Based                          | [#regr1]_          | `pythresh.thresholds.regr module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.regr>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| VAE       | Variational Autoencoder                   | [#vae1]_           | `pythresh.thresholds.vae module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.vae>`_                  |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| WIND      | Topological Winding Number                | [#wind1]_          | `pythresh.thresholds.wind module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.wind>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| YJ        | Yeo-Johnson Transformation                | [#yj1]_            | `pythresh.thresholds.yj module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.yj>`_                    |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| ZSCORE    | Z-score                                   | [#zscore1]_        | `pythresh.thresholds.zscore module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.zscore>`_            |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| COMB      | Thresholder Combination                   | None               | `pythresh.thresholds.comb module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.comb>`_                |
+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+

******************************************
 Implementations, Benchmarks, & Utilities
******************************************

**The comparison among implemented models and general implementation**
is made available below

Additional `benchmarking
<https://pythresh.readthedocs.io/en/latest/benchmark.html>`_ has been
done on all the thresholders and it was found that the ``MIXMOD``
thresholder performed best while the ``CLF`` thresholder provided the
smallest uncertainty about its mean and is the most robust (best least
accurate prediction). However, for interpretability and general
performance the ``MIXMOD, FILTER,`` and ``META`` thresholders are good
fits.

Further utilities are available for assisting in the selection of the
most optimal outlier detection and thresholding methods `ranking
<https://pythresh.readthedocs.io/en/latest/ranking.html>`_ as well as
determining the confidence with regards to the selected thresholding
method `thresholding confidence
<https://pythresh.readthedocs.io/en/latest/confidence.html>`_

----

For Jupyter Notebooks, please navigate to `notebooks
<https://github.com/KulikDM/pythresh/tree/main/notebooks>`_.

A quick look at all the thresholders performance can be found at
**"/notebooks/Compare All Models.ipynb"**

.. image:: https://raw.githubusercontent.com/KulikDM/pythresh/main/imgs/All.png
   :target: https://raw.githubusercontent.com/KulikDM/pythresh/main/imgs/All.png
   :alt: Comparision_of_All

----

**************
 Contributing
**************

Anyone is welcome to contribute to PyThresh:

-  Please share your ideas and ask questions by opening an issue.

-  To contribute, first check the Issue list for the "help wanted" tag
   and comment on the one that you are interested in. The issue will
   then be assigned to you.

-  If the bug, feature, or documentation change is novel (not in the
   Issue list), you can either log a new issue or create a pull request
   for the new changes.

-  To start, fork the main branch and add your
   improvement/modification/fix.

-  To make sure the code has the same style and standard, please refer
   to qmcd.py for example.

-  Create a pull request to the **main branch** and follow the pull
   request template `PR template
   <https://github.com/KulikDM/pythresh/blob/main/.github/PULL_REQUEST_TEMPLATE.md>`_

-  Please make sure that all code changes are accompanied with proper
   new/updated test functions. Automatic tests will be triggered. Before
   the pull request can be merged, make sure that all the tests pass.

----

************
 References
************

**Please Note** not all references' exact methods have been employed in
PyThresh. Rather, the references serve to demonstrate the validity of
the threshold types available in PyThresh.

.. [#aucp1]

   `A Robust AUC Maximization Framework With Simultaneous Outlier Detection
   and Feature Selection for Positive-Unlabeled Classification
   <https://arxiv.org/abs/1803.06604>`_

.. [#boot1]

   `An evaluation of bootstrap methods for outlier detection in least
   squares regression
   <https://www.researchgate.net/publication/24083638_An_evaluation_of_bootstrap_methods_for_outlier_detection_in_least_squares_regression>`_

.. [#chau1]

   `Chauvenet's Test in the Classical Theory of Errors
   <https://epubs.siam.org/doi/10.1137/1119078>`_

.. [#clf1]

   `Linear Models for Outlier Detection
   <https://link.springer.com/chapter/10.1007/978-3-319-47578-3_3>`_

.. [#clust1]

   `Cluster Analysis for Outlier Detection
   <https://www.researchgate.net/publication/224990195_Cluster_Analysis_for_Outlier_Detection>`_

.. [#cpd1]

   `Changepoint Detection in the Presence of Outliers
   <https://arxiv.org/abs/1609.07363>`_

.. [#decomp1]

   `Influence functions and outlier detection under the common principal
   components model: A robust approach
   <https://www.researchgate.net/publication/5207186_Influence_functions_and_outlier_detection_under_the_common_principal_components_model_A_robust_approach>`_

.. [#dsn1]

   `Fast and Exact Outlier Detection in Metric Spaces: A Proximity
   Graph-based Approach <https://arxiv.org/abs/2110.08959>`_

.. [#eb1]

   `Elliptical Insights: Understanding Statistical Methods through
   Elliptical Geometry <https://arxiv.org/abs/1302.4881>`_

.. [#fgd1]

   `Iterative gradient descent for outlier detection
   <https://www.worldscientific.com/doi/10.1142/S0219691321500041>`_

.. [#filter1]

   `Filtering Approaches for Dealing with Noise in Anomaly Detection
   <https://ieeexplore.ieee.org/document/9029258/>`_

.. [#fwfm1]

   `Sparse Auto-Regressive: Robust Estimation of AR Parameters
   <https://arxiv.org/abs/1306.3317>`_

.. [#gamgmm1]

   `Estimating the Contamination Factor's Distribution in Unsupervised
   Anomaly Detection <https://proceedings.mlr.press/v202/perini23a.html>`_

.. [#gesd1]

   `An adjusted Grubbs' and generalized extreme studentized deviation
   <https://www.degruyter.com/document/doi/10.1515/dema-2021-0041/html?lang=en>`_

.. [#hist1]

   `Effective Histogram Thresholding Techniques for Natural Images Using
   Segmentation
   <http://www.joig.net/uploadfile/2015/0116/20150116042320548.pdf>`_

.. [#iqr1]

   `A new non-parametric detector of univariate outliers for distributions
   with unbounded support <https://arxiv.org/abs/1509.02473>`_

.. [#karch1]

   `Riemannian center of mass and mollifier smoothing
   <https://www.jstor.org/stable/41059320>`_

.. [#mad1]

   `Periodicity Detection of Outlier Sequences Using Constraint Based
   Pattern Tree with MAD <https://arxiv.org/abs/1507.01685>`_

.. [#mcst1]

   `Testing normality in the presence of outliers
   <https://www.researchgate.net/publication/24065017_Testing_normality_in_the_presence_of_outliers>`_

.. [#meta1]

   `Automating Outlier Detection via Meta-Learning
   <https://arxiv.org/abs/2009.10606>`_

.. [#mixmod1]

   `Application of Mixture Models to Threshold Anomaly Scores
   <https://studenttheses.uu.nl/bitstream/handle/20.500.12932/45591/Masterthesis%20%284%29.pdf?sequence=1&isAllowed=y>`_

.. [#moll1]

   `Riemannian center of mass and mollifier smoothing
   <https://www.jstor.org/stable/41059320>`_

.. [#moll2]

   `Using the mollifier method to characterize datasets and models: The
   case of the Universal Soil Loss Equation
   <https://www.researchgate.net/publication/286670128_Using_the_mollifier_method_to_characterize_datasets_and_models_The_case_of_the_Universal_Soil_Loss_Equation>`_

.. [#mtt1]

   `Towards a More Reliable Interpretation of Machine Learning Outputs for
   Safety-Critical Systems using Feature Importance Fusion
   <https://arxiv.org/abs/2009.05501>`_

.. [#ocsvm]

   `Rule extraction in unsupervised anomaly detection for model
   explainability: Application to OneClass SVM
   <https://arxiv.org/abs/1911.09315>`_

.. [#qmcd1]

   `Deterministic and quasi-random sampling of optimized Gaussian mixture
   distributions for vibronic Monte Carlo
   <https://arxiv.org/abs/1912.11594>`_

.. [#regr1]

   `Linear Models for Outlier Detection
   <https://link.springer.com/chapter/10.1007/978-3-319-47578-3_3>`_

.. [#vae1]

   `Likelihood Regret: An Out-of-Distribution Detection Score For
   Variational Auto-encoder <https://arxiv.org/abs/2003.02977>`_

.. [#wind1]

   `Robust Inside-Outside Segmentation Using Generalized Winding Numbers
   <https://www.researchgate.net/publication/262165781_Robust_Inside-Outside_Segmentation_Using_Generalized_Winding_Numbers>`_

.. [#yj1]

   `Transforming variables to central normality
   <https://arxiv.org/abs/2005.07946>`_

.. [#zscore1]

   `Multiple outlier detection tests for parametric models
   <https://arxiv.org/abs/1910.10426>`_

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/KulikDM/pythresh",
    "name": "pythresh",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "outlier detection, anomaly detection, thresholding, cutoff, contamintion level, data science, machine learning",
    "author": "D Kulik",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/82/70/1fcf08b53a5aed6bbc7e3058ae71ad788d349c28b530e05a16ca78fd8a73/pythresh-0.3.8.tar.gz",
    "platform": null,
    "description": "##################################################\n Python Outlier Detection Thresholding (PyThresh)\n##################################################\n\n**Deployment, Stats, & License**\n\n|badge_pypi| |badge_anaconda| |badge_docs| |badge_testing|\n|badge_coverage| |badge_maintainability| |badge_stars| |badge_downloads|\n|badge_versions| |badge_licence| |badge_citation|\n\n.. |badge_pypi| image:: https://img.shields.io/pypi/v/pythresh.svg?color=brightgreen&logo=pypi&logoColor=white\n   :alt: PyPI version\n   :target: https://pypi.org/project/pythresh/\n\n.. |badge_anaconda| image:: https://img.shields.io/conda/vn/conda-forge/pythresh?color=brightgreen&logo=conda-forge&logoColor=white\n   :alt: Anaconda version\n   :target: https://anaconda.org/conda-forge/pythresh\n\n.. |badge_docs| image:: https://img.shields.io/readthedocs/pythresh.svg?version=latest&logo=read-the-docs&logoColor=white\n   :alt: Documentation status\n   :target: http://pythresh.readthedocs.io/?badge=latest\n\n.. |badge_testing| image:: https://github.com/KulikDM/pythresh/actions/workflows/python-package.yml/badge.svg\n   :alt: testing\n   :target: https://github.com/KulikDM/pythresh/actions/workflows/python-package.yml\n\n.. |badge_coverage| image:: https://codecov.io/gh/KulikDM/pythresh/branch/main/graph/badge.svg?token=8ZAPXTLW9Y\n   :alt: Codecov\n   :target: https://codecov.io/gh/KulikDM/pythresh\n\n.. |badge_maintainability| image:: https://api.codeclimate.com/v1/badges/3e2de42b48701c731ef6/maintainability\n   :alt: Maintainability\n   :target: https://codeclimate.com/github/KulikDM/pythresh/maintainability\n\n.. |badge_stars| image:: https://img.shields.io/github/stars/KulikDM/pythresh.svg?logo=github&logoColor=white&style=flat\n   :alt: GitHub stars\n   :target: https://github.com/KulikDM/pythresh/stargazers\n\n.. |badge_downloads| image:: https://img.shields.io/badge/dynamic/xml?url=https%3A%2F%2Fstatic.pepy.tech%2Fbadge%2Fpythresh&query=%2F%2F*%5Blocal-name()%20%3D%20%27text%27%5D%5Blast()%5D&logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyBzdHlsZT0iZW5hYmxlLWJhY2tncm91bmQ6bmV3IDAgMCAyNCAyNDsiIHZlcnNpb249IjEuMSIgdmlld0JveD0iMCAwIDI0IDI0IiB4bWw6c3BhY2U9InByZXNlcnZlIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIj48ZyBpZD0iaW5mbyIvPjxnIGlkPSJpY29ucyI%2BPGcgaWQ9InNhdmUiPjxwYXRoIGQ9Ik0xMS4yLDE2LjZjMC40LDAuNSwxLjIsMC41LDEuNiwwbDYtNi4zQzE5LjMsOS44LDE4LjgsOSwxOCw5aC00YzAsMCwwLjItNC42LDAtN2MtMC4xLTEuMS0wLjktMi0yLTJjLTEuMSwwLTEuOSwwLjktMiwyICAgIGMtMC4yLDIuMywwLDcsMCw3SDZjLTAuOCwwLTEuMywwLjgtMC44LDEuNEwxMS4yLDE2LjZ6IiBmaWxsPSIjZWJlYmViIi8%2BPHBhdGggZD0iTTE5LDE5SDVjLTEuMSwwLTIsMC45LTIsMnYwYzAsMC42LDAuNCwxLDEsMWgxNmMwLjYsMCwxLTAuNCwxLTF2MEMyMSwxOS45LDIwLjEsMTksMTksMTl6IiBmaWxsPSIjZWJlYmViIi8%2BPC9nPjwvZz48L3N2Zz4%3D&label=downloads\n   :alt: Downloads\n   :target: https://pepy.tech/project/pythresh\n\n.. |badge_versions| image:: https://img.shields.io/pypi/pyversions/pythresh.svg?logo=python&logoColor=white\n   :alt: Python versions\n   :target: https://pypi.org/project/pythresh/\n\n.. |badge_licence| image:: https://img.shields.io/github/license/KulikDM/pythresh.svg?logo=\n   :alt: License\n   :target: https://github.com/KulikDM/pythresh/blob/master/LICENSE\n\n.. |badge_citation| image:: https://zenodo.org/badge/497683169.svg\n   :alt: Zenodo DOI\n   :target: https://zenodo.org/badge/latestdoi/497683169\n\n----\n\nPyThresh is a comprehensive and scalable **Python toolkit** for\n**thresholding outlier detection likelihood scores** in\nunivariate/multivariate data. It has been written to work in tandem with\nPyOD and has similar syntax and data structures. However, it is not\nlimited to this single library. PyThresh is meant to threshold\nlikelihood scores generated by an outlier detector. It thresholds these\nlikelihood scores and replaces the need to set a contamination level or\nhave the user guess the amount of outliers that may exist in the dataset\nbeforehand. These non-parametric methods were written to reduce the\nuser's input/guess work and rather rely on statistics instead to\nthreshold outlier likelihood scores. For thresholding to be applied\ncorrectly, the outlier detection likelihood scores must follow this\nrule: the higher the score, the higher the probability that it is an\noutlier in the dataset. All threshold functions return a binary array\nwhere inliers and outliers are represented by a 0 and 1 respectively.\n\nPyThresh includes more than 30 thresholding algorithms. These algorithms\nrange from using simple statistical analysis like the Z-score to more\ncomplex mathematical methods that involve graph theory and topology.\n\n************************\n Documentation & Citing\n************************\n\nVisit `PyThresh Docs\n<https://pythresh.readthedocs.io/en/latest/?badge=latest>`_ for full\ndocumentation or see below for a quickstart installation and usage\nexample.\n\nTo cite this work you can visit `PyThresh Citation\n<https://zenodo.org/badge/latestdoi/497683169>`_\n\n----\n\n**Outlier Detection Thresholding with 7 Lines of Code**:\n\n.. code:: python\n\n   # train the KNN detector\n   from pyod.models.knn import KNN\n   from pythresh.thresholds.filter import FILTER\n\n   clf = KNN()\n   clf.fit(X_train)\n\n   # get outlier scores\n   decision_scores = clf.decision_scores_  # raw outlier scores on the train data\n\n   # get outlier labels\n   thres = FILTER()\n   labels = thres.eval(decision_scores)\n\nor using multiple outlier detection score sets\n\n.. code:: python\n\n   # train multiple detectors\n   from pyod.models.knn import KNN\n   from pyod.models.pca import PCA\n   from pyod.models.iforest import IForest\n   from pythresh.thresholds.filter import FILTER\n\n   clfs = [KNN(), IForest(), PCA()]\n\n   # get outlier scores for each detector\n   scores = [clf.fit(X_train).decision_scores_ for clf in clfs]\n\n   scores = np.vstack(scores).T\n\n   # get outlier labels\n   thres = FILTER()\n   labels = thres.eval(scores)\n\n**************\n Installation\n**************\n\nIt is recommended to use **pip** or **conda** for installation:\n\n.. code:: bash\n\n   pip install pythresh            # normal install\n   pip install --upgrade pythresh  # or update if needed\n\n.. code:: bash\n\n   conda install -c conda-forge pythresh\n\nAlternatively, you can get the version with the latest updates by\ncloning the repo and run setup.py file:\n\n.. code:: bash\n\n   git clone https://github.com/KulikDM/pythresh.git\n   cd pythresh\n   pip install .\n\nOr with **pip**:\n\n.. code:: bash\n\n   pip install https://github.com/KulikDM/pythresh/archive/main.zip\n\n**Required Dependencies**:\n\n-  numpy>=1.13\n-  pyod\n-  scipy>=1.3.1\n-  scikit_learn>=0.20.0\n\n**Optional Dependencies**:\n\n-  pyclustering (used in the CLUST thresholder)\n-  ruptures (used in the CPD thresholder)\n-  scikit-lego (used in the META thresholder)\n-  joblib>=0.14.1 (used in the META thresholder and RANK)\n-  pandas (used in the META thresholder)\n-  torch (used in the VAE thresholder)\n-  tqdm (used in the VAE thresholder)\n-  xgboost>=2.0.0 (used in the RANK)\n\n****************\n API Cheatsheet\n****************\n\n-  **eval(score)**: evaluate a single outlier or multiple outlier\n   detection likelihood score sets.\n\nKey Attributes of threshold:\n\n-  **thresh_**: Return the threshold value that separates inliers from\n   outliers. Outliers are considered all values above this threshold\n   value. Note the threshold value has been derived from likelihood\n   scores normalized between 0 and 1.\n\n-  **confidence_interval_**: Return the lower and upper confidence\n   interval of the contamination level. Only applies to the COMB\n   thresholder\n\n-  **dscores_**: 1D array of the TruncatedSVD decomposed decision scores\n   if multiple outlier detector score sets are passed\n\n-  **mixture_**: fitted mixture model class of the selected model used\n   for thresholding. Only applies to MIXMOD. Attributes include:\n   components, weights, params. Functions include: fit, loglikelihood,\n   pdf, and posterior.\n\n************************\n External Feature Cases\n************************\n\n**Towards Data Science**: `Thresholding Outlier Detection Scores with\nPyThresh\n<https://towardsdatascience.com/thresholding-outlier-detection-scores-with-pythresh-f26299d14fa>`_\n\n**Towards Data Science**: `When Outliers are Significant: Weighted\nLinear Regression\n<https://towardsdatascience.com/when-outliers-are-significant-weighted-linear-regression-bcdc8389ab10>`_\n\n**ArXiv**: `Estimating the Contamination Factor's Distribution in\nUnsupervised Anomaly Detection. <https://arxiv.org/abs/2210.10487>`_\n\n***********************************\n Available Thresholding Algorithms\n***********************************\n\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| Abbr      | Description                               | References         | Documentation                                                                                                                                          |\n+===========+===========================================+====================+========================================================================================================================================================+\n| AUCP      | Area Under Curve Percentage               | [#aucp1]_          | `pythresh.thresholds.aucp module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.aucp>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| BOOT      | Bootstrapping                             | [#boot1]_          | `pythresh.thresholds.boot module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.boot>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| CHAU      | Chauvenet's Criterion                     | [#chau1]_          | `pythresh.thresholds.chau module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.chau>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| CLF       | Trained Linear Classifier                 | [#clf1]_           | `pythresh.thresholds.clf module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.clf>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| CLUST     | Clustering Based                          | [#clust1]_         | `pythresh.thresholds.clust module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.clust>`_              |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| CPD       | Change Point Detection                    | [#cpd1]_           | `pythresh.thresholds.cpd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.cpd>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| DECOMP    | Decomposition                             | [#decomp1]_        | `pythresh.thresholds.decomp module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.decomp>`_            |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| DSN       | Distance Shift from Normal                | [#dsn1]_           | `pythresh.thresholds.dsn module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.dsn>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| EB        | Elliptical Boundary                       | [#eb1]_            | `pythresh.thresholds.eb module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.eb>`_                    |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| FGD       | Fixed Gradient Descent                    | [#fgd1]_           | `pythresh.thresholds.fgd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.fgd>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| FILTER    | Filtering Based                           | [#filter1]_        | `pythresh.thresholds.filter module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.filter>`_            |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| FWFM      | Full Width at Full Minimum                | [#fwfm1]_          | `pythresh.thresholds.fwfm module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.fwfm>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| GAMGMM    | Bayesian Gamma GMM                        | [#gamgmm1]_        | `pythresh.thresholds.gamgmm module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.gamgmm>`_            |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| GESD      | Generalized Extreme Studentized Deviate   | [#gesd1]_          | `pythresh.thresholds.gesd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.gesd>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| HIST      | Histogram Based                           | [#hist1]_          | `pythresh.thresholds.hist module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.hist>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| IQR       | Inter-Quartile Region                     | [#iqr1]_           | `pythresh.thresholds.iqr module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.iqr>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| KARCH     | Karcher mean (Riemannian Center of Mass)  | [#karch1]_         | `pythresh.thresholds.karch module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.karch>`_              |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| MAD       | Median Absolute Deviation                 | [#mad1]_           | `pythresh.thresholds.mad module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mad>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| MCST      | Monte Carlo Shapiro Tests                 | [#mcst1]_          | `pythresh.thresholds.mcst module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mcst>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| META      | Meta-model Trained Classifier             | [#meta1]_          | `pythresh.thresholds.meta module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.meta>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| MIXMOD    | Normal & Non-Normal Mixture Models        | [#mixmod1]_        | `pythresh.thresholds.mixmod module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mixmod>`_            |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| MOLL      | Friedrichs' Mollifier                     | [#moll1]_          | `pythresh.thresholds.moll module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.moll>`_                |\n|           |                                           | [#moll2]_          |                                                                                                                                                        |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| MTT       | Modified Thompson Tau Test                | [#mtt1]_           | `pythresh.thresholds.mtt module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.mtt>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| OCSVM     | One-Class Support Vector Machine          | [#ocsvm]_          | `pythresh.thresholds.ocsvm module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#pythresh-thresholds-ocsvm-module>`_              |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| QMCD      | Quasi-Monte Carlo Discrepancy             | [#qmcd1]_          | `pythresh.thresholds.qmcd module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.qmcd>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| REGR      | Regression Based                          | [#regr1]_          | `pythresh.thresholds.regr module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.regr>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| VAE       | Variational Autoencoder                   | [#vae1]_           | `pythresh.thresholds.vae module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.vae>`_                  |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| WIND      | Topological Winding Number                | [#wind1]_          | `pythresh.thresholds.wind module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.wind>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| YJ        | Yeo-Johnson Transformation                | [#yj1]_            | `pythresh.thresholds.yj module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.yj>`_                    |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| ZSCORE    | Z-score                                   | [#zscore1]_        | `pythresh.thresholds.zscore module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.zscore>`_            |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n| COMB      | Thresholder Combination                   | None               | `pythresh.thresholds.comb module <https://pythresh.readthedocs.io/en/latest/pythresh.thresholds.html#module-pythresh.thresholds.comb>`_                |\n+-----------+-------------------------------------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+\n\n******************************************\n Implementations, Benchmarks, & Utilities\n******************************************\n\n**The comparison among implemented models and general implementation**\nis made available below\n\nAdditional `benchmarking\n<https://pythresh.readthedocs.io/en/latest/benchmark.html>`_ has been\ndone on all the thresholders and it was found that the ``MIXMOD``\nthresholder performed best while the ``CLF`` thresholder provided the\nsmallest uncertainty about its mean and is the most robust (best least\naccurate prediction). However, for interpretability and general\nperformance the ``MIXMOD, FILTER,`` and ``META`` thresholders are good\nfits.\n\nFurther utilities are available for assisting in the selection of the\nmost optimal outlier detection and thresholding methods `ranking\n<https://pythresh.readthedocs.io/en/latest/ranking.html>`_ as well as\ndetermining the confidence with regards to the selected thresholding\nmethod `thresholding confidence\n<https://pythresh.readthedocs.io/en/latest/confidence.html>`_\n\n----\n\nFor Jupyter Notebooks, please navigate to `notebooks\n<https://github.com/KulikDM/pythresh/tree/main/notebooks>`_.\n\nA quick look at all the thresholders performance can be found at\n**\"/notebooks/Compare All Models.ipynb\"**\n\n.. image:: https://raw.githubusercontent.com/KulikDM/pythresh/main/imgs/All.png\n   :target: https://raw.githubusercontent.com/KulikDM/pythresh/main/imgs/All.png\n   :alt: Comparision_of_All\n\n----\n\n**************\n Contributing\n**************\n\nAnyone is welcome to contribute to PyThresh:\n\n-  Please share your ideas and ask questions by opening an issue.\n\n-  To contribute, first check the Issue list for the \"help wanted\" tag\n   and comment on the one that you are interested in. The issue will\n   then be assigned to you.\n\n-  If the bug, feature, or documentation change is novel (not in the\n   Issue list), you can either log a new issue or create a pull request\n   for the new changes.\n\n-  To start, fork the main branch and add your\n   improvement/modification/fix.\n\n-  To make sure the code has the same style and standard, please refer\n   to qmcd.py for example.\n\n-  Create a pull request to the **main branch** and follow the pull\n   request template `PR template\n   <https://github.com/KulikDM/pythresh/blob/main/.github/PULL_REQUEST_TEMPLATE.md>`_\n\n-  Please make sure that all code changes are accompanied with proper\n   new/updated test functions. Automatic tests will be triggered. Before\n   the pull request can be merged, make sure that all the tests pass.\n\n----\n\n************\n References\n************\n\n**Please Note** not all references' exact methods have been employed in\nPyThresh. Rather, the references serve to demonstrate the validity of\nthe threshold types available in PyThresh.\n\n.. [#aucp1]\n\n   `A Robust AUC Maximization Framework With Simultaneous Outlier Detection\n   and Feature Selection for Positive-Unlabeled Classification\n   <https://arxiv.org/abs/1803.06604>`_\n\n.. [#boot1]\n\n   `An evaluation of bootstrap methods for outlier detection in least\n   squares regression\n   <https://www.researchgate.net/publication/24083638_An_evaluation_of_bootstrap_methods_for_outlier_detection_in_least_squares_regression>`_\n\n.. [#chau1]\n\n   `Chauvenet's Test in the Classical Theory of Errors\n   <https://epubs.siam.org/doi/10.1137/1119078>`_\n\n.. [#clf1]\n\n   `Linear Models for Outlier Detection\n   <https://link.springer.com/chapter/10.1007/978-3-319-47578-3_3>`_\n\n.. [#clust1]\n\n   `Cluster Analysis for Outlier Detection\n   <https://www.researchgate.net/publication/224990195_Cluster_Analysis_for_Outlier_Detection>`_\n\n.. [#cpd1]\n\n   `Changepoint Detection in the Presence of Outliers\n   <https://arxiv.org/abs/1609.07363>`_\n\n.. [#decomp1]\n\n   `Influence functions and outlier detection under the common principal\n   components model: A robust approach\n   <https://www.researchgate.net/publication/5207186_Influence_functions_and_outlier_detection_under_the_common_principal_components_model_A_robust_approach>`_\n\n.. [#dsn1]\n\n   `Fast and Exact Outlier Detection in Metric Spaces: A Proximity\n   Graph-based Approach <https://arxiv.org/abs/2110.08959>`_\n\n.. [#eb1]\n\n   `Elliptical Insights: Understanding Statistical Methods through\n   Elliptical Geometry <https://arxiv.org/abs/1302.4881>`_\n\n.. [#fgd1]\n\n   `Iterative gradient descent for outlier detection\n   <https://www.worldscientific.com/doi/10.1142/S0219691321500041>`_\n\n.. [#filter1]\n\n   `Filtering Approaches for Dealing with Noise in Anomaly Detection\n   <https://ieeexplore.ieee.org/document/9029258/>`_\n\n.. [#fwfm1]\n\n   `Sparse Auto-Regressive: Robust Estimation of AR Parameters\n   <https://arxiv.org/abs/1306.3317>`_\n\n.. [#gamgmm1]\n\n   `Estimating the Contamination Factor's Distribution in Unsupervised\n   Anomaly Detection <https://proceedings.mlr.press/v202/perini23a.html>`_\n\n.. [#gesd1]\n\n   `An adjusted Grubbs' and generalized extreme studentized deviation\n   <https://www.degruyter.com/document/doi/10.1515/dema-2021-0041/html?lang=en>`_\n\n.. [#hist1]\n\n   `Effective Histogram Thresholding Techniques for Natural Images Using\n   Segmentation\n   <http://www.joig.net/uploadfile/2015/0116/20150116042320548.pdf>`_\n\n.. [#iqr1]\n\n   `A new non-parametric detector of univariate outliers for distributions\n   with unbounded support <https://arxiv.org/abs/1509.02473>`_\n\n.. [#karch1]\n\n   `Riemannian center of mass and mollifier smoothing\n   <https://www.jstor.org/stable/41059320>`_\n\n.. [#mad1]\n\n   `Periodicity Detection of Outlier Sequences Using Constraint Based\n   Pattern Tree with MAD <https://arxiv.org/abs/1507.01685>`_\n\n.. [#mcst1]\n\n   `Testing normality in the presence of outliers\n   <https://www.researchgate.net/publication/24065017_Testing_normality_in_the_presence_of_outliers>`_\n\n.. [#meta1]\n\n   `Automating Outlier Detection via Meta-Learning\n   <https://arxiv.org/abs/2009.10606>`_\n\n.. [#mixmod1]\n\n   `Application of Mixture Models to Threshold Anomaly Scores\n   <https://studenttheses.uu.nl/bitstream/handle/20.500.12932/45591/Masterthesis%20%284%29.pdf?sequence=1&isAllowed=y>`_\n\n.. [#moll1]\n\n   `Riemannian center of mass and mollifier smoothing\n   <https://www.jstor.org/stable/41059320>`_\n\n.. [#moll2]\n\n   `Using the mollifier method to characterize datasets and models: The\n   case of the Universal Soil Loss Equation\n   <https://www.researchgate.net/publication/286670128_Using_the_mollifier_method_to_characterize_datasets_and_models_The_case_of_the_Universal_Soil_Loss_Equation>`_\n\n.. [#mtt1]\n\n   `Towards a More Reliable Interpretation of Machine Learning Outputs for\n   Safety-Critical Systems using Feature Importance Fusion\n   <https://arxiv.org/abs/2009.05501>`_\n\n.. [#ocsvm]\n\n   `Rule extraction in unsupervised anomaly detection for model\n   explainability: Application to OneClass SVM\n   <https://arxiv.org/abs/1911.09315>`_\n\n.. [#qmcd1]\n\n   `Deterministic and quasi-random sampling of optimized Gaussian mixture\n   distributions for vibronic Monte Carlo\n   <https://arxiv.org/abs/1912.11594>`_\n\n.. [#regr1]\n\n   `Linear Models for Outlier Detection\n   <https://link.springer.com/chapter/10.1007/978-3-319-47578-3_3>`_\n\n.. [#vae1]\n\n   `Likelihood Regret: An Out-of-Distribution Detection Score For\n   Variational Auto-encoder <https://arxiv.org/abs/2003.02977>`_\n\n.. [#wind1]\n\n   `Robust Inside-Outside Segmentation Using Generalized Winding Numbers\n   <https://www.researchgate.net/publication/262165781_Robust_Inside-Outside_Segmentation_Using_Generalized_Winding_Numbers>`_\n\n.. [#yj1]\n\n   `Transforming variables to central normality\n   <https://arxiv.org/abs/2005.07946>`_\n\n.. [#zscore1]\n\n   `Multiple outlier detection tests for parametric models\n   <https://arxiv.org/abs/1910.10426>`_\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python Toolbox for Outlier Detection Thresholding",
    "version": "0.3.8",
    "project_urls": {
        "Documentation": "https://pythresh.readthedocs.io/en/latest/",
        "Download": "https://github.com/KulikDM/pythresh/archive/master.zip",
        "Homepage": "https://github.com/KulikDM/pythresh"
    },
    "split_keywords": [
        "outlier detection",
        " anomaly detection",
        " thresholding",
        " cutoff",
        " contamintion level",
        " data science",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "82701fcf08b53a5aed6bbc7e3058ae71ad788d349c28b530e05a16ca78fd8a73",
                "md5": "d3b0d8a03d93b0889807ff08f4b607e9",
                "sha256": "90a61563388337c379edb7f5c6ad5d270d4b450beb045e6c13965cdf105994cd"
            },
            "downloads": -1,
            "filename": "pythresh-0.3.8.tar.gz",
            "has_sig": false,
            "md5_digest": "d3b0d8a03d93b0889807ff08f4b607e9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 557107,
            "upload_time": "2024-12-16T05:06:42",
            "upload_time_iso_8601": "2024-12-16T05:06:42.161818Z",
            "url": "https://files.pythonhosted.org/packages/82/70/1fcf08b53a5aed6bbc7e3058ae71ad788d349c28b530e05a16ca78fd8a73/pythresh-0.3.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-16 05:06:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "KulikDM",
    "github_project": "pythresh",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.13"
                ]
            ]
        },
        {
            "name": "pyod",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.3.1"
                ]
            ]
        }
    ],
    "lcname": "pythresh"
}
        
Elapsed time: 0.42713s