driftlens


Namedriftlens JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/grecosalvatore/drift-lens
SummaryDriftLens: an Unsupervised Drift Detection framework
upload_time2024-08-13 21:19:25
maintainerNone
docs_urlNone
authorSalvatore Greco
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            #
<div align="center">
  <img src="https://github.com/grecosalvatore/drift-lens/raw/main/docs/_static/images/Drift_Lens_Logo.png" width="300"/>
  <h4>Unsupervised Concept Drift Detection <br> from Deep Learning
Representations on Unstructured Data in Real-time</h4>
</div>
<br/>

[![Documentation Status](https://readthedocs.org/projects/driftlens/badge/?version=latest)](https://driftlens.readthedocs.io/en/latest/?version=latest)
[![Version](https://img.shields.io/pypi/v/driftlens?color=blue)](https://pypi.org/project/driftlens)
[![License](https://img.shields.io/github/license/grecosalvatore/drift-lens)](https://github.com/grecosalvatore/drift-lens/blob/main/LICENSE)
[![arxiv preprint](https://img.shields.io/badge/arXiv-2406.17813-b31b1b.svg)](https://arxiv.org/abs/2406.17813)
[![Downloads](https://static.pepy.tech/badge/driftlens)](https://pepy.tech/project/driftlens)

*DriftLens* is an **unsupervised drift detection** framework for deep learning classifiers on unstructured data.

The *DriftLens* methodology and its evaluation is currently **Under Review**. 

The preliminary idea was first proposed in the paper: 
[Drift Lens: Real-time unsupervised Concept Drift detection by evaluating per-label embedding distributions](https://ieeexplore.ieee.org/document/9679880) **(Greco et al., 2021)**

*DriftLens* as been also implemented in a web application tool [GitHub](https://github.com/grecosalvatore/DriftLensDemo).

## Table of Contents
- [Installation](#installation)
- [Example of usage](#example-of-usage)
- [DriftLens Methodology](#driftlens-methodology)
- [Experiments Reproducibility](#experiments-reproducibility)
- [References](#references)
- [Authors](#authors)

## Installation
DriftLens is available on PyPI and can be installed with pip for Python >= 3.
```bash
# Install latest stable version
pip install driftlens

# Alternatively, install latest development version
pip install git+https://github.com/grecosalvatore/drift-lens
```

## Example of usage
```python
from driftlens.driftlens import DriftLens

# DriftLens parameters
batch_n_pc = 150 # Number of principal components to reduce per-batch embeddings
per_label_n_pc = 75 # Number of principal components to reduce per-label embeddings
window_size = 1000 # Window size for drift detection
threshold_number_of_estimation_samples = 1000 # Number of sampled windows to estimate the threshold values

# Initialize DriftLens
dl = DriftLens()

# Estimate the baseline (offline phase)
baseline = dl.estimate_baseline(E=E_train,
                                Y=Y_predicted_train,
                                label_list=training_label_list,
                                batch_n_pc=batch_n_pc,
                                per_label_n_pc=per_label_n_pc)

# Estimate the threshold values with DriftLens (offline phase)
per_batch_distances_sorted, per_label_distances_sorted = dl.random_sampling_threshold_estimation(
                                                            label_list=training_label_list,
                                                            E=E_test,
                                                            Y=Y_predicted_test,
                                                            batch_n_pc=batch_n_pc,
                                                            per_label_n_pc=per_label_n_pc,
                                                            window_size=window_size,
                                                            n_samples=threshold_number_of_estimation_samples,
                                                            flag_shuffle=True,
                                                            flag_replacement=True)

# Compute the window distribution distances (Frechet Inception Distance) with DriftLens
dl_distance = dl.compute_window_distribution_distances(E_windows[0], Y_predicted_windows[0])

```

## DriftLens Methodology
<div align="center">
  <img src="docs/_static/images/drift-lens-architecture.png" width="600"/>
  <h4>DriftLens Methodology.</h4>
</div>
<br/>

DriftLens is an unsupervised drift detection technique based on distribution distances within the embedding representations generated by deep learning models.
The methodology includes an *offline* and an *online* phases. 


In the *offline* phase, DriftLens, takes in input a historical dataset (i.e., baseline and threshold datasets), then: 

1) Estimates the reference distributions from the baseline dataset (e.g., training dataset). The reference
distributions, called **baseline**, represent the distribution of features (i.e., embedding) that the model has learned during the training phase (i.e., they represent the absence of drift).
2) Estimates threshold distance values from the threshold dataset to discriminate between drift and no-drift conditions.

In the *online* phase, the new data stream is processed in windows of fixed size. For each window, DriftLens:

3) Estimates the distributions of the new data windows 
4) it computes the distribution distances with respect to the reference distributions
5) it evaluates the distances against the threshold values.  If the distance exceeds the threshold, the presence of drift is predicted.

In both phases, the distributions are estimated as multivariate normal distribution by computing the mean and the covariance over the embedding vectors.

DriftLens uses the Frechet Distance to measure the similarity between the reference (i.e., baseline) and the new window distributions.

## Experiments Reproducibility
Instructions and scripts for the experimental evaluation reproducibility are located in the [experiments folder](experiments/README.md).

## References
If you use the DriftLens, please cite the following papers:

1) DriftLens methodology and evaluation is currently **under review**. The pre-print is available at:
```bibtex
@misc{greco2024unsupervisedconceptdriftdetection,
      title={Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time}, 
      author={Salvatore Greco and Bartolomeo Vacchetti and Daniele Apiletti and Tania Cerquitelli},
      year={2024},
      eprint={2406.17813},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.17813}, 
}
```

2) Preliminary idea 
```bibtex
@INPROCEEDINGS{driftlens,
  author={Greco, Salvatore and Cerquitelli, Tania},
  booktitle={2021 International Conference on Data Mining Workshops (ICDMW)}, 
  title={Drift Lens: Real-time unsupervised Concept Drift detection by evaluating per-label embedding distributions}, 
  year={2021},
  volume={},
  number={},
  pages={341-349},
  doi={10.1109/ICDMW53433.2021.00049}
  }
```

3) Webapp tool
```bibtex
@inproceedings{greco2024driftlens,
  title={DriftLens: A Concept Drift Detection Tool},
  author={Greco, Salvatore and Vacchetti, Bartolomeo and Apiletti, Daniele and Cerquitelli, Tania and others},
  booktitle={Advances in Database Technology},
  volume={27},
  pages={806--809},
  year={2024},
  organization={Open proceedings}
}
```

# Authors

- **Salvatore Greco**, *Politecnico di Torino* - [Homepage](https://grecosalvatore.github.io/) - [GitHub](https://github.com/grecosalvatore) - [Twitter](https://twitter.com/_salvatoregreco)
- **Bartolomeo Vacchetti**, *Politecnico di Torino* 
- **Daniele Apiletti**, *Politecnico di Torino* - [Homepage](https://www.polito.it/en/staff?p=daniele.apiletti)
- **Tania Cerquitelli**, *Politecnico di Torino* - [Homepage](https://dbdmg.polito.it/dbdmg_web/people/tania-cerquitelli/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/grecosalvatore/drift-lens",
    "name": "driftlens",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Salvatore Greco",
    "author_email": "grecosalvatore94@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/23/d3/18bb3d75d465d94a12a56a41e8fd919ebddd32e4d301b5e8a329c36ace0e/driftlens-0.1.4.tar.gz",
    "platform": null,
    "description": "#\n<div align=\"center\">\n  <img src=\"https://github.com/grecosalvatore/drift-lens/raw/main/docs/_static/images/Drift_Lens_Logo.png\" width=\"300\"/>\n  <h4>Unsupervised Concept Drift Detection <br> from Deep Learning\nRepresentations on Unstructured Data in Real-time</h4>\n</div>\n<br/>\n\n[![Documentation Status](https://readthedocs.org/projects/driftlens/badge/?version=latest)](https://driftlens.readthedocs.io/en/latest/?version=latest)\n[![Version](https://img.shields.io/pypi/v/driftlens?color=blue)](https://pypi.org/project/driftlens)\n[![License](https://img.shields.io/github/license/grecosalvatore/drift-lens)](https://github.com/grecosalvatore/drift-lens/blob/main/LICENSE)\n[![arxiv preprint](https://img.shields.io/badge/arXiv-2406.17813-b31b1b.svg)](https://arxiv.org/abs/2406.17813)\n[![Downloads](https://static.pepy.tech/badge/driftlens)](https://pepy.tech/project/driftlens)\n\n*DriftLens* is an **unsupervised drift detection** framework for deep learning classifiers on unstructured data.\n\nThe *DriftLens* methodology and its evaluation is currently **Under Review**. \n\nThe preliminary idea was first proposed in the paper: \n[Drift Lens: Real-time unsupervised Concept Drift detection by evaluating per-label embedding distributions](https://ieeexplore.ieee.org/document/9679880) **(Greco et al., 2021)**\n\n*DriftLens* as been also implemented in a web application tool [GitHub](https://github.com/grecosalvatore/DriftLensDemo).\n\n## Table of Contents\n- [Installation](#installation)\n- [Example of usage](#example-of-usage)\n- [DriftLens Methodology](#driftlens-methodology)\n- [Experiments Reproducibility](#experiments-reproducibility)\n- [References](#references)\n- [Authors](#authors)\n\n## Installation\nDriftLens is available on PyPI and can be installed with pip for Python >= 3.\n```bash\n# Install latest stable version\npip install driftlens\n\n# Alternatively, install latest development version\npip install git+https://github.com/grecosalvatore/drift-lens\n```\n\n## Example of usage\n```python\nfrom driftlens.driftlens import DriftLens\n\n# DriftLens parameters\nbatch_n_pc = 150 # Number of principal components to reduce per-batch embeddings\nper_label_n_pc = 75 # Number of principal components to reduce per-label embeddings\nwindow_size = 1000 # Window size for drift detection\nthreshold_number_of_estimation_samples = 1000 # Number of sampled windows to estimate the threshold values\n\n# Initialize DriftLens\ndl = DriftLens()\n\n# Estimate the baseline (offline phase)\nbaseline = dl.estimate_baseline(E=E_train,\n                                Y=Y_predicted_train,\n                                label_list=training_label_list,\n                                batch_n_pc=batch_n_pc,\n                                per_label_n_pc=per_label_n_pc)\n\n# Estimate the threshold values with DriftLens (offline phase)\nper_batch_distances_sorted, per_label_distances_sorted = dl.random_sampling_threshold_estimation(\n                                                            label_list=training_label_list,\n                                                            E=E_test,\n                                                            Y=Y_predicted_test,\n                                                            batch_n_pc=batch_n_pc,\n                                                            per_label_n_pc=per_label_n_pc,\n                                                            window_size=window_size,\n                                                            n_samples=threshold_number_of_estimation_samples,\n                                                            flag_shuffle=True,\n                                                            flag_replacement=True)\n\n# Compute the window distribution distances (Frechet Inception Distance) with DriftLens\ndl_distance = dl.compute_window_distribution_distances(E_windows[0], Y_predicted_windows[0])\n\n```\n\n## DriftLens Methodology\n<div align=\"center\">\n  <img src=\"docs/_static/images/drift-lens-architecture.png\" width=\"600\"/>\n  <h4>DriftLens Methodology.</h4>\n</div>\n<br/>\n\nDriftLens is an unsupervised drift detection technique based on distribution distances within the embedding representations generated by deep learning models.\nThe methodology includes an *offline* and an *online* phases. \n\n\nIn the *offline* phase, DriftLens, takes in input a historical dataset (i.e., baseline and threshold datasets), then: \n\n1) Estimates the reference distributions from the baseline dataset (e.g., training dataset). The reference\ndistributions, called **baseline**, represent the distribution of features (i.e., embedding) that the model has learned during the training phase (i.e., they represent the absence of drift).\n2) Estimates threshold distance values from the threshold dataset to discriminate between drift and no-drift conditions.\n\nIn the *online* phase, the new data stream is processed in windows of fixed size. For each window, DriftLens:\n\n3) Estimates the distributions of the new data windows \n4) it computes the distribution distances with respect to the reference distributions\n5) it evaluates the distances against the threshold values.  If the distance exceeds the threshold, the presence of drift is predicted.\n\nIn both phases, the distributions are estimated as multivariate normal distribution by computing the mean and the covariance over the embedding vectors.\n\nDriftLens uses the Frechet Distance to measure the similarity between the reference (i.e., baseline) and the new window distributions.\n\n## Experiments Reproducibility\nInstructions and scripts for the experimental evaluation reproducibility are located in the [experiments folder](experiments/README.md).\n\n## References\nIf you use the DriftLens, please cite the following papers:\n\n1) DriftLens methodology and evaluation is currently **under review**. The pre-print is available at:\n```bibtex\n@misc{greco2024unsupervisedconceptdriftdetection,\n      title={Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time}, \n      author={Salvatore Greco and Bartolomeo Vacchetti and Daniele Apiletti and Tania Cerquitelli},\n      year={2024},\n      eprint={2406.17813},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2406.17813}, \n}\n```\n\n2) Preliminary idea \n```bibtex\n@INPROCEEDINGS{driftlens,\n  author={Greco, Salvatore and Cerquitelli, Tania},\n  booktitle={2021 International Conference on Data Mining Workshops (ICDMW)}, \n  title={Drift Lens: Real-time unsupervised Concept Drift detection by evaluating per-label embedding distributions}, \n  year={2021},\n  volume={},\n  number={},\n  pages={341-349},\n  doi={10.1109/ICDMW53433.2021.00049}\n  }\n```\n\n3) Webapp tool\n```bibtex\n@inproceedings{greco2024driftlens,\n  title={DriftLens: A Concept Drift Detection Tool},\n  author={Greco, Salvatore and Vacchetti, Bartolomeo and Apiletti, Daniele and Cerquitelli, Tania and others},\n  booktitle={Advances in Database Technology},\n  volume={27},\n  pages={806--809},\n  year={2024},\n  organization={Open proceedings}\n}\n```\n\n# Authors\n\n- **Salvatore Greco**, *Politecnico di Torino* - [Homepage](https://grecosalvatore.github.io/) - [GitHub](https://github.com/grecosalvatore) - [Twitter](https://twitter.com/_salvatoregreco)\n- **Bartolomeo Vacchetti**, *Politecnico di Torino* \n- **Daniele Apiletti**, *Politecnico di Torino* - [Homepage](https://www.polito.it/en/staff?p=daniele.apiletti)\n- **Tania Cerquitelli**, *Politecnico di Torino* - [Homepage](https://dbdmg.polito.it/dbdmg_web/people/tania-cerquitelli/)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "DriftLens: an Unsupervised Drift Detection framework",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/grecosalvatore/drift-lens"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e71703396776ee1435c831d2815e55bddee7b1e6189671a3a8f0b43fff619e5e",
                "md5": "ef6ea1c58728e1935a550cb45c087525",
                "sha256": "98e95c2fd8bf0e9070be095cfb570d249527b36228262d175026a3552923f706"
            },
            "downloads": -1,
            "filename": "driftlens-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ef6ea1c58728e1935a550cb45c087525",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 4676,
            "upload_time": "2024-08-13T21:19:11",
            "upload_time_iso_8601": "2024-08-13T21:19:11.358214Z",
            "url": "https://files.pythonhosted.org/packages/e7/17/03396776ee1435c831d2815e55bddee7b1e6189671a3a8f0b43fff619e5e/driftlens-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "23d318bb3d75d465d94a12a56a41e8fd919ebddd32e4d301b5e8a329c36ace0e",
                "md5": "96e5b18e160d294e370ee81559f84150",
                "sha256": "b97f09725cd316896bb541468f66062c6852d4b88e1dbc0ee360ecd9eecf0173"
            },
            "downloads": -1,
            "filename": "driftlens-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "96e5b18e160d294e370ee81559f84150",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 4115745,
            "upload_time": "2024-08-13T21:19:25",
            "upload_time_iso_8601": "2024-08-13T21:19:25.933142Z",
            "url": "https://files.pythonhosted.org/packages/23/d3/18bb3d75d465d94a12a56a41e8fd919ebddd32e4d301b5e8a329c36ace0e/driftlens-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-13 21:19:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "grecosalvatore",
    "github_project": "drift-lens",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "driftlens"
}
        
Elapsed time: 0.32272s