# Bayesian Histogram-based Anomaly Detection (BHAD)
Python implementation of the BHAD algorithm as presented in [Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms](https://www.researchgate.net/publication/364265660_BHAD_Fast_unsupervised_anomaly_detection_using_Bayesian_histograms). The ***bhad* package** follows Scikit-learn's standard API for [outlier detection](https://scikit-learn.org/stable/modules/outlier_detection.html).
<!--- The *bhad* package has been presented on *PyCon DE & PyData Berlin 2023*, you can watch the presentation [here](https://vimeo.com/user/171811262/folder/15825490). -->
## Installation
```bash
pip install bhad
```
## Usage
1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (optionally).
2.) Train the model using discrete data.
For convenience these two steps can be wrapped up via a scikit-learn pipeline (optionally).
```python
from bhad.model import BHAD
from bhad.utils import Discretize
from sklearn.pipeline import Pipeline
num_cols = [....] # names of numeric features
cat_cols = [....] # categorical features
pipe = Pipeline(steps=[
('discrete', Discretize(nbins = None)),
('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))
])
```
For a given dataset get binary model decisons:
```python
y_pred = pipe.fit_predict(X = dataset)
```
Get global model explanation as well as for individual observations:
```python
from bhad.explainer import Explainer
local_expl = Explainer(pipe.named_steps['model'], pipe.named_steps['discrete']).fit()
local_expl.get_explanation(nof_feat_expl = 5, append = False) # individual explanations
local_expl.global_feat_imp # global explanation
```
A detailed toy example using synthetic data for anomaly detection can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Toy_Example.ipynb) and an example using the Titanic dataset illustrating model explanability can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Titanic_Example.ipynb).
Raw data
{
"_id": null,
"home_page": "https://github.com/AVoss84/bhad",
"name": "bhad",
"maintainer": "Alexander Vosseler",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "bayesian-inference,anomaly-detection,unsupervised-learning,explainability",
"author": "Alexander Vosseler",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/cd/79/f91da89721d5b2d7e7af36221d659869408cef7539fbee8cf1cc2a1a817a/bhad-0.1.0.tar.gz",
"platform": null,
"description": "# Bayesian Histogram-based Anomaly Detection (BHAD)\n\nPython implementation of the BHAD algorithm as presented in [Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms](https://www.researchgate.net/publication/364265660_BHAD_Fast_unsupervised_anomaly_detection_using_Bayesian_histograms). The ***bhad* package** follows Scikit-learn's standard API for [outlier detection](https://scikit-learn.org/stable/modules/outlier_detection.html). \n<!--- The *bhad* package has been presented on *PyCon DE & PyData Berlin 2023*, you can watch the presentation [here](https://vimeo.com/user/171811262/folder/15825490). --> \n\n## Installation\n\n```bash\npip install bhad\n```\n\n## Usage\n\n1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (optionally).\n\n2.) Train the model using discrete data.\n\nFor convenience these two steps can be wrapped up via a scikit-learn pipeline (optionally). \n\n```python\nfrom bhad.model import BHAD\nfrom bhad.utils import Discretize\nfrom sklearn.pipeline import Pipeline\n\nnum_cols = [....] # names of numeric features\ncat_cols = [....] # categorical features\n\npipe = Pipeline(steps=[\n ('discrete', Discretize(nbins = None)), \n ('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))\n])\n```\n\nFor a given dataset get binary model decisons:\n\n```python\ny_pred = pipe.fit_predict(X = dataset) \n```\n\nGet global model explanation as well as for individual observations:\n\n```python\nfrom bhad.explainer import Explainer\n\nlocal_expl = Explainer(pipe.named_steps['model'], pipe.named_steps['discrete']).fit()\n\nlocal_expl.get_explanation(nof_feat_expl = 5, append = False) # individual explanations\n\nlocal_expl.global_feat_imp # global explanation\n```\n\nA detailed toy example using synthetic data for anomaly detection can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Toy_Example.ipynb) and an example using the Titanic dataset illustrating model explanability can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Titanic_Example.ipynb).\n",
"bugtrack_url": null,
"license": "",
"summary": "Bayesian Histogram-based Anomaly Detection",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/AVoss84/bhad"
},
"split_keywords": [
"bayesian-inference",
"anomaly-detection",
"unsupervised-learning",
"explainability"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a2f77015c5b11cc5b7ba1d7b50f91517cd3f591c9f7b94db5a45f093f6cfec60",
"md5": "51525d4f7ec94168be93323917f10ed8",
"sha256": "0bddc670d5630507c23bf911f2794f8841bde7bf58e8830778bcf3bc61f3e6a8"
},
"downloads": -1,
"filename": "bhad-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "51525d4f7ec94168be93323917f10ed8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 19327,
"upload_time": "2023-05-30T09:22:51",
"upload_time_iso_8601": "2023-05-30T09:22:51.476264Z",
"url": "https://files.pythonhosted.org/packages/a2/f7/7015c5b11cc5b7ba1d7b50f91517cd3f591c9f7b94db5a45f093f6cfec60/bhad-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cd79f91da89721d5b2d7e7af36221d659869408cef7539fbee8cf1cc2a1a817a",
"md5": "e928d73ccac9c35289a77bc865b660b3",
"sha256": "b6854fffc58f12322c979d1d019b8a52b9613824df1748622456b01481e964f8"
},
"downloads": -1,
"filename": "bhad-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e928d73ccac9c35289a77bc865b660b3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 19136,
"upload_time": "2023-05-30T09:22:53",
"upload_time_iso_8601": "2023-05-30T09:22:53.927460Z",
"url": "https://files.pythonhosted.org/packages/cd/79/f91da89721d5b2d7e7af36221d659869408cef7539fbee8cf1cc2a1a817a/bhad-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-30 09:22:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AVoss84",
"github_project": "bhad",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bhad"
}