# 🔥 *Bayesian Histogram Anomaly Detection (BHAD)* 🔥
Python implementation of the *Bayesian Histogram-based Anomaly Detection (BHAD)* algorithm, see [Vosseler, A. (2022): Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles](https://www.researchgate.net/publication/361463552_Unsupervised_Insurance_Fraud_Prediction_Based_on_Anomaly_Detector_Ensembles) and [Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms](https://www.researchgate.net/publication/364265660_BHAD_Explainable_anomaly_detection_using_Bayesian_histograms). The package was presented at *PyCon DE & PyData Berlin 2023* ([watch talk here](https://www.youtube.com/watch?v=_8zfgPTD-d8&list=PLGVZCDnMOq0peDguAzds7kVmBr8avp46K&index=8)) and at the *42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering* ([MaxEnt 2023](https://www.mdpi.com/2673-9984/9/1/1)), at Max-Planck-Institute for Plasma Physics, Garching, Germany.
## Package installation
We opt here for using [*uv*](https://github.com/astral-sh/uv) as a package manager due to its speed and stability, but the same installation works using *pip* with *venv* for Python 3.12:
```bash
# curl -LsSf https://astral.sh/uv/install.sh | sh # Optional: install uv for the first time
uv venv .env_bhad --python 3.12 # create the usual virtual environment
source .env_bhad/bin/activate
```
For local development (only):
```bash
uv pip install -r pyproject.toml
uv pip install -e .
```
Install directly from PyPi:
```bash
uv pip install bhad
```
## Model usage
1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (*optional*).
2.) Train the model using discrete data.
For convenience these two steps can be wrapped up via a scikit-learn pipeline (*optional*).
```python
from sklearn.pipeline import Pipeline
from bhad.model import BHAD
from bhad.utils import Discretize
num_cols = [....] # names of numeric features
cat_cols = [....] # categorical features
# Setting nbins = None infers the Bayes-optimal number of bins (=only parameter)
# using the MAP estimate
pipe = Pipeline(steps=[
('discrete', Discretize(nbins = None)),
('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))
])
```
For a given dataset get binary model decisons and anomaly scores:
```python
y_pred = pipe.fit_predict(X = dataset)
anomaly_scores = pipe.decision_function(dataset)
```
Get *global* model explanation as well as for *individual* observations:
```python
from bhad.explainer import Explainer
local_expl = Explainer(bhad_obj = pipe.named_steps['model'], discretize_obj = pipe.named_steps['discrete']).fit()
local_expl.get_explanation(nof_feat_expl = 5, append = False) # individual explanations
print(local_expl.global_feat_imp) # global explanation
```
A detailed *toy example* using synthetic data can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Toy_Example.ipynb). An example using the Titanic dataset illustrating *model explanability* with BHAD can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Titanic_Example.ipynb).
Raw data
{
"_id": null,
"home_page": null,
"name": "bhad",
"maintainer": "Alexander Vosseler",
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "bayesian-inference, anomaly-detection, unsupervised-learning, explainability",
"author": "Alexander Vosseler",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/83/e2/913dda99f1d2983f2e251db0a727b918e8a6312a87d5c38d7ca23235f74e/bhad-0.2.6.tar.gz",
"platform": null,
"description": "# \ud83d\udd25 *Bayesian Histogram Anomaly Detection (BHAD)* \ud83d\udd25\n\nPython implementation of the *Bayesian Histogram-based Anomaly Detection (BHAD)* algorithm, see [Vosseler, A. (2022): Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles](https://www.researchgate.net/publication/361463552_Unsupervised_Insurance_Fraud_Prediction_Based_on_Anomaly_Detector_Ensembles) and [Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms](https://www.researchgate.net/publication/364265660_BHAD_Explainable_anomaly_detection_using_Bayesian_histograms). The package was presented at *PyCon DE & PyData Berlin 2023* ([watch talk here](https://www.youtube.com/watch?v=_8zfgPTD-d8&list=PLGVZCDnMOq0peDguAzds7kVmBr8avp46K&index=8)) and at the *42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering* ([MaxEnt 2023](https://www.mdpi.com/2673-9984/9/1/1)), at Max-Planck-Institute for Plasma Physics, Garching, Germany. \n\n## Package installation\n\nWe opt here for using [*uv*](https://github.com/astral-sh/uv) as a package manager due to its speed and stability, but the same installation works using *pip* with *venv* for Python 3.12: \n```bash\n# curl -LsSf https://astral.sh/uv/install.sh | sh # Optional: install uv for the first time\nuv venv .env_bhad --python 3.12 # create the usual virtual environment\nsource .env_bhad/bin/activate\n```\n\nFor local development (only):\n```bash\nuv pip install -r pyproject.toml \nuv pip install -e .\n```\n\nInstall directly from PyPi:\n```bash\nuv pip install bhad \n```\n\n\n## Model usage\n\n1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (*optional*).\n\n2.) Train the model using discrete data.\n\nFor convenience these two steps can be wrapped up via a scikit-learn pipeline (*optional*). \n\n```python\nfrom sklearn.pipeline import Pipeline\nfrom bhad.model import BHAD\nfrom bhad.utils import Discretize\n\nnum_cols = [....] # names of numeric features\ncat_cols = [....] # categorical features\n\n# Setting nbins = None infers the Bayes-optimal number of bins (=only parameter)\n# using the MAP estimate\npipe = Pipeline(steps=[\n ('discrete', Discretize(nbins = None)), \n ('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))\n])\n```\n\nFor a given dataset get binary model decisons and anomaly scores:\n\n```python\ny_pred = pipe.fit_predict(X = dataset) \n\nanomaly_scores = pipe.decision_function(dataset)\n```\n\nGet *global* model explanation as well as for *individual* observations:\n\n```python\nfrom bhad.explainer import Explainer\n\nlocal_expl = Explainer(bhad_obj = pipe.named_steps['model'], discretize_obj = pipe.named_steps['discrete']).fit()\n\nlocal_expl.get_explanation(nof_feat_expl = 5, append = False) # individual explanations\n\nprint(local_expl.global_feat_imp) # global explanation\n```\n\nA detailed *toy example* using synthetic data can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Toy_Example.ipynb). An example using the Titanic dataset illustrating *model explanability* with BHAD can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Titanic_Example.ipynb).\n",
"bugtrack_url": null,
"license": null,
"summary": "Bayesian Histogram-based Anomaly Detection",
"version": "0.2.6",
"project_urls": {
"homepage": "https://github.com/AVoss84/bhad",
"issues": "https://github.com/AVoss84/bhad/issues",
"repository": "https://github.com/AVoss84/bhad"
},
"split_keywords": [
"bayesian-inference",
" anomaly-detection",
" unsupervised-learning",
" explainability"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "71dbf6ede65d8859b20633372bbfcc7800ca91a424e9235de358f56324be00cd",
"md5": "4d5a99310eaadbccf68250e14d07ae3d",
"sha256": "17f92598294cce40966057d6500bda355caf5c9c5d18a8b2b4a78e655958ad23"
},
"downloads": -1,
"filename": "bhad-0.2.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4d5a99310eaadbccf68250e14d07ae3d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 20719,
"upload_time": "2025-01-26T00:32:59",
"upload_time_iso_8601": "2025-01-26T00:32:59.258493Z",
"url": "https://files.pythonhosted.org/packages/71/db/f6ede65d8859b20633372bbfcc7800ca91a424e9235de358f56324be00cd/bhad-0.2.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "83e2913dda99f1d2983f2e251db0a727b918e8a6312a87d5c38d7ca23235f74e",
"md5": "2d0465224c9b40897a7b4795516ca7fc",
"sha256": "59e91a45af536ff61bc43513668f3ee8bcecafb5c6884d045128a1ef61cf41e8"
},
"downloads": -1,
"filename": "bhad-0.2.6.tar.gz",
"has_sig": false,
"md5_digest": "2d0465224c9b40897a7b4795516ca7fc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 20880,
"upload_time": "2025-01-26T00:33:00",
"upload_time_iso_8601": "2025-01-26T00:33:00.257788Z",
"url": "https://files.pythonhosted.org/packages/83/e2/913dda99f1d2983f2e251db0a727b918e8a6312a87d5c38d7ca23235f74e/bhad-0.2.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-26 00:33:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AVoss84",
"github_project": "bhad",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "bhad"
}