bhad


Namebhad JSON
Version 0.2.6 PyPI version JSON
download
home_pageNone
SummaryBayesian Histogram-based Anomaly Detection
upload_time2025-01-26 00:33:00
maintainerAlexander Vosseler
docs_urlNone
authorAlexander Vosseler
requires_python>=3.12
licenseNone
keywords bayesian-inference anomaly-detection unsupervised-learning explainability
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🔥 *Bayesian Histogram Anomaly Detection (BHAD)* 🔥

Python implementation of the *Bayesian Histogram-based Anomaly Detection (BHAD)* algorithm, see [Vosseler, A. (2022): Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles](https://www.researchgate.net/publication/361463552_Unsupervised_Insurance_Fraud_Prediction_Based_on_Anomaly_Detector_Ensembles) and [Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms](https://www.researchgate.net/publication/364265660_BHAD_Explainable_anomaly_detection_using_Bayesian_histograms). The package was presented at *PyCon DE & PyData Berlin 2023* ([watch talk here](https://www.youtube.com/watch?v=_8zfgPTD-d8&list=PLGVZCDnMOq0peDguAzds7kVmBr8avp46K&index=8)) and at the *42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering* ([MaxEnt 2023](https://www.mdpi.com/2673-9984/9/1/1)), at Max-Planck-Institute for Plasma Physics, Garching, Germany. 

## Package installation

We opt here for using [*uv*](https://github.com/astral-sh/uv) as a package manager due to its speed and stability, but the same installation works using *pip* with *venv* for Python 3.12: 
```bash
# curl -LsSf https://astral.sh/uv/install.sh | sh       # Optional: install uv for the first time
uv venv .env_bhad --python 3.12                         # create the usual virtual environment
source .env_bhad/bin/activate
```

For local development (only):
```bash
uv pip install -r pyproject.toml  
uv pip install -e .
```

Install directly from PyPi:
```bash
uv pip install bhad                                       
```


## Model usage

1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (*optional*).

2.) Train the model using discrete data.

For convenience these two steps can be wrapped up via a scikit-learn pipeline (*optional*). 

```python
from sklearn.pipeline import Pipeline
from bhad.model import BHAD
from bhad.utils import Discretize

num_cols = [....]   # names of numeric features
cat_cols = [....]   # categorical features

# Setting nbins = None infers the Bayes-optimal number of bins (=only parameter)
# using the MAP estimate
pipe = Pipeline(steps=[
   ('discrete', Discretize(nbins = None)),   
   ('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))
])
```

For a given dataset get binary model decisons and anomaly scores:

```python
y_pred = pipe.fit_predict(X = dataset)        

anomaly_scores = pipe.decision_function(dataset)
```

Get *global* model explanation as well as for *individual* observations:

```python
from bhad.explainer import Explainer

local_expl = Explainer(bhad_obj = pipe.named_steps['model'], discretize_obj = pipe.named_steps['discrete']).fit()

local_expl.get_explanation(nof_feat_expl = 5, append = False)          # individual explanations

print(local_expl.global_feat_imp)                                      # global explanation
```

A detailed *toy example* using synthetic data can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Toy_Example.ipynb). An example using the Titanic dataset illustrating *model explanability* with BHAD can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Titanic_Example.ipynb).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bhad",
    "maintainer": "Alexander Vosseler",
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "bayesian-inference, anomaly-detection, unsupervised-learning, explainability",
    "author": "Alexander Vosseler",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/83/e2/913dda99f1d2983f2e251db0a727b918e8a6312a87d5c38d7ca23235f74e/bhad-0.2.6.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd25 *Bayesian Histogram Anomaly Detection (BHAD)* \ud83d\udd25\n\nPython implementation of the *Bayesian Histogram-based Anomaly Detection (BHAD)* algorithm, see [Vosseler, A. (2022): Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles](https://www.researchgate.net/publication/361463552_Unsupervised_Insurance_Fraud_Prediction_Based_on_Anomaly_Detector_Ensembles) and [Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms](https://www.researchgate.net/publication/364265660_BHAD_Explainable_anomaly_detection_using_Bayesian_histograms). The package was presented at *PyCon DE & PyData Berlin 2023* ([watch talk here](https://www.youtube.com/watch?v=_8zfgPTD-d8&list=PLGVZCDnMOq0peDguAzds7kVmBr8avp46K&index=8)) and at the *42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering* ([MaxEnt 2023](https://www.mdpi.com/2673-9984/9/1/1)), at Max-Planck-Institute for Plasma Physics, Garching, Germany. \n\n## Package installation\n\nWe opt here for using [*uv*](https://github.com/astral-sh/uv) as a package manager due to its speed and stability, but the same installation works using *pip* with *venv* for Python 3.12: \n```bash\n# curl -LsSf https://astral.sh/uv/install.sh | sh       # Optional: install uv for the first time\nuv venv .env_bhad --python 3.12                         # create the usual virtual environment\nsource .env_bhad/bin/activate\n```\n\nFor local development (only):\n```bash\nuv pip install -r pyproject.toml  \nuv pip install -e .\n```\n\nInstall directly from PyPi:\n```bash\nuv pip install bhad                                       \n```\n\n\n## Model usage\n\n1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (*optional*).\n\n2.) Train the model using discrete data.\n\nFor convenience these two steps can be wrapped up via a scikit-learn pipeline (*optional*). \n\n```python\nfrom sklearn.pipeline import Pipeline\nfrom bhad.model import BHAD\nfrom bhad.utils import Discretize\n\nnum_cols = [....]   # names of numeric features\ncat_cols = [....]   # categorical features\n\n# Setting nbins = None infers the Bayes-optimal number of bins (=only parameter)\n# using the MAP estimate\npipe = Pipeline(steps=[\n   ('discrete', Discretize(nbins = None)),   \n   ('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))\n])\n```\n\nFor a given dataset get binary model decisons and anomaly scores:\n\n```python\ny_pred = pipe.fit_predict(X = dataset)        \n\nanomaly_scores = pipe.decision_function(dataset)\n```\n\nGet *global* model explanation as well as for *individual* observations:\n\n```python\nfrom bhad.explainer import Explainer\n\nlocal_expl = Explainer(bhad_obj = pipe.named_steps['model'], discretize_obj = pipe.named_steps['discrete']).fit()\n\nlocal_expl.get_explanation(nof_feat_expl = 5, append = False)          # individual explanations\n\nprint(local_expl.global_feat_imp)                                      # global explanation\n```\n\nA detailed *toy example* using synthetic data can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Toy_Example.ipynb). An example using the Titanic dataset illustrating *model explanability* with BHAD can be found [here](https://github.com/AVoss84/bhad/blob/main/src/notebooks/Titanic_Example.ipynb).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Bayesian Histogram-based Anomaly Detection",
    "version": "0.2.6",
    "project_urls": {
        "homepage": "https://github.com/AVoss84/bhad",
        "issues": "https://github.com/AVoss84/bhad/issues",
        "repository": "https://github.com/AVoss84/bhad"
    },
    "split_keywords": [
        "bayesian-inference",
        " anomaly-detection",
        " unsupervised-learning",
        " explainability"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "71dbf6ede65d8859b20633372bbfcc7800ca91a424e9235de358f56324be00cd",
                "md5": "4d5a99310eaadbccf68250e14d07ae3d",
                "sha256": "17f92598294cce40966057d6500bda355caf5c9c5d18a8b2b4a78e655958ad23"
            },
            "downloads": -1,
            "filename": "bhad-0.2.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4d5a99310eaadbccf68250e14d07ae3d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 20719,
            "upload_time": "2025-01-26T00:32:59",
            "upload_time_iso_8601": "2025-01-26T00:32:59.258493Z",
            "url": "https://files.pythonhosted.org/packages/71/db/f6ede65d8859b20633372bbfcc7800ca91a424e9235de358f56324be00cd/bhad-0.2.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "83e2913dda99f1d2983f2e251db0a727b918e8a6312a87d5c38d7ca23235f74e",
                "md5": "2d0465224c9b40897a7b4795516ca7fc",
                "sha256": "59e91a45af536ff61bc43513668f3ee8bcecafb5c6884d045128a1ef61cf41e8"
            },
            "downloads": -1,
            "filename": "bhad-0.2.6.tar.gz",
            "has_sig": false,
            "md5_digest": "2d0465224c9b40897a7b4795516ca7fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 20880,
            "upload_time": "2025-01-26T00:33:00",
            "upload_time_iso_8601": "2025-01-26T00:33:00.257788Z",
            "url": "https://files.pythonhosted.org/packages/83/e2/913dda99f1d2983f2e251db0a727b918e8a6312a87d5c38d7ca23235f74e/bhad-0.2.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-26 00:33:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AVoss84",
    "github_project": "bhad",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "bhad"
}
        
Elapsed time: 2.26068s