<div align="center">
<img src="https://raw.githubusercontent.com/msamsami/wnb/main/docs/logo.png" alt="wnb logo" width="275" />
</div>
<div align="center"> <b>General and weighted naive Bayes classifiers</b> </div>
<div align="center">Scikit-learn-compatible</div> <br>
<div align="center">

[](https://pypi.org/project/wnb/)
<br>


[](https://pepy.tech/project/wnb)
</div>
## Introduction
Naive Bayes is often recognized as one of the most popular classification algorithms in the machine learning community. This package takes naive Bayes to a higher level by providing its implementations in more general and weighted settings.
### General naive Bayes
The issue with the well-known implementations of the naive Bayes algorithm (such as the ones in `sklearn.naive_bayes` module) is that they assume a single distribution for the likelihoods of all features. Such an implementation can limit those who need to develop naive Bayes models with different distributions for feature likelihood. And enters **WNB** library! It allows you to customize your naive Bayes model by specifying the likelihood distribution of each feature separately. You can choose from a range of continuous and discrete probability distributions to design your classifier.
### Weighted naive Bayes
Although naive Bayes has many advantages such as simplicity and interpretability, its conditional independence assumption rarely holds true in real-world applications. In order to alleviate its conditional independence assumption, many attribute weighting naive Bayes (WNB) approaches have been proposed. Most of the proposed methods involve computationally demanding optimization problems that do not allow for controlling the model's bias due to class imbalance. Minimum Log-likelihood Difference WNB (MLD-WNB) is a novel weighting approach that optimizes the weights according to the Bayes optimal decision rule and includes hyperparameters for controlling the model's bias. **WNB** library provides an efficient implementation of gaussian MLD-WNB.
## Installation
This library is shipped as an all-in-one module implementation with minimalistic dependencies and requirements. Furthermore, it fully **adheres to Scikit-learn API** ❤️.
### Prerequisites
Ensure that Python 3.8 or higher is installed on your machine before installing **WNB**.
### PyPi
```bash
pip install wnb
```
### uv
```bash
uv add wnb
```
## Getting started ⚡️
Here, we show how you can use the library to train general and weighted naive Bayes classifiers.
### General naive Bayes
A general naive Bayes model can be set up and used in four simple steps:
1. Import the `GeneralNB` class as well as `Distribution` enum class
```python
from wnb import GeneralNB, Distribution as D
```
2. Initialize a classifier and specify the likelihood distributions
```python
gnb = GeneralNB(distributions=[D.NORMAL, D.CATEGORICAL, D.EXPONENTIAL])
```
3. Fit the classifier to a training set (with three features)
```python
gnb.fit(X, y)
```
4. Predict on test data
```python
gnb.predict(X_test)
```
### Weighted naive Bayes
An MLD-WNB model can be set up and used in four simple steps:
1. Import the `GaussianWNB` class
```python
from wnb import GaussianWNB
```
2. Initialize a classifier
```python
wnb = GaussianWNB(max_iter=25, step_size=1e-2, penalty="l2")
```
3. Fit the classifier to a training set
```python
wnb.fit(X, y)
```
4. Predict on test data
```python
wnb.predict(x_test)
```
## Compatibility with Scikit-learn 🤝
The **wnb** library fully adheres to the Scikit-learn API, ensuring seamless integration with other Scikit-learn components and workflows. This means that users familiar with Scikit-learn will find the WNB classifiers intuitive to use.
Both Scikit-learn classifiers and WNB classifiers share these well-known methods:
- `fit(X, y)`
- `predict(X)`
- `predict_proba(X)`
- `predict_log_proba(X)`
- `predict_joint_log_proba(X)`
- `score(X, y)`
- `get_params()`
- `set_params(**params)`
- etc.
By maintaining this consistency, WNB classifiers can be easily incorporated into existing machine learning pipelines and processes.
## Benchmarks 📊
We conducted benchmarks on four datasets, [Wine](https://scikit-learn.org/stable/datasets/toy_dataset.html#wine-recognition-dataset), [Iris](https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset), [Digits](https://scikit-learn.org/stable/datasets/toy_dataset.html#optical-recognition-of-handwritten-digits-dataset), and [Breast Cancer](https://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-wisconsin-diagnostic-dataset), to evaluate the performance of WNB classifiers and compare them with their Scikit-learn counterpart, `GaussianNB`. The results show that WNB classifiers generally perform better in certain cases.
| Dataset | Scikit-learn Classifier | Accuracy | WNB Classifier | Accuracy |
|------------------|-------------------------|----------|----------------|-----------|
| Wine | GaussianNB | 0.9749 | GeneralNB | **0.9812** |
| Iris | GaussianNB | 0.9556 | GeneralNB | **0.9602** |
| Digits | GaussianNB | 0.8372 | GeneralNB | **0.8905** |
| Breast Cancer | GaussianNB | 0.9389 | GaussianWNB | **0.9512** |
These benchmarks highlight the potential of WNB classifiers to provide better performance in certain scenarios by allowing more flexibility in the choice of distributions and incorporating weighting strategies.
The scripts used to generate these benchmark results are available in the _tests/benchmarks/_ directory.
## Support us 💡
You can support the project in the following ways:
⭐ Star WNB on GitHub (click the star button in the top right corner)
💡 Provide your feedback or propose ideas in the [Issues section](https://github.com/msamsami/wnb/issues)
📰 Post about WNB on LinkedIn or other platforms
## Citation 📚
If you utilize this repository, please consider citing it with:
```
@misc{wnb,
author = {Mohammd Mehdi Samsami},
title = {WNB: General and weighted naive Bayes classifiers},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/msamsami/wnb}},
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "wnb",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.8",
"maintainer_email": null,
"keywords": "bayes, classifier, machine learning, naive bayes, python",
"author": null,
"author_email": "Mehdi Samsami <mehdisamsami@live.com>",
"download_url": "https://files.pythonhosted.org/packages/d5/5b/b783577f6aea17327344982820305687b3f53563f42d4d99354580fc7cf1/wnb-0.6.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n<img src=\"https://raw.githubusercontent.com/msamsami/wnb/main/docs/logo.png\" alt=\"wnb logo\" width=\"275\" />\n</div>\n\n<div align=\"center\"> <b>General and weighted naive Bayes classifiers</b> </div>\n<div align=\"center\">Scikit-learn-compatible</div> <br>\n\n<div align=\"center\">\n\n\n[](https://pypi.org/project/wnb/)\n<br>\n\n\n[](https://pepy.tech/project/wnb)\n\n</div>\n\n## Introduction\nNaive Bayes is often recognized as one of the most popular classification algorithms in the machine learning community. This package takes naive Bayes to a higher level by providing its implementations in more general and weighted settings.\n\n### General naive Bayes\nThe issue with the well-known implementations of the naive Bayes algorithm (such as the ones in `sklearn.naive_bayes` module) is that they assume a single distribution for the likelihoods of all features. Such an implementation can limit those who need to develop naive Bayes models with different distributions for feature likelihood. And enters **WNB** library! It allows you to customize your naive Bayes model by specifying the likelihood distribution of each feature separately. You can choose from a range of continuous and discrete probability distributions to design your classifier.\n\n### Weighted naive Bayes\nAlthough naive Bayes has many advantages such as simplicity and interpretability, its conditional independence assumption rarely holds true in real-world applications. In order to alleviate its conditional independence assumption, many attribute weighting naive Bayes (WNB) approaches have been proposed. Most of the proposed methods involve computationally demanding optimization problems that do not allow for controlling the model's bias due to class imbalance. Minimum Log-likelihood Difference WNB (MLD-WNB) is a novel weighting approach that optimizes the weights according to the Bayes optimal decision rule and includes hyperparameters for controlling the model's bias. **WNB** library provides an efficient implementation of gaussian MLD-WNB.\n\n## Installation\nThis library is shipped as an all-in-one module implementation with minimalistic dependencies and requirements. Furthermore, it fully **adheres to Scikit-learn API** \u2764\ufe0f.\n\n### Prerequisites\nEnsure that Python 3.8 or higher is installed on your machine before installing **WNB**.\n\n### PyPi\n```bash\npip install wnb\n```\n\n### uv\n```bash\nuv add wnb\n```\n\n## Getting started \u26a1\ufe0f\nHere, we show how you can use the library to train general and weighted naive Bayes classifiers.\n\n### General naive Bayes\n\nA general naive Bayes model can be set up and used in four simple steps:\n\n1. Import the `GeneralNB` class as well as `Distribution` enum class\n```python\nfrom wnb import GeneralNB, Distribution as D\n```\n\n2. Initialize a classifier and specify the likelihood distributions\n```python\ngnb = GeneralNB(distributions=[D.NORMAL, D.CATEGORICAL, D.EXPONENTIAL])\n```\n\n3. Fit the classifier to a training set (with three features)\n```python\ngnb.fit(X, y)\n```\n\n4. Predict on test data\n```python\ngnb.predict(X_test)\n```\n\n### Weighted naive Bayes\n\nAn MLD-WNB model can be set up and used in four simple steps:\n\n1. Import the `GaussianWNB` class\n```python\nfrom wnb import GaussianWNB\n```\n\n2. Initialize a classifier\n```python\nwnb = GaussianWNB(max_iter=25, step_size=1e-2, penalty=\"l2\")\n```\n\n3. Fit the classifier to a training set\n```python\nwnb.fit(X, y)\n```\n\n4. Predict on test data\n```python\nwnb.predict(x_test)\n```\n\n## Compatibility with Scikit-learn \ud83e\udd1d\n\nThe **wnb** library fully adheres to the Scikit-learn API, ensuring seamless integration with other Scikit-learn components and workflows. This means that users familiar with Scikit-learn will find the WNB classifiers intuitive to use.\n\nBoth Scikit-learn classifiers and WNB classifiers share these well-known methods:\n\n- `fit(X, y)`\n- `predict(X)`\n- `predict_proba(X)`\n- `predict_log_proba(X)`\n- `predict_joint_log_proba(X)`\n- `score(X, y)`\n- `get_params()`\n- `set_params(**params)`\n- etc.\n\nBy maintaining this consistency, WNB classifiers can be easily incorporated into existing machine learning pipelines and processes.\n\n## Benchmarks \ud83d\udcca\nWe conducted benchmarks on four datasets, [Wine](https://scikit-learn.org/stable/datasets/toy_dataset.html#wine-recognition-dataset), [Iris](https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset), [Digits](https://scikit-learn.org/stable/datasets/toy_dataset.html#optical-recognition-of-handwritten-digits-dataset), and [Breast Cancer](https://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-wisconsin-diagnostic-dataset), to evaluate the performance of WNB classifiers and compare them with their Scikit-learn counterpart, `GaussianNB`. The results show that WNB classifiers generally perform better in certain cases.\n\n| Dataset | Scikit-learn Classifier | Accuracy | WNB Classifier | Accuracy |\n|------------------|-------------------------|----------|----------------|-----------|\n| Wine | GaussianNB | 0.9749 | GeneralNB | **0.9812** |\n| Iris | GaussianNB | 0.9556 | GeneralNB | **0.9602** |\n| Digits | GaussianNB | 0.8372 | GeneralNB | **0.8905** |\n| Breast Cancer | GaussianNB | 0.9389 | GaussianWNB | **0.9512** |\n\nThese benchmarks highlight the potential of WNB classifiers to provide better performance in certain scenarios by allowing more flexibility in the choice of distributions and incorporating weighting strategies.\n\nThe scripts used to generate these benchmark results are available in the _tests/benchmarks/_ directory.\n\n## Support us \ud83d\udca1\nYou can support the project in the following ways:\n\n\u2b50 Star WNB on GitHub (click the star button in the top right corner)\n\n\ud83d\udca1 Provide your feedback or propose ideas in the [Issues section](https://github.com/msamsami/wnb/issues)\n\n\ud83d\udcf0 Post about WNB on LinkedIn or other platforms\n\n## Citation \ud83d\udcda\nIf you utilize this repository, please consider citing it with:\n\n```\n@misc{wnb,\n author = {Mohammd Mehdi Samsami},\n title = {WNB: General and weighted naive Bayes classifiers},\n year = {2023},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/msamsami/wnb}},\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Python library for the implementations of general and weighted naive Bayes (WNB) classifiers.",
"version": "0.6.0",
"project_urls": {
"Homepage": "https://github.com/msamsami/wnb",
"Source": "https://github.com/msamsami/wnb"
},
"split_keywords": [
"bayes",
" classifier",
" machine learning",
" naive bayes",
" python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "86f94e776266ca80596497323e5e9b963bc653ebd704dc0b2d354290a7370712",
"md5": "23dc36797d7488cba3e9f7309a965155",
"sha256": "1d216677e22181f275a2ca6697b2d152713bc4178586621c048460fadae15062"
},
"downloads": -1,
"filename": "wnb-0.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "23dc36797d7488cba3e9f7309a965155",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.8",
"size": 19653,
"upload_time": "2025-01-19T02:45:10",
"upload_time_iso_8601": "2025-01-19T02:45:10.880078Z",
"url": "https://files.pythonhosted.org/packages/86/f9/4e776266ca80596497323e5e9b963bc653ebd704dc0b2d354290a7370712/wnb-0.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d55bb783577f6aea17327344982820305687b3f53563f42d4d99354580fc7cf1",
"md5": "7d283e5d0a589e6358af5678e1ce903c",
"sha256": "aa08a5170f50ab98a5e2bda15a2051a912d5734b8b7077e609aca37f955e9a87"
},
"downloads": -1,
"filename": "wnb-0.6.0.tar.gz",
"has_sig": false,
"md5_digest": "7d283e5d0a589e6358af5678e1ce903c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.8",
"size": 105747,
"upload_time": "2025-01-19T02:45:11",
"upload_time_iso_8601": "2025-01-19T02:45:11.918514Z",
"url": "https://files.pythonhosted.org/packages/d5/5b/b783577f6aea17327344982820305687b3f53563f42d4d99354580fc7cf1/wnb-0.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-19 02:45:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "msamsami",
"github_project": "wnb",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.4.1"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.8.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.2"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
">=",
"4.8.0"
]
]
}
],
"lcname": "wnb"
}