# WoeBoost
**Author:** [xRiskLab](https://github.com/xRiskLab)<br>
**Version:** v1.0.1<br>
**License:** [MIT License](https://opensource.org/licenses/MIT) (2024)
![Title](https://raw.githubusercontent.com/xRiskLab/woeboost/main/docs/ims/woeboost.png)
<div align="center">
<img src="https://img.shields.io/pypi/v/woeboost" alt="PyPI Version"/>
<img src="https://img.shields.io/github/license/xRiskLab/woeboost" alt="License"/>
<img src="https://img.shields.io/github/contributors/xRiskLab/woeboost" alt="Contributors"/>
<img src="https://img.shields.io/github/issues/xRiskLab/woeboost" alt="Issues"/>
<img src="https://img.shields.io/github/forks/xRiskLab/woeboost" alt="Forks"/>
<img src="https://img.shields.io/github/stars/xRiskLab/woeboost" alt="Stars"/>
</div><br>
**WoeBoost** is a Python ๐ package designed to bridge the gap between the predictive power of gradient boosting and the interpretability required in high-stakes domains such as finance, healthcare, and law. It introduces an interpretable, evidence-driven framework for scoring tasks, inspired by the principles of **Weight of Evidence (WOE)** and the ideas of **Alan M. Turing**.
## ๐ Key Features
- **๐ Gradient Boosting with Explainability**: Combines the strength of gradient boosting with the interpretability of WOE-based scoring systems.
- **๐ Calibrated Scores**: Produces well-calibrated scores essential for decision-making in regulated environments.
- **๐ค AutoML-like Enhancements**:
- Infers monotonic relationships automatically (`infer_monotonicity`).
- Supports early stopping for efficient training (`enable_early_stopping`).
- **๐ง Support for Missing Values & Categorical Inputs**: Handles various data types seamlessly while maintaining interpretability.
- **๐ ๏ธ Diagnostic Toolkit**:
- Partial dependence plots.
- Feature importance analysis.
- Decision boundary visualization.
- **๐ WOE Inference Maker**: Provides classical WOE calculations and bin-level insights.
## โ๏ธ How It Works
1. **๐ Initialization**: Starts with prior log odds, representing baseline probabilities.
2. **๐ Iterative Updates**: Each boosting iteration calculates residual per each binned feature and sums residuals into total evidence (WOE), updating predictions.
3. **๐ Evidence Accumulation**: Combines evidence from all iterations, producing a cumulative and interpretable scoring model.
## ๐ง Why WoeBoost?
- **๐ก Interpretability**: Every model step adheres to principles familiar to risk managers and data scientists, ensuring transparency and trust.
- **โ
Alignment with Regulatory Requirements**: Calibrated and interpretable results meet the demands of high-stakes applications.
- **โก Flexibility**: Works seamlessly with diverse data types and supports customizations for complex datasets, including multi-threading for CPU.
## Installation โคต
Install the package using pip:
```bash
pip install woeboost
```
## ๐ป Example Usage
Below we provide two examples of using WoeBoost.
### Training and Inference with WoeBoost classifier
```python
from woeboost import WoeBoostClassifier
# Initialize the classifier
woe_model = WoeBoostClassifier(infer_monotonicity=True)
# Fit the model
woe_model.fit(X_train, y_train)
# Predict probabilities and scores
probas = woe_model.predict_proba(X_test)[:, 1]
preds = woe_model.predict(X_test)
scores = woe_model.predict_score(X_test)
```
### Preparation of WOE inputs for logistic regression
```python
from woeboost import WoeBoostClassifier
# Initialize the classifier
woe_model = WoeBoostClassifier(infer_monotonicity=True)
# Fit the model
woe_model.fit(X_train, y_train)
X_woe_train = woe_model.transform(X_train)
X_woe_test = woe_model.transform(X_test)
```
## ๐ Documentation
- **[`Technical Note`](https://github.com/xRiskLab/woeboost/blob/main/docs/technical_note.md)**: Overview of the WoeBoost modules.
- **[`learner.py`](https://github.com/xRiskLab/woeboost/blob/main/docs/learner.md)**: Core module implementing a base learner.
- **[`classifier.py`](https://github.com/xRiskLab/woeboost/blob/main/docs/classifier.md)**: Module for building a boosted classification model.
- **[`explainer.py`](https://github.com/xRiskLab/woeboost/blob/main/docs/explainer.md)**: Module for explaining the model predictions.
## ๐ License
This project is licensed under the MIT License - see the **[LICENSE](https://github.com/xRiskLab/woeboost/blob/main/LICENSE.md)** file for details.
## ๐ Change Log
- **v1.0.1**
- Adjusted feature importance default plot size and added minor updates of documentation.
- **v1.0.0**
- Initial release of WoeBoost.
Raw data
{
"_id": null,
"home_page": "https://github.com/xRiskLab/woeboost",
"name": "woeboost",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "weight of evidence, gradient boosting, scoring, classification",
"author": "xRiskLab",
"author_email": "contact@xrisklab.ai",
"download_url": "https://files.pythonhosted.org/packages/2b/56/c8a84e5faa6e872e8c4a2f06821e220a9c42b780dba14c1cf32e59b8885b/woeboost-1.0.1.tar.gz",
"platform": null,
"description": "# WoeBoost\n**Author:** [xRiskLab](https://github.com/xRiskLab)<br>\n**Version:** v1.0.1<br>\n**License:** [MIT License](https://opensource.org/licenses/MIT) (2024)\n\n![Title](https://raw.githubusercontent.com/xRiskLab/woeboost/main/docs/ims/woeboost.png)\n\n<div align=\"center\">\n <img src=\"https://img.shields.io/pypi/v/woeboost\" alt=\"PyPI Version\"/> \n <img src=\"https://img.shields.io/github/license/xRiskLab/woeboost\" alt=\"License\"/> \n <img src=\"https://img.shields.io/github/contributors/xRiskLab/woeboost\" alt=\"Contributors\"/> \n <img src=\"https://img.shields.io/github/issues/xRiskLab/woeboost\" alt=\"Issues\"/> \n <img src=\"https://img.shields.io/github/forks/xRiskLab/woeboost\" alt=\"Forks\"/> \n <img src=\"https://img.shields.io/github/stars/xRiskLab/woeboost\" alt=\"Stars\"/>\n</div><br>\n\n**WoeBoost** is a Python \ud83d\udc0d package designed to bridge the gap between the predictive power of gradient boosting and the interpretability required in high-stakes domains such as finance, healthcare, and law. It introduces an interpretable, evidence-driven framework for scoring tasks, inspired by the principles of **Weight of Evidence (WOE)** and the ideas of **Alan M. Turing**.\n\n## \ud83d\udd11 Key Features\n\n- **\ud83c\udf1f Gradient Boosting with Explainability**: Combines the strength of gradient boosting with the interpretability of WOE-based scoring systems.\n- **\ud83d\udcca Calibrated Scores**: Produces well-calibrated scores essential for decision-making in regulated environments.\n- **\ud83e\udd16 AutoML-like Enhancements**:\n - Infers monotonic relationships automatically (`infer_monotonicity`).\n - Supports early stopping for efficient training (`enable_early_stopping`).\n- **\ud83d\udd27 Support for Missing Values & Categorical Inputs**: Handles various data types seamlessly while maintaining interpretability.\n- **\ud83d\udee0\ufe0f Diagnostic Toolkit**:\n - Partial dependence plots.\n - Feature importance analysis.\n - Decision boundary visualization.\n- **\ud83d\udcc8 WOE Inference Maker**: Provides classical WOE calculations and bin-level insights.\n\n## \u2699\ufe0f How It Works\n\n1. **\ud83d\udd0d Initialization**: Starts with prior log odds, representing baseline probabilities.\n2. **\ud83d\udcc8 Iterative Updates**: Each boosting iteration calculates residual per each binned feature and sums residuals into total evidence (WOE), updating predictions.\n3. **\ud83d\udd17 Evidence Accumulation**: Combines evidence from all iterations, producing a cumulative and interpretable scoring model.\n\n## \ud83e\uddd0 Why WoeBoost?\n\n- **\ud83d\udca1 Interpretability**: Every model step adheres to principles familiar to risk managers and data scientists, ensuring transparency and trust.\n- **\u2705 Alignment with Regulatory Requirements**: Calibrated and interpretable results meet the demands of high-stakes applications.\n- **\u26a1 Flexibility**: Works seamlessly with diverse data types and supports customizations for complex datasets, including multi-threading for CPU.\n\n## Installation \u2935\n\nInstall the package using pip:\n\n```bash\npip install woeboost\n```\n\n## \ud83d\udcbb Example Usage\n\nBelow we provide two examples of using WoeBoost.\n\n### Training and Inference with WoeBoost classifier\n\n```python\nfrom woeboost import WoeBoostClassifier\n\n# Initialize the classifier\nwoe_model = WoeBoostClassifier(infer_monotonicity=True)\n\n# Fit the model\nwoe_model.fit(X_train, y_train)\n\n# Predict probabilities and scores\nprobas = woe_model.predict_proba(X_test)[:, 1]\npreds = woe_model.predict(X_test)\nscores = woe_model.predict_score(X_test)\n```\n\n### Preparation of WOE inputs for logistic regression\n\n```python\nfrom woeboost import WoeBoostClassifier\n\n# Initialize the classifier\nwoe_model = WoeBoostClassifier(infer_monotonicity=True)\n\n# Fit the model\nwoe_model.fit(X_train, y_train)\n\nX_woe_train = woe_model.transform(X_train)\nX_woe_test = woe_model.transform(X_test)\n```\n\n## \ud83d\udcda Documentation\n\n- **[`Technical Note`](https://github.com/xRiskLab/woeboost/blob/main/docs/technical_note.md)**: Overview of the WoeBoost modules.\n- **[`learner.py`](https://github.com/xRiskLab/woeboost/blob/main/docs/learner.md)**: Core module implementing a base learner.\n- **[`classifier.py`](https://github.com/xRiskLab/woeboost/blob/main/docs/classifier.md)**: Module for building a boosted classification model.\n- **[`explainer.py`](https://github.com/xRiskLab/woeboost/blob/main/docs/explainer.md)**: Module for explaining the model predictions.\n\n## \ud83d\udcc4 License\nThis project is licensed under the MIT License - see the **[LICENSE](https://github.com/xRiskLab/woeboost/blob/main/LICENSE.md)** file for details.\n\n## \ud83d\udcc3 Change Log\n\n- **v1.0.1**\n - Adjusted feature importance default plot size and added minor updates of documentation.\n\n- **v1.0.0**\n - Initial release of WoeBoost.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "WoeBoost: Weight of Evidence (WOE) Gradient Boosting",
"version": "1.0.1",
"project_urls": {
"Homepage": "https://github.com/xRiskLab/woeboost",
"Repository": "https://github.com/xRiskLab/woeboost"
},
"split_keywords": [
"weight of evidence",
" gradient boosting",
" scoring",
" classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "eeacc6fc16986060d270737991ee1ff677e2543e1afe6869992a15d76de92384",
"md5": "95af12f2d3c914e7e0b0b78c126c8ff3",
"sha256": "84666dc837fd45c67bdd4a0c4965e3a860e7060ae5b5466c451aa68a3f1a8ef7"
},
"downloads": -1,
"filename": "woeboost-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "95af12f2d3c914e7e0b0b78c126c8ff3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 24885,
"upload_time": "2024-12-09T18:05:11",
"upload_time_iso_8601": "2024-12-09T18:05:11.427875Z",
"url": "https://files.pythonhosted.org/packages/ee/ac/c6fc16986060d270737991ee1ff677e2543e1afe6869992a15d76de92384/woeboost-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2b56c8a84e5faa6e872e8c4a2f06821e220a9c42b780dba14c1cf32e59b8885b",
"md5": "d2761556caf032b2d7609e835a21a295",
"sha256": "020dcaa897595e1b3cf6a1b6e5769e39af3b5a67b68fe6399d57cade9981494b"
},
"downloads": -1,
"filename": "woeboost-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "d2761556caf032b2d7609e835a21a295",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 24153,
"upload_time": "2024-12-09T18:05:13",
"upload_time_iso_8601": "2024-12-09T18:05:13.443956Z",
"url": "https://files.pythonhosted.org/packages/2b/56/c8a84e5faa6e872e8c4a2f06821e220a9c42b780dba14c1cf32e59b8885b/woeboost-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-09 18:05:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xRiskLab",
"github_project": "woeboost",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "cmap",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "contourpy",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "cycler",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "fonttools",
"specs": [
[
"==",
"4.55.2"
]
]
},
{
"name": "importlib-resources",
"specs": [
[
"==",
"6.4.5"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.4.2"
]
]
},
{
"name": "kiwisolver",
"specs": [
[
"==",
"1.4.7"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.9.3"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"24.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"11.0.0"
]
]
},
{
"name": "pydocstyle",
"specs": [
[
"==",
"6.3.0"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.18.0"
]
]
},
{
"name": "pyparsing",
"specs": [
[
"==",
"3.2.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.2"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.9.4"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "snowballstemmer",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.12.2"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2024.2"
]
]
},
{
"name": "zipp",
"specs": [
[
"==",
"3.21.0"
]
]
}
],
"lcname": "woeboost"
}