<h1><p align = 'center'><strong> Monotonic-Optimal-Binning </strong> </p></h1>
<h3><p align = 'center'><strong> Python implementation (MOBPY) </strong> </p></h3>
[**MOB**](https://pypi.org/project/MOBPY/) is a statistical approach to transform continuous variables into optimal and monotonic categorical variables. In this project, we have expanded the application to allow the users to merge the bins based on `statistics` or `bin size`. This is a Python-based project that enables the users to achieve monotone optimal binning results aligned with their expectations.<br>
<h2><strong> Installation </strong></h2>
```bash
python3 -m pip install MOBPY
```
<h2><strong> Usage </strong></h2>
**_Example_**:
```python
import pandas as pd
from MOBPY.MOB import MOB
if __name__ == '__main__' :
# import the testing datasets
df = pd.read_csv('/data/german_data_credit_cat.csv')
# Original values in the column are [1,2], make it into 1 representing the positive term, and 0 for the other one.
df['default'] = df['default'] - 1
# run the MOB algorithm to discretize the variable 'Durationinmonth'.
MOB_ALGO = MOB(data = df, var = 'Durationinmonth', response = 'default', exclude_value = None)
# A must-do step is to set the binning constraints.
MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3,
max_samples = 0.4, min_samples = 0.05,
min_bads = 0.05,
init_pvalue = 0.4,
maximize_bins=True)
# execute the MOB algorithm.
SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size') # Run under the bins size base.
StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats') # Run under the statistical base.
```
The `runMOB` method will return a `pandas.DataFrame` which shows the binning result of the variable and also the WoE summary information for each bin.
<p align = 'center'><img src = 'https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/images/Durationinmonth%20bins%20summary.png' alt = 'Image' style = 'width: 800px'/></p>
And after we receive the binning result dataframe, we can plot it by using `MOBPY.plot.MOB_PLOT.plotBinsSummary` to visualize the binning summary result.
```python
from MOBPY.plot.MOB_PLOT import MOB_PLOT
# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')
print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')
```
<p align = 'center'><img src = 'https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/charts/Durationinmonth-Size.png' alt = 'Image' style = 'width: 1200px'/></p>
<h2> <strong> Highlighted Features </strong></h2>
**_User Preferences_**:
The MOB algorithm offers two user preference settings (**`mergeMethod`** argument):
1. `Size`: This setting allows you to optimize the sample size of each bin within specified maximum and minimum limits while ensuring that the minimum number of bins constraint is maintained.
2. `Stats`: With this setting, the algorithm applies a stricter approach based on hypothesis testing results.<br>
Typically, the `'Stats'` (statistical-based) and `'Size'` (bin size-based) methods yield identical results. However, when dealing with data under certain scenarios where the `'Size'` method, employed by **MOB**, tends to prioritize maintaining the population of each bin within the maximum and minimum limits. In contrast, the `'Stats'` method adheres to a more rigorous logic based on the results of hypothesis testing.
For example,
```python
# run the MOB algorithm to discretize the variable 'Creditamount'.
MOB_ALGO = MOB(data = df, var = 'Creditamount', response = 'default', exclude_value = None)
# Set Binning Constraints (Must-Do!)
MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3,
max_samples = 0.4, min_samples = 0.05,
min_bads = 0.05,
init_pvalue = 0.4,
maximize_bins=True)
# mergeMethod = 'Size' means to run MOB algorithm under bins size base
SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size')
StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats')
# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')
print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')
```
<div>
<table >
<thead>
<tr>
<th style="text-align: center;">SizeBinning</th>
<th style="text-align: center;">StatsBinning</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;">runMOB(mergeMethod='Size') (bins size base)</td>
<td style="text-align: center;">runMOB(mergeMethod='Stats') (statistical base)</td>
</tr>
<tr>
<td>
<img src="https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/charts/Creditamount-Size.png" width="400" >
</td>
<td>
<img src="https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/charts/Creditamount-Stats.png" width="400">
</td>
</tr>
</tbody>
</table>
</div>
The left side image is the result generated by **`mergeMethod = 'Size'`** (bin size-based), and the right side is the result generated by **`mergeMethod = 'Stats'`** (statistical-based). We can see that the `'Size'` method is designed to merge bins that fail to meet the minimum sample population requirement. This approach ensures that the number of bins remains within the specified limit, preventing it from exceeding the minimum bin limitation. By merging bins that fall short of the population threshold, the `'Size'` method effectively maintains a balanced distribution of data across the bins..<br><br>
<h2><strong> Full Documentation </strong></h2>
↪ [Full API Reference](https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/MOBPY-API-Ref.md)<br><br>
<h2> <strong> Environment </strong></h2>
```bash
OS : macOS Ventura
IDE: Visual Studio Code 1.79.2 (Universal)
Language : Python 3.9.7
- pandas 1.3.4
- numpy 1.20.3
- scipy 1.7.1
- matplotlib 3.7.1
```
<h2><strong> Reference </strong></h2>
- Testing Dataset : [German Credit Risk](https://www.kaggle.com/datasets/uciml/german-credit) from [Kaggle](https://www.kaggle.com/)
- [Mironchyk, Pavel, and Viktor Tchistiakov. "Monotone optimal binning algorithm for credit risk modeling." Utr. Work. Pap (2017).](https://www.researchgate.net/profile/Viktor-Tchistiakov/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling/links/5a5dd1a8458515c03edf9a97/Monotone-optimal-binning-algorithm-for-credit-risk-modeling.pdf)
- GitHub Project : [Monotone Optimal Binning (SAS 9.4 version)](https://github.com/cdfq384903/MonotonicOptimalBinning)
<h2><strong> Authors </strong></h2>
1. Chen, Ta-Hung (Denny) <br>
- LinkedIn Profile : https://www.linkedin.com/in/dennychen-tahung/
- E-Mail : denny20700@gmail.com
2. Tsai, Yu-Cheng (Darren)
- LindedIn Profile : https://www.linkedin.com/in/darren-yucheng-tsai/
- E-Mail :
Raw data
{
"_id": null,
"home_page": "",
"name": "MOBPY",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "\"Chen, Ta-Hung\" <denny20700@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3a/ee/710bbf5253c7f01eac867a38e3e5f17db84c4278659c353f476c21ce1a62/MOBPY-1.0.1.tar.gz",
"platform": null,
"description": "<h1><p align = 'center'><strong> Monotonic-Optimal-Binning </strong> </p></h1>\n<h3><p align = 'center'><strong> Python implementation (MOBPY) </strong> </p></h3>\n\n[**MOB**](https://pypi.org/project/MOBPY/) is a statistical approach to transform continuous variables into optimal and monotonic categorical variables. In this project, we have expanded the application to allow the users to merge the bins based on `statistics` or `bin size`. This is a Python-based project that enables the users to achieve monotone optimal binning results aligned with their expectations.<br>\n\n<h2><strong> Installation </strong></h2>\n\n```bash\npython3 -m pip install MOBPY\n```\n\n<h2><strong> Usage </strong></h2>\n\n\n**_Example_**:\n\n```python\nimport pandas as pd\nfrom MOBPY.MOB import MOB\n\n\nif __name__ == '__main__' :\n # import the testing datasets\n df = pd.read_csv('/data/german_data_credit_cat.csv')\n \n # Original values in the column are [1,2], make it into 1 representing the positive term, and 0 for the other one.\n df['default'] = df['default'] - 1\n\n # run the MOB algorithm to discretize the variable 'Durationinmonth'.\n MOB_ALGO = MOB(data = df, var = 'Durationinmonth', response = 'default', exclude_value = None)\n # A must-do step is to set the binning constraints.\n MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, \n max_samples = 0.4, min_samples = 0.05, \n min_bads = 0.05, \n init_pvalue = 0.4, \n maximize_bins=True)\n # execute the MOB algorithm.\n SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size') # Run under the bins size base.\n\n StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats') # Run under the statistical base. \n \n```\n\n\nThe `runMOB` method will return a `pandas.DataFrame` which shows the binning result of the variable and also the WoE summary information for each bin. \n\n<p align = 'center'><img src = 'https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/images/Durationinmonth%20bins%20summary.png' alt = 'Image' style = 'width: 800px'/></p>\n\nAnd after we receive the binning result dataframe, we can plot it by using `MOBPY.plot.MOB_PLOT.plotBinsSummary` to visualize the binning summary result.\n\n```python\nfrom MOBPY.plot.MOB_PLOT import MOB_PLOT\n\n# plot the bin summary data.\nprint('Bins Size Base')\nMOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')\n\nprint('Statisitcal Base')\nMOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')\n```\n\n<p align = 'center'><img src = 'https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/charts/Durationinmonth-Size.png' alt = 'Image' style = 'width: 1200px'/></p>\n\n\n<h2> <strong> Highlighted Features </strong></h2>\n\n**_User Preferences_**:\n\nThe MOB algorithm offers two user preference settings (**`mergeMethod`** argument):\n\n1. `Size`: This setting allows you to optimize the sample size of each bin within specified maximum and minimum limits while ensuring that the minimum number of bins constraint is maintained.\n\n2. `Stats`: With this setting, the algorithm applies a stricter approach based on hypothesis testing results.<br>\n\n\nTypically, the `'Stats'` (statistical-based) and `'Size'` (bin size-based) methods yield identical results. However, when dealing with data under certain scenarios where the `'Size'` method, employed by **MOB**, tends to prioritize maintaining the population of each bin within the maximum and minimum limits. In contrast, the `'Stats'` method adheres to a more rigorous logic based on the results of hypothesis testing.\n\nFor example, \n\n```python\n# run the MOB algorithm to discretize the variable 'Creditamount'.\nMOB_ALGO = MOB(data = df, var = 'Creditamount', response = 'default', exclude_value = None) \n# Set Binning Constraints (Must-Do!)\nMOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, \n max_samples = 0.4, min_samples = 0.05, \n min_bads = 0.05, \n init_pvalue = 0.4, \n maximize_bins=True)\n# mergeMethod = 'Size' means to run MOB algorithm under bins size base\nSizeBinning = MOB_ALGO.runMOB(mergeMethod='Size')\nStatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats')\n\n# plot the bin summary data.\nprint('Bins Size Base')\nMOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')\nprint('Statisitcal Base')\nMOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')\n```\n\n<div>\n <table >\n <thead>\n <tr>\n <th style=\"text-align: center;\">SizeBinning</th>\n <th style=\"text-align: center;\">StatsBinning</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td style=\"text-align: center;\">runMOB(mergeMethod='Size') (bins size base)</td>\n <td style=\"text-align: center;\">runMOB(mergeMethod='Stats') (statistical base)</td>\n </tr>\n <tr>\n <td>\n <img src=\"https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/charts/Creditamount-Size.png\" width=\"400\" >\n </td>\n <td>\n <img src=\"https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/charts/Creditamount-Stats.png\" width=\"400\">\n </td>\n </tr>\n </tbody>\n </table>\n</div>\n\nThe left side image is the result generated by **`mergeMethod = 'Size'`** (bin size-based), and the right side is the result generated by **`mergeMethod = 'Stats'`** (statistical-based). We can see that the `'Size'` method is designed to merge bins that fail to meet the minimum sample population requirement. This approach ensures that the number of bins remains within the specified limit, preventing it from exceeding the minimum bin limitation. By merging bins that fall short of the population threshold, the `'Size'` method effectively maintains a balanced distribution of data across the bins..<br><br>\n\n\n<h2><strong> Full Documentation </strong></h2>\n\n\u21aa [Full API Reference](https://github.com/ChenTaHung/Monotonic-Optimal-Binning/blob/main/doc/MOBPY-API-Ref.md)<br><br>\n\n<h2> <strong> Environment </strong></h2>\n\n```bash\nOS : macOS Ventura\n\nIDE: Visual Studio Code 1.79.2 (Universal)\n\nLanguage : Python 3.9.7 \n - pandas 1.3.4\n - numpy 1.20.3\n - scipy 1.7.1\n - matplotlib 3.7.1\n```\n\n\n<h2><strong> Reference </strong></h2>\n\n- Testing Dataset : [German Credit Risk](https://www.kaggle.com/datasets/uciml/german-credit) from [Kaggle](https://www.kaggle.com/)\n\n- [Mironchyk, Pavel, and Viktor Tchistiakov. \"Monotone optimal binning algorithm for credit risk modeling.\" Utr. Work. Pap (2017).](https://www.researchgate.net/profile/Viktor-Tchistiakov/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling/links/5a5dd1a8458515c03edf9a97/Monotone-optimal-binning-algorithm-for-credit-risk-modeling.pdf)\n\n- GitHub Project : [Monotone Optimal Binning (SAS 9.4 version)](https://github.com/cdfq384903/MonotonicOptimalBinning)\n\n<h2><strong> Authors </strong></h2>\n\n\n1. Chen, Ta-Hung (Denny) <br>\n - LinkedIn Profile : https://www.linkedin.com/in/dennychen-tahung/\n - E-Mail : denny20700@gmail.com\n2. Tsai, Yu-Cheng (Darren)\n - LindedIn Profile : https://www.linkedin.com/in/darren-yucheng-tsai/\n - E-Mail : \n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.",
"version": "1.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/ChenTaHung/Monotonic-Optimal-Binning/issues",
"Homepage": "https://github.com/ChenTaHung/Monotonic-Optimal-Binning"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b3f1b1b6c09eddcf79b9b4a226d21d5b1f2f173f9cb2fe1985ce6481a552ddd1",
"md5": "c18d19dfb9fd1a445c8cee90da2925aa",
"sha256": "5971908011fa7110415e247298303334e9d8b2a664522d12dcae69873e4c1b67"
},
"downloads": -1,
"filename": "MOBPY-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c18d19dfb9fd1a445c8cee90da2925aa",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 15008,
"upload_time": "2023-07-09T08:30:16",
"upload_time_iso_8601": "2023-07-09T08:30:16.746459Z",
"url": "https://files.pythonhosted.org/packages/b3/f1/b1b6c09eddcf79b9b4a226d21d5b1f2f173f9cb2fe1985ce6481a552ddd1/MOBPY-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3aee710bbf5253c7f01eac867a38e3e5f17db84c4278659c353f476c21ce1a62",
"md5": "1e3bbf7fcf89f3269aad07bff34de2a9",
"sha256": "0091f2684976dcb44fb278ce0caca1cad9f4a1ba6500761f2c4a77df01096ab4"
},
"downloads": -1,
"filename": "MOBPY-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "1e3bbf7fcf89f3269aad07bff34de2a9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 15277,
"upload_time": "2023-07-09T08:30:18",
"upload_time_iso_8601": "2023-07-09T08:30:18.038376Z",
"url": "https://files.pythonhosted.org/packages/3a/ee/710bbf5253c7f01eac867a38e3e5f17db84c4278659c353f476c21ce1a62/MOBPY-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-09 08:30:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ChenTaHung",
"github_project": "Monotonic-Optimal-Binning",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
"==",
"1.3.4"
]
]
},
{
"name": "numpy",
"specs": []
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.7.1"
]
]
},
{
"name": "matplotlib-inline",
"specs": []
},
{
"name": "scipy",
"specs": []
},
{
"name": "seaborn",
"specs": []
},
{
"name": "bokeh",
"specs": []
},
{
"name": "graphviz",
"specs": [
[
"==",
"0.19.1"
]
]
},
{
"name": "typing",
"specs": [
[
"==",
"3.7.4.3"
]
]
},
{
"name": "typing-extensions",
"specs": []
}
],
"lcname": "mobpy"
}