<h1><p align="center"><strong>Monotonic-Optimal-Binning</strong></p></h1>
<h2><p align="center">MOBPY - Monotonic Optimal Binning for Python</p></h2>
[](https://github.com/ChenTaHung/Monotonic-Optimal-Binning/actions/workflows/RunTests.yml)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://badge.fury.io/py/mobpy)
A fast, deterministic Python library for creating **monotonic optimal bins** with respect to a target variable. MOBPY implements a stack-based Pool-Adjacent-Violators Algorithm (PAVA) followed by constrained adjacent merging, ensuring strict monotonicity and statistical robustness.
## π― Key Features
- **β‘ Fast & Deterministic**: Stack-based PAVA with O(n) complexity, followed by O(k) adjacent merges
- **π Monotonic Guarantee**: Ensures strict monotonicity (increasing/decreasing) between bins and target
- **π§ Flexible Constraints**: Min/max samples, min positives, min/max bins with automatic resolution
- **π WoE & IV Calculation**: Automatic Weight of Evidence and Information Value for binary targets
- **π¨ Rich Visualizations**: Comprehensive plotting functions for PAVA process and binning results
- **βΎοΈ Safe Edges**: First bin starts at -β, last bin ends at +β for complete coverage
## π¦ Installation
```bash
pip install MOBPY
```
For development installation:
```bash
git clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git
cd Monotonic-Optimal-Binning
pip install -e .
```
## π Quick Start
```python
import pandas as pd
import numpy as np
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_bin_statistics, plot_pava_comparison
import matplotlib.pyplot as plt
df = pd.read_csv('/Users/chentahung/Desktop/git/mob-py/data/german_data_credit_cat.csv')
# Convert default to 0/1 (original is 1/2)
df['default'] = df['default'] - 1
# Configure constraints
constraints = BinningConstraints(
min_bins=4, # Minimum number of bins
max_bins=6, # Maximum number of bins
min_samples=0.05, # Each bin needs at least 5% of total samples
min_positives=0.01 # Each bin needs at least 1% of total positive samples
)
# Create and fit the binner
binner = MonotonicBinner(
df=df,
x='Durationinmonth',
y='default',
constraints=constraints
)
binner.fit()
# Get binning results
bins = binner.bins_() # Bin boundaries
summary = binner.summary_() # Detailed statistics with WoE/IV
display(summary)
```
Output:
```
bucket count count_pct sum mean std min max woe iv
0 (-inf, 9) 94 9.4 10.0 0.106383 0.309980 0.0 1.0 1.241870 0.106307
1 [9, 16) 337 33.7 79.0 0.234421 0.424267 0.0 1.0 0.335632 0.035238
2 [16, 45) 499 49.9 171.0 0.342685 0.475084 0.0 1.0 -0.193553 0.019342
3 [45, +inf) 70 7.0 4 0.0 0.571429 0.498445 0.0 1.0 -1.127082 0.102180
```
## π Visualization
MOBPY provides comprehensive visualization of binning results:
```python
# Generate comprehensive binning analysis plot
fig = plot_bin_statistics(binner)
plt.show()
```

*The `plot_bin_statistics` function creates a multi-panel visualization showing:*
- **Top Left**: Weight of Evidence (WoE) bars for each bin
- **Top Right**: Event rate trend with sample distribution
- **Bottom Left**: Sample distribution histogram
- **Bottom Right**: Target distribution boxplots per bin
## π¬ Understanding the Algorithm
MOBPY uses a two-stage approach:
### Stage 1: PAVA (Pool-Adjacent-Violators Algorithm)
Creates initial monotonic blocks by pooling adjacent violators:
```python
from MOBPY.plot import plot_pava_comparison
# Visualize PAVA process
fig = plot_pava_comparison(binner)
plt.show()
```

### Stage 2: Constrained Merging
Merges adjacent blocks to satisfy constraints while preserving monotonicity:
```python
# Check initial PAVA blocks vs final bins
print(f"PAVA blocks: {len(binner.pava_blocks_())}")
print(f"Final bins: {len(binner.bins_())}")
> PAVA blocks: 10
> Final bins: 4
```
## ποΈ Advanced Configuration
### Custom Constraints
```python
# Fractional constraints (adaptive to data size)
constraints = BinningConstraints(
max_bins=8,
min_samples=0.05, # 5% of total samples
max_samples=0.30, # 30% of total samples
min_positives=0.01 # 1% of positive samples
)
# Absolute constraints (fixed values)
constraints = BinningConstraints(
max_bins=5,
min_samples=100, # At least 100 samples per bin
max_samples=500 # At most 500 samples per bin
)
```
### Handling Special Values
```python
# Exclude special codes from binning
age_binner = MonotonicBinner(
df=df,
x='Age',
y='default',
constraints= constraints,
exclude_values=[-999, -1, 0] # Treat as separate bins
).fit()
```
### Transform New Data
```python
new_data = pd.DataFrame({'age': [25, 45, 65]})
# Get bin assignments
bins = age_binner.transform(new_data['age'], assign='interval')
print(bins)
# Output:
# 0 (-inf, 26)
# 1 [35, 75)
# 2 [35, 75)
# Name: age, dtype: object
# Get WoE values for scoring
print(age_binner.transform(new_data['age'], assign='woe'))
# Output:
# 0 -0.526748
# 1 0.306015
# 2 0.306015
```
## π Use Cases
MOBPY is ideal for:
- **Credit Risk Modeling**: Create monotonic risk score bins for regulatory compliance
- **Insurance Pricing**: Develop age/risk factor bands with clear premium progression
- **Customer Segmentation**: Build ordered customer value tiers
- **Feature Engineering**: Generate interpretable binned features for ML models
- **Regulatory Reporting**: Ensure transparent, monotonic relationships in models
## π§ Performance
MOBPY is optimized for production use:
- **Time Complexity**: O(n log n) for sorting + O(n) for PAVA + O(kΒ²) for merging
- **Space Complexity**: O(n) for data storage
- **Scalability**: Handles datasets from 10Β² to 10βΆ samples efficiently
Benchmark on 1M samples, 20 final bins:
- Data preparation: 0.3s
- PAVA fitting: 0.8s
- Constraint merging: 0.2s
- **Total time: ~1.3s**
## π Documentation
- [API Reference](docs/api_reference.md) - Complete API documentation
- [Algorithm Details](docs/core) - Mathematical foundations
- [Examples](examples/) - Jupyter notebooks with real-world examples
## π§ͺ Testing
```bash
# Run unit tests
pytest -vv -ignore-userwarnings -q
```
## π Reference
* [Mironchyk, Pavel, and Viktor Tchistiakov. *Monotone optimal binning algorithm for credit risk modeling.* (2017)](https://www.researchgate.net/profile/Viktor-Tchistiakov/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling/links/5a5dd1a8458515c03edf9a97/Monotone-optimal-binning-algorithm-for-credit-risk-modeling.pdf)
* [Smalbil, P. J. *The choices of weights in the iterative convex minorant algorithm.* (2015)](https://repository.tudelft.nl/islandora/object/uuid:5a111157-1a92-4176-9c8e-0b848feb7c30)
* Testing Dataset 1: [German Credit Risk](https://www.kaggle.com/datasets/uciml/german-credit) from Kaggle
* Testing Dataset 2: [US Health Insurance Dataset](https://www.kaggle.com/datasets/teertha/ushealthinsurancedataset) from Kaggle
* GitHub Project: [Monotone Optimal Binning (SAS 9.4 version)](https://github.com/cdfq384903/MonotonicOptimalBinning)
## π₯ Authors
1. Ta-Hung (Denny) Chen
* LinkedIn: [https://www.linkedin.com/in/dennychen-tahung/](https://www.linkedin.com/in/dennychen-tahung/)
* E-mail: [denny20700@gmail.com](mailto:denny20700@gmail.com)
2. Yu-Cheng (Darren) Tsai
* LinkedIn: [https://www.linkedin.com/in/darren-yucheng-tsai/](https://www.linkedin.com/in/darren-yucheng-tsai/)
* E-mail:
3. Peter Chen
* LinkedIn: [https://www.linkedin.com/in/peterchentsungwei/](https://www.linkedin.com/in/peterchentsungwei/)
* E-mail: [peterwei20700@gmail.com](mailto:peterwei20700@gmail.com)
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Raw data
{
"_id": null,
"home_page": null,
"name": "MOBPY",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "\"Ta-Hung (Denny) Chen\" <denny20700@gmail.com>",
"keywords": "binning, woe, iv, pava, isotonic, credit-risk, monotonic, scorecard, feature-engineering",
"author": null,
"author_email": "\"Ta-Hung (Denny) Chen\" <denny20700@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/dc/d8/d0664f65b55fbbf8bcb5df1829be91fbc9f9a87b5c19d4f587ac2dd50f03/mobpy-2.0.0.tar.gz",
"platform": null,
"description": "<h1><p align=\"center\"><strong>Monotonic-Optimal-Binning</strong></p></h1>\n\n<h2><p align=\"center\">MOBPY - Monotonic Optimal Binning for Python</p></h2>\n\n[](https://github.com/ChenTaHung/Monotonic-Optimal-Binning/actions/workflows/RunTests.yml)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://badge.fury.io/py/mobpy)\n\nA fast, deterministic Python library for creating **monotonic optimal bins** with respect to a target variable. MOBPY implements a stack-based Pool-Adjacent-Violators Algorithm (PAVA) followed by constrained adjacent merging, ensuring strict monotonicity and statistical robustness.\n\n## \ud83c\udfaf Key Features\n\n- **\u26a1 Fast & Deterministic**: Stack-based PAVA with O(n) complexity, followed by O(k) adjacent merges\n- **\ud83d\udcca Monotonic Guarantee**: Ensures strict monotonicity (increasing/decreasing) between bins and target\n- **\ud83d\udd27 Flexible Constraints**: Min/max samples, min positives, min/max bins with automatic resolution\n- **\ud83d\udcc8 WoE & IV Calculation**: Automatic Weight of Evidence and Information Value for binary targets\n- **\ud83c\udfa8 Rich Visualizations**: Comprehensive plotting functions for PAVA process and binning results\n- **\u267e\ufe0f Safe Edges**: First bin starts at -\u221e, last bin ends at +\u221e for complete coverage\n\n## \ud83d\udce6 Installation\n\n```bash\npip install MOBPY\n```\n\nFor development installation:\n```bash\ngit clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git\ncd Monotonic-Optimal-Binning\npip install -e .\n```\n\n## \ud83d\ude80 Quick Start\n\n```python\nimport pandas as pd\nimport numpy as np\nfrom MOBPY import MonotonicBinner, BinningConstraints\nfrom MOBPY.plot import plot_bin_statistics, plot_pava_comparison\nimport matplotlib.pyplot as plt\n\ndf = pd.read_csv('/Users/chentahung/Desktop/git/mob-py/data/german_data_credit_cat.csv')\n# Convert default to 0/1 (original is 1/2)\ndf['default'] = df['default'] - 1\n\n# Configure constraints\nconstraints = BinningConstraints(\n min_bins=4, # Minimum number of bins\n max_bins=6, # Maximum number of bins\n min_samples=0.05, # Each bin needs at least 5% of total samples\n min_positives=0.01 # Each bin needs at least 1% of total positive samples\n)\n\n# Create and fit the binner\nbinner = MonotonicBinner(\n df=df,\n x='Durationinmonth',\n y='default',\n constraints=constraints\n)\nbinner.fit()\n\n# Get binning results\nbins = binner.bins_() # Bin boundaries\nsummary = binner.summary_() # Detailed statistics with WoE/IV\ndisplay(summary)\n```\n\nOutput:\n```\n bucket\t count\tcount_pct\tsum\t mean\t std\t min\t max\twoe\t iv\n0\t(-inf, 9)\t94\t 9.4\t 10.0\t0.106383\t0.309980\t0.0\t 1.0\t1.241870\t0.106307\n1\t[9, 16)\t 337\t 33.7\t 79.0\t0.234421\t0.424267\t0.0\t 1.0\t0.335632\t0.035238\n2\t[16, 45)\t499\t 49.9\t 171.0\t0.342685\t0.475084\t0.0\t 1.0\t-0.193553\t0.019342\n3\t[45, +inf)\t70\t 7.0\t4 0.0\t 0.571429\t0.498445\t0.0\t 1.0\t-1.127082\t0.102180\n```\n\n## \ud83d\udcca Visualization\n\nMOBPY provides comprehensive visualization of binning results:\n\n```python\n# Generate comprehensive binning analysis plot\nfig = plot_bin_statistics(binner)\nplt.show()\n```\n\n\n\n*The `plot_bin_statistics` function creates a multi-panel visualization showing:*\n- **Top Left**: Weight of Evidence (WoE) bars for each bin\n- **Top Right**: Event rate trend with sample distribution\n- **Bottom Left**: Sample distribution histogram\n- **Bottom Right**: Target distribution boxplots per bin\n\n## \ud83d\udd2c Understanding the Algorithm\n\nMOBPY uses a two-stage approach:\n\n### Stage 1: PAVA (Pool-Adjacent-Violators Algorithm)\nCreates initial monotonic blocks by pooling adjacent violators:\n\n```python\nfrom MOBPY.plot import plot_pava_comparison\n\n# Visualize PAVA process\nfig = plot_pava_comparison(binner)\nplt.show()\n```\n\n\n\n### Stage 2: Constrained Merging\nMerges adjacent blocks to satisfy constraints while preserving monotonicity:\n\n```python\n# Check initial PAVA blocks vs final bins\nprint(f\"PAVA blocks: {len(binner.pava_blocks_())}\")\nprint(f\"Final bins: {len(binner.bins_())}\")\n\n> PAVA blocks: 10\n> Final bins: 4\n```\n\n## \ud83c\udf9b\ufe0f Advanced Configuration\n\n### Custom Constraints\n\n```python\n# Fractional constraints (adaptive to data size)\nconstraints = BinningConstraints(\n max_bins=8,\n min_samples=0.05, # 5% of total samples\n max_samples=0.30, # 30% of total samples\n min_positives=0.01 # 1% of positive samples\n)\n\n# Absolute constraints (fixed values)\nconstraints = BinningConstraints(\n max_bins=5,\n min_samples=100, # At least 100 samples per bin\n max_samples=500 # At most 500 samples per bin\n)\n```\n\n### Handling Special Values\n\n```python\n# Exclude special codes from binning\nage_binner = MonotonicBinner(\n df=df,\n x='Age',\n y='default',\n constraints= constraints,\n exclude_values=[-999, -1, 0] # Treat as separate bins\n).fit()\n```\n\n### Transform New Data\n\n```python\nnew_data = pd.DataFrame({'age': [25, 45, 65]})\n\n# Get bin assignments\nbins = age_binner.transform(new_data['age'], assign='interval')\nprint(bins)\n# Output:\n# 0 (-inf, 26)\n# 1 [35, 75)\n# 2 [35, 75)\n# Name: age, dtype: object\n\n# Get WoE values for scoring\nprint(age_binner.transform(new_data['age'], assign='woe'))\n# Output:\n# 0 -0.526748\n# 1 0.306015\n# 2 0.306015\n```\n\n## \ud83d\udcc8 Use Cases\n\nMOBPY is ideal for:\n\n- **Credit Risk Modeling**: Create monotonic risk score bins for regulatory compliance\n- **Insurance Pricing**: Develop age/risk factor bands with clear premium progression\n- **Customer Segmentation**: Build ordered customer value tiers\n- **Feature Engineering**: Generate interpretable binned features for ML models\n- **Regulatory Reporting**: Ensure transparent, monotonic relationships in models\n\n## \ud83d\udd27 Performance\n\nMOBPY is optimized for production use:\n\n- **Time Complexity**: O(n log n) for sorting + O(n) for PAVA + O(k\u00b2) for merging\n- **Space Complexity**: O(n) for data storage\n- **Scalability**: Handles datasets from 10\u00b2 to 10\u2076 samples efficiently\n\nBenchmark on 1M samples, 20 final bins:\n- Data preparation: 0.3s\n- PAVA fitting: 0.8s\n- Constraint merging: 0.2s\n- **Total time: ~1.3s**\n\n## \ud83d\udcda Documentation\n\n- [API Reference](docs/api_reference.md) - Complete API documentation\n- [Algorithm Details](docs/core) - Mathematical foundations\n- [Examples](examples/) - Jupyter notebooks with real-world examples\n\n## \ud83e\uddea Testing\n\n```bash\n# Run unit tests\npytest -vv -ignore-userwarnings -q\n```\n\n## \ud83d\udcd6 Reference\n\n* [Mironchyk, Pavel, and Viktor Tchistiakov. *Monotone optimal binning algorithm for credit risk modeling.* (2017)](https://www.researchgate.net/profile/Viktor-Tchistiakov/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling/links/5a5dd1a8458515c03edf9a97/Monotone-optimal-binning-algorithm-for-credit-risk-modeling.pdf)\n* [Smalbil, P. J. *The choices of weights in the iterative convex minorant algorithm.* (2015)](https://repository.tudelft.nl/islandora/object/uuid:5a111157-1a92-4176-9c8e-0b848feb7c30)\n* Testing Dataset 1: [German Credit Risk](https://www.kaggle.com/datasets/uciml/german-credit) from Kaggle\n* Testing Dataset 2: [US Health Insurance Dataset](https://www.kaggle.com/datasets/teertha/ushealthinsurancedataset) from Kaggle\n* GitHub Project: [Monotone Optimal Binning (SAS 9.4 version)](https://github.com/cdfq384903/MonotonicOptimalBinning)\n\n## \ud83d\udc65 Authors\n\n1. Ta-Hung (Denny) Chen\n * LinkedIn: [https://www.linkedin.com/in/dennychen-tahung/](https://www.linkedin.com/in/dennychen-tahung/)\n * E-mail: [denny20700@gmail.com](mailto:denny20700@gmail.com)\n\n2. Yu-Cheng (Darren) Tsai\n * LinkedIn: [https://www.linkedin.com/in/darren-yucheng-tsai/](https://www.linkedin.com/in/darren-yucheng-tsai/)\n * E-mail:\n\n3. Peter Chen\n * LinkedIn: [https://www.linkedin.com/in/peterchentsungwei/](https://www.linkedin.com/in/peterchentsungwei/)\n * E-mail: [peterwei20700@gmail.com](mailto:peterwei20700@gmail.com)\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n",
"bugtrack_url": null,
"license": null,
"summary": "Monotone optimal binning (MOB) via PAVA with constraints, plus plotting utilities.",
"version": "2.0.0",
"project_urls": {
"Documentation": "https://github.com/ChenTaHung/Monotonic-Optimal-Binning/tree/main/docs",
"Homepage": "https://github.com/ChenTaHung/Monotonic-Optimal-Binning",
"Issues": "https://github.com/ChenTaHung/Monotonic-Optimal-Binning/issues",
"Release Notes": "https://github.com/ChenTaHung/Monotonic-Optimal-Binning/releases",
"Repository": "https://github.com/ChenTaHung/Monotonic-Optimal-Binning"
},
"split_keywords": [
"binning",
" woe",
" iv",
" pava",
" isotonic",
" credit-risk",
" monotonic",
" scorecard",
" feature-engineering"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3be7b30f6bca01ac1e00bf021f35556ee96e08c4866c7bfff2b32028a7182f9d",
"md5": "ae4af616380bdde8990a788a642d2cd6",
"sha256": "80f3e74d1e180f7e726ce29dc1dcb108603324abae1515aa285cd17d042e8eb4"
},
"downloads": -1,
"filename": "mobpy-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ae4af616380bdde8990a788a642d2cd6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 55572,
"upload_time": "2025-08-28T22:50:37",
"upload_time_iso_8601": "2025-08-28T22:50:37.833481Z",
"url": "https://files.pythonhosted.org/packages/3b/e7/b30f6bca01ac1e00bf021f35556ee96e08c4866c7bfff2b32028a7182f9d/mobpy-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "dcd8d0664f65b55fbbf8bcb5df1829be91fbc9f9a87b5c19d4f587ac2dd50f03",
"md5": "dd1e3f8c1da11ba22fcdd09ce7b7a129",
"sha256": "0d00ee795a10677de90991b84039c41161a42d5f496d45c37958721dd1a0c4ee"
},
"downloads": -1,
"filename": "mobpy-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "dd1e3f8c1da11ba22fcdd09ce7b7a129",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 422719,
"upload_time": "2025-08-28T22:50:39",
"upload_time_iso_8601": "2025-08-28T22:50:39.648536Z",
"url": "https://files.pythonhosted.org/packages/dc/d8/d0664f65b55fbbf8bcb5df1829be91fbc9f9a87b5c19d4f587ac2dd50f03/mobpy-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-28 22:50:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ChenTaHung",
"github_project": "Monotonic-Optimal-Binning",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.20.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.3.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "hypothesis",
"specs": [
[
">=",
"6.0.0"
]
]
}
],
"lcname": "mobpy"
}