# WoE-IV-Bin Toolkit
## Overview
The WoE-IV-Bin Toolkit is a comprehensive Python library designed to streamline the analysis and optimization of categorical variables through the calculation of Weight of Evidence (WoE) and Information Value (IV), along with enhanced binning strategies for continuous features. This toolkit empowers data scientists and analysts to uncover valuable insights, optimize feature engineering, and improve predictive modeling accuracy.
## Concept
**Information Value (IV)**
`IV` refers to a measure that quantifies the predictive power of an individual feature (independent variable) in relation to the target variable. It is calculated by taking the sum of the products of the differences in the proportion of goods and bads (or events and non-events) and the `Weight of Evidence (WoE)` for each category or bin of a feature.
`IV` can be used to rank features in terms of their importance or strength of association with the outcome variable. The value of IV helps in determining how well a feature is able to distinguish between the target variable's classes (for example, default vs. non-default in credit scoring).
The `IV` of a feature is a single scalar value. The interpretation of IV values is often guided by `rules of thumb`, indicating the `predictive strength of the feature`:
- IV < 0.02: **Not useful for prediction**
- 0.02 =< IV < 0.1: **Weak predictive power**
- 0.1 =< IV < 0.3: **Medium predictive power**
- 0.3 =< IV: **Strong predictive power**
## Features
- **WoE and IV Calculation**: Effortlessly compute WoE and IV for categorical variables, enabling deeper understanding of the predictive strength of each category within the variable.
- **Continuous Binning Optimization**: Dynamically optimize binning strategies for continuous features by maximizing IV, facilitating superior feature engineering and model performance.
- **Flexible and Intuitive**: Seamlessly integrate the toolkit into existing workflows, with intuitive functions for easy implementation and customization according to specific analysis requirements.
- **Scalable Algorithms**: Utilize scalable algorithms and optimized code to handle large datasets efficiently, ensuring fast computation even with extensive data volumes.
- **Empirical Insights**: Gain actionable insights into dataset characteristics and predictive patterns through comprehensive WoE, IV, and binning analysis.
## Workflow
<img src="https://github.com/knowusuboaky/woe-iv-bin/blob/main/README_files/figure-markdown/mermaid-figure-1.png?raw=true" width="730" height="1615" alt="Optional Alt Text">
## Installation
You can install the WoE-IV-Bin Toolkit via pip:
``` bash
pip install woe_iv_bin==0.1.2
```
## Load Package
### Categorical WoE and IV Calculation
``` bash
from woe_iv_bin import categorical_woe
```
### Continuous WoE, IV Calculation and Binning Optimization
``` bash
from woe_iv_bin import continuous_woe
```
## Usage
### Categorical WoE and IV Calculation
To calculate WoE and IV for categorical variables, use the `categorical_woe` function
``` bash
from woe_iv_bin import categorical_woe
woe_results = categorical_woe(df,
cat_variable_name='cat_feature',
y_df=df['target'])
print(woe_results)
```
### Continuous WoE, IV Calculation and Binning Optimization
For optimizing binning strategies for continuous features, use the `continuous_woe` function:
``` bash
from woe_iv_bin import continuous_woe
df_binned, optimized_bins, binning_woe_results = continuous_woe(df,
feature = 'cont_feature',
target = 'target',
max_bins=20,
min_samples_bin=0.05)
print(df_binned)
print(optimized_bins)
print(binning_woe_results)
```
## Ideal Uses
- **Credit Risk Assessment**: Evaluate the predictive power of categorical variables such as income brackets or loan types using WoE and IV analysis to inform credit risk assessment models.
- **Customer Segmentation**: Optimize binning strategies for continuous features like age or purchase amount to uncover meaningful customer segments with distinct behaviors and characteristics.
- **Marketing Effectiveness**: Assess the effectiveness of marketing campaigns by analyzing WoE and IV of categorical variables such as campaign channels or customer segments.
- **Insurance Underwriting**: Enhance risk assessment models by optimizing binning for continuous variables like property value or policy coverage amount.
## Contributing
Contributions to the WoE-IV-Bin Toolkit are highly appreciated! Whether it's bug fixes, feature enhancements, or documentation improvements, your contributions can help make the toolkit even more powerful and user-friendly for the community. Feel free to open issues, submit pull requests, or suggest new features on the project's GitHub repository.
## Documentation & Examples
For documentation and usage examples, visit the GitHub repository: https://github.com/knowusuboaky/woe-iv-bin
**Author**: Kwadwo Daddy Nyame Owusu - Boakye\
**Email**: kwadwo.owusuboakye@outlook.com\
**License**: MIT
Raw data
{
"_id": null,
"home_page": "https://github.com/knowusuboaky/woe-iv-bin",
"name": "woe-iv-bin",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "woe, iv, binning, feature-engineering, weight of evidence, optimization, scoring, predictive-modeling",
"author": "Kwadwo Daddy Nyame Owusu - Boakye",
"author_email": "kwadwo.owusuboakye@outlook.com",
"download_url": "https://files.pythonhosted.org/packages/58/5f/0cfbd22890aa497d591bc931338438d7387ec046b93e0076624ff1244814/woe_iv_bin-0.1.2.tar.gz",
"platform": null,
"description": "# WoE-IV-Bin Toolkit\r\n\r\n## Overview\r\nThe WoE-IV-Bin Toolkit is a comprehensive Python library designed to streamline the analysis and optimization of categorical variables through the calculation of Weight of Evidence (WoE) and Information Value (IV), along with enhanced binning strategies for continuous features. This toolkit empowers data scientists and analysts to uncover valuable insights, optimize feature engineering, and improve predictive modeling accuracy.\r\n\r\n## Concept \r\n**Information Value (IV)**\r\n`IV` refers to a measure that quantifies the predictive power of an individual feature (independent variable) in relation to the target variable. It is calculated by taking the sum of the products of the differences in the proportion of goods and bads (or events and non-events) and the `Weight of Evidence (WoE)` for each category or bin of a feature.\r\n\r\n`IV` can be used to rank features in terms of their importance or strength of association with the outcome variable. The value of IV helps in determining how well a feature is able to distinguish between the target variable's classes (for example, default vs. non-default in credit scoring).\r\n\r\nThe `IV` of a feature is a single scalar value. The interpretation of IV values is often guided by `rules of thumb`, indicating the `predictive strength of the feature`:\r\n- IV < 0.02: **Not useful for prediction**\r\n- 0.02 =< IV < 0.1: **Weak predictive power**\r\n- 0.1 =< IV < 0.3: **Medium predictive power**\r\n- 0.3 =< IV: **Strong predictive power**\r\n\r\n## Features\r\n- **WoE and IV Calculation**: Effortlessly compute WoE and IV for categorical variables, enabling deeper understanding of the predictive strength of each category within the variable.\r\n- **Continuous Binning Optimization**: Dynamically optimize binning strategies for continuous features by maximizing IV, facilitating superior feature engineering and model performance.\r\n- **Flexible and Intuitive**: Seamlessly integrate the toolkit into existing workflows, with intuitive functions for easy implementation and customization according to specific analysis requirements.\r\n- **Scalable Algorithms**: Utilize scalable algorithms and optimized code to handle large datasets efficiently, ensuring fast computation even with extensive data volumes.\r\n- **Empirical Insights**: Gain actionable insights into dataset characteristics and predictive patterns through comprehensive WoE, IV, and binning analysis.\r\n\r\n## Workflow\r\n\r\n<img src=\"https://github.com/knowusuboaky/woe-iv-bin/blob/main/README_files/figure-markdown/mermaid-figure-1.png?raw=true\" width=\"730\" height=\"1615\" alt=\"Optional Alt Text\">\r\n\r\n## Installation\r\n\r\nYou can install the WoE-IV-Bin Toolkit via pip:\r\n\r\n``` bash\r\n\r\npip install woe_iv_bin==0.1.2\r\n```\r\n\r\n## Load Package\r\n### Categorical WoE and IV Calculation\r\n``` bash\r\n\r\nfrom woe_iv_bin import categorical_woe\r\n```\r\n\r\n### Continuous WoE, IV Calculation and Binning Optimization\r\n``` bash\r\n\r\nfrom woe_iv_bin import continuous_woe\r\n```\r\n\r\n## Usage \r\n### Categorical WoE and IV Calculation\r\n\r\nTo calculate WoE and IV for categorical variables, use the `categorical_woe` function\r\n\r\n``` bash\r\nfrom woe_iv_bin import categorical_woe\r\n\r\nwoe_results = categorical_woe(df, \r\n cat_variable_name='cat_feature', \r\n y_df=df['target'])\r\nprint(woe_results)\r\n```\r\n\r\n### Continuous WoE, IV Calculation and Binning Optimization\r\n\r\nFor optimizing binning strategies for continuous features, use the `continuous_woe` function:\r\n\r\n``` bash\r\nfrom woe_iv_bin import continuous_woe\r\n\r\ndf_binned, optimized_bins, binning_woe_results = continuous_woe(df, \r\n feature = 'cont_feature', \r\n target = 'target', \r\n max_bins=20, \r\n min_samples_bin=0.05)\r\nprint(df_binned)\r\nprint(optimized_bins)\r\nprint(binning_woe_results)\r\n```\r\n\r\n## Ideal Uses\r\n- **Credit Risk Assessment**: Evaluate the predictive power of categorical variables such as income brackets or loan types using WoE and IV analysis to inform credit risk assessment models.\r\n\r\n- **Customer Segmentation**: Optimize binning strategies for continuous features like age or purchase amount to uncover meaningful customer segments with distinct behaviors and characteristics.\r\n\r\n- **Marketing Effectiveness**: Assess the effectiveness of marketing campaigns by analyzing WoE and IV of categorical variables such as campaign channels or customer segments.\r\n\r\n- **Insurance Underwriting**: Enhance risk assessment models by optimizing binning for continuous variables like property value or policy coverage amount.\r\n\r\n## Contributing\r\nContributions to the WoE-IV-Bin Toolkit are highly appreciated! Whether it's bug fixes, feature enhancements, or documentation improvements, your contributions can help make the toolkit even more powerful and user-friendly for the community. Feel free to open issues, submit pull requests, or suggest new features on the project's GitHub repository.\r\n\r\n## Documentation & Examples\r\nFor documentation and usage examples, visit the GitHub repository: https://github.com/knowusuboaky/woe-iv-bin\r\n\r\n**Author**: Kwadwo Daddy Nyame Owusu - Boakye\\\r\n**Email**: kwadwo.owusuboakye@outlook.com\\\r\n**License**: MIT\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python Library for WoE, IV Calculation, and Continuous Binning Optimization.",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/knowusuboaky/woe-iv-bin"
},
"split_keywords": [
"woe",
" iv",
" binning",
" feature-engineering",
" weight of evidence",
" optimization",
" scoring",
" predictive-modeling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "585f0cfbd22890aa497d591bc931338438d7387ec046b93e0076624ff1244814",
"md5": "136c31789fe3ad09efcdbf9f38804cb7",
"sha256": "5ed6274cbf0641a5b281165ae04adafd9208c7176f6c0b9dab5e5152b4b2f7a1"
},
"downloads": -1,
"filename": "woe_iv_bin-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "136c31789fe3ad09efcdbf9f38804cb7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 6575,
"upload_time": "2024-09-25T20:40:27",
"upload_time_iso_8601": "2024-09-25T20:40:27.621480Z",
"url": "https://files.pythonhosted.org/packages/58/5f/0cfbd22890aa497d591bc931338438d7387ec046b93e0076624ff1244814/woe_iv_bin-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-25 20:40:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "knowusuboaky",
"github_project": "woe-iv-bin",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "woe-iv-bin"
}