## Statistical IV
Our J-Divergence test is under the next null hypothesis
H<sub>0</sub>: The predictive power of the variable is not significant.
The null hypothesis is tested using a two-tailed distribution, and this should be taken into consideration when interpreting the p-value.
### Explanation
Optimize your machine learning models with 'Statistical-IV'. Perform automated feature selection based on statistics and customize error control.
0. **Import package**
```python
from statistical_iv import api
```
1. **Provide a DataFrame as Input:**
- Supply a DataFrame `df` containing your data for IV calculation.
2. **Specify Predictor Variables:**
- Prived a list of predictor variable names (`variables_names`) to analyze.
3. **Define the Target Variable:**
- Specify the name of the target variable (`var_y`) in your DataFrame.
4. **Indicate Variable Types:**
- Define the type of your predictor variables as 'categorical' or 'numerical' using the `type_vars` parameter.
5. **Optional: Set Maximum Bins:**
- Adjust the maximum number of bins for discretization (optional) using the `max_bins` parameter.
6. **Call the `statistical_iv` Function:**
- Calculate Statistical IV information by calling the `statistical_iv` function from api with the specified parameters (That is used for OptimalBinning package).
```python
result_df = api.statistical_iv(df, variables_names, var_y, type_vars, max_bins)
#### Example Result:

### Full Paper:
For a comprehensive exploration of the topic, we recommend perusing the contents of the article available at [this link](https://arxiv.org/abs/2309.13183).
Raw data
{
"_id": null,
"home_page": "https://github.com/Nicerova7/statistical_iv",
"name": "statistical-iv",
"maintainer": "Nilton Rojas Vales",
"docs_url": null,
"requires_python": "",
"maintainer_email": "nrojasv@uni.pe",
"keywords": "information_value,woe,data science,hypothesis test",
"author": "Nilton Rojas, Helder Rojas",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/de/d1/244638ac52e775acc91b1bbbb7eeca9716d5469d525ec594728c610e726f/statistical_iv-0.3.1.tar.gz",
"platform": null,
"description": "## Statistical IV\r\n\r\n\r\nOur J-Divergence test is under the next null hypothesis\r\n\r\nH<sub>0</sub>: The predictive power of the variable is not significant.\r\n\r\nThe null hypothesis is tested using a two-tailed distribution, and this should be taken into consideration when interpreting the p-value.\r\n\r\n### Explanation\r\n\r\nOptimize your machine learning models with 'Statistical-IV'. Perform automated feature selection based on statistics and customize error control.\r\n\r\n\r\n0. **Import package**\r\n ```python\r\n from statistical_iv import api\r\n ```\r\n \r\n1. **Provide a DataFrame as Input:**\r\n - Supply a DataFrame `df` containing your data for IV calculation.\r\n\r\n2. **Specify Predictor Variables:**\r\n - Prived a list of predictor variable names (`variables_names`) to analyze.\r\n\r\n3. **Define the Target Variable:**\r\n - Specify the name of the target variable (`var_y`) in your DataFrame.\r\n\r\n4. **Indicate Variable Types:**\r\n - Define the type of your predictor variables as 'categorical' or 'numerical' using the `type_vars` parameter.\r\n\r\n5. **Optional: Set Maximum Bins:**\r\n - Adjust the maximum number of bins for discretization (optional) using the `max_bins` parameter.\r\n\r\n6. **Call the `statistical_iv` Function:**\r\n - Calculate Statistical IV information by calling the `statistical_iv` function from api with the specified parameters (That is used for OptimalBinning package).\r\n\r\n ```python\r\n result_df = api.statistical_iv(df, variables_names, var_y, type_vars, max_bins)\r\n\r\n#### Example Result:\r\n\r\n\r\n\r\n\r\n### Full Paper:\r\n\r\nFor a comprehensive exploration of the topic, we recommend perusing the contents of the article available at [this link](https://arxiv.org/abs/2309.13183).\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Statistical IV: Statistical Hypothesis Testing for the Information Value (IV). Evaluation of the predictive power of features using the IV with specific thresholds for each dataset.",
"version": "0.3.1",
"project_urls": {
"Homepage": "https://github.com/Nicerova7/statistical_iv"
},
"split_keywords": [
"information_value",
"woe",
"data science",
"hypothesis test"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bf931674c8eb97ca9666b68e1699023cc02b695a84904fedcf739561eb24655a",
"md5": "8ecc9e9e3f6dd9520ab5330cf97c2203",
"sha256": "9fc5c1e2f27c86efb1ae21c33a51ee88395b70d8aaf4d5b5aa5698b558560376"
},
"downloads": -1,
"filename": "statistical_iv-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8ecc9e9e3f6dd9520ab5330cf97c2203",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5766,
"upload_time": "2023-10-10T00:33:12",
"upload_time_iso_8601": "2023-10-10T00:33:12.351807Z",
"url": "https://files.pythonhosted.org/packages/bf/93/1674c8eb97ca9666b68e1699023cc02b695a84904fedcf739561eb24655a/statistical_iv-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ded1244638ac52e775acc91b1bbbb7eeca9716d5469d525ec594728c610e726f",
"md5": "57dbb64b72453d92c745a9bcf88c15ff",
"sha256": "1e02834880a8b1841147553b92998c4ff2d21d91b0d95316f4b863f07a66ca45"
},
"downloads": -1,
"filename": "statistical_iv-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "57dbb64b72453d92c745a9bcf88c15ff",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5355,
"upload_time": "2023-10-10T00:33:14",
"upload_time_iso_8601": "2023-10-10T00:33:14.099495Z",
"url": "https://files.pythonhosted.org/packages/de/d1/244638ac52e775acc91b1bbbb7eeca9716d5469d525ec594728c610e726f/statistical_iv-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-10 00:33:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Nicerova7",
"github_project": "statistical_iv",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "statistical-iv"
}