statistical-iv


Namestatistical-iv JSON
Version 0.3.1 PyPI version JSON
download
home_pagehttps://github.com/Nicerova7/statistical_iv
SummaryStatistical IV: Statistical Hypothesis Testing for the Information Value (IV). Evaluation of the predictive power of features using the IV with specific thresholds for each dataset.
upload_time2023-10-10 00:33:14
maintainerNilton Rojas Vales
docs_urlNone
authorNilton Rojas, Helder Rojas
requires_python
licenseMIT
keywords information_value woe data science hypothesis test
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Statistical IV


Our J-Divergence test is under the next null hypothesis

H<sub>0</sub>: The predictive power of the variable is not significant.

The null hypothesis is tested using a two-tailed distribution, and this should be taken into consideration when interpreting the p-value.

### Explanation

Optimize your machine learning models with 'Statistical-IV'. Perform automated feature selection based on statistics and customize error control.


0. **Import package**
   ```python
   from statistical_iv import api
   ```
   
1. **Provide a DataFrame as Input:**
   - Supply a DataFrame `df` containing your data for IV calculation.

2. **Specify Predictor Variables:**
   - Prived a list of predictor variable names (`variables_names`) to analyze.

3. **Define the Target Variable:**
   - Specify the name of the target variable (`var_y`) in your DataFrame.

4. **Indicate Variable Types:**
   - Define the type of your predictor variables as 'categorical' or 'numerical' using the `type_vars` parameter.

5. **Optional: Set Maximum Bins:**
   - Adjust the maximum number of bins for discretization (optional) using the `max_bins` parameter.

6. **Call the `statistical_iv` Function:**
   - Calculate Statistical IV information by calling the `statistical_iv` function from api with the specified parameters (That is used for OptimalBinning package).

   ```python
   result_df = api.statistical_iv(df, variables_names, var_y, type_vars, max_bins)

#### Example Result:

![Output Example](https://github.com/Nicerova7/statistical_iv/blob/main/images/output_example.png?raw=true)


### Full Paper:

For a comprehensive exploration of the topic, we recommend perusing the contents of the article available at [this link](https://arxiv.org/abs/2309.13183).


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Nicerova7/statistical_iv",
    "name": "statistical-iv",
    "maintainer": "Nilton Rojas Vales",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "nrojasv@uni.pe",
    "keywords": "information_value,woe,data science,hypothesis test",
    "author": "Nilton Rojas, Helder Rojas",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/de/d1/244638ac52e775acc91b1bbbb7eeca9716d5469d525ec594728c610e726f/statistical_iv-0.3.1.tar.gz",
    "platform": null,
    "description": "## Statistical IV\r\n\r\n\r\nOur J-Divergence test is under the next null hypothesis\r\n\r\nH<sub>0</sub>: The predictive power of the variable is not significant.\r\n\r\nThe null hypothesis is tested using a two-tailed distribution, and this should be taken into consideration when interpreting the p-value.\r\n\r\n### Explanation\r\n\r\nOptimize your machine learning models with 'Statistical-IV'. Perform automated feature selection based on statistics and customize error control.\r\n\r\n\r\n0. **Import package**\r\n   ```python\r\n   from statistical_iv import api\r\n   ```\r\n   \r\n1. **Provide a DataFrame as Input:**\r\n   - Supply a DataFrame `df` containing your data for IV calculation.\r\n\r\n2. **Specify Predictor Variables:**\r\n   - Prived a list of predictor variable names (`variables_names`) to analyze.\r\n\r\n3. **Define the Target Variable:**\r\n   - Specify the name of the target variable (`var_y`) in your DataFrame.\r\n\r\n4. **Indicate Variable Types:**\r\n   - Define the type of your predictor variables as 'categorical' or 'numerical' using the `type_vars` parameter.\r\n\r\n5. **Optional: Set Maximum Bins:**\r\n   - Adjust the maximum number of bins for discretization (optional) using the `max_bins` parameter.\r\n\r\n6. **Call the `statistical_iv` Function:**\r\n   - Calculate Statistical IV information by calling the `statistical_iv` function from api with the specified parameters (That is used for OptimalBinning package).\r\n\r\n   ```python\r\n   result_df = api.statistical_iv(df, variables_names, var_y, type_vars, max_bins)\r\n\r\n#### Example Result:\r\n\r\n![Output Example](https://github.com/Nicerova7/statistical_iv/blob/main/images/output_example.png?raw=true)\r\n\r\n\r\n### Full Paper:\r\n\r\nFor a comprehensive exploration of the topic, we recommend perusing the contents of the article available at [this link](https://arxiv.org/abs/2309.13183).\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Statistical IV: Statistical Hypothesis Testing for the Information Value (IV). Evaluation of the predictive power of features using the IV with specific thresholds for each dataset.",
    "version": "0.3.1",
    "project_urls": {
        "Homepage": "https://github.com/Nicerova7/statistical_iv"
    },
    "split_keywords": [
        "information_value",
        "woe",
        "data science",
        "hypothesis test"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bf931674c8eb97ca9666b68e1699023cc02b695a84904fedcf739561eb24655a",
                "md5": "8ecc9e9e3f6dd9520ab5330cf97c2203",
                "sha256": "9fc5c1e2f27c86efb1ae21c33a51ee88395b70d8aaf4d5b5aa5698b558560376"
            },
            "downloads": -1,
            "filename": "statistical_iv-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8ecc9e9e3f6dd9520ab5330cf97c2203",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5766,
            "upload_time": "2023-10-10T00:33:12",
            "upload_time_iso_8601": "2023-10-10T00:33:12.351807Z",
            "url": "https://files.pythonhosted.org/packages/bf/93/1674c8eb97ca9666b68e1699023cc02b695a84904fedcf739561eb24655a/statistical_iv-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ded1244638ac52e775acc91b1bbbb7eeca9716d5469d525ec594728c610e726f",
                "md5": "57dbb64b72453d92c745a9bcf88c15ff",
                "sha256": "1e02834880a8b1841147553b92998c4ff2d21d91b0d95316f4b863f07a66ca45"
            },
            "downloads": -1,
            "filename": "statistical_iv-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "57dbb64b72453d92c745a9bcf88c15ff",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5355,
            "upload_time": "2023-10-10T00:33:14",
            "upload_time_iso_8601": "2023-10-10T00:33:14.099495Z",
            "url": "https://files.pythonhosted.org/packages/de/d1/244638ac52e775acc91b1bbbb7eeca9716d5469d525ec594728c610e726f/statistical_iv-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-10 00:33:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Nicerova7",
    "github_project": "statistical_iv",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "statistical-iv"
}
        
Elapsed time: 0.12033s