Name | corr-shap JSON |
Version | 0.0.2 JSON |
download | |
home_page | |
Summary | This package is an extension of the KernelExplainer of shap package that explains the output of any machine learning model, taking into account dependencies between features. |
upload_time | 2023-11-14 11:04:28 |
maintainer | |
docs_url | None |
author | |
requires_python | |
license | |
keywords | shap shapley dependent correlated corr_shap xai explainable artificial intelligence |
VCS | |
bugtrack_url | |
requirements | No requirements were recorded. |
Travis-CI | No Travis. |
coveralls test coverage | No coveralls. |
# Shapley values for correlated features This package contains an extension of the [shap package](https://github.com/shap/shap) based on the paper ['Explaining individual predictions when features are dependent: More accurate approximations to Shapley values'](https://arxiv.org/abs/1903.10464) that describes methods to more accurately approximate shapley values when features in the dataset are correlated. ## Installation To install the package with pip, simply run `pip install corr-shap` Alternatively, you can download the [corr_shap repository](https://github.com/Fraunhofer-SCAI/corr_shap) and create a conda environment with `conda env create -f environment.yml` ## Background ### SHAP SHAP (SHapley Additive exPlanations) is a method to explain the output of a machine learning model. It uses Shapley values from game theory to compute the contribution of each input feature to the output of the model. Therefore, it can help users understand the factors influencing a model's decision-making process. Since the computational effort to calculate Shapley values grows exponentially, approximation methods such as Kernel SHAP are needed. See the paper ['A Unified Approach to Interpreting Model Predictions'](http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions) by Scott M. Lundberg and Su-In Lee for more details on Kernel SHAP or their [SHAP git repo](https://github.com/shap/shap) for the implementation. ### Correlated Explainer One disadvantage of Kernel SHAP is the fact that it assumes that all features are independent. If there is a high correlation between the features, the results of Kernel SHAP can be inaccurate. Therefore, Kjersti Aas, Martin Jullum and Anders Løland propose an extension of Kernel SHAP in their paper ['Explaining individual predictions when features are dependent: More accurate approximations to Shapley values'](https://arxiv.org/abs/1903.10464). Instead of assuming feature independence, they use either a Gaussian distribution, a Gaussian copula distribution, an empirical conditional distribution, or a combination of the empirical distribution with one of the other two. This can produce more accurate results in case of dependent features. Their proposed method is implemented in the 'CorrExplainer' class. Based on the chosen sampling strategy, the CorrExplainer uses one of the distributions mentioned above or returns the same result as the Kernel Explainer (while having a faster runtime) in case the 'default' sampling strategy is chosen. In our comparisons (with [data sets 'adult', 'linear independent 60' and 'diabetes'](https://shap.readthedocs.io/en/latest/api.html#datasets)) the CorrExplainer was between 6 to 19 times faster than the Kernel Explainer. However, in its current implementation it is only suitable for the explanation of tabular data. ## Examples ### Explaining a single instance Below is a code example that shows how to use the CorrExplainer to explain a single instance of the 'adult' dataset and display the result in a bar plot. ````python from sklearn import linear_model from sklearn.model_selection import train_test_split import shap from corr_shap.CorrExplainer import CorrExplainer # load data x, y = shap.datasets.adult() # train model x_training_data, x_test_data, y_training_data, y_test_data \ = train_test_split(x, y, test_size=0.2, random_state=0) model = linear_model.LinearRegression() model.fit(x_training_data, y_training_data) # create explanation object with CorrExplainer explainer = CorrExplainer(model.predict, x_training_data, sampling="default") explanation = explainer(x_test_data[:1]) shap.plots.bar(explanation) ```` ![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/dependent_bar_plot.png) ### Explaining full 'adult' dataset To get a sense, which features are most important in the whole dataset and not just a single instance, the shap values for each feature and each sample can be visualized in the same plot. See example code [here](https://github.com/Fraunhofer-SCAI/corr_shap/examples/adult_beeswarmplot.py). ![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/adult_beeswarmplot_all_sample.png) ### Credit default data Another example with a credit default dataset from the [rivapy package](https://github.com/RIVACON/RiVaPy) with high correlation between the features 'income' and 'savings' and a model that ignores the 'savings' feature can be found [here](https://github.com/Fraunhofer-SCAI/corr_shap/examples/credit_default.py). Bar plot explaining a single instance: ![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/comparison_bar_plot.png) Summary plot explaining multiple samples: ![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/creditdefault_beeswarmplot_5000_sample.png) Further examples can be found in the [examples](https://github.com/Fraunhofer-SCAI/corr_shap/examples) folder. ## References * ['A Unified Approach to Interpreting Model Predictions'](http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions) Scott M. Lundberg, Su-In Lee * ['Explaining individual predictions when features are dependent: More accurate approximations to Shapley values'](https://arxiv.org/abs/1903.10464) Kjersti Aas, Martin Jullum and Anders Løland * [shap package](https://github.com/shap/shap) * [rivapy package](https://github.com/RIVACON/RiVaPy)
{ "_id": null, "home_page": "", "name": "corr-shap", "maintainer": "", "docs_url": null, "requires_python": "", "maintainer_email": "", "keywords": "shap,shapley,dependent,correlated,corr_shap,xai,explainable artificial intelligence", "author": "", "author_email": "", "download_url": "https://files.pythonhosted.org/packages/7c/ac/ee7829676820ee067c43c8769ea7c334c5bcef274c2465290ffaf8453fa4/corr_shap-0.0.2.tar.gz", "platform": null, "description": "# Shapley values for correlated features\r\n\r\nThis package contains an extension of the [shap package](https://github.com/shap/shap) based on the paper ['Explaining individual predictions when features are dependent: More accurate\r\napproximations to Shapley values'](https://arxiv.org/abs/1903.10464) that describes methods to more accurately approximate shapley values when features in the dataset are correlated. \r\n\r\n## Installation\r\n\r\nTo install the package with pip, simply run\r\n\r\n`pip install corr-shap`\r\n\r\nAlternatively, you can download the [corr_shap repository](https://github.com/Fraunhofer-SCAI/corr_shap) and\r\ncreate a conda environment with \r\n\r\n`conda env create -f environment.yml`\r\n\r\n## Background\r\n\r\n### SHAP\r\nSHAP (SHapley Additive exPlanations) is a method to explain the output of a machine learning model.\r\nIt uses Shapley values from game theory to compute the contribution of each input feature to the output of the model.\r\nTherefore, it can help users understand the factors influencing a model's decision-making process.\r\nSince the computational effort to calculate Shapley values grows exponentially, approximation methods such as Kernel SHAP are needed. \r\nSee the paper ['A Unified Approach to Interpreting Model Predictions'](http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions)\r\nby Scott M. Lundberg and Su-In Lee for more details on Kernel SHAP or their [SHAP git repo](https://github.com/shap/shap) for the implementation.\r\n\r\n### Correlated Explainer\r\nOne disadvantage of Kernel SHAP is the fact that it assumes that all features are independent.\r\nIf there is a high correlation between the features, the results of Kernel SHAP can be inaccurate.\r\nTherefore, Kjersti Aas, Martin Jullum and Anders L\u00f8land propose an extension of Kernel SHAP in their paper \r\n['Explaining individual predictions when features are dependent: More accurate\r\napproximations to Shapley values'](https://arxiv.org/abs/1903.10464). \r\nInstead of assuming feature independence, they use either a Gaussian distribution, \r\na Gaussian copula distribution, an empirical conditional distribution, \r\nor a combination of the empirical distribution with one of the other two.\r\nThis can produce more accurate results in case of dependent features.\r\n\r\nTheir proposed method is implemented in the 'CorrExplainer' class. Based on the chosen sampling strategy, the CorrExplainer \r\nuses one of the distributions mentioned above or returns the same result as the Kernel Explainer (while having a faster runtime) \r\nin case the 'default' sampling strategy is chosen.\r\nIn our comparisons (with [data sets 'adult', 'linear independent 60' and\r\n'diabetes'](https://shap.readthedocs.io/en/latest/api.html#datasets))\r\nthe CorrExplainer was between 6 to 19 times faster than the Kernel Explainer.\r\nHowever, in its current implementation it is only suitable for the explanation of tabular data.\r\n\r\n## Examples\r\n### Explaining a single instance\r\nBelow is a code example that shows how to use the CorrExplainer to explain a single instance\r\nof the 'adult' dataset and display the result in a bar plot.\r\n\r\n````python\r\nfrom sklearn import linear_model\r\nfrom sklearn.model_selection import train_test_split\r\nimport shap\r\nfrom corr_shap.CorrExplainer import CorrExplainer\r\n\r\n# load data\r\nx, y = shap.datasets.adult()\r\n\r\n# train model\r\nx_training_data, x_test_data, y_training_data, y_test_data \\\r\n = train_test_split(x, y, test_size=0.2, random_state=0)\r\nmodel = linear_model.LinearRegression()\r\nmodel.fit(x_training_data, y_training_data)\r\n\r\n# create explanation object with CorrExplainer\r\nexplainer = CorrExplainer(model.predict, x_training_data, sampling=\"default\")\r\nexplanation = explainer(x_test_data[:1])\r\n\r\nshap.plots.bar(explanation)\r\n````\r\n\r\n![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/dependent_bar_plot.png)\r\n\r\n### Explaining full 'adult' dataset\r\n\r\nTo get a sense, which features are most important in the whole dataset and not just a single instance, the shap values \r\nfor each feature and each sample can be visualized in the same plot.\r\nSee example code [here](https://github.com/Fraunhofer-SCAI/corr_shap/examples/adult_beeswarmplot.py).\r\n\r\n![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/adult_beeswarmplot_all_sample.png)\r\n\r\n### Credit default data\r\nAnother example with a credit default dataset from the [rivapy package](https://github.com/RIVACON/RiVaPy) with high correlation between the features 'income' and 'savings' \r\nand a model that ignores the 'savings' feature can be found [here](https://github.com/Fraunhofer-SCAI/corr_shap/examples/credit_default.py).\r\n\r\nBar plot explaining a single instance:\r\n![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/comparison_bar_plot.png)\r\n\r\nSummary plot explaining multiple samples:\r\n![plot](https://github.com/Fraunhofer-SCAI/corr_shap/images/creditdefault_beeswarmplot_5000_sample.png)\r\n\r\nFurther examples can be found in the [examples](https://github.com/Fraunhofer-SCAI/corr_shap/examples) folder.\r\n\r\n## References\r\n* ['A Unified Approach to Interpreting Model Predictions'](http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions) \r\nScott M. Lundberg, Su-In Lee\r\n* ['Explaining individual predictions when features are dependent: More accurate approximations to Shapley values'](https://arxiv.org/abs/1903.10464) \r\nKjersti Aas, Martin Jullum and Anders L\u00f8land\r\n* [shap package](https://github.com/shap/shap)\r\n* [rivapy package](https://github.com/RIVACON/RiVaPy)\r\n", "bugtrack_url": null, "license": "", "summary": "This package is an extension of the KernelExplainer of shap package that explains the output of any machine learning model, taking into account dependencies between features.", "version": "0.0.2", "project_urls": null, "split_keywords": [ "shap", "shapley", "dependent", "correlated", "corr_shap", "xai", "explainable artificial intelligence" ], "urls": [ { "comment_text": "", "digests": { "blake2b_256": "67e9cebcb7672a9b1c6f927d8d313749aa8ac877e560ab39304f3d20580c0072", "md5": "5e2f202d9098df3dfc6f7947ec2e357a", "sha256": "8655163760dffa13a0afa3b6a1e550c01def4a4427ab25a708627013f21e15be" }, "downloads": -1, "filename": "corr_shap-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "5e2f202d9098df3dfc6f7947ec2e357a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17924, "upload_time": "2023-11-14T11:04:27", "upload_time_iso_8601": "2023-11-14T11:04:27.242478Z", "url": "https://files.pythonhosted.org/packages/67/e9/cebcb7672a9b1c6f927d8d313749aa8ac877e560ab39304f3d20580c0072/corr_shap-0.0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "7cacee7829676820ee067c43c8769ea7c334c5bcef274c2465290ffaf8453fa4", "md5": "6d92ad9a8b5b54e1c6266ef36dfe8b9f", "sha256": "6fe2d174619b1977e1fe406b5c3a6000d54c797ed76c319552cb00560a5bf748" }, "downloads": -1, "filename": "corr_shap-0.0.2.tar.gz", "has_sig": false, "md5_digest": "6d92ad9a8b5b54e1c6266ef36dfe8b9f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17238, "upload_time": "2023-11-14T11:04:28", "upload_time_iso_8601": "2023-11-14T11:04:28.772278Z", "url": "https://files.pythonhosted.org/packages/7c/ac/ee7829676820ee067c43c8769ea7c334c5bcef274c2465290ffaf8453fa4/corr_shap-0.0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "upload_time": "2023-11-14 11:04:28", "github": false, "gitlab": false, "bitbucket": false, "codeberg": false, "lcname": "corr-shap" }