Name | triglav JSON |
Version |
1.0.7
JSON |
| download |
home_page | None |
Summary | Triglav: Iterative Refinement and Selection of Stable Features Using Shapley Values |
upload_time | 2024-05-14 14:48:22 |
maintainer | None |
docs_url | None |
author | Peter Kruczkiewicz, G. Brian Golding, Oliver Lung |
requires_python | >=3.10 |
license | MIT License Copyright (c) 2023 Josip Rudar, Peter Kruczkiewicz, Oliver Lung, G.Brian Golding, Mehrdad Hajibabaei Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
ecology
feature selection
multivariate statistics
stability selection
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Triglav - Feature Selection Using Iterative Refinement
[![CI](https://github.com/jrudar/Triglav/actions/workflows/ci.yml/badge.svg)](https://github.com/jrudar/Triglav/actions/workflows/ci.yml)
## Overview
Triglav (named after the Slavic god of divination) attempts to discover
all relevant features using an iterative refinement approach. This
approach is based after the method introduced in Boruta with several
modifications:
1) Features are clustered and the impact of each cluster is assessed as
the average of the Shapley scores of the features associated with
each cluster.
2) Like Boruta, a set of shadow features is created. However, an ensemble
of classifiers is used to measure the Shapley scores of each real feature
and its shadow counterpart, producing a distribution of scores. A Wilcoxon
signed-rank test is used to determine the significance of each cluster
and p-values are adjusted to correct for multiple comparisons across each
round. Clusters with adjusted p-values below 'alpha' are considered a hit.
3) At each iteration at or over 'n_iter_fwer', two beta-binomial distributions
are used to determine if a cluster should be retained or not. The first
distribution models the hit rate while the the second distribution models
the rejection rate. For a cluster to be successfully selected the probability
of a hit must be significant after correcting for multiple comparisons and
applying a Bonferroni correction for each iteration greater than or equal
to the 'n_iter_fwer' parameter. For a cluster to be rejected a similar round
of reasoning applies. Clusters that are not rejected remain tentative.
4) After the iterative refinement stage SAGE scores could be used to select
the best feature from each cluster.
While this method may not produce all features important for classification,
it does have some nice properties. First of all, by using an Extremely
Randomized Trees model as the default, dependencies between features can be
accounted for. Further, decision tree models are better able to partition
the sample space. This can result in the selection of both globally optimal
and locally optimal features. Finally, this approach identifies stable clusters of
features since only those which consistently pass the Wilcoxon signed-rank test
are selected. This makes this approach more robust to differences in training
data.
## Install
With Conda from BioConda:
```bash
conda install -c bioconda triglav
```
From PyPI:
```bash
pip install triglav
```
From source:
```bash
git clone https://github.com/jrudar/Triglav.git
cd Triglav
pip install .
# or create a virtual environment
python -m venv venv
source venv/bin/activate
pip install .
```
## Interface
An overview of the API can be found [here](docs/API.md).
## Usage and Examples
Examples of how to use `Triglav` can be found [here](notebooks/README.md).
## Contributing
To contribute to the development of `Triglav` please read our [contributing guide](docs/CONTRIBUTING.md)
## References
Coming Soon
Raw data
{
"_id": null,
"home_page": null,
"name": "triglav",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "ecology, feature selection, multivariate statistics, stability selection",
"author": "Peter Kruczkiewicz, G. Brian Golding, Oliver Lung",
"author_email": "Josip Rudar <joe.rudar@inspection.gc.ca>, Mehrdad Hajibabaei <mhajibab@uoguelph.ca>",
"download_url": "https://files.pythonhosted.org/packages/8d/04/ab8f17d720f60c13121113769b4fe3361b308596af78d9bdc420c88cfaef/triglav-1.0.7.tar.gz",
"platform": null,
"description": "# Triglav - Feature Selection Using Iterative Refinement\n\n[![CI](https://github.com/jrudar/Triglav/actions/workflows/ci.yml/badge.svg)](https://github.com/jrudar/Triglav/actions/workflows/ci.yml)\n\n## Overview\n\nTriglav (named after the Slavic god of divination) attempts to discover\nall relevant features using an iterative refinement approach. This\napproach is based after the method introduced in Boruta with several\nmodifications:\n\n1) Features are clustered and the impact of each cluster is assessed as\n the average of the Shapley scores of the features associated with\n each cluster.\n\n2) Like Boruta, a set of shadow features is created. However, an ensemble\n of classifiers is used to measure the Shapley scores of each real feature \n and its shadow counterpart, producing a distribution of scores. A Wilcoxon \n signed-rank test is used to determine the significance of each cluster\n and p-values are adjusted to correct for multiple comparisons across each \n round. Clusters with adjusted p-values below 'alpha' are considered a hit.\n\n3) At each iteration at or over 'n_iter_fwer', two beta-binomial distributions \n are used to determine if a cluster should be retained or not. The first\n distribution models the hit rate while the the second distribution models \n the rejection rate. For a cluster to be successfully selected the probability \n of a hit must be significant after correcting for multiple comparisons and\n applying a Bonferroni correction for each iteration greater than or equal\n to the 'n_iter_fwer' parameter. For a cluster to be rejected a similar round\n of reasoning applies. Clusters that are not rejected remain tentative.\n\n4) After the iterative refinement stage SAGE scores could be used to select\n the best feature from each cluster.\n\nWhile this method may not produce all features important for classification,\nit does have some nice properties. First of all, by using an Extremely \nRandomized Trees model as the default, dependencies between features can be \naccounted for. Further, decision tree models are better able to partition \nthe sample space. This can result in the selection of both globally optimal\nand locally optimal features. Finally, this approach identifies stable clusters of \nfeatures since only those which consistently pass the Wilcoxon signed-rank test \nare selected. This makes this approach more robust to differences in training\ndata.\n\n## Install\n\nWith Conda from BioConda:\n\n```bash\nconda install -c bioconda triglav\n```\n\nFrom PyPI:\n\n```bash\npip install triglav\n```\n\nFrom source:\n\n```bash\ngit clone https://github.com/jrudar/Triglav.git\ncd Triglav\npip install .\n# or create a virtual environment\npython -m venv venv\nsource venv/bin/activate\npip install .\n```\n\n## Interface\n\nAn overview of the API can be found [here](docs/API.md).\n\n## Usage and Examples\n\nExamples of how to use `Triglav` can be found [here](notebooks/README.md).\n\n## Contributing\n\nTo contribute to the development of `Triglav` please read our [contributing guide](docs/CONTRIBUTING.md)\n\n## References\n\nComing Soon\n\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2023 Josip Rudar, Peter Kruczkiewicz, Oliver Lung, G.Brian Golding, Mehrdad Hajibabaei Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "Triglav: Iterative Refinement and Selection of Stable Features Using Shapley Values",
"version": "1.0.7",
"project_urls": {
"Bug Tracker": "https://github.com/jrudar/Triglav/issues",
"Homepage": "https://github.com/jrudar/Triglav",
"Repository": "https://github.com/jrudar/Triglav.git"
},
"split_keywords": [
"ecology",
" feature selection",
" multivariate statistics",
" stability selection"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "700b9ef376f8a39f95a48971ee20ee8cc8805c6c82f74115b37134871736b3fe",
"md5": "71ec7da40c169d3cf1d4c56bfd5734b4",
"sha256": "cc95a1d81b677b8c27a8ebb7cc92022c453ce8e354c3a091c7f9ceb3696b4677"
},
"downloads": -1,
"filename": "triglav-1.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "71ec7da40c169d3cf1d4c56bfd5734b4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 13576,
"upload_time": "2024-05-14T14:48:20",
"upload_time_iso_8601": "2024-05-14T14:48:20.802890Z",
"url": "https://files.pythonhosted.org/packages/70/0b/9ef376f8a39f95a48971ee20ee8cc8805c6c82f74115b37134871736b3fe/triglav-1.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8d04ab8f17d720f60c13121113769b4fe3361b308596af78d9bdc420c88cfaef",
"md5": "d0085f5f1790de178b843ce356e02242",
"sha256": "4d0b12a5eae2a80c7c816aabe11409c6d1cc7f0f6051e33af87b345c8a7fe340"
},
"downloads": -1,
"filename": "triglav-1.0.7.tar.gz",
"has_sig": false,
"md5_digest": "d0085f5f1790de178b843ce356e02242",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 1296855,
"upload_time": "2024-05-14T14:48:22",
"upload_time_iso_8601": "2024-05-14T14:48:22.732260Z",
"url": "https://files.pythonhosted.org/packages/8d/04/ab8f17d720f60c13121113769b4fe3361b308596af78d9bdc420c88cfaef/triglav-1.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-14 14:48:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jrudar",
"github_project": "Triglav",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "triglav"
}