triglav


Nametriglav JSON
Version 1.0.5 PyPI version JSON
download
home_page
SummaryTriglav: Iterative Refinement and Selection of Stable Features Using Shapley Values
upload_time2023-09-20 17:11:27
maintainer
docs_urlNone
authorPeter Kruczkiewicz, G. Brian Golding, Oliver Lung
requires_python>=3.8
licenseMIT License Copyright (c) 2023 Josip Rudar, Peter Kruczkiewicz, Oliver Lung, G.Brian Golding, Mehrdad Hajibabaei Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords ecology feature selection multivariate statistics stability selection
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Triglav - Feature Selection Using Iterative Refinement

[![CI](https://github.com/jrudar/Triglav/actions/workflows/ci.yml/badge.svg)](https://github.com/jrudar/Triglav/actions/workflows/ci.yml)
[![Draft PDF](https://github.com/jrudar/Triglav/actions/workflows/draft-pdf.yml/badge.svg)](https://github.com/jrudar/Triglav/actions/workflows/draft-pdf.yml)

## Overview

Triglav (named after the Slavic god of divination) attempts to discover
all relevant features using an iterative refinement approach. This
approach is based after the method introduced in Boruta with several
modifications:

1) Features are clustered and the impact of each cluster is assessed as
   the average of the Shapley scores of the features associated with
   each cluster.

2) Like Boruta, a set of shadow features is created. However, an ensemble
   of classifiers is used to measure the Shapley scores of each real feature 
   and its shadow counterpart, producing a distribution of scores. A Wilcoxon 
   signed-rank test is used to determine the significance of each cluster
   and p-values are adjusted to correct for multiple comparisons across each 
   round. Clusters with adjusted p-values below 'alpha' are considered a hit.

3) At each iteration at or over 'n_iter_fwer', two beta-binomial distributions 
   are used to determine if a cluster should be retained or not. The first
   distribution models the hit rate while the the second distribution models 
   the rejection rate. For a cluster to be successfully selected the probability 
   of a hit must be significant after correcting for multiple comparisons and
   applying a Bonferroni correction for each iteration greater than or equal
   to the 'n_iter_fwer' parameter. For a cluster to be rejected a similar round
   of reasoning applies. Clusters that are not rejected remain tentative.

4) After the iterative refinement stage SAGE scores could be used to select
   the best feature from each cluster.

While this method may not produce all features important for classification,
it does have some nice properties. First of all, by using an Extremely 
Randomized Trees model as the default, dependencies between features can be 
accounted for. Further, decision tree models are better able to partition 
the sample space. This can result in the selection of both globally optimal
and locally optimal features. Finally, this approach identifies stable clusters of 
features since only those which consistently pass the Wilcoxon signed-rank test 
are selected. This makes this approach more robust to differences in training
data.

## Install

With Conda from BioConda:

```bash
conda install -c bioconda triglav
```

From PyPI:

```bash
pip install triglav
```

From source:

```bash
git clone https://github.com/jrudar/Triglav.git
cd Triglav
pip install .
# or create a virtual environment
python -m venv venv
source venv/bin/activate
pip install .
```

## Interface

An overview of the API can be found [here](docs/API.md).

## Usage and Examples

Examples of how to use `Triglav` can be found [here](notebooks/README.md).

## Contributing

To contribute to the development of `Triglav` please read our [contributing guide](docs/CONTRIBUTING.md)

## References

Coming Soon


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "triglav",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "ecology,feature selection,multivariate statistics,stability selection",
    "author": "Peter Kruczkiewicz, G. Brian Golding, Oliver Lung",
    "author_email": "Josip Rudar <rudarj@uoguelph.ca>, Mehrdad Hajibabaei <mhajibab@uoguelph.ca>",
    "download_url": "https://files.pythonhosted.org/packages/46/f0/9f887eb37ce8dfa609b123acf613dba1baea082faea6657694798436d12d/triglav-1.0.5.tar.gz",
    "platform": null,
    "description": "# Triglav - Feature Selection Using Iterative Refinement\n\n[![CI](https://github.com/jrudar/Triglav/actions/workflows/ci.yml/badge.svg)](https://github.com/jrudar/Triglav/actions/workflows/ci.yml)\n[![Draft PDF](https://github.com/jrudar/Triglav/actions/workflows/draft-pdf.yml/badge.svg)](https://github.com/jrudar/Triglav/actions/workflows/draft-pdf.yml)\n\n## Overview\n\nTriglav (named after the Slavic god of divination) attempts to discover\nall relevant features using an iterative refinement approach. This\napproach is based after the method introduced in Boruta with several\nmodifications:\n\n1) Features are clustered and the impact of each cluster is assessed as\n   the average of the Shapley scores of the features associated with\n   each cluster.\n\n2) Like Boruta, a set of shadow features is created. However, an ensemble\n   of classifiers is used to measure the Shapley scores of each real feature \n   and its shadow counterpart, producing a distribution of scores. A Wilcoxon \n   signed-rank test is used to determine the significance of each cluster\n   and p-values are adjusted to correct for multiple comparisons across each \n   round. Clusters with adjusted p-values below 'alpha' are considered a hit.\n\n3) At each iteration at or over 'n_iter_fwer', two beta-binomial distributions \n   are used to determine if a cluster should be retained or not. The first\n   distribution models the hit rate while the the second distribution models \n   the rejection rate. For a cluster to be successfully selected the probability \n   of a hit must be significant after correcting for multiple comparisons and\n   applying a Bonferroni correction for each iteration greater than or equal\n   to the 'n_iter_fwer' parameter. For a cluster to be rejected a similar round\n   of reasoning applies. Clusters that are not rejected remain tentative.\n\n4) After the iterative refinement stage SAGE scores could be used to select\n   the best feature from each cluster.\n\nWhile this method may not produce all features important for classification,\nit does have some nice properties. First of all, by using an Extremely \nRandomized Trees model as the default, dependencies between features can be \naccounted for. Further, decision tree models are better able to partition \nthe sample space. This can result in the selection of both globally optimal\nand locally optimal features. Finally, this approach identifies stable clusters of \nfeatures since only those which consistently pass the Wilcoxon signed-rank test \nare selected. This makes this approach more robust to differences in training\ndata.\n\n## Install\n\nWith Conda from BioConda:\n\n```bash\nconda install -c bioconda triglav\n```\n\nFrom PyPI:\n\n```bash\npip install triglav\n```\n\nFrom source:\n\n```bash\ngit clone https://github.com/jrudar/Triglav.git\ncd Triglav\npip install .\n# or create a virtual environment\npython -m venv venv\nsource venv/bin/activate\npip install .\n```\n\n## Interface\n\nAn overview of the API can be found [here](docs/API.md).\n\n## Usage and Examples\n\nExamples of how to use `Triglav` can be found [here](notebooks/README.md).\n\n## Contributing\n\nTo contribute to the development of `Triglav` please read our [contributing guide](docs/CONTRIBUTING.md)\n\n## References\n\nComing Soon\n\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2023 Josip Rudar, Peter Kruczkiewicz, Oliver Lung, G.Brian Golding, Mehrdad Hajibabaei  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Triglav: Iterative Refinement and Selection of Stable Features Using Shapley Values",
    "version": "1.0.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/jrudar/Triglav/issues",
        "Homepage": "https://github.com/jrudar/Triglav",
        "Repository": "https://github.com/jrudar/Triglav.git"
    },
    "split_keywords": [
        "ecology",
        "feature selection",
        "multivariate statistics",
        "stability selection"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bfbc80e84cbafb9041faeedf7f7e1df3f30dbb993057fe6843d1dd807eb0ba8d",
                "md5": "00ba67cd96db7f6968a77e424a9d8136",
                "sha256": "5fe0b78ce3fc4a2f518a4c8903c1732d610472624bc51e1d639171955efe7156"
            },
            "downloads": -1,
            "filename": "triglav-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "00ba67cd96db7f6968a77e424a9d8136",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 13528,
            "upload_time": "2023-09-20T17:11:25",
            "upload_time_iso_8601": "2023-09-20T17:11:25.170678Z",
            "url": "https://files.pythonhosted.org/packages/bf/bc/80e84cbafb9041faeedf7f7e1df3f30dbb993057fe6843d1dd807eb0ba8d/triglav-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46f09f887eb37ce8dfa609b123acf613dba1baea082faea6657694798436d12d",
                "md5": "f5f619c1f9671fd6d83766dd25648bed",
                "sha256": "04ffbe362b910fc1636ae9179c5407d72cc2ae18109eb61ba5289906a1d9067d"
            },
            "downloads": -1,
            "filename": "triglav-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "f5f619c1f9671fd6d83766dd25648bed",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 1297282,
            "upload_time": "2023-09-20T17:11:27",
            "upload_time_iso_8601": "2023-09-20T17:11:27.076462Z",
            "url": "https://files.pythonhosted.org/packages/46/f0/9f887eb37ce8dfa609b123acf613dba1baea082faea6657694798436d12d/triglav-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-20 17:11:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jrudar",
    "github_project": "Triglav",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "triglav"
}
        
Elapsed time: 0.16598s