featureranker


Namefeatureranker JSON
Version 1.1.2 PyPI version JSON
download
home_pagehttps://github.com/lhallee/feature-ranker
SummaryFeature ranking ensemble
upload_time2024-01-23 04:13:55
maintainer
docs_urlNone
authorLogan Hallee
requires_python>=3.7,<4.0
licenseCC-BY-NC-SA-4.0
keywords feature ranker sklearn
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FEATURE RANKER
featureranker is a lightweight Python package for the feature ranking ensemble developed by Logan Hallee, featured in the following works:

[Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life](https://www.nature.com/articles/s41598-023-28965-7)

[cdsBERT - Extending Protein Language Models with Codon Awareness](https://www.biorxiv.org/content/10.1101/2023.09.15.558027v1.abstract)

[Exploring Phylogenetic Classification and Further Applications of Codon Usage Frequencies](https://www.biorxiv.org/content/10.1101/2022.07.20.500846v1.abstract)

The ensemble utilizes l1 penalization, random forests, extreme gradient boosting, ANOVA F values, and mutual information to effectively rank the importance of features for regression and classification tasks. Scoring lists are concatenated with a weighted voting scheme.

## Usage

Install
```
!pip install featureranker
```
Imports

```
from featureranker.utils import *
from featureranker.plots import *
from featureranker.rankers import *

import pandas as pd
from sklearn.datasets import load_diabetes, load_breast_cancer
import warnings
warnings.filterwarnings('ignore')
```
Regression example (diabetes dataset)
```
diabetes = load_diabetes(as_frame=True)
df = diabetes.data.merge(diabetes.target, left_index=True, right_index=True)
view_data(df)
X, y = get_data(df, labels='target')
rankings = regression_ranking(X, y, predict=False)
scoring = voting(rankings)
plot_rankings(rankings, title='Regression example all methods')
plot_after_vote(scoring, title='Regression example full ensemble')
```
![image](https://github.com/lhallee/featureranker/assets/72926928/a95c8ac9-11b5-45df-827f-0be1255c82ea)
![image](https://github.com/lhallee/featureranker/assets/72926928/710ed10e-eed5-4f0e-b9f8-997c7fb0de8b)

Classification example (breast cancer dataset)
```
cancer = load_breast_cancer(as_frame=True)
df = cancer.data.merge(cancer.target, left_index=True, right_index=True)
view_data(df)
X, y = get_data(df, labels='target')
rankings = classification_ranking(X, y, predict=False)
scoring = voting(rankings)
plot_rankings(rankings, title='Classification example all methods')
plot_after_vote(scoring, title='Classification example full ensemble')
```
![image](https://github.com/lhallee/featureranker/assets/72926928/fbb1308f-118f-4db2-a5a4-9c65d510fbc3)
![image](https://github.com/lhallee/featureranker/assets/72926928/88373375-18a3-4c82-99b2-1aec7b79aaa4)

### [More examples](https://github.com/lhallee/featureranker/tree/main/examples)

## [Documentation](https://github.com/lhallee/featureranker/tree/main/documentation)
See documentation via the link above for more details

## ISSUES WITH GOOGLE COLAB
The numpy / linux build on Google Colab does not always work when installing featureranker on collab.
**Simply upgrade numpy and restart the session to fix featureranker.**

## Citation
Please cite 
_Hallee, L., Khomtchouk, B.B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 13, 2088 (2023).
https://doi.org/10.1038/s41598-023-28965-7_

and

_Logan Hallee, Nikolaos Rafailidis, Jason P. Gleghorn
bioRxiv 2023.09.15.558027; doi: https://doi.org/10.1101/2023.09.15.558027_

## News
* 1/22/2023: Version 1.1.0 is released with faster solvers, many more settings, and more plots. 1.1.1 fixes some bugs.
* 1/3/2023: Version 1.0.2 is released with added clustering capabilities and better automatic plots.
* 11/10/2023: Version 1.0.1 is published in PyPI under featureranker.
* 11/9/2023: Version 1.0.0 of the package is published for testing on TestPyPI.
* 11/8/2023: Various utility helpers and plot functions are added for ease of use. The proper l1 penalty constant is now found automatically. The automatic hyperparameter search also returns the best metrics found via the methodologies.
* 11/7/2023: Recursive feature extraction is replaced with ANOVA F-scores due to its ability to rank based on modeled variance.
* 10/15/2023: A separate classification and regression version are developed for more reliable results. Logistic regression (OvR) with an l1 penalty takes the place of lasso for classification.
* 9/17/2023: The feature ranker is now a proper ensemble, with a custom soft voting scheme. XGboost, recursive feature elimination, and mutual information are also leveraged. The ensemble is used to unify the results of the previous papers in the cdsBERT paper.
* 2/6/2023: The preliminary work makes its way into Nature Scientific Reports!
* 7/21/2022: A preliminary version of this feature ranker leveraging lasso and random forests is published in BioRxiv for phylogenetic and organelle prediction.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lhallee/feature-ranker",
    "name": "featureranker",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "feature, ranker, sklearn",
    "author": "Logan Hallee",
    "author_email": "lhallee@udel.edu",
    "download_url": "https://files.pythonhosted.org/packages/b4/fa/0ecc792b3a2b818992ddcfd21bf0817bed43c76bb021f2f30a9a6755e1be/featureranker-1.1.2.tar.gz",
    "platform": null,
    "description": "# FEATURE RANKER\nfeatureranker is a lightweight Python package for the feature ranking ensemble developed by Logan Hallee, featured in the following works:\n\n[Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life](https://www.nature.com/articles/s41598-023-28965-7)\n\n[cdsBERT - Extending Protein Language Models with Codon Awareness](https://www.biorxiv.org/content/10.1101/2023.09.15.558027v1.abstract)\n\n[Exploring Phylogenetic Classification and Further Applications of Codon Usage Frequencies](https://www.biorxiv.org/content/10.1101/2022.07.20.500846v1.abstract)\n\nThe ensemble utilizes l1 penalization, random forests, extreme gradient boosting, ANOVA F values, and mutual information to effectively rank the importance of features for regression and classification tasks. Scoring lists are concatenated with a weighted voting scheme.\n\n## Usage\n\nInstall\n```\n!pip install featureranker\n```\nImports\n\n```\nfrom featureranker.utils import *\nfrom featureranker.plots import *\nfrom featureranker.rankers import *\n\nimport pandas as pd\nfrom sklearn.datasets import load_diabetes, load_breast_cancer\nimport warnings\nwarnings.filterwarnings('ignore')\n```\nRegression example (diabetes dataset)\n```\ndiabetes = load_diabetes(as_frame=True)\ndf = diabetes.data.merge(diabetes.target, left_index=True, right_index=True)\nview_data(df)\nX, y = get_data(df, labels='target')\nrankings = regression_ranking(X, y, predict=False)\nscoring = voting(rankings)\nplot_rankings(rankings, title='Regression example all methods')\nplot_after_vote(scoring, title='Regression example full ensemble')\n```\n![image](https://github.com/lhallee/featureranker/assets/72926928/a95c8ac9-11b5-45df-827f-0be1255c82ea)\n![image](https://github.com/lhallee/featureranker/assets/72926928/710ed10e-eed5-4f0e-b9f8-997c7fb0de8b)\n\nClassification example (breast cancer dataset)\n```\ncancer = load_breast_cancer(as_frame=True)\ndf = cancer.data.merge(cancer.target, left_index=True, right_index=True)\nview_data(df)\nX, y = get_data(df, labels='target')\nrankings = classification_ranking(X, y, predict=False)\nscoring = voting(rankings)\nplot_rankings(rankings, title='Classification example all methods')\nplot_after_vote(scoring, title='Classification example full ensemble')\n```\n![image](https://github.com/lhallee/featureranker/assets/72926928/fbb1308f-118f-4db2-a5a4-9c65d510fbc3)\n![image](https://github.com/lhallee/featureranker/assets/72926928/88373375-18a3-4c82-99b2-1aec7b79aaa4)\n\n### [More examples](https://github.com/lhallee/featureranker/tree/main/examples)\n\n## [Documentation](https://github.com/lhallee/featureranker/tree/main/documentation)\nSee documentation via the link above for more details\n\n## ISSUES WITH GOOGLE COLAB\nThe numpy / linux build on Google Colab does not always work when installing featureranker on collab.\n**Simply upgrade numpy and restart the session to fix featureranker.**\n\n## Citation\nPlease cite \n_Hallee, L., Khomtchouk, B.B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 13, 2088 (2023).\nhttps://doi.org/10.1038/s41598-023-28965-7_\n\nand\n\n_Logan Hallee, Nikolaos Rafailidis, Jason P. Gleghorn\nbioRxiv 2023.09.15.558027; doi: https://doi.org/10.1101/2023.09.15.558027_\n\n## News\n* 1/22/2023: Version 1.1.0 is released with faster solvers, many more settings, and more plots. 1.1.1 fixes some bugs.\n* 1/3/2023: Version 1.0.2 is released with added clustering capabilities and better automatic plots.\n* 11/10/2023: Version 1.0.1 is published in PyPI under featureranker.\n* 11/9/2023: Version 1.0.0 of the package is published for testing on TestPyPI.\n* 11/8/2023: Various utility helpers and plot functions are added for ease of use. The proper l1 penalty constant is now found automatically. The automatic hyperparameter search also returns the best metrics found via the methodologies.\n* 11/7/2023: Recursive feature extraction is replaced with ANOVA F-scores due to its ability to rank based on modeled variance.\n* 10/15/2023: A separate classification and regression version are developed for more reliable results. Logistic regression (OvR) with an l1 penalty takes the place of lasso for classification.\n* 9/17/2023: The feature ranker is now a proper ensemble, with a custom soft voting scheme. XGboost, recursive feature elimination, and mutual information are also leveraged. The ensemble is used to unify the results of the previous papers in the cdsBERT paper.\n* 2/6/2023: The preliminary work makes its way into Nature Scientific Reports!\n* 7/21/2022: A preliminary version of this feature ranker leveraging lasso and random forests is published in BioRxiv for phylogenetic and organelle prediction.\n\n\n",
    "bugtrack_url": null,
    "license": "CC-BY-NC-SA-4.0",
    "summary": "Feature ranking ensemble",
    "version": "1.1.2",
    "project_urls": {
        "Homepage": "https://github.com/lhallee/feature-ranker",
        "Repository": "https://github.com/lhallee/feature-ranker"
    },
    "split_keywords": [
        "feature",
        " ranker",
        " sklearn"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "48fcdb1634fc2d346be9f248ccbfae6310a8ebf04a93c642e00f343fc22a5b3c",
                "md5": "f1c51a6de12636f474c530df1ce8b629",
                "sha256": "9101d87d0bfc7f72c71fedbbbd72403d90ce94b52c49fa8dc8059982c72b2ab8"
            },
            "downloads": -1,
            "filename": "featureranker-1.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f1c51a6de12636f474c530df1ce8b629",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 23436,
            "upload_time": "2024-01-23T04:13:53",
            "upload_time_iso_8601": "2024-01-23T04:13:53.463464Z",
            "url": "https://files.pythonhosted.org/packages/48/fc/db1634fc2d346be9f248ccbfae6310a8ebf04a93c642e00f343fc22a5b3c/featureranker-1.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b4fa0ecc792b3a2b818992ddcfd21bf0817bed43c76bb021f2f30a9a6755e1be",
                "md5": "d947863d51b44f25fdb6de0ac8dd60ed",
                "sha256": "9c842beaaf6d9cf3a3dcfbbd5ccc5c1a14db9dba88570989a3c02bdf732ae973"
            },
            "downloads": -1,
            "filename": "featureranker-1.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "d947863d51b44f25fdb6de0ac8dd60ed",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 15310,
            "upload_time": "2024-01-23T04:13:55",
            "upload_time_iso_8601": "2024-01-23T04:13:55.321706Z",
            "url": "https://files.pythonhosted.org/packages/b4/fa/0ecc792b3a2b818992ddcfd21bf0817bed43c76bb021f2f30a9a6755e1be/featureranker-1.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-23 04:13:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lhallee",
    "github_project": "feature-ranker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "featureranker"
}
        
Elapsed time: 0.18881s