dRFEtools


NamedRFEtools JSON
Version 0.3.5 PyPI version JSON
download
home_pagehttps://github.com/LieberInstitute/dRFEtools.git
SummaryA package for preforming dynamic recursive feature elimination with sklearn.
upload_time2024-07-13 03:29:32
maintainerKynon JM Benjamin
docs_urlNone
authorKynon JM Benjamin
requires_python<4.0,>=3.10
licenseGPL-3.0-only
keywords recursive feature elimination sklearn feature ranking
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # dRFEtools - dynamic Recursive Feature Elimination

`dRFEtools` is a package for dynamic recursive feature elimination with
sklearn.

Authors: Apuã Paquola, Kynon Jade Benjamin, and Tarun Katipalli

Package developed in Python 3.8+.

In addition to scikit-learn, `dRFEtools` is also built with NumPy, SciPy,
Pandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports
models with `coef_` or `feature_importances_` attribute.

This package has several function to run dynamic recursive feature elimination
(dRFE) for random forest and linear model classifier and regression models. For
random forest, it assumes Out-of-Bag (OOB) is set to True. For linear models,
it generates a developmental set. For both classification and regression, three
measurements are calculated for feature selection:

Classification:

1.  Normalized mutual information
2.  Accuracy
3.  Area under the curve (AUC) ROC curve

Regression:

1.  R2 (this can be negative if model is arbitrarily worse)
2.  Explained variance
3.  Mean squared error

The package has been split in to four additional scripts for:

1.  Out-of-bag dynamic RFE metrics (AP/KJB)
2.  Validation set dynamic RFE metrics (KJB)
3.  Rank features function (TK)
4.  Lowess core + peripheral selection (KJB)

# Table of Contents

1.  [Citation](#org7b64d47)
2.  [Installation](#org04443e4)
3.  [Tutorials](#org07777f88)
4.  [Reference Manual](#org5afd041)
    1.  [dRFEtools main functions](#org6171433)
    2.  [Peripheral features functions](#org3cfdf65)
    3.  [Plotting functions](#org8ecca01)
    4.  [Metric functions](#org377b1aa)
    5.  [Random forest helper functions](#orga29d49b)
    6.  [Linear model helper functions](#orgbda21bf)

<a id="org7b64d47"></a>

## Citation

If using please cite the following:

Kynon J M Benjamin, Tarun Katipalli, Apuã C M Paquola, 
dRFEtools: dynamic recursive feature elimination for omics, 
Bioinformatics, Volume 39, Issue 8, August 2023, btad513, 
https://doi.org/10.1093/bioinformatics/btad513

PMID: 37632789

DOI: [10.1093/bioinformatics/btad513](10.1093/bioinformatics/btad513).


<a id="org04443e4"></a>

## Installation

`pip install --user dRFEtools`

<a id="org07777f88"></a>
## Tutorials

We have two tutorials for [optimization](./examples/optimization.md)
(version 0.2) and [classification](./examples/classification.md) (version 0.3+).

In addition to this, we have example code used in the manuscript for
scikit-learn simulation, biological simulation, and BrainSEQ Phase 1
at the link below.

[https://github.com/LieberInstitute/dRFEtools_manuscript](https://github.com/LieberInstitute/dRFEtools_manuscript/tree/main)

<a id="org5afd041"></a>

## Reference Manual

<a id="org6171433"></a>

### dRFEtools main functions

1.  dRFE - Random Forest

    `rf_rfe`

    Runs random forest feature elimination step over iterator process.

    **Args:**

    -   estimator: Random forest classifier object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   elimination_rate: percent rate to reduce feature list. default .2
    -   RANK: Output feature ranking. default=True (Boolean)

    **Yields:**

    -   dict: a dictionary with number of features, normalized mutual information score, accuracy score, and array of the indexes for features to keep

2.  dRFE - Linear Models

    `dev_rfe`

    Runs recursive feature elimination for linear model step over iterator
    process assuming developmental set is needed.

    **Args:**

    -   estimator: regressor or classifier linear model object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   elimination_rate: percent rate to reduce feature list. default .2
    -   dev_size: developmental set size. default '0.20'
    -   RANK: run feature ranking, default 'True'
    -   SEED: random state. default 'True'

    **Yields:**

    -   dict: a dictionary with number of features, r2 score, mean square error,
        expalined variance, and array of the indices for features to keep

3.  Feature Rank Function

    `feature_rank_fnc`

    This function ranks features within the feature elimination loop.

    **Args:**

    -   features: A vector of feature names
    -   rank: A vector with feature ranks based on absolute value of feature importance
    -   n_features_to_keep: Number of features to keep. (Int)
    -   fold: Fold to analyzed. (Int)
    -   out_dir: Output directory for text file. Default '.'
    -   RANK: Boolean (True or False)

    **Yields:**

    -   Text file: Ranked features by fold tab-delimited text file, only if RANK=True

4.  N Feature Iterator

    `n_features_iter`

    Determines the features to keep.

    **Args:**

    -   nf: current number of features
    -   keep_rate: percentage of features to keep

    **Yields:**

    -   int: number of features to keep


5.  Extract feature importances

    `_get_feature_importances`

    Generates feature importance from absolute value of feature weights.

	**Args:**

	-  estimator: the estimator to generate feature importance for

	**Yields:**

	-  numpy array: returns feature importances as a NumPy array


<a id="org3cfdf65"></a>

### Peripheral features functions

1.  Run lowess

    `run_lowess`

    This function runs the lowess function and caches it to memory.

    **Args:**

    -   x: the x-values of the observed points
    -   y: the y-values of the observed points
    -   frac: the fraction of the data used when estimating each y-value. default 3/10

    **Yields:**

    -   z: 2D array of results

2.  Convert array to tuple

    `array_to_tuple`

    This function attempts to convert a numpy array to a tuple.

    **Args:**

    -   np_array: numpy array

    **Yields:**

    -   tuple

3.  Extract dRFE as a dataframe

    `get_elim_df_ordered`

    This function converts the dRFE dictionary to a pandas dataframe.

    **Args:**

    -   d: dRFE dictionary
    -   multi: is this for multiple classes. (True or False)

    **Yields:**

    -   df_elim: dRFE as a dataframe with log10 transformed features

4.  Calculate lowess curve

    `cal_lowess`

    This function calculates the lowess curve.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value
    -   multi: is this for multiple classes. (True or False)

    **Yields:**

    -   x: dRFE log10 transformed features
    -   y: dRFE metrics
    -   z: 2D numpy array with lowess curve
    -   xnew: increased intervals
    -   ynew: interpolated metrics for xnew

5.  Calculate lowess curve for log10

    `cal_lowess`

    This function calculates the rate of change on the lowess fitted curve with
    log10 transformated input.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value
    -   multi: is this for multiple classes. default False

    **Yields:**

    -   data frame: dataframe with n_features, lowess value, and rate of change (DxDy)

6.  Extract max lowess

    `extract_max_lowess`

    This function extracts the max features based on rate of change of log10
    transformed lowess fit curve.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   multi: is this for multiple classes. default False

    **Yields:**

    -   int: number of max features (smallest subset)

7.  Extract peripheral lowess

    `extract_peripheral_lowess`

    This function extracts the peripheral features based on rate of change of log10
    transformed lowess fit curve.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   step_size: rate of change step size to analyze for extraction. default 0.05
    -   multi: is this for multiple classes. default False

    **Yields:**

    -   int: number of peripheral features

8.  Optimize lowess plot

    `plot_with_lowess_vline`

    Peripheral set selection optimization plot. This will be ROC AUC for multiple
    classification (3+), NMI for binary classification, or R2 for regression. The
    plot returned has fraction and step size as well as lowess smoothed curve and
    indication of predicted peripheral set.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   step_size: rate of change step size to analyze for extraction. default 0.05
    -   classify: is this a classification algorithm. default True
    -   multi: does this have multiple (3+) classes. default True

    **Yields:**

    -   graph: plot of dRFE with estimated peripheral set indicated as well as fraction and set size used. It automatically saves files as pdf, png, and svg

9.  Plot lowess vline

    `plot_with_lowess_vline`

    Plot feature elimination results with the peripheral set indicated. This will be
    ROC AUC for multiple classification (3+), NMI for binary classification, or R2
    for regression.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   step_size: rate of change step size to analyze for extraction. default 0.05
    -   classify: is this a classification algorithm. default True
    -   multi: does this have multiple (3+) classes. default True

    **Yields:**

    -   graph: plot of dRFE with estimated peripheral set indicated, automatically saves files as pdf, png, and svg


<a id="org8ecca01"></a>

### Plotting functions

1.  Save plots

    `save_plots`

    This function save plot as svg, png, and pdf with specific label and dimension.

    **Args:**

    -   p: plotnine object
    -   fn: file name without extensions
    -   w: width, default 7
    -   h: height, default 7

    **Yields:** SVG, PNG, and PDF of plotnine object

2.  Plot dRFE Accuracy

    `plot_acc`

    Plot feature elimination results for accuracy.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by accuracy, automatically saves files as pdf, png, and svg

3.  Plot dRFE NMI

    `plot_nmi`

    Plot feature elimination results for normalized mutual information.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by NMI, automatically saves files as pdf, png, and svg

4.  Plot dRFE ROC AUC

    `plot_roc`

    Plot feature elimination results for AUC ROC curve.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by AUC, automatically saves files as pdf, png, and svg

5.  Plot dRFE R2

    `plot_r2`

    Plot feature elimination results for R2 score. Note that this can be negative
    if model is arbitarily worse.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by R2, automatically saves files as pdf, png, and svg

6.  Plot dRFE MSE

    `plot_mse`

    Plot feature elimination results for mean squared error score.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by mean squared error, automatically saves files as pdf, png, and svg

7.  Plot dRFE Explained Variance

    `plot_evar`

    Plot feature elimination results for explained variance score.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by explained variance, automatically saves files as pdf, png, and svg


<a id="org377b1aa"></a>

### Metric functions

1.  OOB Prediction

    `oob_predictions`

    Extracts out-of-bag (OOB) predictions from random forest classifier classes.

    **Args:**

    -   estimator: Random forest classifier object

    **Yields:**

    -   vector: OOB predicted labels

2.  OOB Accuracy Score

    `oob_score_accuracy`

    Calculates the accuracy score from the OOB predictions.

    **Args:**

    -   estimator: Random forest classifier object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: accuracy score

3.  OOB Normalized Mutual Information Score

    `oob_score_nmi`

    Calculates the normalized mutual information score from the OOB predictions.

    **Args:**

    -   estimator: Random forest classifier object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: normalized mutual information score

4.  OOB Area Under ROC Curve Score

    `oob_score_roc`

    Calculates the area under the ROC curve score for the OOB predictions.

    **Args:**

    -   estimator: Random forest classifier object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: AUC ROC score

5.  OOB R2 Score

    `oob_score_r2`

    Calculates the r2 score from the OOB predictions.

    **Args:**

    -   estimator: Random forest regressor object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: r2 score

6.  OOB Mean Squared Error Score

    `oob_score_mse`

    Calculates the mean squared error score from the OOB predictions.

    **Args:**

    -   estimator: Random forest regressor object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: mean squared error score

7.  OOB Explained Variance Score

    `oob_score_evar`

    Calculates the explained variance score for the OOB predictions.

    **Args:**

    -   estimator: Random forest regressor object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: explained variance score

8.  Developmental Test Set Predictions

    `dev_predictions`

    Extracts predictions using a development fold for linear
    regressor.

    **Args:**

    -   estimator: Linear model regression classifier object
    -   X: a data frame of normalized values from developmental dataset

    **Yields:**

    -   vector: Development set predicted labels

9.  Developmental Test Set R2 Score

    `dev_score_r2`

    Calculates the r2 score from the developmental dataset
    predictions.

    **Args:**

    -   estimator: Linear model regressor object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from developmental dataset

    **Yields:**

    -   float: r2 score

10. Developmental Test Set Mean Squared Error Score

    `dev_score_mse`

    Calculates the mean squared error score from the developmental dataset
    predictions.

    **Args:**

    -   estimator: Linear model regressor object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from developmental dataset

    **Yields:**

    -   float: mean squared error score

11. Developmental Test Set Explained Variance Score

    `dev_score_evar`

    Calculates the explained variance score for the develomental dataset predictions.

    **Args:**

    -   estimator: Linear model regressor object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from developmental data set

    **Yields:**

    -   float: explained variance score

12.  DEV Accuracy Score

    `dev_score_accuracy`

    Calculates the accuracy score from the DEV predictions.

    **Args:**

    -   estimator: Linear model classifier object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: accuracy score

13.  DEV Normalized Mutual Information Score

    `dev_score_nmi`

    Calculates the normalized mutual information score from the DEV predictions.

    **Args:**

    -   estimator: Linear model classifier object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: normalized mutual information score

14.  DEV Area Under ROC Curve Score

    `dev_score_roc`

    Calculates the area under the ROC curve score for the DEV predictions.

    **Args:**

    -   estimator: Linear model classifier object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: AUC ROC score


<a id="orgbda21bf"></a>

### Linear model helper functions

1.  dRFE Subfunction

    `regr_fe`

    Iterate over features to by eliminated by step.

    **Args:**

    -   estimator: regressor or classifier linear model object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   n_features_iter: iterator for number of features to keep loop
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   dev_size: developmental test set propotion of training
    -   SEED: random state
    -   RANK: Boolean (True or False)

    **Yields:**

    -   list: a list with number of features, r2 score, mean square error, expalined variance, and array of the indices for features to keep

2.  dRFE Step function

    `regr_fe_step`

    Split training data into developmental dataset and apply estimator
    to developmental dataset, rank features, and conduct feature
    elimination, single steps.

    **Args:**

    -   estimator: regressor or classifier linear model object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   n_features_to_keep: number of features to keep
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   dev_size: developmental test set propotion of training
    -   SEED: random state
    -   RANK: Boolean (True or False)

    **Yields:**

    -   dict: a dictionary with number of features, r2 score, mean square error, expalined variance, and selected features

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LieberInstitute/dRFEtools.git",
    "name": "dRFEtools",
    "maintainer": "Kynon JM Benjamin",
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": "kj.benjamin90@gmail.com",
    "keywords": "recursive feature elimination, sklearn, feature ranking",
    "author": "Kynon JM Benjamin",
    "author_email": "kj.benjamin90@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ec/23/a3cb7270abc1c4e827eddc1276682f2151380c0eb3852748eb0636c8618a/drfetools-0.3.5.tar.gz",
    "platform": null,
    "description": "# dRFEtools - dynamic Recursive Feature Elimination\n\n`dRFEtools` is a package for dynamic recursive feature elimination with\nsklearn.\n\nAuthors: Apu\u00e3 Paquola, Kynon Jade Benjamin, and Tarun Katipalli\n\nPackage developed in Python 3.8+.\n\nIn addition to scikit-learn, `dRFEtools` is also built with NumPy, SciPy,\nPandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports\nmodels with `coef_` or `feature_importances_` attribute.\n\nThis package has several function to run dynamic recursive feature elimination\n(dRFE) for random forest and linear model classifier and regression models. For\nrandom forest, it assumes Out-of-Bag (OOB) is set to True. For linear models,\nit generates a developmental set. For both classification and regression, three\nmeasurements are calculated for feature selection:\n\nClassification:\n\n1.  Normalized mutual information\n2.  Accuracy\n3.  Area under the curve (AUC) ROC curve\n\nRegression:\n\n1.  R2 (this can be negative if model is arbitrarily worse)\n2.  Explained variance\n3.  Mean squared error\n\nThe package has been split in to four additional scripts for:\n\n1.  Out-of-bag dynamic RFE metrics (AP/KJB)\n2.  Validation set dynamic RFE metrics (KJB)\n3.  Rank features function (TK)\n4.  Lowess core + peripheral selection (KJB)\n\n# Table of Contents\n\n1.  [Citation](#org7b64d47)\n2.  [Installation](#org04443e4)\n3.  [Tutorials](#org07777f88)\n4.  [Reference Manual](#org5afd041)\n    1.  [dRFEtools main functions](#org6171433)\n    2.  [Peripheral features functions](#org3cfdf65)\n    3.  [Plotting functions](#org8ecca01)\n    4.  [Metric functions](#org377b1aa)\n    5.  [Random forest helper functions](#orga29d49b)\n    6.  [Linear model helper functions](#orgbda21bf)\n\n<a id=\"org7b64d47\"></a>\n\n## Citation\n\nIf using please cite the following:\n\nKynon J M Benjamin, Tarun Katipalli, Apu\u00e3 C M Paquola, \ndRFEtools: dynamic recursive feature elimination for omics, \nBioinformatics, Volume 39, Issue 8, August 2023, btad513, \nhttps://doi.org/10.1093/bioinformatics/btad513\n\nPMID: 37632789\n\nDOI: [10.1093/bioinformatics/btad513](10.1093/bioinformatics/btad513).\n\n\n<a id=\"org04443e4\"></a>\n\n## Installation\n\n`pip install --user dRFEtools`\n\n<a id=\"org07777f88\"></a>\n## Tutorials\n\nWe have two tutorials for [optimization](./examples/optimization.md)\n(version 0.2) and [classification](./examples/classification.md) (version 0.3+).\n\nIn addition to this, we have example code used in the manuscript for\nscikit-learn simulation, biological simulation, and BrainSEQ Phase 1\nat the link below.\n\n[https://github.com/LieberInstitute/dRFEtools_manuscript](https://github.com/LieberInstitute/dRFEtools_manuscript/tree/main)\n\n<a id=\"org5afd041\"></a>\n\n## Reference Manual\n\n<a id=\"org6171433\"></a>\n\n### dRFEtools main functions\n\n1.  dRFE - Random Forest\n\n    `rf_rfe`\n\n    Runs random forest feature elimination step over iterator process.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   elimination_rate: percent rate to reduce feature list. default .2\n    -   RANK: Output feature ranking. default=True (Boolean)\n\n    **Yields:**\n\n    -   dict: a dictionary with number of features, normalized mutual information score, accuracy score, and array of the indexes for features to keep\n\n2.  dRFE - Linear Models\n\n    `dev_rfe`\n\n    Runs recursive feature elimination for linear model step over iterator\n    process assuming developmental set is needed.\n\n    **Args:**\n\n    -   estimator: regressor or classifier linear model object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   elimination_rate: percent rate to reduce feature list. default .2\n    -   dev_size: developmental set size. default '0.20'\n    -   RANK: run feature ranking, default 'True'\n    -   SEED: random state. default 'True'\n\n    **Yields:**\n\n    -   dict: a dictionary with number of features, r2 score, mean square error,\n        expalined variance, and array of the indices for features to keep\n\n3.  Feature Rank Function\n\n    `feature_rank_fnc`\n\n    This function ranks features within the feature elimination loop.\n\n    **Args:**\n\n    -   features: A vector of feature names\n    -   rank: A vector with feature ranks based on absolute value of feature importance\n    -   n_features_to_keep: Number of features to keep. (Int)\n    -   fold: Fold to analyzed. (Int)\n    -   out_dir: Output directory for text file. Default '.'\n    -   RANK: Boolean (True or False)\n\n    **Yields:**\n\n    -   Text file: Ranked features by fold tab-delimited text file, only if RANK=True\n\n4.  N Feature Iterator\n\n    `n_features_iter`\n\n    Determines the features to keep.\n\n    **Args:**\n\n    -   nf: current number of features\n    -   keep_rate: percentage of features to keep\n\n    **Yields:**\n\n    -   int: number of features to keep\n\n\n5.  Extract feature importances\n\n    `_get_feature_importances`\n\n    Generates feature importance from absolute value of feature weights.\n\n\t**Args:**\n\n\t-  estimator: the estimator to generate feature importance for\n\n\t**Yields:**\n\n\t-  numpy array: returns feature importances as a NumPy array\n\n\n<a id=\"org3cfdf65\"></a>\n\n### Peripheral features functions\n\n1.  Run lowess\n\n    `run_lowess`\n\n    This function runs the lowess function and caches it to memory.\n\n    **Args:**\n\n    -   x: the x-values of the observed points\n    -   y: the y-values of the observed points\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n\n    **Yields:**\n\n    -   z: 2D array of results\n\n2.  Convert array to tuple\n\n    `array_to_tuple`\n\n    This function attempts to convert a numpy array to a tuple.\n\n    **Args:**\n\n    -   np_array: numpy array\n\n    **Yields:**\n\n    -   tuple\n\n3.  Extract dRFE as a dataframe\n\n    `get_elim_df_ordered`\n\n    This function converts the dRFE dictionary to a pandas dataframe.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   multi: is this for multiple classes. (True or False)\n\n    **Yields:**\n\n    -   df_elim: dRFE as a dataframe with log10 transformed features\n\n4.  Calculate lowess curve\n\n    `cal_lowess`\n\n    This function calculates the lowess curve.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value\n    -   multi: is this for multiple classes. (True or False)\n\n    **Yields:**\n\n    -   x: dRFE log10 transformed features\n    -   y: dRFE metrics\n    -   z: 2D numpy array with lowess curve\n    -   xnew: increased intervals\n    -   ynew: interpolated metrics for xnew\n\n5.  Calculate lowess curve for log10\n\n    `cal_lowess`\n\n    This function calculates the rate of change on the lowess fitted curve with\n    log10 transformated input.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value\n    -   multi: is this for multiple classes. default False\n\n    **Yields:**\n\n    -   data frame: dataframe with n_features, lowess value, and rate of change (DxDy)\n\n6.  Extract max lowess\n\n    `extract_max_lowess`\n\n    This function extracts the max features based on rate of change of log10\n    transformed lowess fit curve.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   multi: is this for multiple classes. default False\n\n    **Yields:**\n\n    -   int: number of max features (smallest subset)\n\n7.  Extract peripheral lowess\n\n    `extract_peripheral_lowess`\n\n    This function extracts the peripheral features based on rate of change of log10\n    transformed lowess fit curve.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   step_size: rate of change step size to analyze for extraction. default 0.05\n    -   multi: is this for multiple classes. default False\n\n    **Yields:**\n\n    -   int: number of peripheral features\n\n8.  Optimize lowess plot\n\n    `plot_with_lowess_vline`\n\n    Peripheral set selection optimization plot. This will be ROC AUC for multiple\n    classification (3+), NMI for binary classification, or R2 for regression. The\n    plot returned has fraction and step size as well as lowess smoothed curve and\n    indication of predicted peripheral set.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   step_size: rate of change step size to analyze for extraction. default 0.05\n    -   classify: is this a classification algorithm. default True\n    -   multi: does this have multiple (3+) classes. default True\n\n    **Yields:**\n\n    -   graph: plot of dRFE with estimated peripheral set indicated as well as fraction and set size used. It automatically saves files as pdf, png, and svg\n\n9.  Plot lowess vline\n\n    `plot_with_lowess_vline`\n\n    Plot feature elimination results with the peripheral set indicated. This will be\n    ROC AUC for multiple classification (3+), NMI for binary classification, or R2\n    for regression.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   step_size: rate of change step size to analyze for extraction. default 0.05\n    -   classify: is this a classification algorithm. default True\n    -   multi: does this have multiple (3+) classes. default True\n\n    **Yields:**\n\n    -   graph: plot of dRFE with estimated peripheral set indicated, automatically saves files as pdf, png, and svg\n\n\n<a id=\"org8ecca01\"></a>\n\n### Plotting functions\n\n1.  Save plots\n\n    `save_plots`\n\n    This function save plot as svg, png, and pdf with specific label and dimension.\n\n    **Args:**\n\n    -   p: plotnine object\n    -   fn: file name without extensions\n    -   w: width, default 7\n    -   h: height, default 7\n\n    **Yields:** SVG, PNG, and PDF of plotnine object\n\n2.  Plot dRFE Accuracy\n\n    `plot_acc`\n\n    Plot feature elimination results for accuracy.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by accuracy, automatically saves files as pdf, png, and svg\n\n3.  Plot dRFE NMI\n\n    `plot_nmi`\n\n    Plot feature elimination results for normalized mutual information.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by NMI, automatically saves files as pdf, png, and svg\n\n4.  Plot dRFE ROC AUC\n\n    `plot_roc`\n\n    Plot feature elimination results for AUC ROC curve.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by AUC, automatically saves files as pdf, png, and svg\n\n5.  Plot dRFE R2\n\n    `plot_r2`\n\n    Plot feature elimination results for R2 score. Note that this can be negative\n    if model is arbitarily worse.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by R2, automatically saves files as pdf, png, and svg\n\n6.  Plot dRFE MSE\n\n    `plot_mse`\n\n    Plot feature elimination results for mean squared error score.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by mean squared error, automatically saves files as pdf, png, and svg\n\n7.  Plot dRFE Explained Variance\n\n    `plot_evar`\n\n    Plot feature elimination results for explained variance score.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by explained variance, automatically saves files as pdf, png, and svg\n\n\n<a id=\"org377b1aa\"></a>\n\n### Metric functions\n\n1.  OOB Prediction\n\n    `oob_predictions`\n\n    Extracts out-of-bag (OOB) predictions from random forest classifier classes.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n\n    **Yields:**\n\n    -   vector: OOB predicted labels\n\n2.  OOB Accuracy Score\n\n    `oob_score_accuracy`\n\n    Calculates the accuracy score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: accuracy score\n\n3.  OOB Normalized Mutual Information Score\n\n    `oob_score_nmi`\n\n    Calculates the normalized mutual information score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: normalized mutual information score\n\n4.  OOB Area Under ROC Curve Score\n\n    `oob_score_roc`\n\n    Calculates the area under the ROC curve score for the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: AUC ROC score\n\n5.  OOB R2 Score\n\n    `oob_score_r2`\n\n    Calculates the r2 score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest regressor object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: r2 score\n\n6.  OOB Mean Squared Error Score\n\n    `oob_score_mse`\n\n    Calculates the mean squared error score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest regressor object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: mean squared error score\n\n7.  OOB Explained Variance Score\n\n    `oob_score_evar`\n\n    Calculates the explained variance score for the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest regressor object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: explained variance score\n\n8.  Developmental Test Set Predictions\n\n    `dev_predictions`\n\n    Extracts predictions using a development fold for linear\n    regressor.\n\n    **Args:**\n\n    -   estimator: Linear model regression classifier object\n    -   X: a data frame of normalized values from developmental dataset\n\n    **Yields:**\n\n    -   vector: Development set predicted labels\n\n9.  Developmental Test Set R2 Score\n\n    `dev_score_r2`\n\n    Calculates the r2 score from the developmental dataset\n    predictions.\n\n    **Args:**\n\n    -   estimator: Linear model regressor object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from developmental dataset\n\n    **Yields:**\n\n    -   float: r2 score\n\n10. Developmental Test Set Mean Squared Error Score\n\n    `dev_score_mse`\n\n    Calculates the mean squared error score from the developmental dataset\n    predictions.\n\n    **Args:**\n\n    -   estimator: Linear model regressor object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from developmental dataset\n\n    **Yields:**\n\n    -   float: mean squared error score\n\n11. Developmental Test Set Explained Variance Score\n\n    `dev_score_evar`\n\n    Calculates the explained variance score for the develomental dataset predictions.\n\n    **Args:**\n\n    -   estimator: Linear model regressor object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from developmental data set\n\n    **Yields:**\n\n    -   float: explained variance score\n\n12.  DEV Accuracy Score\n\n    `dev_score_accuracy`\n\n    Calculates the accuracy score from the DEV predictions.\n\n    **Args:**\n\n    -   estimator: Linear model classifier object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: accuracy score\n\n13.  DEV Normalized Mutual Information Score\n\n    `dev_score_nmi`\n\n    Calculates the normalized mutual information score from the DEV predictions.\n\n    **Args:**\n\n    -   estimator: Linear model classifier object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: normalized mutual information score\n\n14.  DEV Area Under ROC Curve Score\n\n    `dev_score_roc`\n\n    Calculates the area under the ROC curve score for the DEV predictions.\n\n    **Args:**\n\n    -   estimator: Linear model classifier object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: AUC ROC score\n\n\n<a id=\"orgbda21bf\"></a>\n\n### Linear model helper functions\n\n1.  dRFE Subfunction\n\n    `regr_fe`\n\n    Iterate over features to by eliminated by step.\n\n    **Args:**\n\n    -   estimator: regressor or classifier linear model object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   n_features_iter: iterator for number of features to keep loop\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   dev_size: developmental test set propotion of training\n    -   SEED: random state\n    -   RANK: Boolean (True or False)\n\n    **Yields:**\n\n    -   list: a list with number of features, r2 score, mean square error, expalined variance, and array of the indices for features to keep\n\n2.  dRFE Step function\n\n    `regr_fe_step`\n\n    Split training data into developmental dataset and apply estimator\n    to developmental dataset, rank features, and conduct feature\n    elimination, single steps.\n\n    **Args:**\n\n    -   estimator: regressor or classifier linear model object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   n_features_to_keep: number of features to keep\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   dev_size: developmental test set propotion of training\n    -   SEED: random state\n    -   RANK: Boolean (True or False)\n\n    **Yields:**\n\n    -   dict: a dictionary with number of features, r2 score, mean square error, expalined variance, and selected features\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-only",
    "summary": "A package for preforming dynamic recursive feature elimination with sklearn.",
    "version": "0.3.5",
    "project_urls": {
        "Homepage": "https://github.com/LieberInstitute/dRFEtools.git",
        "Repository": "https://github.com/LieberInstitute/dRFEtools.git"
    },
    "split_keywords": [
        "recursive feature elimination",
        " sklearn",
        " feature ranking"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c9eceb8276286e291d9b13bb680352ff64ae2aaa881d0e9617c9f74c71d0abfe",
                "md5": "60d52daef6348309176bab18b00c5bff",
                "sha256": "3a12c938fa9d5d671a1769a9271c1ea5dea9fdbee79044e0f28ad1ed5d16b658"
            },
            "downloads": -1,
            "filename": "drfetools-0.3.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "60d52daef6348309176bab18b00c5bff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 28586,
            "upload_time": "2024-07-13T03:29:30",
            "upload_time_iso_8601": "2024-07-13T03:29:30.916947Z",
            "url": "https://files.pythonhosted.org/packages/c9/ec/eb8276286e291d9b13bb680352ff64ae2aaa881d0e9617c9f74c71d0abfe/drfetools-0.3.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ec23a3cb7270abc1c4e827eddc1276682f2151380c0eb3852748eb0636c8618a",
                "md5": "d51b9ff3faa4658861e3ea5105148296",
                "sha256": "52ceccd271e27a83b9d2cabdf63c196851cb455a2ec7cc583e115a3e9d2e08a2"
            },
            "downloads": -1,
            "filename": "drfetools-0.3.5.tar.gz",
            "has_sig": false,
            "md5_digest": "d51b9ff3faa4658861e3ea5105148296",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 27676,
            "upload_time": "2024-07-13T03:29:32",
            "upload_time_iso_8601": "2024-07-13T03:29:32.871513Z",
            "url": "https://files.pythonhosted.org/packages/ec/23/a3cb7270abc1c4e827eddc1276682f2151380c0eb3852748eb0636c8618a/drfetools-0.3.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-13 03:29:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LieberInstitute",
    "github_project": "dRFEtools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "drfetools"
}
        
Elapsed time: 0.27715s