drfetools


Namedrfetools JSON
Version 0.3.4 PyPI version JSON
download
home_pagehttps://github.com/LieberInstitute/dRFEtools.git
SummaryA package for preforming dynamic recursive feature elimination with sklearn.
upload_time2023-06-28 12:29:12
maintainerKynon JM Benjamin
docs_urlNone
authorKynon JM Benjamin
requires_python>=3.8,<4.0
licenseGPL-3.0-only
keywords recursive feature elimination sklearn feature ranking
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # dRFEtools - dynamic Recursive Feature Elimination

`dRFEtools` is a package for dynamic recursive feature elimination with
sklearn.

Authors: ApuĆ£ Paquola, Kynon Jade Benjamin, and Tarun Katipalli

Package developed in Python 3.8+.

In addition to scikit-learn, `dRFEtools` is also built with NumPy, SciPy,
Pandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports
models with `coef_` or `feature_importances_` attribute.

This package has several function to run dynamic recursive feature elimination
(dRFE) for random forest and linear model classifier and regression models. For
random forest, it assumes Out-of-Bag (OOB) is set to True. For linear models,
it generates a developmental set. For both classification and regression, three
measurements are calculated for feature selection:

Classification:

1.  Normalized mutual information
2.  Accuracy
3.  Area under the curve (AUC) ROC curve

Regression:

1.  R2 (this can be negative if model is arbitrarily worse)
2.  Explained variance
3.  Mean squared error

The package has been split in to four additional scripts for:

1.  Out-of-bag dynamic RFE metrics (AP/KJB)
2.  Validation set dynamic RFE metrics (KJB)
3.  Rank features function (TK)
4.  Lowess core + peripheral selection (KJB)

# Table of Contents

1.  [Citation](#org7b64d47)
2.  [Installation](#org04443e4)
3.  [Tutorials](#org07777f88)
4.  [Reference Manual](#org5afd041)
    1.  [dRFEtools main functions](#org6171433)
    2.  [Peripheral features functions](#org3cfdf65)
    3.  [Plotting functions](#org8ecca01)
    4.  [Metric functions](#org377b1aa)
    5.  [Random forest helper functions](#orga29d49b)
    6.  [Linear model helper functions](#orgbda21bf)

<a id="org7b64d47"></a>

## Citation

If using please cite the following:
Pre-print DOI: https://doi.org/10.1101/2022.07.27.501227
[![DOI](https://zenodo.org/badge/402494754.svg)](https://zenodo.org/badge/latestdoi/402494754).


<a id="org04443e4"></a>

## Installation

`pip install --user dRFEtools`

<a id="org07777f88"></a>
## Tutorials

Follow [this](https://github.com/LieberInstitute/dRFEtools_manuscript/blob/main/optimization/_m/optimization.ipynb) jupyter notebook for an example on optimization.

The GitHub below has example code for sklearn simulation, biological simulation, and using BrainSEQ Phase 1.

[https://github.com/LieberInstitute/dRFEtools_manuscript](https://github.com/LieberInstitute/dRFEtools_manuscript/tree/main)

<a id="org5afd041"></a>

## Reference Manual

<a id="org6171433"></a>

### dRFEtools main functions

1.  dRFE - Random Forest

    `rf_rfe`

    Runs random forest feature elimination step over iterator process.

    **Args:**

    -   estimator: Random forest classifier object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   elimination_rate: percent rate to reduce feature list. default .2
    -   RANK: Output feature ranking. default=True (Boolean)

    **Yields:**

    -   dict: a dictionary with number of features, normalized mutual information score, accuracy score, and array of the indexes for features to keep

2.  dRFE - Linear Models

    `dev_rfe`

    Runs recursive feature elimination for linear model step over iterator
    process assuming developmental set is needed.

    **Args:**

    -   estimator: regressor or classifier linear model object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   elimination_rate: percent rate to reduce feature list. default .2
    -   dev_size: developmental set size. default '0.20'
    -   RANK: run feature ranking, default 'True'
    -   SEED: random state. default 'True'

    **Yields:**

    -   dict: a dictionary with number of features, r2 score, mean square error,
        expalined variance, and array of the indices for features to keep

3.  Feature Rank Function

    `feature_rank_fnc`

    This function ranks features within the feature elimination loop.

    **Args:**

    -   features: A vector of feature names
    -   rank: A vector with feature ranks based on absolute value of feature importance
    -   n_features_to_keep: Number of features to keep. (Int)
    -   fold: Fold to analyzed. (Int)
    -   out_dir: Output directory for text file. Default '.'
    -   RANK: Boolean (True or False)

    **Yields:**

    -   Text file: Ranked features by fold tab-delimited text file, only if RANK=True

4.  N Feature Iterator

    `n_features_iter`

    Determines the features to keep.

    **Args:**

    -   nf: current number of features
    -   keep_rate: percentage of features to keep

    **Yields:**

    -   int: number of features to keep


5.  Calculate feature importance

    `cal_feature_imp`

    Generates feature importance from absolute value of feature weights.

	**Args:**

	-  estimator: the estimator to generate feature importance for

	**Yields:**

	-  estimator: returns the estimator with feature importance


<a id="org3cfdf65"></a>

### Peripheral features functions

1.  Run lowess

    `run_lowess`

    This function runs the lowess function and caches it to memory.

    **Args:**

    -   x: the x-values of the observed points
    -   y: the y-values of the observed points
    -   frac: the fraction of the data used when estimating each y-value. default 3/10

    **Yields:**

    -   z: 2D array of results

2.  Convert array to tuple

    `array_to_tuple`

    This function attempts to convert a numpy array to a tuple.

    **Args:**

    -   np_array: numpy array

    **Yields:**

    -   tuple

3.  Extract dRFE as a dataframe

    `get_elim_df_ordered`

    This function converts the dRFE dictionary to a pandas dataframe.

    **Args:**

    -   d: dRFE dictionary
    -   multi: is this for multiple classes. (True or False)

    **Yields:**

    -   df_elim: dRFE as a dataframe with log10 transformed features

4.  Calculate lowess curve

    `cal_lowess`

    This function calculates the lowess curve.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value
    -   multi: is this for multiple classes. (True or False)

    **Yields:**

    -   x: dRFE log10 transformed features
    -   y: dRFE metrics
    -   z: 2D numpy array with lowess curve
    -   xnew: increased intervals
    -   ynew: interpolated metrics for xnew

5.  Calculate lowess curve for log10

    `cal_lowess`

    This function calculates the rate of change on the lowess fitted curve with
    log10 transformated input.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value
    -   multi: is this for multiple classes. default False

    **Yields:**

    -   data frame: dataframe with n_features, lowess value, and rate of change (DxDy)

6.  Extract max lowess

    `extract_max_lowess`

    This function extracts the max features based on rate of change of log10
    transformed lowess fit curve.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   multi: is this for multiple classes. default False

    **Yields:**

    -   int: number of max features (smallest subset)

7.  Extract peripheral lowess

    `extract_peripheral_lowess`

    This function extracts the peripheral features based on rate of change of log10
    transformed lowess fit curve.

    **Args:**

    -   d: dRFE dictionary
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   step_size: rate of change step size to analyze for extraction. default 0.05
    -   multi: is this for multiple classes. default False

    **Yields:**

    -   int: number of peripheral features

8.  Optimize lowess plot

    `plot_with_lowess_vline`

    Peripheral set selection optimization plot. This will be ROC AUC for multiple
    classification (3+), NMI for binary classification, or R2 for regression. The
    plot returned has fraction and step size as well as lowess smoothed curve and
    indication of predicted peripheral set.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   step_size: rate of change step size to analyze for extraction. default 0.05
    -   classify: is this a classification algorithm. default True
    -   multi: does this have multiple (3+) classes. default True

    **Yields:**

    -   graph: plot of dRFE with estimated peripheral set indicated as well as fraction and set size used. It automatically saves files as pdf, png, and svg

9.  Plot lowess vline

    `plot_with_lowess_vline`

    Plot feature elimination results with the peripheral set indicated. This will be
    ROC AUC for multiple classification (3+), NMI for binary classification, or R2
    for regression.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   frac: the fraction of the data used when estimating each y-value. default 3/10
    -   step_size: rate of change step size to analyze for extraction. default 0.05
    -   classify: is this a classification algorithm. default True
    -   multi: does this have multiple (3+) classes. default True

    **Yields:**

    -   graph: plot of dRFE with estimated peripheral set indicated, automatically saves files as pdf, png, and svg


<a id="org8ecca01"></a>

### Plotting functions

1.  Save plots

    `save_plots`

    This function save plot as svg, png, and pdf with specific label and dimension.

    **Args:**

    -   p: plotnine object
    -   fn: file name without extensions
    -   w: width, default 7
    -   h: height, default 7

    **Yields:** SVG, PNG, and PDF of plotnine object

2.  Plot dRFE Accuracy

    `plot_acc`

    Plot feature elimination results for accuracy.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by accuracy, automatically saves files as pdf, png, and svg

3.  Plot dRFE NMI

    `plot_nmi`

    Plot feature elimination results for normalized mutual information.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by NMI, automatically saves files as pdf, png, and svg

4.  Plot dRFE ROC AUC

    `plot_roc`

    Plot feature elimination results for AUC ROC curve.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by AUC, automatically saves files as pdf, png, and svg

5.  Plot dRFE R2

    `plot_r2`

    Plot feature elimination results for R2 score. Note that this can be negative
    if model is arbitarily worse.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by R2, automatically saves files as pdf, png, and svg

6.  Plot dRFE MSE

    `plot_mse`

    Plot feature elimination results for mean squared error score.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by mean squared error, automatically saves files as pdf, png, and svg

7.  Plot dRFE Explained Variance

    `plot_evar`

    Plot feature elimination results for explained variance score.

    **Args:**

    -   d: feature elimination class dictionary
    -   fold: current fold
    -   out_dir: output directory. default '.'

    **Yields:**

    -   graph: plot of feature by explained variance, automatically saves files as pdf, png, and svg


<a id="org377b1aa"></a>

### Metric functions

1.  OOB Prediction

    `oob_predictions`

    Extracts out-of-bag (OOB) predictions from random forest classifier classes.

    **Args:**

    -   estimator: Random forest classifier object

    **Yields:**

    -   vector: OOB predicted labels

2.  OOB Accuracy Score

    `oob_score_accuracy`

    Calculates the accuracy score from the OOB predictions.

    **Args:**

    -   estimator: Random forest classifier object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: accuracy score

3.  OOB Normalized Mutual Information Score

    `oob_score_nmi`

    Calculates the normalized mutual information score from the OOB predictions.

    **Args:**

    -   estimator: Random forest classifier object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: normalized mutual information score

4.  OOB Area Under ROC Curve Score

    `oob_score_roc`

    Calculates the area under the ROC curve score for the OOB predictions.

    **Args:**

    -   estimator: Random forest classifier object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: AUC ROC score

5.  OOB R2 Score

    `oob_score_r2`

    Calculates the r2 score from the OOB predictions.

    **Args:**

    -   estimator: Random forest regressor object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: r2 score

6.  OOB Mean Squared Error Score

    `oob_score_mse`

    Calculates the mean squared error score from the OOB predictions.

    **Args:**

    -   estimator: Random forest regressor object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: mean squared error score

7.  OOB Explained Variance Score

    `oob_score_evar`

    Calculates the explained variance score for the OOB predictions.

    **Args:**

    -   estimator: Random forest regressor object
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: explained variance score

8.  Developmental Test Set Predictions

    `dev_predictions`

    Extracts predictions using a development fold for linear
    regressor.

    **Args:**

    -   estimator: Linear model regression classifier object
    -   X: a data frame of normalized values from developmental dataset

    **Yields:**

    -   vector: Development set predicted labels

9.  Developmental Test Set R2 Score

    `dev_score_r2`

    Calculates the r2 score from the developmental dataset
    predictions.

    **Args:**

    -   estimator: Linear model regressor object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from developmental dataset

    **Yields:**

    -   float: r2 score

10. Developmental Test Set Mean Squared Error Score

    `dev_score_mse`

    Calculates the mean squared error score from the developmental dataset
    predictions.

    **Args:**

    -   estimator: Linear model regressor object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from developmental dataset

    **Yields:**

    -   float: mean squared error score

11. Developmental Test Set Explained Variance Score

    `dev_score_evar`

    Calculates the explained variance score for the develomental dataset predictions.

    **Args:**

    -   estimator: Linear model regressor object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from developmental data set

    **Yields:**

    -   float: explained variance score

12.  DEV Accuracy Score

    `dev_score_accuracy`

    Calculates the accuracy score from the DEV predictions.

    **Args:**

    -   estimator: Linear model classifier object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: accuracy score

13.  DEV Normalized Mutual Information Score

    `dev_score_nmi`

    Calculates the normalized mutual information score from the DEV predictions.

    **Args:**

    -   estimator: Linear model classifier object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: normalized mutual information score

14.  DEV Area Under ROC Curve Score

    `dev_score_roc`

    Calculates the area under the ROC curve score for the DEV predictions.

    **Args:**

    -   estimator: Linear model classifier object
    -   X: a data frame of normalized values from developmental dataset
    -   Y: a vector of sample labels from training data set

    **Yields:**

    -   float: AUC ROC score


<a id="orgbda21bf"></a>

### Linear model helper functions

1.  dRFE Subfunction

    `regr_fe`

    Iterate over features to by eliminated by step.

    **Args:**

    -   estimator: regressor or classifier linear model object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   n_features_iter: iterator for number of features to keep loop
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   dev_size: developmental test set propotion of training
    -   SEED: random state
    -   RANK: Boolean (True or False)

    **Yields:**

    -   list: a list with number of features, r2 score, mean square error, expalined variance, and array of the indices for features to keep

2.  dRFE Step function

    `regr_fe_step`

    Split training data into developmental dataset and apply estimator
    to developmental dataset, rank features, and conduct feature
    elimination, single steps.

    **Args:**

    -   estimator: regressor or classifier linear model object
    -   X: a data frame of training data
    -   Y: a vector of sample labels from training data set
    -   n_features_to_keep: number of features to keep
    -   features: a vector of feature names
    -   fold: current fold
    -   out_dir: output directory. default '.'
    -   dev_size: developmental test set propotion of training
    -   SEED: random state
    -   RANK: Boolean (True or False)

    **Yields:**

    -   dict: a dictionary with number of features, r2 score, mean square error, expalined variance, and selected features

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LieberInstitute/dRFEtools.git",
    "name": "drfetools",
    "maintainer": "Kynon JM Benjamin",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "kj.benjamin90@gmail.com",
    "keywords": "recursive feature elimination,sklearn,feature ranking",
    "author": "Kynon JM Benjamin",
    "author_email": "kj.benjamin90@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2b/15/0f278e47f1fa3614edf6e2e26c574cb01b354e780578fe1c8e43efdecd36/drfetools-0.3.4.tar.gz",
    "platform": null,
    "description": "# dRFEtools - dynamic Recursive Feature Elimination\n\n`dRFEtools` is a package for dynamic recursive feature elimination with\nsklearn.\n\nAuthors: Apu\u00e3 Paquola, Kynon Jade Benjamin, and Tarun Katipalli\n\nPackage developed in Python 3.8+.\n\nIn addition to scikit-learn, `dRFEtools` is also built with NumPy, SciPy,\nPandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports\nmodels with `coef_` or `feature_importances_` attribute.\n\nThis package has several function to run dynamic recursive feature elimination\n(dRFE) for random forest and linear model classifier and regression models. For\nrandom forest, it assumes Out-of-Bag (OOB) is set to True. For linear models,\nit generates a developmental set. For both classification and regression, three\nmeasurements are calculated for feature selection:\n\nClassification:\n\n1.  Normalized mutual information\n2.  Accuracy\n3.  Area under the curve (AUC) ROC curve\n\nRegression:\n\n1.  R2 (this can be negative if model is arbitrarily worse)\n2.  Explained variance\n3.  Mean squared error\n\nThe package has been split in to four additional scripts for:\n\n1.  Out-of-bag dynamic RFE metrics (AP/KJB)\n2.  Validation set dynamic RFE metrics (KJB)\n3.  Rank features function (TK)\n4.  Lowess core + peripheral selection (KJB)\n\n# Table of Contents\n\n1.  [Citation](#org7b64d47)\n2.  [Installation](#org04443e4)\n3.  [Tutorials](#org07777f88)\n4.  [Reference Manual](#org5afd041)\n    1.  [dRFEtools main functions](#org6171433)\n    2.  [Peripheral features functions](#org3cfdf65)\n    3.  [Plotting functions](#org8ecca01)\n    4.  [Metric functions](#org377b1aa)\n    5.  [Random forest helper functions](#orga29d49b)\n    6.  [Linear model helper functions](#orgbda21bf)\n\n<a id=\"org7b64d47\"></a>\n\n## Citation\n\nIf using please cite the following:\nPre-print DOI: https://doi.org/10.1101/2022.07.27.501227\n[![DOI](https://zenodo.org/badge/402494754.svg)](https://zenodo.org/badge/latestdoi/402494754).\n\n\n<a id=\"org04443e4\"></a>\n\n## Installation\n\n`pip install --user dRFEtools`\n\n<a id=\"org07777f88\"></a>\n## Tutorials\n\nFollow [this](https://github.com/LieberInstitute/dRFEtools_manuscript/blob/main/optimization/_m/optimization.ipynb) jupyter notebook for an example on optimization.\n\nThe GitHub below has example code for sklearn simulation, biological simulation, and using BrainSEQ Phase 1.\n\n[https://github.com/LieberInstitute/dRFEtools_manuscript](https://github.com/LieberInstitute/dRFEtools_manuscript/tree/main)\n\n<a id=\"org5afd041\"></a>\n\n## Reference Manual\n\n<a id=\"org6171433\"></a>\n\n### dRFEtools main functions\n\n1.  dRFE - Random Forest\n\n    `rf_rfe`\n\n    Runs random forest feature elimination step over iterator process.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   elimination_rate: percent rate to reduce feature list. default .2\n    -   RANK: Output feature ranking. default=True (Boolean)\n\n    **Yields:**\n\n    -   dict: a dictionary with number of features, normalized mutual information score, accuracy score, and array of the indexes for features to keep\n\n2.  dRFE - Linear Models\n\n    `dev_rfe`\n\n    Runs recursive feature elimination for linear model step over iterator\n    process assuming developmental set is needed.\n\n    **Args:**\n\n    -   estimator: regressor or classifier linear model object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   elimination_rate: percent rate to reduce feature list. default .2\n    -   dev_size: developmental set size. default '0.20'\n    -   RANK: run feature ranking, default 'True'\n    -   SEED: random state. default 'True'\n\n    **Yields:**\n\n    -   dict: a dictionary with number of features, r2 score, mean square error,\n        expalined variance, and array of the indices for features to keep\n\n3.  Feature Rank Function\n\n    `feature_rank_fnc`\n\n    This function ranks features within the feature elimination loop.\n\n    **Args:**\n\n    -   features: A vector of feature names\n    -   rank: A vector with feature ranks based on absolute value of feature importance\n    -   n_features_to_keep: Number of features to keep. (Int)\n    -   fold: Fold to analyzed. (Int)\n    -   out_dir: Output directory for text file. Default '.'\n    -   RANK: Boolean (True or False)\n\n    **Yields:**\n\n    -   Text file: Ranked features by fold tab-delimited text file, only if RANK=True\n\n4.  N Feature Iterator\n\n    `n_features_iter`\n\n    Determines the features to keep.\n\n    **Args:**\n\n    -   nf: current number of features\n    -   keep_rate: percentage of features to keep\n\n    **Yields:**\n\n    -   int: number of features to keep\n\n\n5.  Calculate feature importance\n\n    `cal_feature_imp`\n\n    Generates feature importance from absolute value of feature weights.\n\n\t**Args:**\n\n\t-  estimator: the estimator to generate feature importance for\n\n\t**Yields:**\n\n\t-  estimator: returns the estimator with feature importance\n\n\n<a id=\"org3cfdf65\"></a>\n\n### Peripheral features functions\n\n1.  Run lowess\n\n    `run_lowess`\n\n    This function runs the lowess function and caches it to memory.\n\n    **Args:**\n\n    -   x: the x-values of the observed points\n    -   y: the y-values of the observed points\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n\n    **Yields:**\n\n    -   z: 2D array of results\n\n2.  Convert array to tuple\n\n    `array_to_tuple`\n\n    This function attempts to convert a numpy array to a tuple.\n\n    **Args:**\n\n    -   np_array: numpy array\n\n    **Yields:**\n\n    -   tuple\n\n3.  Extract dRFE as a dataframe\n\n    `get_elim_df_ordered`\n\n    This function converts the dRFE dictionary to a pandas dataframe.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   multi: is this for multiple classes. (True or False)\n\n    **Yields:**\n\n    -   df_elim: dRFE as a dataframe with log10 transformed features\n\n4.  Calculate lowess curve\n\n    `cal_lowess`\n\n    This function calculates the lowess curve.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value\n    -   multi: is this for multiple classes. (True or False)\n\n    **Yields:**\n\n    -   x: dRFE log10 transformed features\n    -   y: dRFE metrics\n    -   z: 2D numpy array with lowess curve\n    -   xnew: increased intervals\n    -   ynew: interpolated metrics for xnew\n\n5.  Calculate lowess curve for log10\n\n    `cal_lowess`\n\n    This function calculates the rate of change on the lowess fitted curve with\n    log10 transformated input.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value\n    -   multi: is this for multiple classes. default False\n\n    **Yields:**\n\n    -   data frame: dataframe with n_features, lowess value, and rate of change (DxDy)\n\n6.  Extract max lowess\n\n    `extract_max_lowess`\n\n    This function extracts the max features based on rate of change of log10\n    transformed lowess fit curve.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   multi: is this for multiple classes. default False\n\n    **Yields:**\n\n    -   int: number of max features (smallest subset)\n\n7.  Extract peripheral lowess\n\n    `extract_peripheral_lowess`\n\n    This function extracts the peripheral features based on rate of change of log10\n    transformed lowess fit curve.\n\n    **Args:**\n\n    -   d: dRFE dictionary\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   step_size: rate of change step size to analyze for extraction. default 0.05\n    -   multi: is this for multiple classes. default False\n\n    **Yields:**\n\n    -   int: number of peripheral features\n\n8.  Optimize lowess plot\n\n    `plot_with_lowess_vline`\n\n    Peripheral set selection optimization plot. This will be ROC AUC for multiple\n    classification (3+), NMI for binary classification, or R2 for regression. The\n    plot returned has fraction and step size as well as lowess smoothed curve and\n    indication of predicted peripheral set.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   step_size: rate of change step size to analyze for extraction. default 0.05\n    -   classify: is this a classification algorithm. default True\n    -   multi: does this have multiple (3+) classes. default True\n\n    **Yields:**\n\n    -   graph: plot of dRFE with estimated peripheral set indicated as well as fraction and set size used. It automatically saves files as pdf, png, and svg\n\n9.  Plot lowess vline\n\n    `plot_with_lowess_vline`\n\n    Plot feature elimination results with the peripheral set indicated. This will be\n    ROC AUC for multiple classification (3+), NMI for binary classification, or R2\n    for regression.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   frac: the fraction of the data used when estimating each y-value. default 3/10\n    -   step_size: rate of change step size to analyze for extraction. default 0.05\n    -   classify: is this a classification algorithm. default True\n    -   multi: does this have multiple (3+) classes. default True\n\n    **Yields:**\n\n    -   graph: plot of dRFE with estimated peripheral set indicated, automatically saves files as pdf, png, and svg\n\n\n<a id=\"org8ecca01\"></a>\n\n### Plotting functions\n\n1.  Save plots\n\n    `save_plots`\n\n    This function save plot as svg, png, and pdf with specific label and dimension.\n\n    **Args:**\n\n    -   p: plotnine object\n    -   fn: file name without extensions\n    -   w: width, default 7\n    -   h: height, default 7\n\n    **Yields:** SVG, PNG, and PDF of plotnine object\n\n2.  Plot dRFE Accuracy\n\n    `plot_acc`\n\n    Plot feature elimination results for accuracy.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by accuracy, automatically saves files as pdf, png, and svg\n\n3.  Plot dRFE NMI\n\n    `plot_nmi`\n\n    Plot feature elimination results for normalized mutual information.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by NMI, automatically saves files as pdf, png, and svg\n\n4.  Plot dRFE ROC AUC\n\n    `plot_roc`\n\n    Plot feature elimination results for AUC ROC curve.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by AUC, automatically saves files as pdf, png, and svg\n\n5.  Plot dRFE R2\n\n    `plot_r2`\n\n    Plot feature elimination results for R2 score. Note that this can be negative\n    if model is arbitarily worse.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by R2, automatically saves files as pdf, png, and svg\n\n6.  Plot dRFE MSE\n\n    `plot_mse`\n\n    Plot feature elimination results for mean squared error score.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by mean squared error, automatically saves files as pdf, png, and svg\n\n7.  Plot dRFE Explained Variance\n\n    `plot_evar`\n\n    Plot feature elimination results for explained variance score.\n\n    **Args:**\n\n    -   d: feature elimination class dictionary\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n\n    **Yields:**\n\n    -   graph: plot of feature by explained variance, automatically saves files as pdf, png, and svg\n\n\n<a id=\"org377b1aa\"></a>\n\n### Metric functions\n\n1.  OOB Prediction\n\n    `oob_predictions`\n\n    Extracts out-of-bag (OOB) predictions from random forest classifier classes.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n\n    **Yields:**\n\n    -   vector: OOB predicted labels\n\n2.  OOB Accuracy Score\n\n    `oob_score_accuracy`\n\n    Calculates the accuracy score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: accuracy score\n\n3.  OOB Normalized Mutual Information Score\n\n    `oob_score_nmi`\n\n    Calculates the normalized mutual information score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: normalized mutual information score\n\n4.  OOB Area Under ROC Curve Score\n\n    `oob_score_roc`\n\n    Calculates the area under the ROC curve score for the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest classifier object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: AUC ROC score\n\n5.  OOB R2 Score\n\n    `oob_score_r2`\n\n    Calculates the r2 score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest regressor object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: r2 score\n\n6.  OOB Mean Squared Error Score\n\n    `oob_score_mse`\n\n    Calculates the mean squared error score from the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest regressor object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: mean squared error score\n\n7.  OOB Explained Variance Score\n\n    `oob_score_evar`\n\n    Calculates the explained variance score for the OOB predictions.\n\n    **Args:**\n\n    -   estimator: Random forest regressor object\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: explained variance score\n\n8.  Developmental Test Set Predictions\n\n    `dev_predictions`\n\n    Extracts predictions using a development fold for linear\n    regressor.\n\n    **Args:**\n\n    -   estimator: Linear model regression classifier object\n    -   X: a data frame of normalized values from developmental dataset\n\n    **Yields:**\n\n    -   vector: Development set predicted labels\n\n9.  Developmental Test Set R2 Score\n\n    `dev_score_r2`\n\n    Calculates the r2 score from the developmental dataset\n    predictions.\n\n    **Args:**\n\n    -   estimator: Linear model regressor object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from developmental dataset\n\n    **Yields:**\n\n    -   float: r2 score\n\n10. Developmental Test Set Mean Squared Error Score\n\n    `dev_score_mse`\n\n    Calculates the mean squared error score from the developmental dataset\n    predictions.\n\n    **Args:**\n\n    -   estimator: Linear model regressor object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from developmental dataset\n\n    **Yields:**\n\n    -   float: mean squared error score\n\n11. Developmental Test Set Explained Variance Score\n\n    `dev_score_evar`\n\n    Calculates the explained variance score for the develomental dataset predictions.\n\n    **Args:**\n\n    -   estimator: Linear model regressor object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from developmental data set\n\n    **Yields:**\n\n    -   float: explained variance score\n\n12.  DEV Accuracy Score\n\n    `dev_score_accuracy`\n\n    Calculates the accuracy score from the DEV predictions.\n\n    **Args:**\n\n    -   estimator: Linear model classifier object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: accuracy score\n\n13.  DEV Normalized Mutual Information Score\n\n    `dev_score_nmi`\n\n    Calculates the normalized mutual information score from the DEV predictions.\n\n    **Args:**\n\n    -   estimator: Linear model classifier object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: normalized mutual information score\n\n14.  DEV Area Under ROC Curve Score\n\n    `dev_score_roc`\n\n    Calculates the area under the ROC curve score for the DEV predictions.\n\n    **Args:**\n\n    -   estimator: Linear model classifier object\n    -   X: a data frame of normalized values from developmental dataset\n    -   Y: a vector of sample labels from training data set\n\n    **Yields:**\n\n    -   float: AUC ROC score\n\n\n<a id=\"orgbda21bf\"></a>\n\n### Linear model helper functions\n\n1.  dRFE Subfunction\n\n    `regr_fe`\n\n    Iterate over features to by eliminated by step.\n\n    **Args:**\n\n    -   estimator: regressor or classifier linear model object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   n_features_iter: iterator for number of features to keep loop\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   dev_size: developmental test set propotion of training\n    -   SEED: random state\n    -   RANK: Boolean (True or False)\n\n    **Yields:**\n\n    -   list: a list with number of features, r2 score, mean square error, expalined variance, and array of the indices for features to keep\n\n2.  dRFE Step function\n\n    `regr_fe_step`\n\n    Split training data into developmental dataset and apply estimator\n    to developmental dataset, rank features, and conduct feature\n    elimination, single steps.\n\n    **Args:**\n\n    -   estimator: regressor or classifier linear model object\n    -   X: a data frame of training data\n    -   Y: a vector of sample labels from training data set\n    -   n_features_to_keep: number of features to keep\n    -   features: a vector of feature names\n    -   fold: current fold\n    -   out_dir: output directory. default '.'\n    -   dev_size: developmental test set propotion of training\n    -   SEED: random state\n    -   RANK: Boolean (True or False)\n\n    **Yields:**\n\n    -   dict: a dictionary with number of features, r2 score, mean square error, expalined variance, and selected features\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-only",
    "summary": "A package for preforming dynamic recursive feature elimination with sklearn.",
    "version": "0.3.4",
    "project_urls": {
        "Homepage": "https://github.com/LieberInstitute/dRFEtools.git",
        "Repository": "https://github.com/LieberInstitute/dRFEtools.git"
    },
    "split_keywords": [
        "recursive feature elimination",
        "sklearn",
        "feature ranking"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8bfdcbc6481c1bfd0a73b4da7289bf77e8201f400074c61a5e971ecce516b7ce",
                "md5": "a261d6199899533daca125bb5792febb",
                "sha256": "d02e51b78bb587dfaad4c77637fc21b1feed72a9373029f85aa61671a098a729"
            },
            "downloads": -1,
            "filename": "drfetools-0.3.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a261d6199899533daca125bb5792febb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 27794,
            "upload_time": "2023-06-28T12:29:11",
            "upload_time_iso_8601": "2023-06-28T12:29:11.288320Z",
            "url": "https://files.pythonhosted.org/packages/8b/fd/cbc6481c1bfd0a73b4da7289bf77e8201f400074c61a5e971ecce516b7ce/drfetools-0.3.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b150f278e47f1fa3614edf6e2e26c574cb01b354e780578fe1c8e43efdecd36",
                "md5": "afc8bb943504aeb49e7e8c15ad1b47b3",
                "sha256": "caf6030617e9e39ab2691bc28dd762f7a4957eef161f0340c05ca59574e4afd3"
            },
            "downloads": -1,
            "filename": "drfetools-0.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "afc8bb943504aeb49e7e8c15ad1b47b3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 26654,
            "upload_time": "2023-06-28T12:29:12",
            "upload_time_iso_8601": "2023-06-28T12:29:12.728043Z",
            "url": "https://files.pythonhosted.org/packages/2b/15/0f278e47f1fa3614edf6e2e26c574cb01b354e780578fe1c8e43efdecd36/drfetools-0.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-28 12:29:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LieberInstitute",
    "github_project": "dRFEtools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "drfetools"
}
        
Elapsed time: 0.08955s