# dRFEtools - dynamic Recursive Feature Elimination
`dRFEtools` is a package for dynamic recursive feature elimination with
sklearn.
Authors: Apuã Paquola, Kynon Jade Benjamin, and Tarun Katipalli
Package developed in Python 3.8+.
In addition to scikit-learn, `dRFEtools` is also built with NumPy, SciPy,
Pandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports
models with `coef_` or `feature_importances_` attribute.
This package has several function to run dynamic recursive feature elimination
(dRFE) for random forest and linear model classifier and regression models. For
random forest, it assumes Out-of-Bag (OOB) is set to True. For linear models,
it generates a developmental set. For both classification and regression, three
measurements are calculated for feature selection:
Classification:
1. Normalized mutual information
2. Accuracy
3. Area under the curve (AUC) ROC curve
Regression:
1. R2 (this can be negative if model is arbitrarily worse)
2. Explained variance
3. Mean squared error
The package has been split in to four additional scripts for:
1. Out-of-bag dynamic RFE metrics (AP/KJB)
2. Validation set dynamic RFE metrics (KJB)
3. Rank features function (TK)
4. Lowess core + peripheral selection (KJB)
# Table of Contents
1. [Citation](#org7b64d47)
2. [Installation](#org04443e4)
3. [Tutorials](#org07777f88)
4. [Reference Manual](#org5afd041)
1. [dRFEtools main functions](#org6171433)
2. [Peripheral features functions](#org3cfdf65)
3. [Plotting functions](#org8ecca01)
4. [Metric functions](#org377b1aa)
5. [Random forest helper functions](#orga29d49b)
6. [Linear model helper functions](#orgbda21bf)
<a id="org7b64d47"></a>
## Citation
If using please cite the following:
Kynon J M Benjamin, Tarun Katipalli, Apuã C M Paquola,
dRFEtools: dynamic recursive feature elimination for omics,
Bioinformatics, Volume 39, Issue 8, August 2023, btad513,
https://doi.org/10.1093/bioinformatics/btad513
PMID: 37632789
DOI: [10.1093/bioinformatics/btad513](10.1093/bioinformatics/btad513).
<a id="org04443e4"></a>
## Installation
`pip install --user dRFEtools`
<a id="org07777f88"></a>
## Tutorials
We have two tutorials for [optimization](./examples/optimization.md)
(version 0.2) and [classification](./examples/classification.md) (version 0.3+).
In addition to this, we have example code used in the manuscript for
scikit-learn simulation, biological simulation, and BrainSEQ Phase 1
at the link below.
[https://github.com/LieberInstitute/dRFEtools_manuscript](https://github.com/LieberInstitute/dRFEtools_manuscript/tree/main)
<a id="org5afd041"></a>
## Reference Manual
<a id="org6171433"></a>
### dRFEtools main functions
1. dRFE - Random Forest
`rf_rfe`
Runs random forest feature elimination step over iterator process.
**Args:**
- estimator: Random forest classifier object
- X: a data frame of training data
- Y: a vector of sample labels from training data set
- features: a vector of feature names
- fold: current fold
- out_dir: output directory. default '.'
- elimination_rate: percent rate to reduce feature list. default .2
- RANK: Output feature ranking. default=True (Boolean)
**Yields:**
- dict: a dictionary with number of features, normalized mutual information score, accuracy score, and array of the indexes for features to keep
2. dRFE - Linear Models
`dev_rfe`
Runs recursive feature elimination for linear model step over iterator
process assuming developmental set is needed.
**Args:**
- estimator: regressor or classifier linear model object
- X: a data frame of training data
- Y: a vector of sample labels from training data set
- features: a vector of feature names
- fold: current fold
- out_dir: output directory. default '.'
- elimination_rate: percent rate to reduce feature list. default .2
- dev_size: developmental set size. default '0.20'
- RANK: run feature ranking, default 'True'
- SEED: random state. default 'True'
**Yields:**
- dict: a dictionary with number of features, r2 score, mean square error,
expalined variance, and array of the indices for features to keep
3. Feature Rank Function
`feature_rank_fnc`
This function ranks features within the feature elimination loop.
**Args:**
- features: A vector of feature names
- rank: A vector with feature ranks based on absolute value of feature importance
- n_features_to_keep: Number of features to keep. (Int)
- fold: Fold to analyzed. (Int)
- out_dir: Output directory for text file. Default '.'
- RANK: Boolean (True or False)
**Yields:**
- Text file: Ranked features by fold tab-delimited text file, only if RANK=True
4. N Feature Iterator
`n_features_iter`
Determines the features to keep.
**Args:**
- nf: current number of features
- keep_rate: percentage of features to keep
**Yields:**
- int: number of features to keep
5. Extract feature importances
`_get_feature_importances`
Generates feature importance from absolute value of feature weights.
**Args:**
- estimator: the estimator to generate feature importance for
**Yields:**
- numpy array: returns feature importances as a NumPy array
<a id="org3cfdf65"></a>
### Peripheral features functions
1. Run lowess
`run_lowess`
This function runs the lowess function and caches it to memory.
**Args:**
- x: the x-values of the observed points
- y: the y-values of the observed points
- frac: the fraction of the data used when estimating each y-value. default 3/10
**Yields:**
- z: 2D array of results
2. Convert array to tuple
`array_to_tuple`
This function attempts to convert a numpy array to a tuple.
**Args:**
- np_array: numpy array
**Yields:**
- tuple
3. Extract dRFE as a dataframe
`get_elim_df_ordered`
This function converts the dRFE dictionary to a pandas dataframe.
**Args:**
- d: dRFE dictionary
- multi: is this for multiple classes. (True or False)
**Yields:**
- df_elim: dRFE as a dataframe with log10 transformed features
4. Calculate lowess curve
`cal_lowess`
This function calculates the lowess curve.
**Args:**
- d: dRFE dictionary
- frac: the fraction of the data used when estimating each y-value
- multi: is this for multiple classes. (True or False)
**Yields:**
- x: dRFE log10 transformed features
- y: dRFE metrics
- z: 2D numpy array with lowess curve
- xnew: increased intervals
- ynew: interpolated metrics for xnew
5. Calculate lowess curve for log10
`cal_lowess`
This function calculates the rate of change on the lowess fitted curve with
log10 transformated input.
**Args:**
- d: dRFE dictionary
- frac: the fraction of the data used when estimating each y-value
- multi: is this for multiple classes. default False
**Yields:**
- data frame: dataframe with n_features, lowess value, and rate of change (DxDy)
6. Extract max lowess
`extract_max_lowess`
This function extracts the max features based on rate of change of log10
transformed lowess fit curve.
**Args:**
- d: dRFE dictionary
- frac: the fraction of the data used when estimating each y-value. default 3/10
- multi: is this for multiple classes. default False
**Yields:**
- int: number of max features (smallest subset)
7. Extract peripheral lowess
`extract_peripheral_lowess`
This function extracts the peripheral features based on rate of change of log10
transformed lowess fit curve.
**Args:**
- d: dRFE dictionary
- frac: the fraction of the data used when estimating each y-value. default 3/10
- step_size: rate of change step size to analyze for extraction. default 0.05
- multi: is this for multiple classes. default False
**Yields:**
- int: number of peripheral features
8. Optimize lowess plot
`plot_with_lowess_vline`
Peripheral set selection optimization plot. This will be ROC AUC for multiple
classification (3+), NMI for binary classification, or R2 for regression. The
plot returned has fraction and step size as well as lowess smoothed curve and
indication of predicted peripheral set.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
- frac: the fraction of the data used when estimating each y-value. default 3/10
- step_size: rate of change step size to analyze for extraction. default 0.05
- classify: is this a classification algorithm. default True
- multi: does this have multiple (3+) classes. default True
**Yields:**
- graph: plot of dRFE with estimated peripheral set indicated as well as fraction and set size used. It automatically saves files as pdf, png, and svg
9. Plot lowess vline
`plot_with_lowess_vline`
Plot feature elimination results with the peripheral set indicated. This will be
ROC AUC for multiple classification (3+), NMI for binary classification, or R2
for regression.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
- frac: the fraction of the data used when estimating each y-value. default 3/10
- step_size: rate of change step size to analyze for extraction. default 0.05
- classify: is this a classification algorithm. default True
- multi: does this have multiple (3+) classes. default True
**Yields:**
- graph: plot of dRFE with estimated peripheral set indicated, automatically saves files as pdf, png, and svg
<a id="org8ecca01"></a>
### Plotting functions
1. Save plots
`save_plots`
This function save plot as svg, png, and pdf with specific label and dimension.
**Args:**
- p: plotnine object
- fn: file name without extensions
- w: width, default 7
- h: height, default 7
**Yields:** SVG, PNG, and PDF of plotnine object
2. Plot dRFE Accuracy
`plot_acc`
Plot feature elimination results for accuracy.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
**Yields:**
- graph: plot of feature by accuracy, automatically saves files as pdf, png, and svg
3. Plot dRFE NMI
`plot_nmi`
Plot feature elimination results for normalized mutual information.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
**Yields:**
- graph: plot of feature by NMI, automatically saves files as pdf, png, and svg
4. Plot dRFE ROC AUC
`plot_roc`
Plot feature elimination results for AUC ROC curve.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
**Yields:**
- graph: plot of feature by AUC, automatically saves files as pdf, png, and svg
5. Plot dRFE R2
`plot_r2`
Plot feature elimination results for R2 score. Note that this can be negative
if model is arbitarily worse.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
**Yields:**
- graph: plot of feature by R2, automatically saves files as pdf, png, and svg
6. Plot dRFE MSE
`plot_mse`
Plot feature elimination results for mean squared error score.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
**Yields:**
- graph: plot of feature by mean squared error, automatically saves files as pdf, png, and svg
7. Plot dRFE Explained Variance
`plot_evar`
Plot feature elimination results for explained variance score.
**Args:**
- d: feature elimination class dictionary
- fold: current fold
- out_dir: output directory. default '.'
**Yields:**
- graph: plot of feature by explained variance, automatically saves files as pdf, png, and svg
<a id="org377b1aa"></a>
### Metric functions
1. OOB Prediction
`oob_predictions`
Extracts out-of-bag (OOB) predictions from random forest classifier classes.
**Args:**
- estimator: Random forest classifier object
**Yields:**
- vector: OOB predicted labels
2. OOB Accuracy Score
`oob_score_accuracy`
Calculates the accuracy score from the OOB predictions.
**Args:**
- estimator: Random forest classifier object
- Y: a vector of sample labels from training data set
**Yields:**
- float: accuracy score
3. OOB Normalized Mutual Information Score
`oob_score_nmi`
Calculates the normalized mutual information score from the OOB predictions.
**Args:**
- estimator: Random forest classifier object
- Y: a vector of sample labels from training data set
**Yields:**
- float: normalized mutual information score
4. OOB Area Under ROC Curve Score
`oob_score_roc`
Calculates the area under the ROC curve score for the OOB predictions.
**Args:**
- estimator: Random forest classifier object
- Y: a vector of sample labels from training data set
**Yields:**
- float: AUC ROC score
5. OOB R2 Score
`oob_score_r2`
Calculates the r2 score from the OOB predictions.
**Args:**
- estimator: Random forest regressor object
- Y: a vector of sample labels from training data set
**Yields:**
- float: r2 score
6. OOB Mean Squared Error Score
`oob_score_mse`
Calculates the mean squared error score from the OOB predictions.
**Args:**
- estimator: Random forest regressor object
- Y: a vector of sample labels from training data set
**Yields:**
- float: mean squared error score
7. OOB Explained Variance Score
`oob_score_evar`
Calculates the explained variance score for the OOB predictions.
**Args:**
- estimator: Random forest regressor object
- Y: a vector of sample labels from training data set
**Yields:**
- float: explained variance score
8. Developmental Test Set Predictions
`dev_predictions`
Extracts predictions using a development fold for linear
regressor.
**Args:**
- estimator: Linear model regression classifier object
- X: a data frame of normalized values from developmental dataset
**Yields:**
- vector: Development set predicted labels
9. Developmental Test Set R2 Score
`dev_score_r2`
Calculates the r2 score from the developmental dataset
predictions.
**Args:**
- estimator: Linear model regressor object
- X: a data frame of normalized values from developmental dataset
- Y: a vector of sample labels from developmental dataset
**Yields:**
- float: r2 score
10. Developmental Test Set Mean Squared Error Score
`dev_score_mse`
Calculates the mean squared error score from the developmental dataset
predictions.
**Args:**
- estimator: Linear model regressor object
- X: a data frame of normalized values from developmental dataset
- Y: a vector of sample labels from developmental dataset
**Yields:**
- float: mean squared error score
11. Developmental Test Set Explained Variance Score
`dev_score_evar`
Calculates the explained variance score for the develomental dataset predictions.
**Args:**
- estimator: Linear model regressor object
- X: a data frame of normalized values from developmental dataset
- Y: a vector of sample labels from developmental data set
**Yields:**
- float: explained variance score
12. DEV Accuracy Score
`dev_score_accuracy`
Calculates the accuracy score from the DEV predictions.
**Args:**
- estimator: Linear model classifier object
- X: a data frame of normalized values from developmental dataset
- Y: a vector of sample labels from training data set
**Yields:**
- float: accuracy score
13. DEV Normalized Mutual Information Score
`dev_score_nmi`
Calculates the normalized mutual information score from the DEV predictions.
**Args:**
- estimator: Linear model classifier object
- X: a data frame of normalized values from developmental dataset
- Y: a vector of sample labels from training data set
**Yields:**
- float: normalized mutual information score
14. DEV Area Under ROC Curve Score
`dev_score_roc`
Calculates the area under the ROC curve score for the DEV predictions.
**Args:**
- estimator: Linear model classifier object
- X: a data frame of normalized values from developmental dataset
- Y: a vector of sample labels from training data set
**Yields:**
- float: AUC ROC score
<a id="orgbda21bf"></a>
### Linear model helper functions
1. dRFE Subfunction
`regr_fe`
Iterate over features to by eliminated by step.
**Args:**
- estimator: regressor or classifier linear model object
- X: a data frame of training data
- Y: a vector of sample labels from training data set
- n_features_iter: iterator for number of features to keep loop
- features: a vector of feature names
- fold: current fold
- out_dir: output directory. default '.'
- dev_size: developmental test set propotion of training
- SEED: random state
- RANK: Boolean (True or False)
**Yields:**
- list: a list with number of features, r2 score, mean square error, expalined variance, and array of the indices for features to keep
2. dRFE Step function
`regr_fe_step`
Split training data into developmental dataset and apply estimator
to developmental dataset, rank features, and conduct feature
elimination, single steps.
**Args:**
- estimator: regressor or classifier linear model object
- X: a data frame of training data
- Y: a vector of sample labels from training data set
- n_features_to_keep: number of features to keep
- features: a vector of feature names
- fold: current fold
- out_dir: output directory. default '.'
- dev_size: developmental test set propotion of training
- SEED: random state
- RANK: Boolean (True or False)
**Yields:**
- dict: a dictionary with number of features, r2 score, mean square error, expalined variance, and selected features
Raw data
{
"_id": null,
"home_page": "https://github.com/LieberInstitute/dRFEtools.git",
"name": "dRFEtools",
"maintainer": "Kynon JM Benjamin",
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": "kj.benjamin90@gmail.com",
"keywords": "recursive feature elimination, sklearn, feature ranking",
"author": "Kynon JM Benjamin",
"author_email": "kj.benjamin90@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ec/23/a3cb7270abc1c4e827eddc1276682f2151380c0eb3852748eb0636c8618a/drfetools-0.3.5.tar.gz",
"platform": null,
"description": "# dRFEtools - dynamic Recursive Feature Elimination\n\n`dRFEtools` is a package for dynamic recursive feature elimination with\nsklearn.\n\nAuthors: Apu\u00e3 Paquola, Kynon Jade Benjamin, and Tarun Katipalli\n\nPackage developed in Python 3.8+.\n\nIn addition to scikit-learn, `dRFEtools` is also built with NumPy, SciPy,\nPandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports\nmodels with `coef_` or `feature_importances_` attribute.\n\nThis package has several function to run dynamic recursive feature elimination\n(dRFE) for random forest and linear model classifier and regression models. For\nrandom forest, it assumes Out-of-Bag (OOB) is set to True. For linear models,\nit generates a developmental set. For both classification and regression, three\nmeasurements are calculated for feature selection:\n\nClassification:\n\n1. Normalized mutual information\n2. Accuracy\n3. Area under the curve (AUC) ROC curve\n\nRegression:\n\n1. R2 (this can be negative if model is arbitrarily worse)\n2. Explained variance\n3. Mean squared error\n\nThe package has been split in to four additional scripts for:\n\n1. Out-of-bag dynamic RFE metrics (AP/KJB)\n2. Validation set dynamic RFE metrics (KJB)\n3. Rank features function (TK)\n4. Lowess core + peripheral selection (KJB)\n\n# Table of Contents\n\n1. [Citation](#org7b64d47)\n2. [Installation](#org04443e4)\n3. [Tutorials](#org07777f88)\n4. [Reference Manual](#org5afd041)\n 1. [dRFEtools main functions](#org6171433)\n 2. [Peripheral features functions](#org3cfdf65)\n 3. [Plotting functions](#org8ecca01)\n 4. [Metric functions](#org377b1aa)\n 5. [Random forest helper functions](#orga29d49b)\n 6. [Linear model helper functions](#orgbda21bf)\n\n<a id=\"org7b64d47\"></a>\n\n## Citation\n\nIf using please cite the following:\n\nKynon J M Benjamin, Tarun Katipalli, Apu\u00e3 C M Paquola, \ndRFEtools: dynamic recursive feature elimination for omics, \nBioinformatics, Volume 39, Issue 8, August 2023, btad513, \nhttps://doi.org/10.1093/bioinformatics/btad513\n\nPMID: 37632789\n\nDOI: [10.1093/bioinformatics/btad513](10.1093/bioinformatics/btad513).\n\n\n<a id=\"org04443e4\"></a>\n\n## Installation\n\n`pip install --user dRFEtools`\n\n<a id=\"org07777f88\"></a>\n## Tutorials\n\nWe have two tutorials for [optimization](./examples/optimization.md)\n(version 0.2) and [classification](./examples/classification.md) (version 0.3+).\n\nIn addition to this, we have example code used in the manuscript for\nscikit-learn simulation, biological simulation, and BrainSEQ Phase 1\nat the link below.\n\n[https://github.com/LieberInstitute/dRFEtools_manuscript](https://github.com/LieberInstitute/dRFEtools_manuscript/tree/main)\n\n<a id=\"org5afd041\"></a>\n\n## Reference Manual\n\n<a id=\"org6171433\"></a>\n\n### dRFEtools main functions\n\n1. dRFE - Random Forest\n\n `rf_rfe`\n\n Runs random forest feature elimination step over iterator process.\n\n **Args:**\n\n - estimator: Random forest classifier object\n - X: a data frame of training data\n - Y: a vector of sample labels from training data set\n - features: a vector of feature names\n - fold: current fold\n - out_dir: output directory. default '.'\n - elimination_rate: percent rate to reduce feature list. default .2\n - RANK: Output feature ranking. default=True (Boolean)\n\n **Yields:**\n\n - dict: a dictionary with number of features, normalized mutual information score, accuracy score, and array of the indexes for features to keep\n\n2. dRFE - Linear Models\n\n `dev_rfe`\n\n Runs recursive feature elimination for linear model step over iterator\n process assuming developmental set is needed.\n\n **Args:**\n\n - estimator: regressor or classifier linear model object\n - X: a data frame of training data\n - Y: a vector of sample labels from training data set\n - features: a vector of feature names\n - fold: current fold\n - out_dir: output directory. default '.'\n - elimination_rate: percent rate to reduce feature list. default .2\n - dev_size: developmental set size. default '0.20'\n - RANK: run feature ranking, default 'True'\n - SEED: random state. default 'True'\n\n **Yields:**\n\n - dict: a dictionary with number of features, r2 score, mean square error,\n expalined variance, and array of the indices for features to keep\n\n3. Feature Rank Function\n\n `feature_rank_fnc`\n\n This function ranks features within the feature elimination loop.\n\n **Args:**\n\n - features: A vector of feature names\n - rank: A vector with feature ranks based on absolute value of feature importance\n - n_features_to_keep: Number of features to keep. (Int)\n - fold: Fold to analyzed. (Int)\n - out_dir: Output directory for text file. Default '.'\n - RANK: Boolean (True or False)\n\n **Yields:**\n\n - Text file: Ranked features by fold tab-delimited text file, only if RANK=True\n\n4. N Feature Iterator\n\n `n_features_iter`\n\n Determines the features to keep.\n\n **Args:**\n\n - nf: current number of features\n - keep_rate: percentage of features to keep\n\n **Yields:**\n\n - int: number of features to keep\n\n\n5. Extract feature importances\n\n `_get_feature_importances`\n\n Generates feature importance from absolute value of feature weights.\n\n\t**Args:**\n\n\t- estimator: the estimator to generate feature importance for\n\n\t**Yields:**\n\n\t- numpy array: returns feature importances as a NumPy array\n\n\n<a id=\"org3cfdf65\"></a>\n\n### Peripheral features functions\n\n1. Run lowess\n\n `run_lowess`\n\n This function runs the lowess function and caches it to memory.\n\n **Args:**\n\n - x: the x-values of the observed points\n - y: the y-values of the observed points\n - frac: the fraction of the data used when estimating each y-value. default 3/10\n\n **Yields:**\n\n - z: 2D array of results\n\n2. Convert array to tuple\n\n `array_to_tuple`\n\n This function attempts to convert a numpy array to a tuple.\n\n **Args:**\n\n - np_array: numpy array\n\n **Yields:**\n\n - tuple\n\n3. Extract dRFE as a dataframe\n\n `get_elim_df_ordered`\n\n This function converts the dRFE dictionary to a pandas dataframe.\n\n **Args:**\n\n - d: dRFE dictionary\n - multi: is this for multiple classes. (True or False)\n\n **Yields:**\n\n - df_elim: dRFE as a dataframe with log10 transformed features\n\n4. Calculate lowess curve\n\n `cal_lowess`\n\n This function calculates the lowess curve.\n\n **Args:**\n\n - d: dRFE dictionary\n - frac: the fraction of the data used when estimating each y-value\n - multi: is this for multiple classes. (True or False)\n\n **Yields:**\n\n - x: dRFE log10 transformed features\n - y: dRFE metrics\n - z: 2D numpy array with lowess curve\n - xnew: increased intervals\n - ynew: interpolated metrics for xnew\n\n5. Calculate lowess curve for log10\n\n `cal_lowess`\n\n This function calculates the rate of change on the lowess fitted curve with\n log10 transformated input.\n\n **Args:**\n\n - d: dRFE dictionary\n - frac: the fraction of the data used when estimating each y-value\n - multi: is this for multiple classes. default False\n\n **Yields:**\n\n - data frame: dataframe with n_features, lowess value, and rate of change (DxDy)\n\n6. Extract max lowess\n\n `extract_max_lowess`\n\n This function extracts the max features based on rate of change of log10\n transformed lowess fit curve.\n\n **Args:**\n\n - d: dRFE dictionary\n - frac: the fraction of the data used when estimating each y-value. default 3/10\n - multi: is this for multiple classes. default False\n\n **Yields:**\n\n - int: number of max features (smallest subset)\n\n7. Extract peripheral lowess\n\n `extract_peripheral_lowess`\n\n This function extracts the peripheral features based on rate of change of log10\n transformed lowess fit curve.\n\n **Args:**\n\n - d: dRFE dictionary\n - frac: the fraction of the data used when estimating each y-value. default 3/10\n - step_size: rate of change step size to analyze for extraction. default 0.05\n - multi: is this for multiple classes. default False\n\n **Yields:**\n\n - int: number of peripheral features\n\n8. Optimize lowess plot\n\n `plot_with_lowess_vline`\n\n Peripheral set selection optimization plot. This will be ROC AUC for multiple\n classification (3+), NMI for binary classification, or R2 for regression. The\n plot returned has fraction and step size as well as lowess smoothed curve and\n indication of predicted peripheral set.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n - frac: the fraction of the data used when estimating each y-value. default 3/10\n - step_size: rate of change step size to analyze for extraction. default 0.05\n - classify: is this a classification algorithm. default True\n - multi: does this have multiple (3+) classes. default True\n\n **Yields:**\n\n - graph: plot of dRFE with estimated peripheral set indicated as well as fraction and set size used. It automatically saves files as pdf, png, and svg\n\n9. Plot lowess vline\n\n `plot_with_lowess_vline`\n\n Plot feature elimination results with the peripheral set indicated. This will be\n ROC AUC for multiple classification (3+), NMI for binary classification, or R2\n for regression.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n - frac: the fraction of the data used when estimating each y-value. default 3/10\n - step_size: rate of change step size to analyze for extraction. default 0.05\n - classify: is this a classification algorithm. default True\n - multi: does this have multiple (3+) classes. default True\n\n **Yields:**\n\n - graph: plot of dRFE with estimated peripheral set indicated, automatically saves files as pdf, png, and svg\n\n\n<a id=\"org8ecca01\"></a>\n\n### Plotting functions\n\n1. Save plots\n\n `save_plots`\n\n This function save plot as svg, png, and pdf with specific label and dimension.\n\n **Args:**\n\n - p: plotnine object\n - fn: file name without extensions\n - w: width, default 7\n - h: height, default 7\n\n **Yields:** SVG, PNG, and PDF of plotnine object\n\n2. Plot dRFE Accuracy\n\n `plot_acc`\n\n Plot feature elimination results for accuracy.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n\n **Yields:**\n\n - graph: plot of feature by accuracy, automatically saves files as pdf, png, and svg\n\n3. Plot dRFE NMI\n\n `plot_nmi`\n\n Plot feature elimination results for normalized mutual information.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n\n **Yields:**\n\n - graph: plot of feature by NMI, automatically saves files as pdf, png, and svg\n\n4. Plot dRFE ROC AUC\n\n `plot_roc`\n\n Plot feature elimination results for AUC ROC curve.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n\n **Yields:**\n\n - graph: plot of feature by AUC, automatically saves files as pdf, png, and svg\n\n5. Plot dRFE R2\n\n `plot_r2`\n\n Plot feature elimination results for R2 score. Note that this can be negative\n if model is arbitarily worse.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n\n **Yields:**\n\n - graph: plot of feature by R2, automatically saves files as pdf, png, and svg\n\n6. Plot dRFE MSE\n\n `plot_mse`\n\n Plot feature elimination results for mean squared error score.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n\n **Yields:**\n\n - graph: plot of feature by mean squared error, automatically saves files as pdf, png, and svg\n\n7. Plot dRFE Explained Variance\n\n `plot_evar`\n\n Plot feature elimination results for explained variance score.\n\n **Args:**\n\n - d: feature elimination class dictionary\n - fold: current fold\n - out_dir: output directory. default '.'\n\n **Yields:**\n\n - graph: plot of feature by explained variance, automatically saves files as pdf, png, and svg\n\n\n<a id=\"org377b1aa\"></a>\n\n### Metric functions\n\n1. OOB Prediction\n\n `oob_predictions`\n\n Extracts out-of-bag (OOB) predictions from random forest classifier classes.\n\n **Args:**\n\n - estimator: Random forest classifier object\n\n **Yields:**\n\n - vector: OOB predicted labels\n\n2. OOB Accuracy Score\n\n `oob_score_accuracy`\n\n Calculates the accuracy score from the OOB predictions.\n\n **Args:**\n\n - estimator: Random forest classifier object\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: accuracy score\n\n3. OOB Normalized Mutual Information Score\n\n `oob_score_nmi`\n\n Calculates the normalized mutual information score from the OOB predictions.\n\n **Args:**\n\n - estimator: Random forest classifier object\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: normalized mutual information score\n\n4. OOB Area Under ROC Curve Score\n\n `oob_score_roc`\n\n Calculates the area under the ROC curve score for the OOB predictions.\n\n **Args:**\n\n - estimator: Random forest classifier object\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: AUC ROC score\n\n5. OOB R2 Score\n\n `oob_score_r2`\n\n Calculates the r2 score from the OOB predictions.\n\n **Args:**\n\n - estimator: Random forest regressor object\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: r2 score\n\n6. OOB Mean Squared Error Score\n\n `oob_score_mse`\n\n Calculates the mean squared error score from the OOB predictions.\n\n **Args:**\n\n - estimator: Random forest regressor object\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: mean squared error score\n\n7. OOB Explained Variance Score\n\n `oob_score_evar`\n\n Calculates the explained variance score for the OOB predictions.\n\n **Args:**\n\n - estimator: Random forest regressor object\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: explained variance score\n\n8. Developmental Test Set Predictions\n\n `dev_predictions`\n\n Extracts predictions using a development fold for linear\n regressor.\n\n **Args:**\n\n - estimator: Linear model regression classifier object\n - X: a data frame of normalized values from developmental dataset\n\n **Yields:**\n\n - vector: Development set predicted labels\n\n9. Developmental Test Set R2 Score\n\n `dev_score_r2`\n\n Calculates the r2 score from the developmental dataset\n predictions.\n\n **Args:**\n\n - estimator: Linear model regressor object\n - X: a data frame of normalized values from developmental dataset\n - Y: a vector of sample labels from developmental dataset\n\n **Yields:**\n\n - float: r2 score\n\n10. Developmental Test Set Mean Squared Error Score\n\n `dev_score_mse`\n\n Calculates the mean squared error score from the developmental dataset\n predictions.\n\n **Args:**\n\n - estimator: Linear model regressor object\n - X: a data frame of normalized values from developmental dataset\n - Y: a vector of sample labels from developmental dataset\n\n **Yields:**\n\n - float: mean squared error score\n\n11. Developmental Test Set Explained Variance Score\n\n `dev_score_evar`\n\n Calculates the explained variance score for the develomental dataset predictions.\n\n **Args:**\n\n - estimator: Linear model regressor object\n - X: a data frame of normalized values from developmental dataset\n - Y: a vector of sample labels from developmental data set\n\n **Yields:**\n\n - float: explained variance score\n\n12. DEV Accuracy Score\n\n `dev_score_accuracy`\n\n Calculates the accuracy score from the DEV predictions.\n\n **Args:**\n\n - estimator: Linear model classifier object\n - X: a data frame of normalized values from developmental dataset\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: accuracy score\n\n13. DEV Normalized Mutual Information Score\n\n `dev_score_nmi`\n\n Calculates the normalized mutual information score from the DEV predictions.\n\n **Args:**\n\n - estimator: Linear model classifier object\n - X: a data frame of normalized values from developmental dataset\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: normalized mutual information score\n\n14. DEV Area Under ROC Curve Score\n\n `dev_score_roc`\n\n Calculates the area under the ROC curve score for the DEV predictions.\n\n **Args:**\n\n - estimator: Linear model classifier object\n - X: a data frame of normalized values from developmental dataset\n - Y: a vector of sample labels from training data set\n\n **Yields:**\n\n - float: AUC ROC score\n\n\n<a id=\"orgbda21bf\"></a>\n\n### Linear model helper functions\n\n1. dRFE Subfunction\n\n `regr_fe`\n\n Iterate over features to by eliminated by step.\n\n **Args:**\n\n - estimator: regressor or classifier linear model object\n - X: a data frame of training data\n - Y: a vector of sample labels from training data set\n - n_features_iter: iterator for number of features to keep loop\n - features: a vector of feature names\n - fold: current fold\n - out_dir: output directory. default '.'\n - dev_size: developmental test set propotion of training\n - SEED: random state\n - RANK: Boolean (True or False)\n\n **Yields:**\n\n - list: a list with number of features, r2 score, mean square error, expalined variance, and array of the indices for features to keep\n\n2. dRFE Step function\n\n `regr_fe_step`\n\n Split training data into developmental dataset and apply estimator\n to developmental dataset, rank features, and conduct feature\n elimination, single steps.\n\n **Args:**\n\n - estimator: regressor or classifier linear model object\n - X: a data frame of training data\n - Y: a vector of sample labels from training data set\n - n_features_to_keep: number of features to keep\n - features: a vector of feature names\n - fold: current fold\n - out_dir: output directory. default '.'\n - dev_size: developmental test set propotion of training\n - SEED: random state\n - RANK: Boolean (True or False)\n\n **Yields:**\n\n - dict: a dictionary with number of features, r2 score, mean square error, expalined variance, and selected features\n",
"bugtrack_url": null,
"license": "GPL-3.0-only",
"summary": "A package for preforming dynamic recursive feature elimination with sklearn.",
"version": "0.3.5",
"project_urls": {
"Homepage": "https://github.com/LieberInstitute/dRFEtools.git",
"Repository": "https://github.com/LieberInstitute/dRFEtools.git"
},
"split_keywords": [
"recursive feature elimination",
" sklearn",
" feature ranking"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c9eceb8276286e291d9b13bb680352ff64ae2aaa881d0e9617c9f74c71d0abfe",
"md5": "60d52daef6348309176bab18b00c5bff",
"sha256": "3a12c938fa9d5d671a1769a9271c1ea5dea9fdbee79044e0f28ad1ed5d16b658"
},
"downloads": -1,
"filename": "drfetools-0.3.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "60d52daef6348309176bab18b00c5bff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 28586,
"upload_time": "2024-07-13T03:29:30",
"upload_time_iso_8601": "2024-07-13T03:29:30.916947Z",
"url": "https://files.pythonhosted.org/packages/c9/ec/eb8276286e291d9b13bb680352ff64ae2aaa881d0e9617c9f74c71d0abfe/drfetools-0.3.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ec23a3cb7270abc1c4e827eddc1276682f2151380c0eb3852748eb0636c8618a",
"md5": "d51b9ff3faa4658861e3ea5105148296",
"sha256": "52ceccd271e27a83b9d2cabdf63c196851cb455a2ec7cc583e115a3e9d2e08a2"
},
"downloads": -1,
"filename": "drfetools-0.3.5.tar.gz",
"has_sig": false,
"md5_digest": "d51b9ff3faa4658861e3ea5105148296",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 27676,
"upload_time": "2024-07-13T03:29:32",
"upload_time_iso_8601": "2024-07-13T03:29:32.871513Z",
"url": "https://files.pythonhosted.org/packages/ec/23/a3cb7270abc1c4e827eddc1276682f2151380c0eb3852748eb0636c8618a/drfetools-0.3.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-13 03:29:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "LieberInstitute",
"github_project": "dRFEtools",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "drfetools"
}