<!-- README.md is generated from README.Rmd. Please edit that file -->
# asgl <img src="figures/logo.png" align="right" height="150" alt="funq website" /></a>
[![Downloads](https://pepy.tech/badge/asgl)](https://pepy.tech/project/asgl)
[![Downloads](https://pepy.tech/badge/asgl/month)](https://pepy.tech/project/asgl)
[![License: GPL
v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Package
Version](https://img.shields.io/badge/version-2.1.1-blue.svg)](https://cran.r-project.org/package=asgl)
## Introduction
The `asgl` package is a versatile and robust tool designed for fitting a
variety of regression models, including linear regression, quantile
regression, logistic regression and various penalized regression models
such as Lasso, Ridge, Group Lasso, Sparse Group Lasso, and their
adaptive variants. The package is especially useful for simultaneous
variable selection and prediction in both low and high-dimensional
frameworks.
The primary class available to users is the `Regressor` class, which is
detailed later in this document.
`asgl` is based on cutting-edge research and methodologies, as outlined
in the following papers:
- [Adaptive Sparse Group Lasso in Quantile
Regression](https://link.springer.com/article/10.1007/s11634-020-00413-8)
- [`asgl`: A Python Package for Penalized Linear and Quantile
Regression](https://arxiv.org/abs/2111.00472)
For a practical introduction to the package, users can refer to the user
guide notebook available in the GitHub repository. Additional accessible
explanations can be found on [Towards Data Science: Sparse Group
Lasso](https://towardsdatascience.com/sparse-group-lasso-in-python-255e379ab892),
[Towards Data Science: Adaptive
Lasso](https://towardsdatascience.com/an-adaptive-lasso-63afca54b80d)
and [Towards Data Science: Quantile
regression](https://towardsdatascience.com/squashing-the-average-a-dive-into-penalized-quantile-regression-for-python-8f3a996768b6).
## Dependencies
asgl requires:
- Python \>= 3.9
- cvxpy \>= 1.2.0
- numpy \>= 1.20.0
- scikit-learn \>= 1.0
- pytest \>= 7.1.2
## User installation
The easiest way to install asgl is using `pip`:
pip install asgl
## Testing
After installation, you can launch the test suite from the source
directory (you will need to have `pytest >= 7.1.2` installed) by runnig:
pytest
## What’s new?
### 2.1.1
Now the intercept term appears in the `intercept_` attribute instead of
being part of the `coef_` attribute.
### 2.1.0
The latest release of the `asgl` package, version 2.1.0, introduces
powerful enhancements for logistic regression models. Users can now
easily tackle binary classification problems by setting `model='logit'`.
For more granular control, specify `model='logit_raw'` to retrieve
outputs before logistic transformation, or `model='logit_proba'` for
probability outputs. Additionally, this update includes the
implementation of ridge and adaptive ridge penalizations, accessible via
`penalization='ridge'` or `'aridge'`, allowing for more flexible model
tuning.
### 2.0.0
With the release of version 2.0, the `asgl` package has undergone
significant enhancements and improvements. The most notable change is
the introduction of the `Regressor` object, which brings full
compatibility with scikit-learn. This means that the `Regressor` object
can now be used just like any other scikit-learn estimator, enabling
seamless integration with scikit-learn’s extensive suite of tools for
model evaluation, hyperparameter optimization, and performance metrics.
Key updates include:
- Scikit-learn Compatibility: The `Regressor` class is now fully
compatible with scikit-learn. Users can leverage functionalities such
as `sklearn.model_selection.GridSearchCV` for hyperparameter tuning
and utilize various scikit-learn metrics and utilities to assess model
performance.
- Deprecation of `ASGL` class: The old `ASGL` class is still included in
the package for backward compatibility but is now deprecated. It will
raise a `DeprecationWarning` when used, as it is no longer supported
and will be removed in future versions. Users are strongly encouraged
to transition to the new `Regressor` class to take advantage of the
latest features and improvements.
For users currently utilizing the `ASGL` class, we recommend switching
to the `Regressor` class to ensure continued support and access to the
latest functionalities.
## Key features:
The `Regressor` class includes the following list of parameters:
- model: str, default=‘lm’
- Type of model to fit. Options are ‘lm’ (linear regression), ‘qr’
(quantile regression), ‘logit’ (logistic regression for binary
classification, output binary classification), ‘logit_proba’
(logistic regression for binary classification, output probability)
and ‘logit_raw’ (logistic regression for binary classification,
output score before logistic).
- penalization: str or None, default=‘lasso’
- Type of penalization to use. Options are ‘lasso’, ‘ridge’, ‘gl’
(group lasso), ‘sgl’ (sparse group lasso), ‘alasso’ (adaptive
lasso), ‘aridge’, ‘agl’ (adaptive group lasso), ‘asgl’ (adaptive
sparse group lasso), or None.
- quantile: float, default=0.5
- Quantile level for quantile regression models. Valid values are
between 0 and 1.
- fit_intercept: bool, default=True
- Whether to fit an intercept in the model.
- lambda1: float, default=0.1
- Constant that multiplies the penalization, controlling the strength.
Must be a non-negative float i.e. in `[0, inf)`. Larger values will
result in larger penalizations.
- alpha: float, default=0.5
- Constant that performs tradeoff between individual and group
penalizations in sgl and asgl penalizations. `alpha=1` enforces a
lasso penalization while `alpha=0` enforces a group lasso
penalization.
- solver: str, default=‘default’
- Solver to be used by `cvxpy`. Default uses optimal alternative
depending on the problem. Users can check available solvers via the
command `cvxpy.installed_solvers()`.
- weight_technique: str, default=‘pca_pct’
- Technique used to fit adaptive weights. Options include ‘pca_1’,
‘pca_pct’, ‘pls_1’, ‘pls_pct’, ‘lasso’, ‘ridge’, ‘unpenalized’, and
‘sparse_pca’. For low dimensional problems (where the number of
variables is smaller than the number of observations) the usage of
the ‘unpenalized’ or ‘ridge’ weight_techniques is encouraged. For
high dimensional problems (where the number of variables is larger
than the number of observations) the default ‘pca_pct’ is
encouraged.
- individual_power_weight: float, default=1
- Power to which individual weights are raised. This parameter only
has effect in adaptive penalizations. (‘alasso’ and ‘asgl’).
- group_power_weight: float, default=1
- Power to which group weights are raised. This parameter only has
effect in adaptive penalizations with a grouped structure (‘agl’ and
‘asgl’).
- variability_pct: float, default=0.9
- Percentage of variability explained by PCA, PLS, and sparse PCA
components. This parameter only has effect in adaptiv penalizations
where `weight_technique` is equal to ‘pca_pct’, ‘pls_pct’ or
‘sparse_pca’.
- lambda1_weights: float, default=0.1
- The value of the parameter `lambda1` used to solve the lasso model
if `weight_technique='lasso'`
- spca_alpha: float, default=1e-5
- Sparse PCA parameter. This parameter only has effect if
`weight_technique='sparse_pca'`See scikit-learn implementation for
more details.
- spca_ridge_alpha: float, default=1e-2
- Sparse PCA parameter. This parameter only has effect if
`weight_technique='sparse_pca'`See scikit-learn implementation for
more details.
- individual_weights: array or None, default=None
- Custom individual weights for adaptive penalizations. If this
parameter is informed, it overrides the weight estimation process
defined by parameter `weight_technique` and allows the user to
provide custom weights. It must be either `None` or be an array with
non-negative float values and length equal to the number of
variables.
- group_weights: array or None, default=None
- Custom group weights for adaptive penalizations. If this parameter
is informed, it overrides the weight estimation process defined by
parameter `weight_technique` and allows the user to provide custom
weights. It must be either `None` or be an array with non-negative
float values and length equal to the number of groups (as defined by
`group_index`)
- tol: float, default=1e-4
- Tolerance for coefficients to be considered zero.
- weight_tol: float, default=1e-4
- Tolerance value used to avoid ZeroDivision errors when computing the
weights.
## Examples
### Example 1: Linear Regression with Lasso.
``` python
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from asgl import Regressor
X, y = make_regression(n_samples=1000, n_features=50, n_informative=25, bias=10, noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)
model = Regressor(model='lm', penalization='lasso', lambda1=0.1)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
mse = mean_squared_error(predictions, y_test)
```
This example illustrates how to:
- Generate synthetic regression data.
- Split the data into training and testing sets.
- Create a `Regressor` object configured for linear regression with
Lasso penalization.
- Fit the model to the training data.
- Make predictions on the test data.
- Evaluate the model’s performance using mean squared error.
### Example 2: Quantile regression with Adaptive Sparse Group Lasso.
Group-based penalizations like Group Lasso, Sparse Group Lasso, and
their adaptive variants, assume that there is a group structure within
the regressors. This structure can be useful in various applications,
such as when using dummy variables where all the dummies of the same
variable belong to the same group, or in genetic data analysis where
genes are grouped into genetic pathways.
For scenarios where the regressors have a known grouped structure, this
information can be passed to the `Regressor` class during model fitting
using the `group_index` parameter. This parameter is an array where each
element indicates the group at which the associated variable belongs.
The following example demonstrates this with a synthetic group_index.
The model will be optimized using scikit-learn’s `RandomizedSearchCV`
function.
``` python
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from asgl import Regressor
X, y = make_regression(n_samples=1000, n_features=50, n_informative=25, bias=10, noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)
group_index = np.random.randint(1, 5, size=50)
model = Regressor(model='qr', penalization='asgl', quantile=0.5)
param_grid = {'lambda1': [1e-4, 1e-3, 1e-2, 1e-1, 1], 'alpha': [0, 0.2, 0.4, 0.6, 0.8, 1]}
rscv = RandomizedSearchCV(model, param_grid, scoring='neg_median_absolute_error')
rscv.fit(X_train, y_train, **{'group_index': group_index})
```
This example demonstrates how to fit a quantile regression model with
Adaptive Sparse Group Lasso penalization, utilizing scikit-learn’s
`RandomizedSearchCV` to optimize the model’s hyperparameters.
### Example 3: Logistic regression
In binary classification tasks using logistic regression, the default
decision threshold of 0.5 is used by default. But it might not always
yield the best accuracy. By leveraging the `'logit_proba'` model from
the `asgl` package, you can obtain predicted probabilities and use them
to find an optimal threshold that maximizes classification accuracy.
This example demonstrates how to use `cross_val_predict` from
scikit-learn to evaluate different thresholds and select the one that
offers the highest accuracy for your classification model.
``` python
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, cross_val_predict
from sklearn.metrics import accuracy_score, precision_recall_curve
from asgl import Regressor
import matplotlib.pyplot as plt
X, y = make_classification(n_samples=1000, n_features=100, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Create a Regressor object for logistic regression to output probabilities
model = Regressor(model='logit_proba', penalization='ridge')
# Use cross_val_predict to get probability estimates for each fold
probabilities = cross_val_predict(model, X_train, y_train, method='predict', cv=5)
#> C:\Users\alvar\ONEDRI~1\Trabajo\Investigacion\asgl\venv\Lib\site-packages\cvxpy\problems\problem.py:1407: UserWarning: Solution may be inaccurate. Try another solver, adjusting the solver settings, or solve with verbose=True for more information.
#> warnings.warn(
thresholds = np.linspace(0.01, 0.99, 100)
# Calculate accuracy for each threshold
accuracies = []
for threshold in thresholds:
predictions = (probabilities >= threshold).astype(int)
accuracies.append(accuracy_score(y_train, predictions))
```
``` python
plt.plot(thresholds, accuracies)
plt.title('Accuracy vs Threshold')
plt.ylabel('Accuracy')
plt.xlabel('Threshold')
```
<img src="figures/README-unnamed-chunk-4-1.png" width="100%" />
``` python
optimal_threshold = thresholds[np.argmax(accuracies)]
model.fit(X_train, y_train)
test_probabilities = model.predict(X_test)
test_predictions = (test_probabilities >= optimal_threshold).astype(int)
test_accuracy = accuracy_score(y_test, test_predictions)
```
### Example 4: Customizing weights for adaptive sparse group lasso
The `asgl` package offers several built-in methods for estimating
adaptive weights, controlled via the `weight_technique` parameter. For
more details onto the inners of each of these alternatives, refer to the
[associated research
paper](https://link.springer.com/article/10.1007/s11634-020-00413-8) or
to the user guide. However, for users requiring extensive customization,
the package allows for the direct specification of custom weights
through the `individual_weights` and `group_weights` parameters. This
allows the users to implement their own weight computation techniques
and use them within the `asgl` framework.
When using custom weights, ensure that the length of
`individual_weights` matches the number of variables, and the length of
`group_weights` matches the number of groups. Below is an example
demonstrating how to fit a model with custom individual and group
weights:
``` python
import numpy as np
from asgl import Regressor
# Generate custom weights
custom_individual_weights = np.random.rand(X_train.shape[1])
custom_group_weights = np.random.rand(len(np.unique(group_index)))
# Create a Regressor object with custom weights
model = Regressor(model='lm', penalization='asgl', individual_weights=custom_individual_weights, group_weights=custom_group_weights)
# Fit the model
model.fit(X_train, y_train, group_index=group_index)
```
### Example 5: Comparison of lasso and adaptive lasso
This example compares an implementation of lasso as available in
`scikit-learn` against an adaptive lasso model built using the `asgl`
library. Both models are optimized using 5-fold cross validation on a
grid of hyper parameters, but ass demonstrated by the final MSEs
computed on the test set, the adaptive lasso reduces by half the error
compared to lasso.
``` python
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV, train_test_split
from asgl import Regressor
X, y = make_regression(n_samples=200, n_features=200, n_informative=25, bias=10, noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=50, random_state=42)
param_grid = {'alpha': 10 ** np.arange(-2, 1.51, 0.1)}
lasso_model = Lasso()
gscv_lasso = GridSearchCV(lasso_model, param_grid, scoring='neg_mean_squared_error', cv=5, n_jobs=-1)
gscv_lasso.fit(X_train, y_train)
lasso_predictions = gscv_lasso.predict(X_test)
lasso_mse = np.round(mean_squared_error(lasso_predictions, y_test), 3)
print(f"Lasso MSE: {lasso_mse}")
param_grid = {'lambda1': 10 ** np.arange(-2, 1.51, 0.1)}
alasso_model = Regressor(model='lm', penalization='alasso', weight_technique='lasso')
gscv_alasso = GridSearchCV(alasso_model, param_grid, scoring='neg_mean_squared_error', cv=5, n_jobs=-1)
gscv_alasso.fit(X_train, y_train)
alasso_predictions = gscv_alasso.predict(X_test)
alasso_mse = np.round(mean_squared_error(alasso_predictions, y_test), 3)
print(f"Adaptive lasso MSE: {alasso_mse}")
```
Lasso MSE: 59.693
Adaptive lasso MSE: 35.085
## Contributions
Contributions are welcome! Please submit a pull request or open an issue
to discuss your ideas.
### Citation
------------------------------------------------------------------------
If you use `asgl` in a scientific publication, we would appreciate you
[cite our
paper](https://link.springer.com/article/10.1007/s11634-020-00413-8).
Thank you for your support and we hope you find this package useful!
## License
This project is licensed under the GPL-3.0 license. This means that the
package is open source and that any copy or modification of the original
code must also be released under the GPL-3.0 license. In other words,
you can take the code, add to it or make major changes, and then openly
distribute your version, but not profit from it.
Raw data
{
"_id": null,
"home_page": "https://github.com/alvaromc317/asgl",
"name": "asgl",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "variable-selection, regression, classification, penalization, lasso, adaptive-lasso, group-lasso, sparse-group-lasso, high-dimension, quantile-regression",
"author": "Alvaro Mendez Civieta",
"author_email": "alvaromc317@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f6/af/fdadb5a87a1da2938d74153b751a2688db86078cf9f3b3f74db961f61394/asgl-2.1.2.tar.gz",
"platform": null,
"description": "\r\n<!-- README.md is generated from README.Rmd. Please edit that file -->\r\n\r\n# asgl <img src=\"figures/logo.png\" align=\"right\" height=\"150\" alt=\"funq website\" /></a>\r\n\r\n[![Downloads](https://pepy.tech/badge/asgl)](https://pepy.tech/project/asgl)\r\n[![Downloads](https://pepy.tech/badge/asgl/month)](https://pepy.tech/project/asgl)\r\n[![License: GPL\r\nv3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\r\n[![Package\r\nVersion](https://img.shields.io/badge/version-2.1.1-blue.svg)](https://cran.r-project.org/package=asgl)\r\n\r\n## Introduction\r\n\r\nThe `asgl` package is a versatile and robust tool designed for fitting a\r\nvariety of regression models, including linear regression, quantile\r\nregression, logistic regression and various penalized regression models\r\nsuch as Lasso, Ridge, Group Lasso, Sparse Group Lasso, and their\r\nadaptive variants. The package is especially useful for simultaneous\r\nvariable selection and prediction in both low and high-dimensional\r\nframeworks.\r\n\r\nThe primary class available to users is the `Regressor` class, which is\r\ndetailed later in this document.\r\n\r\n`asgl` is based on cutting-edge research and methodologies, as outlined\r\nin the following papers:\r\n\r\n- [Adaptive Sparse Group Lasso in Quantile\r\n Regression](https://link.springer.com/article/10.1007/s11634-020-00413-8)\r\n- [`asgl`: A Python Package for Penalized Linear and Quantile\r\n Regression](https://arxiv.org/abs/2111.00472)\r\n\r\nFor a practical introduction to the package, users can refer to the user\r\nguide notebook available in the GitHub repository. Additional accessible\r\nexplanations can be found on [Towards Data Science: Sparse Group\r\nLasso](https://towardsdatascience.com/sparse-group-lasso-in-python-255e379ab892),\r\n[Towards Data Science: Adaptive\r\nLasso](https://towardsdatascience.com/an-adaptive-lasso-63afca54b80d)\r\nand [Towards Data Science: Quantile\r\nregression](https://towardsdatascience.com/squashing-the-average-a-dive-into-penalized-quantile-regression-for-python-8f3a996768b6).\r\n\r\n## Dependencies\r\n\r\nasgl requires:\r\n\r\n- Python \\>= 3.9\r\n- cvxpy \\>= 1.2.0\r\n- numpy \\>= 1.20.0\r\n- scikit-learn \\>= 1.0\r\n- pytest \\>= 7.1.2\r\n\r\n## User installation\r\n\r\nThe easiest way to install asgl is using `pip`:\r\n\r\n pip install asgl\r\n\r\n## Testing\r\n\r\nAfter installation, you can launch the test suite from the source\r\ndirectory (you will need to have `pytest >= 7.1.2` installed) by runnig:\r\n\r\n pytest\r\n\r\n## What\u2019s new?\r\n\r\n### 2.1.1\r\n\r\nNow the intercept term appears in the `intercept_` attribute instead of\r\nbeing part of the `coef_` attribute.\r\n\r\n### 2.1.0\r\n\r\nThe latest release of the `asgl` package, version 2.1.0, introduces\r\npowerful enhancements for logistic regression models. Users can now\r\neasily tackle binary classification problems by setting `model='logit'`.\r\nFor more granular control, specify `model='logit_raw'` to retrieve\r\noutputs before logistic transformation, or `model='logit_proba'` for\r\nprobability outputs. Additionally, this update includes the\r\nimplementation of ridge and adaptive ridge penalizations, accessible via\r\n`penalization='ridge'` or `'aridge'`, allowing for more flexible model\r\ntuning.\r\n\r\n### 2.0.0\r\n\r\nWith the release of version 2.0, the `asgl` package has undergone\r\nsignificant enhancements and improvements. The most notable change is\r\nthe introduction of the `Regressor` object, which brings full\r\ncompatibility with scikit-learn. This means that the `Regressor` object\r\ncan now be used just like any other scikit-learn estimator, enabling\r\nseamless integration with scikit-learn\u2019s extensive suite of tools for\r\nmodel evaluation, hyperparameter optimization, and performance metrics.\r\n\r\nKey updates include:\r\n\r\n- Scikit-learn Compatibility: The `Regressor` class is now fully\r\n compatible with scikit-learn. Users can leverage functionalities such\r\n as `sklearn.model_selection.GridSearchCV` for hyperparameter tuning\r\n and utilize various scikit-learn metrics and utilities to assess model\r\n performance.\r\n\r\n- Deprecation of `ASGL` class: The old `ASGL` class is still included in\r\n the package for backward compatibility but is now deprecated. It will\r\n raise a `DeprecationWarning` when used, as it is no longer supported\r\n and will be removed in future versions. Users are strongly encouraged\r\n to transition to the new `Regressor` class to take advantage of the\r\n latest features and improvements.\r\n\r\nFor users currently utilizing the `ASGL` class, we recommend switching\r\nto the `Regressor` class to ensure continued support and access to the\r\nlatest functionalities.\r\n\r\n## Key features:\r\n\r\nThe `Regressor` class includes the following list of parameters:\r\n\r\n- model: str, default=\u2018lm\u2019\r\n - Type of model to fit. Options are \u2018lm\u2019 (linear regression), \u2018qr\u2019\r\n (quantile regression), \u2018logit\u2019 (logistic regression for binary\r\n classification, output binary classification), \u2018logit_proba\u2019\r\n (logistic regression for binary classification, output probability)\r\n and \u2018logit_raw\u2019 (logistic regression for binary classification,\r\n output score before logistic).\r\n- penalization: str or None, default=\u2018lasso\u2019\r\n - Type of penalization to use. Options are \u2018lasso\u2019, \u2018ridge\u2019, \u2018gl\u2019\r\n (group lasso), \u2018sgl\u2019 (sparse group lasso), \u2018alasso\u2019 (adaptive\r\n lasso), \u2018aridge\u2019, \u2018agl\u2019 (adaptive group lasso), \u2018asgl\u2019 (adaptive\r\n sparse group lasso), or None.\r\n- quantile: float, default=0.5\r\n - Quantile level for quantile regression models. Valid values are\r\n between 0 and 1.\r\n- fit_intercept: bool, default=True\r\n - Whether to fit an intercept in the model.\r\n- lambda1: float, default=0.1\r\n - Constant that multiplies the penalization, controlling the strength.\r\n Must be a non-negative float i.e.\u00a0in `[0, inf)`. Larger values will\r\n result in larger penalizations.\r\n- alpha: float, default=0.5\r\n - Constant that performs tradeoff between individual and group\r\n penalizations in sgl and asgl penalizations. `alpha=1` enforces a\r\n lasso penalization while `alpha=0` enforces a group lasso\r\n penalization.\r\n- solver: str, default=\u2018default\u2019\r\n - Solver to be used by `cvxpy`. Default uses optimal alternative\r\n depending on the problem. Users can check available solvers via the\r\n command `cvxpy.installed_solvers()`.\r\n- weight_technique: str, default=\u2018pca_pct\u2019\r\n - Technique used to fit adaptive weights. Options include \u2018pca_1\u2019,\r\n \u2018pca_pct\u2019, \u2018pls_1\u2019, \u2018pls_pct\u2019, \u2018lasso\u2019, \u2018ridge\u2019, \u2018unpenalized\u2019, and\r\n \u2018sparse_pca\u2019. For low dimensional problems (where the number of\r\n variables is smaller than the number of observations) the usage of\r\n the \u2018unpenalized\u2019 or \u2018ridge\u2019 weight_techniques is encouraged. For\r\n high dimensional problems (where the number of variables is larger\r\n than the number of observations) the default \u2018pca_pct\u2019 is\r\n encouraged.\r\n- individual_power_weight: float, default=1\r\n - Power to which individual weights are raised. This parameter only\r\n has effect in adaptive penalizations. (\u2018alasso\u2019 and \u2018asgl\u2019).\r\n- group_power_weight: float, default=1\r\n - Power to which group weights are raised. This parameter only has\r\n effect in adaptive penalizations with a grouped structure (\u2018agl\u2019 and\r\n \u2018asgl\u2019).\r\n- variability_pct: float, default=0.9\r\n - Percentage of variability explained by PCA, PLS, and sparse PCA\r\n components. This parameter only has effect in adaptiv penalizations\r\n where `weight_technique` is equal to \u2018pca_pct\u2019, \u2018pls_pct\u2019 or\r\n \u2018sparse_pca\u2019.\r\n- lambda1_weights: float, default=0.1\r\n - The value of the parameter `lambda1` used to solve the lasso model\r\n if `weight_technique='lasso'`\r\n- spca_alpha: float, default=1e-5\r\n - Sparse PCA parameter. This parameter only has effect if\r\n `weight_technique='sparse_pca'`See scikit-learn implementation for\r\n more details.\r\n- spca_ridge_alpha: float, default=1e-2\r\n - Sparse PCA parameter. This parameter only has effect if\r\n `weight_technique='sparse_pca'`See scikit-learn implementation for\r\n more details.\r\n- individual_weights: array or None, default=None\r\n - Custom individual weights for adaptive penalizations. If this\r\n parameter is informed, it overrides the weight estimation process\r\n defined by parameter `weight_technique` and allows the user to\r\n provide custom weights. It must be either `None` or be an array with\r\n non-negative float values and length equal to the number of\r\n variables.\r\n- group_weights: array or None, default=None\r\n - Custom group weights for adaptive penalizations. If this parameter\r\n is informed, it overrides the weight estimation process defined by\r\n parameter `weight_technique` and allows the user to provide custom\r\n weights. It must be either `None` or be an array with non-negative\r\n float values and length equal to the number of groups (as defined by\r\n `group_index`)\r\n- tol: float, default=1e-4\r\n - Tolerance for coefficients to be considered zero.\r\n- weight_tol: float, default=1e-4\r\n - Tolerance value used to avoid ZeroDivision errors when computing the\r\n weights.\r\n\r\n## Examples\r\n\r\n### Example 1: Linear Regression with Lasso.\r\n\r\n``` python\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import mean_squared_error\r\nfrom asgl import Regressor\r\n\r\nX, y = make_regression(n_samples=1000, n_features=50, n_informative=25, bias=10, noise=5, random_state=42)\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)\r\n\r\nmodel = Regressor(model='lm', penalization='lasso', lambda1=0.1)\r\nmodel.fit(X_train, y_train)\r\n\r\npredictions = model.predict(X_test)\r\nmse = mean_squared_error(predictions, y_test)\r\n```\r\n\r\nThis example illustrates how to:\r\n\r\n- Generate synthetic regression data.\r\n- Split the data into training and testing sets.\r\n- Create a `Regressor` object configured for linear regression with\r\n Lasso penalization.\r\n- Fit the model to the training data.\r\n- Make predictions on the test data.\r\n- Evaluate the model\u2019s performance using mean squared error.\r\n\r\n### Example 2: Quantile regression with Adaptive Sparse Group Lasso.\r\n\r\nGroup-based penalizations like Group Lasso, Sparse Group Lasso, and\r\ntheir adaptive variants, assume that there is a group structure within\r\nthe regressors. This structure can be useful in various applications,\r\nsuch as when using dummy variables where all the dummies of the same\r\nvariable belong to the same group, or in genetic data analysis where\r\ngenes are grouped into genetic pathways.\r\n\r\nFor scenarios where the regressors have a known grouped structure, this\r\ninformation can be passed to the `Regressor` class during model fitting\r\nusing the `group_index` parameter. This parameter is an array where each\r\nelement indicates the group at which the associated variable belongs.\r\nThe following example demonstrates this with a synthetic group_index.\r\nThe model will be optimized using scikit-learn\u2019s `RandomizedSearchCV`\r\nfunction.\r\n\r\n``` python\r\nimport numpy as np\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.model_selection import RandomizedSearchCV\r\nfrom asgl import Regressor\r\n\r\nX, y = make_regression(n_samples=1000, n_features=50, n_informative=25, bias=10, noise=5, random_state=42)\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)\r\n\r\ngroup_index = np.random.randint(1, 5, size=50)\r\n\r\nmodel = Regressor(model='qr', penalization='asgl', quantile=0.5)\r\n\r\nparam_grid = {'lambda1': [1e-4, 1e-3, 1e-2, 1e-1, 1], 'alpha': [0, 0.2, 0.4, 0.6, 0.8, 1]}\r\nrscv = RandomizedSearchCV(model, param_grid, scoring='neg_median_absolute_error')\r\nrscv.fit(X_train, y_train, **{'group_index': group_index})\r\n```\r\n\r\nThis example demonstrates how to fit a quantile regression model with\r\nAdaptive Sparse Group Lasso penalization, utilizing scikit-learn\u2019s\r\n`RandomizedSearchCV` to optimize the model\u2019s hyperparameters.\r\n\r\n### Example 3: Logistic regression\r\n\r\nIn binary classification tasks using logistic regression, the default\r\ndecision threshold of 0.5 is used by default. But it might not always\r\nyield the best accuracy. By leveraging the `'logit_proba'` model from\r\nthe `asgl` package, you can obtain predicted probabilities and use them\r\nto find an optimal threshold that maximizes classification accuracy.\r\nThis example demonstrates how to use `cross_val_predict` from\r\nscikit-learn to evaluate different thresholds and select the one that\r\noffers the highest accuracy for your classification model.\r\n\r\n``` python\r\nimport numpy as np\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split, cross_val_predict\r\nfrom sklearn.metrics import accuracy_score, precision_recall_curve\r\nfrom asgl import Regressor\r\nimport matplotlib.pyplot as plt\r\n\r\nX, y = make_classification(n_samples=1000, n_features=100, random_state=42)\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)\r\n\r\n# Create a Regressor object for logistic regression to output probabilities\r\nmodel = Regressor(model='logit_proba', penalization='ridge')\r\n\r\n# Use cross_val_predict to get probability estimates for each fold\r\nprobabilities = cross_val_predict(model, X_train, y_train, method='predict', cv=5)\r\n#> C:\\Users\\alvar\\ONEDRI~1\\Trabajo\\Investigacion\\asgl\\venv\\Lib\\site-packages\\cvxpy\\problems\\problem.py:1407: UserWarning: Solution may be inaccurate. Try another solver, adjusting the solver settings, or solve with verbose=True for more information.\r\n#> warnings.warn(\r\n\r\nthresholds = np.linspace(0.01, 0.99, 100)\r\n\r\n# Calculate accuracy for each threshold\r\naccuracies = []\r\nfor threshold in thresholds:\r\n predictions = (probabilities >= threshold).astype(int)\r\n accuracies.append(accuracy_score(y_train, predictions))\r\n```\r\n\r\n``` python\r\nplt.plot(thresholds, accuracies)\r\nplt.title('Accuracy vs Threshold')\r\nplt.ylabel('Accuracy')\r\nplt.xlabel('Threshold')\r\n```\r\n\r\n<img src=\"figures/README-unnamed-chunk-4-1.png\" width=\"100%\" />\r\n\r\n``` python\r\noptimal_threshold = thresholds[np.argmax(accuracies)]\r\nmodel.fit(X_train, y_train)\r\ntest_probabilities = model.predict(X_test)\r\ntest_predictions = (test_probabilities >= optimal_threshold).astype(int)\r\ntest_accuracy = accuracy_score(y_test, test_predictions)\r\n```\r\n\r\n### Example 4: Customizing weights for adaptive sparse group lasso\r\n\r\nThe `asgl` package offers several built-in methods for estimating\r\nadaptive weights, controlled via the `weight_technique` parameter. For\r\nmore details onto the inners of each of these alternatives, refer to the\r\n[associated research\r\npaper](https://link.springer.com/article/10.1007/s11634-020-00413-8) or\r\nto the user guide. However, for users requiring extensive customization,\r\nthe package allows for the direct specification of custom weights\r\nthrough the `individual_weights` and `group_weights` parameters. This\r\nallows the users to implement their own weight computation techniques\r\nand use them within the `asgl` framework.\r\n\r\nWhen using custom weights, ensure that the length of\r\n`individual_weights` matches the number of variables, and the length of\r\n`group_weights` matches the number of groups. Below is an example\r\ndemonstrating how to fit a model with custom individual and group\r\nweights:\r\n\r\n``` python\r\nimport numpy as np\r\nfrom asgl import Regressor\r\n\r\n# Generate custom weights\r\ncustom_individual_weights = np.random.rand(X_train.shape[1])\r\ncustom_group_weights = np.random.rand(len(np.unique(group_index)))\r\n\r\n# Create a Regressor object with custom weights\r\nmodel = Regressor(model='lm', penalization='asgl', individual_weights=custom_individual_weights, group_weights=custom_group_weights)\r\n\r\n# Fit the model\r\nmodel.fit(X_train, y_train, group_index=group_index)\r\n```\r\n\r\n### Example 5: Comparison of lasso and adaptive lasso\r\n\r\nThis example compares an implementation of lasso as available in\r\n`scikit-learn` against an adaptive lasso model built using the `asgl`\r\nlibrary. Both models are optimized using 5-fold cross validation on a\r\ngrid of hyper parameters, but ass demonstrated by the final MSEs\r\ncomputed on the test set, the adaptive lasso reduces by half the error\r\ncompared to lasso.\r\n\r\n``` python\r\nimport numpy as np\r\nfrom sklearn.linear_model import Lasso\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.metrics import mean_squared_error\r\nfrom sklearn.model_selection import GridSearchCV, train_test_split\r\nfrom asgl import Regressor\r\n\r\nX, y = make_regression(n_samples=200, n_features=200, n_informative=25, bias=10, noise=5, random_state=42)\r\n\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=50, random_state=42)\r\n\r\nparam_grid = {'alpha': 10 ** np.arange(-2, 1.51, 0.1)}\r\n\r\nlasso_model = Lasso()\r\n\r\ngscv_lasso = GridSearchCV(lasso_model, param_grid, scoring='neg_mean_squared_error', cv=5, n_jobs=-1)\r\ngscv_lasso.fit(X_train, y_train)\r\nlasso_predictions = gscv_lasso.predict(X_test)\r\nlasso_mse = np.round(mean_squared_error(lasso_predictions, y_test), 3)\r\nprint(f\"Lasso MSE: {lasso_mse}\")\r\n\r\n\r\nparam_grid = {'lambda1': 10 ** np.arange(-2, 1.51, 0.1)}\r\n\r\nalasso_model = Regressor(model='lm', penalization='alasso', weight_technique='lasso')\r\n\r\ngscv_alasso = GridSearchCV(alasso_model, param_grid, scoring='neg_mean_squared_error', cv=5, n_jobs=-1)\r\ngscv_alasso.fit(X_train, y_train)\r\nalasso_predictions = gscv_alasso.predict(X_test)\r\nalasso_mse = np.round(mean_squared_error(alasso_predictions, y_test), 3)\r\nprint(f\"Adaptive lasso MSE: {alasso_mse}\")\r\n```\r\n\r\n Lasso MSE: 59.693\r\n Adaptive lasso MSE: 35.085\r\n\r\n## Contributions\r\n\r\nContributions are welcome! Please submit a pull request or open an issue\r\nto discuss your ideas.\r\n\r\n### Citation\r\n\r\n------------------------------------------------------------------------\r\n\r\nIf you use `asgl` in a scientific publication, we would appreciate you\r\n[cite our\r\npaper](https://link.springer.com/article/10.1007/s11634-020-00413-8).\r\nThank you for your support and we hope you find this package useful!\r\n\r\n## License\r\n\r\nThis project is licensed under the GPL-3.0 license. This means that the\r\npackage is open source and that any copy or modification of the original\r\ncode must also be released under the GPL-3.0 license. In other words,\r\nyou can take the code, add to it or make major changes, and then openly\r\ndistribute your version, but not profit from it.\r\n",
"bugtrack_url": null,
"license": "GNU General Public License",
"summary": "A regression solver for high dimensional penalized linear, quantile and logistic regression models",
"version": "2.1.2",
"project_urls": {
"Homepage": "https://github.com/alvaromc317/asgl"
},
"split_keywords": [
"variable-selection",
" regression",
" classification",
" penalization",
" lasso",
" adaptive-lasso",
" group-lasso",
" sparse-group-lasso",
" high-dimension",
" quantile-regression"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f6affdadb5a87a1da2938d74153b751a2688db86078cf9f3b3f74db961f61394",
"md5": "be2550e00ef7c87a012db6cc8b402809",
"sha256": "701109b7048e140659b1910189890b3a77900ba47fd111e7fda885e3d82191aa"
},
"downloads": -1,
"filename": "asgl-2.1.2.tar.gz",
"has_sig": false,
"md5_digest": "be2550e00ef7c87a012db6cc8b402809",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 45636,
"upload_time": "2024-10-01T19:33:17",
"upload_time_iso_8601": "2024-10-01T19:33:17.513614Z",
"url": "https://files.pythonhosted.org/packages/f6/af/fdadb5a87a1da2938d74153b751a2688db86078cf9f3b3f74db961f61394/asgl-2.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-01 19:33:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alvaromc317",
"github_project": "asgl",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "asgl"
}