penlm


Namepenlm JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/bellibot/penlm
SummaryPenalized Linear Models for Classification and Regression
upload_time2024-07-28 17:10:52
maintainerNone
docs_urlNone
authorEdoardo Belli
requires_python>=3.5
licenseMIT
keywords classification regression linear penalty
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## penlm - Penalized Linear Models

`penlm` is a Python package that implements a few penalty based linear models that aren't (currently) available in `scikit-learn`. All the models are implemented with the familiar `fit`/`predict`/`predict_proba`/`score` interface.

The supported estimators are:
- Smoothly Adaptively Centered Ridge ([SACR](https://doi.org/10.1016/j.jmva.2021.104882))
- Ridge penalty on first/second derivatives (Functional Linear Model)
- [Adaptive Lasso](https://doi.org/10.1198/016214506000000735)
- [Relaxed Lasso](https://doi.org/10.1016/j.csda.2006.12.019)
- Broken Adaptive Ridge ([BAR](https://doi.org/10.1016/j.jmva.2018.08.007)) 
- Non-negative Garrote ([NNG](https://doi.org/10.2307/1269730))

All the estimators have `fit_intercept` and `scale` arguments (scaling is done with `sklearn.preprocessing.StandardScaler`) , and work for the following tasks:

- Linear Regression
- Binary Classification (Logistic Regression)
- Multiclass Classification (One-vs-Rest)

A custom `cross-validation` object is provided in order to perform grid search hyperparameter tuning (with any splitter from `scikit-learn`), and uses `multiprocessing` for parallelization (default `n_jobs = -1`).
Multiclass fitting is `not` parallelized in this version (that would be beneficial when a high number of cores is available, or when refitting the best estimator in the grid search object).

## Installation

The package can be installed from terminal with the command `pip install penlm`. Some of the estimators in `penlm` are obtained by directly wrapping `scikit-learn` classes, while the SACR, FLMs, and NNG are formulated using `Pyomo`, which in turn needs a `solver` to interface with. Depending on the estimator, the optimization problems are quadratic with equality and/or inequality constraints, and all the code was tested using the solver [Ipopt](https://doi.org/10.1007/s10107-004-0559-y). You just need to download the [executable binary](https://ampl.com/products/solvers/open-source/#ipopt), and then add the folder that contains it to your path (tested on Ubuntu).


## Usage

The following lines show how to fit an estimator with its own parameters and grid search object, by using a `StratifiedKFold` splitter:

```sh
import numpy as np
import penlm.grid_search as gs
from penlm.smoothly_adaptively_centered_ridge import SACRClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.datasets import load_wine
from sklearn.metrics import balanced_accuracy_score

X, Y = load_wine(return_X_y = True)

train_index = [i for i in range(100)]
test_index = [i for i in range(100, len(X))]

lambda_list = np.logspace(-5, 5, 10, base = 2)
phi_list = np.linspace(0, 1, 10)[1:]


estimator = SACRClassifier(solver = 'ipopt',
                           scale = True,
                           fit_intercept = True)
parameters = {'phi':phi_list,
              'lambda':lambda_list}
cv = StratifiedKFold(n_splits = 2, 
                     random_state = 46,
                     shuffle = True)              
grid_search = gs.GridSearchCV(estimator,
                              parameters,
                              cv,
                              scoring = balanced_accuracy_score,
                              verbose = False,
                              n_jobs = -1)
grid_search.fit(X[train_index],
                Y[train_index])
score = grid_search.score(X[test_index],
                          Y[test_index])
```
The test folder in the `github` repo contains two sample scripts that show how to use all the estimators (in both classification and regression tasks). In particular, for the Adaptive Lasso and the NNG you need to provide an initial coefficient vector as a `np.ndarray`, with the same shape as the one found in `scikit-learn` estimators (the test scripts fit a LogisticRegression/Ridge estimator and use the `estimator.coef_` object).
Moreover, regarding the `scoring`, all the estimators and the grid search class use `accuracy`/`R^2` as default scores (when the argument `scoring = None`), but you can provide any `Callable` scoring function found in `sklearn.metrics`. Beware that higher is better, and therefore when scoring with errors like `sklearn.metrics.mean_squared_error`, you need to wrap that in a custom function that changes its sign.

## Citing

We encourage the users to cite the original papers of the implemented estimators. 
In particular, the code published in this package has been used in the case studies of [this](https://doi.org/10.1016/j.jmva.2021.104882) paper.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bellibot/penlm",
    "name": "penlm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": null,
    "keywords": "Classification, Regression, Linear, Penalty",
    "author": "Edoardo Belli",
    "author_email": "iedobelli@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/da/87/5e86df3b989678b17400cbf3a13294c6d5a384ee3285576acf9a6321e84c/penlm-1.1.0.tar.gz",
    "platform": null,
    "description": "## penlm - Penalized Linear Models\n\n`penlm` is a Python package that implements a few penalty based linear models that aren't (currently) available in `scikit-learn`. All the models are implemented with the familiar `fit`/`predict`/`predict_proba`/`score` interface.\n\nThe supported estimators are:\n- Smoothly Adaptively Centered Ridge ([SACR](https://doi.org/10.1016/j.jmva.2021.104882))\n- Ridge penalty on first/second derivatives (Functional Linear Model)\n- [Adaptive Lasso](https://doi.org/10.1198/016214506000000735)\n- [Relaxed Lasso](https://doi.org/10.1016/j.csda.2006.12.019)\n- Broken Adaptive Ridge ([BAR](https://doi.org/10.1016/j.jmva.2018.08.007)) \n- Non-negative Garrote ([NNG](https://doi.org/10.2307/1269730))\n\nAll the estimators have `fit_intercept` and `scale` arguments (scaling is done with `sklearn.preprocessing.StandardScaler`) , and work for the following tasks:\n\n- Linear Regression\n- Binary Classification (Logistic Regression)\n- Multiclass Classification (One-vs-Rest)\n\nA custom `cross-validation` object is provided in order to perform grid search hyperparameter tuning (with any splitter from `scikit-learn`), and uses `multiprocessing` for parallelization (default `n_jobs = -1`).\nMulticlass fitting is `not` parallelized in this version (that would be beneficial when a high number of cores is available, or when refitting the best estimator in the grid search object).\n\n## Installation\n\nThe package can be installed from terminal with the command `pip install penlm`. Some of the estimators in `penlm` are obtained by directly wrapping `scikit-learn` classes, while the SACR, FLMs, and NNG are formulated using `Pyomo`, which in turn needs a `solver` to interface with. Depending on the estimator, the optimization problems are quadratic with equality and/or inequality constraints, and all the code was tested using the solver [Ipopt](https://doi.org/10.1007/s10107-004-0559-y). You just need to download the [executable binary](https://ampl.com/products/solvers/open-source/#ipopt), and then add the folder that contains it to your path (tested on Ubuntu).\n\n\n## Usage\n\nThe following lines show how to fit an estimator with its own parameters and grid search object, by using a `StratifiedKFold` splitter:\n\n```sh\nimport numpy as np\nimport penlm.grid_search as gs\nfrom penlm.smoothly_adaptively_centered_ridge import SACRClassifier\nfrom sklearn.model_selection import StratifiedKFold\nfrom sklearn.datasets import load_wine\nfrom sklearn.metrics import balanced_accuracy_score\n\nX, Y = load_wine(return_X_y = True)\n\ntrain_index = [i for i in range(100)]\ntest_index = [i for i in range(100, len(X))]\n\nlambda_list = np.logspace(-5, 5, 10, base = 2)\nphi_list = np.linspace(0, 1, 10)[1:]\n\n\nestimator = SACRClassifier(solver = 'ipopt',\n                           scale = True,\n                           fit_intercept = True)\nparameters = {'phi':phi_list,\n              'lambda':lambda_list}\ncv = StratifiedKFold(n_splits = 2, \n                     random_state = 46,\n                     shuffle = True)              \ngrid_search = gs.GridSearchCV(estimator,\n                              parameters,\n                              cv,\n                              scoring = balanced_accuracy_score,\n                              verbose = False,\n                              n_jobs = -1)\ngrid_search.fit(X[train_index],\n                Y[train_index])\nscore = grid_search.score(X[test_index],\n                          Y[test_index])\n```\nThe test folder in the `github` repo contains two sample scripts that show how to use all the estimators (in both classification and regression tasks). In particular, for the Adaptive Lasso and the NNG you need to provide an initial coefficient vector as a `np.ndarray`, with the same shape as the one found in `scikit-learn` estimators (the test scripts fit a LogisticRegression/Ridge estimator and use the `estimator.coef_` object).\nMoreover, regarding the `scoring`, all the estimators and the grid search class use `accuracy`/`R^2` as default scores (when the argument `scoring = None`), but you can provide any `Callable` scoring function found in `sklearn.metrics`. Beware that higher is better, and therefore when scoring with errors like `sklearn.metrics.mean_squared_error`, you need to wrap that in a custom function that changes its sign.\n\n## Citing\n\nWe encourage the users to cite the original papers of the implemented estimators. \nIn particular, the code published in this package has been used in the case studies of [this](https://doi.org/10.1016/j.jmva.2021.104882) paper.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Penalized Linear Models for Classification and Regression",
    "version": "1.1.0",
    "project_urls": {
        "Download": "https://github.com/bellibot/penlm/archive/refs/tags/v1.1.0.tar.gz",
        "Homepage": "https://github.com/bellibot/penlm"
    },
    "split_keywords": [
        "classification",
        " regression",
        " linear",
        " penalty"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "da875e86df3b989678b17400cbf3a13294c6d5a384ee3285576acf9a6321e84c",
                "md5": "30ec7b65608f8a6491f545eb840816d6",
                "sha256": "8dc9d93b0af4429e39a0897eb7e9fdb0c686cb61f05ef29c0fe50bf561ef53dc"
            },
            "downloads": -1,
            "filename": "penlm-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "30ec7b65608f8a6491f545eb840816d6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 13450,
            "upload_time": "2024-07-28T17:10:52",
            "upload_time_iso_8601": "2024-07-28T17:10:52.614231Z",
            "url": "https://files.pythonhosted.org/packages/da/87/5e86df3b989678b17400cbf3a13294c6d5a384ee3285576acf9a6321e84c/penlm-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-28 17:10:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bellibot",
    "github_project": "penlm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "penlm"
}
        
Elapsed time: 0.53992s