muCART


NamemuCART JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/bellibot/muCART
SummaryMeasure Inducing Classification and Regression Trees
upload_time2023-04-23 17:22:04
maintainer
docs_urlNone
authorEdoardo Belli
requires_python>=3.5
licenseMIT
keywords decision tree functional data classification regression
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## muCART - Measure Inducing Classification and Regression Trees

`muCART` is a Python package that implements Measure Inducing Classification and Regression Trees for Functional Data.

The estimators are implemented with the familiar `fit`/`predict`/`score` interface, and also support multiple predictors of possibly different lengths (as a List of np.ndarray objects, one for each predictor). The following tasks are supported, based on the loss function inside each node of the tree:

- Regression (mse, mae)
- Binary and Multiclass Classification (gini, misclassification error, entropy)

A custom `cross-validation` object is provided in order to perform grid search hyperparameter tuning (with any splitter from `scikit-learn`), and uses `multiprocessing` for parallelization (default `n_jobs = -1`).

## Installation

The package can be installed from terminal with the command `pip install muCART`. Inside each node of the tree, the optimization problems (quadratic with equality and/or inequality constraints) are formulated using `Pyomo`, which in turn needs a `solver` to interface with. All the code was tested on Ubuntu using the solver [Ipopt](https://doi.org/10.1007/s10107-004-0559-y). You just need to download the [executable binary](https://ampl.com/products/solvers/open-source/#ipopt), and then add the folder that contains it to your path.


## Usage

The following lines show how to fit an estimator with its own parameters and grid search object, by using a `StratifiedKFold` splitter:

```sh
import numpy as np
import muCART.grid_search as gs
from muCART.mu_cart import muCARTClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.datasets import load_wine
from sklearn.metrics import balanced_accuracy_score

X, Y = load_wine(return_X_y = True)
train_index = [i for i in range(100)]
test_index = [i for i in range(100, len(X))]
# wrap the single predictor in a List
X = [X]

min_samples_leaf_list = [i for i in range(1,5)]
lambda_list = np.logspace(-5, 5, 10, base = 2)
solver_options = {'solver':'ipopt',
                  'max_iter':500}

estimator = muCARTClassifier(solver_options)
parameters = {'min_samples_leaf':min_samples_leaf_list,
              'lambda':lambda_list,
              'max_depth': [None]}
cv = StratifiedKFold(n_splits = 2,
                     random_state = 46,
                     shuffle = True)
grid_search = gs.GridSearchCV(estimator,
                              parameters,
                              cv,
                              scoring = balanced_accuracy_score,
                              verbose = False,
                              n_jobs = -1)
# extract train samples for each predictor
X_train = [X[i][train_index] for i in range(len(X))]
grid_search.fit(X_train,
                Y[train_index])
# extract test samples for each predictor
X_test = [X[i][test_index] for i in range(len(X))]
score = grid_search.score(X_test,
                          Y[test_index])
```
The test folder in the `github` repo contains two sample scripts that show how to use the estimator in both classification and regression tasks. Regarding the `scoring`, both estimators and the grid search class use `accuracy`/`R^2` as default scores (when the argument `scoring = None`), but you can provide any `Callable` scoring function found in `sklearn.metrics`. Beware that higher is better, and therefore when scoring with errors like `sklearn.metrics.mean_squared_error`, you need to wrap that in a custom function that changes its sign.

## Citing

The code published in this package has been used in the case studies of [this](https://doi.org/10.1002/sam.11569) paper.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bellibot/muCART",
    "name": "muCART",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "Decision Tree,Functional Data,Classification,Regression",
    "author": "Edoardo Belli",
    "author_email": "iedobelli@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/28/e4/897e76cc9c0896134f4a4f019cbd2634dcec32068e9c5edf4abf0b0ea3dd/muCART-1.0.2.tar.gz",
    "platform": null,
    "description": "## muCART - Measure Inducing Classification and Regression Trees\n\n`muCART` is a Python package that implements Measure Inducing Classification and Regression Trees for Functional Data.\n\nThe estimators are implemented with the familiar `fit`/`predict`/`score` interface, and also support multiple predictors of possibly different lengths (as a List of np.ndarray objects, one for each predictor). The following tasks are supported, based on the loss function inside each node of the tree:\n\n- Regression (mse, mae)\n- Binary and Multiclass Classification (gini, misclassification error, entropy)\n\nA custom `cross-validation` object is provided in order to perform grid search hyperparameter tuning (with any splitter from `scikit-learn`), and uses `multiprocessing` for parallelization (default `n_jobs = -1`).\n\n## Installation\n\nThe package can be installed from terminal with the command `pip install muCART`. Inside each node of the tree, the optimization problems (quadratic with equality and/or inequality constraints) are formulated using `Pyomo`, which in turn needs a `solver` to interface with. All the code was tested on Ubuntu using the solver [Ipopt](https://doi.org/10.1007/s10107-004-0559-y). You just need to download the [executable binary](https://ampl.com/products/solvers/open-source/#ipopt), and then add the folder that contains it to your path.\n\n\n## Usage\n\nThe following lines show how to fit an estimator with its own parameters and grid search object, by using a `StratifiedKFold` splitter:\n\n```sh\nimport numpy as np\nimport muCART.grid_search as gs\nfrom muCART.mu_cart import muCARTClassifier\nfrom sklearn.model_selection import StratifiedKFold\nfrom sklearn.datasets import load_wine\nfrom sklearn.metrics import balanced_accuracy_score\n\nX, Y = load_wine(return_X_y = True)\ntrain_index = [i for i in range(100)]\ntest_index = [i for i in range(100, len(X))]\n# wrap the single predictor in a List\nX = [X]\n\nmin_samples_leaf_list = [i for i in range(1,5)]\nlambda_list = np.logspace(-5, 5, 10, base = 2)\nsolver_options = {'solver':'ipopt',\n                  'max_iter':500}\n\nestimator = muCARTClassifier(solver_options)\nparameters = {'min_samples_leaf':min_samples_leaf_list,\n              'lambda':lambda_list,\n              'max_depth': [None]}\ncv = StratifiedKFold(n_splits = 2,\n                     random_state = 46,\n                     shuffle = True)\ngrid_search = gs.GridSearchCV(estimator,\n                              parameters,\n                              cv,\n                              scoring = balanced_accuracy_score,\n                              verbose = False,\n                              n_jobs = -1)\n# extract train samples for each predictor\nX_train = [X[i][train_index] for i in range(len(X))]\ngrid_search.fit(X_train,\n                Y[train_index])\n# extract test samples for each predictor\nX_test = [X[i][test_index] for i in range(len(X))]\nscore = grid_search.score(X_test,\n                          Y[test_index])\n```\nThe test folder in the `github` repo contains two sample scripts that show how to use the estimator in both classification and regression tasks. Regarding the `scoring`, both estimators and the grid search class use `accuracy`/`R^2` as default scores (when the argument `scoring = None`), but you can provide any `Callable` scoring function found in `sklearn.metrics`. Beware that higher is better, and therefore when scoring with errors like `sklearn.metrics.mean_squared_error`, you need to wrap that in a custom function that changes its sign.\n\n## Citing\n\nThe code published in this package has been used in the case studies of [this](https://doi.org/10.1002/sam.11569) paper.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Measure Inducing Classification and Regression Trees",
    "version": "1.0.2",
    "split_keywords": [
        "decision tree",
        "functional data",
        "classification",
        "regression"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "28e4897e76cc9c0896134f4a4f019cbd2634dcec32068e9c5edf4abf0b0ea3dd",
                "md5": "881d8862146662c4a08ec1d673b1a377",
                "sha256": "0082f20b778870e6daaf76e288aaf858a4fbbca69319c45a015b4a8983f89287"
            },
            "downloads": -1,
            "filename": "muCART-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "881d8862146662c4a08ec1d673b1a377",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 13946,
            "upload_time": "2023-04-23T17:22:04",
            "upload_time_iso_8601": "2023-04-23T17:22:04.685818Z",
            "url": "https://files.pythonhosted.org/packages/28/e4/897e76cc9c0896134f4a4f019cbd2634dcec32068e9c5edf4abf0b0ea3dd/muCART-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-23 17:22:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "bellibot",
    "github_project": "muCART",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mucart"
}
        
Elapsed time: 1.16867s