## muCART - Measure Inducing Classification and Regression Trees
`muCART` is a Python package that implements Measure Inducing Classification and Regression Trees for Functional Data.
The estimators are implemented with the familiar `fit`/`predict`/`score` interface, and also support multiple predictors of possibly different lengths (as a List of np.ndarray objects, one for each predictor). The following tasks are supported, based on the loss function inside each node of the tree:
- Regression (mse, mae)
- Binary and Multiclass Classification (gini, misclassification error, entropy)
A custom `cross-validation` object is provided in order to perform grid search hyperparameter tuning (with any splitter from `scikit-learn`), and uses `multiprocessing` for parallelization (default `n_jobs = -1`).
## Installation
The package can be installed from terminal with the command `pip install muCART`. Inside each node of the tree, the optimization problems (quadratic with equality and/or inequality constraints) are formulated using `Pyomo`, which in turn needs a `solver` to interface with. All the code was tested on Ubuntu using the solver [Ipopt](https://doi.org/10.1007/s10107-004-0559-y). You just need to download the [executable binary](https://ampl.com/products/solvers/open-source/#ipopt), and then add the folder that contains it to your path.
## Usage
The following lines show how to fit an estimator with its own parameters and grid search object, by using a `StratifiedKFold` splitter:
```sh
import numpy as np
import muCART.grid_search as gs
from muCART.mu_cart import muCARTClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.datasets import load_wine
from sklearn.metrics import balanced_accuracy_score
X, Y = load_wine(return_X_y = True)
train_index = [i for i in range(100)]
test_index = [i for i in range(100, len(X))]
# wrap the single predictor in a List
X = [X]
min_samples_leaf_list = [i for i in range(1,5)]
lambda_list = np.logspace(-5, 5, 10, base = 2)
solver_options = {'solver':'ipopt',
'max_iter':500}
estimator = muCARTClassifier(solver_options)
parameters = {'min_samples_leaf':min_samples_leaf_list,
'lambda':lambda_list,
'max_depth': [None]}
cv = StratifiedKFold(n_splits = 2,
random_state = 46,
shuffle = True)
grid_search = gs.GridSearchCV(estimator,
parameters,
cv,
scoring = balanced_accuracy_score,
verbose = False,
n_jobs = -1)
# extract train samples for each predictor
X_train = [X[i][train_index] for i in range(len(X))]
grid_search.fit(X_train,
Y[train_index])
# extract test samples for each predictor
X_test = [X[i][test_index] for i in range(len(X))]
score = grid_search.score(X_test,
Y[test_index])
```
The test folder in the `github` repo contains two sample scripts that show how to use the estimator in both classification and regression tasks. Regarding the `scoring`, both estimators and the grid search class use `accuracy`/`R^2` as default scores (when the argument `scoring = None`), but you can provide any `Callable` scoring function found in `sklearn.metrics`. Beware that higher is better, and therefore when scoring with errors like `sklearn.metrics.mean_squared_error`, you need to wrap that in a custom function that changes its sign.
## Citing
The code published in this package has been used in the case studies of [this](https://doi.org/10.1002/sam.11569) paper.
Raw data
{
"_id": null,
"home_page": "https://github.com/bellibot/muCART",
"name": "muCART",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.5",
"maintainer_email": "",
"keywords": "Decision Tree,Functional Data,Classification,Regression",
"author": "Edoardo Belli",
"author_email": "iedobelli@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/28/e4/897e76cc9c0896134f4a4f019cbd2634dcec32068e9c5edf4abf0b0ea3dd/muCART-1.0.2.tar.gz",
"platform": null,
"description": "## muCART - Measure Inducing Classification and Regression Trees\n\n`muCART` is a Python package that implements Measure Inducing Classification and Regression Trees for Functional Data.\n\nThe estimators are implemented with the familiar `fit`/`predict`/`score` interface, and also support multiple predictors of possibly different lengths (as a List of np.ndarray objects, one for each predictor). The following tasks are supported, based on the loss function inside each node of the tree:\n\n- Regression (mse, mae)\n- Binary and Multiclass Classification (gini, misclassification error, entropy)\n\nA custom `cross-validation` object is provided in order to perform grid search hyperparameter tuning (with any splitter from `scikit-learn`), and uses `multiprocessing` for parallelization (default `n_jobs = -1`).\n\n## Installation\n\nThe package can be installed from terminal with the command `pip install muCART`. Inside each node of the tree, the optimization problems (quadratic with equality and/or inequality constraints) are formulated using `Pyomo`, which in turn needs a `solver` to interface with. All the code was tested on Ubuntu using the solver [Ipopt](https://doi.org/10.1007/s10107-004-0559-y). You just need to download the [executable binary](https://ampl.com/products/solvers/open-source/#ipopt), and then add the folder that contains it to your path.\n\n\n## Usage\n\nThe following lines show how to fit an estimator with its own parameters and grid search object, by using a `StratifiedKFold` splitter:\n\n```sh\nimport numpy as np\nimport muCART.grid_search as gs\nfrom muCART.mu_cart import muCARTClassifier\nfrom sklearn.model_selection import StratifiedKFold\nfrom sklearn.datasets import load_wine\nfrom sklearn.metrics import balanced_accuracy_score\n\nX, Y = load_wine(return_X_y = True)\ntrain_index = [i for i in range(100)]\ntest_index = [i for i in range(100, len(X))]\n# wrap the single predictor in a List\nX = [X]\n\nmin_samples_leaf_list = [i for i in range(1,5)]\nlambda_list = np.logspace(-5, 5, 10, base = 2)\nsolver_options = {'solver':'ipopt',\n 'max_iter':500}\n\nestimator = muCARTClassifier(solver_options)\nparameters = {'min_samples_leaf':min_samples_leaf_list,\n 'lambda':lambda_list,\n 'max_depth': [None]}\ncv = StratifiedKFold(n_splits = 2,\n random_state = 46,\n shuffle = True)\ngrid_search = gs.GridSearchCV(estimator,\n parameters,\n cv,\n scoring = balanced_accuracy_score,\n verbose = False,\n n_jobs = -1)\n# extract train samples for each predictor\nX_train = [X[i][train_index] for i in range(len(X))]\ngrid_search.fit(X_train,\n Y[train_index])\n# extract test samples for each predictor\nX_test = [X[i][test_index] for i in range(len(X))]\nscore = grid_search.score(X_test,\n Y[test_index])\n```\nThe test folder in the `github` repo contains two sample scripts that show how to use the estimator in both classification and regression tasks. Regarding the `scoring`, both estimators and the grid search class use `accuracy`/`R^2` as default scores (when the argument `scoring = None`), but you can provide any `Callable` scoring function found in `sklearn.metrics`. Beware that higher is better, and therefore when scoring with errors like `sklearn.metrics.mean_squared_error`, you need to wrap that in a custom function that changes its sign.\n\n## Citing\n\nThe code published in this package has been used in the case studies of [this](https://doi.org/10.1002/sam.11569) paper.\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Measure Inducing Classification and Regression Trees",
"version": "1.0.2",
"split_keywords": [
"decision tree",
"functional data",
"classification",
"regression"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "28e4897e76cc9c0896134f4a4f019cbd2634dcec32068e9c5edf4abf0b0ea3dd",
"md5": "881d8862146662c4a08ec1d673b1a377",
"sha256": "0082f20b778870e6daaf76e288aaf858a4fbbca69319c45a015b4a8983f89287"
},
"downloads": -1,
"filename": "muCART-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "881d8862146662c4a08ec1d673b1a377",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 13946,
"upload_time": "2023-04-23T17:22:04",
"upload_time_iso_8601": "2023-04-23T17:22:04.685818Z",
"url": "https://files.pythonhosted.org/packages/28/e4/897e76cc9c0896134f4a4f019cbd2634dcec32068e9c5edf4abf0b0ea3dd/muCART-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-23 17:22:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "bellibot",
"github_project": "muCART",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mucart"
}