binarybeech

Name	binarybeech JSON
Version	0.3.1 JSON
	download
home_page
Summary	Simplistic algorithms to train decision trees for regression and classification
upload_time	2023-09-24 20:00:59
maintainer
docs_url	None
author	Armin Witte
requires_python	>=3.7
license	MIT License Copyright (c) 2023 arminwitte Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	decision tree machine learning supervised learning unsupervised learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # binarybeech
Simplistic algorithms to train decision trees for regression and classification

## Features

- Create binary trees using the CART (Classification and Regression Tree) algorithm.
- Train ensembles of trees using Gradient Boosting, Adaptive Boosting (AdaBoost) or Random Forest.
- Process each datatype with a data handler as provided or impöememted to suit your needs. Just add your own implementation to the factory.
- Metrics for different kinds of outcome variables are implemented analogously.
- Features with high cardinality are treated with a simulated annealing solver to find the best combination.
- No need for dummy encoding.
- Train models using supervised or unsupervised learning.
- Specify weights for unbalanced datasets.

> **NOTE:**  These pure python (and a bit of numpy) algorithms are many times slower than, e.g., `sklearn` or `xgboost`.

## Install

```
pip install binarybeech[visualize]
```
The dependencies installed using the visualize option enable support for plotting and formatting trees.

## Example

Load the Classification And Regression Tree model class

```
import pandas as pd
from binarybeech.binarybeech import CART
from binarybeech.extra import k_fold_split
```
get the data from a csv file
```
df = pd.read_csv("data/titanic.csv")
[(df_train, df_test)] = k_fold_split(df,frac=0.75,random=True,replace=False)
```
grow a decision tree
```
c = CART(df=df_train,y_name="Survived", method="classification")
c.create_tree()
```
predict
```
c.predict(df_test)
```
validation metrics
```
c.validate(df=df_test)
```

Please have a look at the jupyter notebooks in this repository for more examples. To try them out online, you can use [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/arminwitte/binarybeech/HEAD?labpath=notebooks%2Ftitanic.ipynb).

## Usage
### binarybeech.binarybeech.CART
**CART(df, y_name, X_names=None, min_leaf_samples=1, min_split_samples=1, max_depth=10, method="regression", handle_missings="simple", attribute_handlers=None)**

Class for a Classification and Regression Tree (CART) model.

* Parameters
    - **df**: pandas _dataframe_ with training data
    - **y_name**: name of the column with the output data/labels
    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.
    - **min_leaf_samples**: If the number of training samples is lower than this, a terminal node (leaf) is created. Default is 1.
    - **min_split_samples**: If a split of the training data is proposed with at least one branch containing less samples than this, the split is rejected. Default is 1.
    - **max_depth**: Maximum number of sequential splits. This corresponds to the number of vertical layers of the tree. Default is 10, which corresponds to a maximum number of 1024 terminal nodes.
    - **method**: Metrics to use for the evaluation of split loss, etc. Can be either "classification", "logistic", "regression", or _None_. Default is "regression". If _None_ is chosen, the `method` is deduced from the training _dataframe_.
    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or "simple".
    - **attribute_handlers**: _dict_ with attribute handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.
* Methods
    - **predict(df)**:
        + Parameters:
            * **df**: _dataframe_ with inputs for predictions.
        + Returns:
            * array with predicted values/labels.
    - **train(k=5, plot=True, slack=1.0)**:
        + Parameters:
            * **k**: number of different splits of the _dataframe_ into training and test sets for k-fold cross-validation.
            * **plot**: flag for plotting a diagram of the loss over cost complexity parameter alpha using _matplotlib_.
            * **slack**: the amount of slack granted in chosing the best cost complexity parameter alpha. It is given as multiplier for the standard deviation of the alpha at minimum loss and allows thus to chose an alpha that is probably larger to account for the uncertainty in the k-fold cross validation procedure.
        + Returns:
    - **create_tree(leaf_loss_threshold=1e-12)**
        + Returns
    - **prune(alpha_max=None, test_set=None, metrics_only=False)**
        + Parameters:
            * **alpha_max**: Stop the pruning procedure at this value of the cost complexity parameter alpha. If _None_, the tree is pruned down to its root giving the complete relationship between alpha and the loss. Default is _None_.
            * **test_set**: data set to use for the evaluation off the losses. If _None_, the training set is used. Default is _None_.
            * **metrics_only**: If _True_, pruning is performed on a copy of the tree, leaving the actual tree intact. Default is _False_
    - **validate(df=None)**
        + Parameters:
            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.
        + Returns:
            * _dict_ with metrics, e.g. accuracy or RSquared.
* Attributes
    - **tree**:

### binarybeech.binarybeech.GradientBoostedTree

**GradientBoostedTree(df, y_name, X_names=None, sample_frac=1, n_attributes=None, learning_rate=0.1, cart_settings={}, init_method="logistic", gamma=None, handle_missings="simple", s=None)**

Class for a Gradient Boosted Tree model.

* Parameters
    - **df**: pandas _dataframe_ with training data
    - **y_name**: name of the column with the output data/labels
    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.
    - **sample_frac**: fraction (0, 1] of the training data to use for the training of an individual tree of the ensemble. Default is 1.
    - **n_attributes**: number of attributes (elements of the X_names list) to use for the training of an individual tree of the ensemble. Default is _None_ which corresponds to all available attributes.
    - **learning_rate**: the shinkage parameter used to "downweight" individual trees of the ensemble. Default is 0.1.
    - **cart_settings**: _dict_ that is passed on to the constuctor of the individual tree (binarybeech.binarybeech.CART). For details cf. above.
    - **init_method**: Metrics to use for the evaluation of split loss, etc if the initial tree (stump). Can be either "classification", "logistic", "regression", or _None_. Default is "regression". If _None_ is chosen, the `method` is deduced from the training _dataframe_.
    - **gamma**: weight for individual trees of the ensemble. If _None_, the weight for each tree is chosen by line search minimizing the loss given by _init_method_.
    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or "simple".
    - **attribute_handlers**: _dict_ with data handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.
* Methods
    - **predict(df)**
        + Parameters:
            * **df**: _dataframe_ with inputs for predictions.
        + Returns:
            * array with predicted values/labels.
    - **train(M)**
        + Parameters:
            * **M**: Number of individual trees to create for the ensemble.
        + Returns:
    - **validate(df=None)**
        + Parameters:
            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.
        + Returns:
            * _dict_ with metrics, e.g. accuracy or RSquared.
* Attributes
    - **trees**

### binarybeech.binarybeech.AdaBoostTree

**AdaBoostTree(training_data=None, df=None, y_name=None, X_names=None, sample_frac=1, n_attributes=None, cart_settings={}, method="classification", handle_missings="simple", attribute_handlers=None, seed=None, algorithm_kwargs={})**

Class for a AdaBoost model using CARTs as weak learners.

* Parameters:
    - **training_data**: Preprocessed instance of class _TrainingData_.
    - **df**: pandas _dataframe_ with training data
    - **y_name**: name of the column with the output data/labels
    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.
    - **method**: Metrics to use for the evaluation of split loss, etc. Can be either "classification", "logistic", "regression", or _None_. Default is "regression". If _None_ is chosen, the `method` is deduced from the training _dataframe_.
    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or "simple".
    - **attribute_handlers**: _dict_ with attribute handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.
* Methods
    - **predict(df)**
        + Parameters:
            * **df**: _dataframe_ with inputs for predictions.
        + Returns:
            * array with predicted values/labels.
    - **train(M)**
        + Parameters:
            * **M**: Number of individual trees to create for the ensemble.
        + Returns:
    - **validate(df=None)**
        + Parameters:
            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.
        + Returns:
            * _dict_ with metrics, e.g. accuracy or RSquared.
    - **variable_importance()**:
        + Returns:
            * _dict_ with normalized importance values.
* Attributes

### binarybeech.binarybeech.RandomForest

**RandomForest(df, y_name, X_names=None, verbose=False, sample_frac=1, n_attributes=None, cart_settings={}, method="regression", handle_missings="simple", attribute_handlers=None)**

Class for a Random Forest model.

* Parameters
    - **df**: pandas _dataframe_ with training data
    - **y_name**: name of the column with the output data/labels
    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.
    - **verbose**: if set to _True_, status messages are sent to stdout. Default is _False_.
    - **sample_frac**: fraction (0, 1] of the training data to use for the training of an individual tree of the ensemble. Default is 1.
    - **n_attributes**: number of attributes (elements of the X_names list) to use for the training of an individual tree of the ensemble. Default is _None_ which corresponds to all available attributes.
    - **cart_settings**: _dict_ that is passed on to the constuctor of the individual tree (binarybeech.binarybeech.CART). For details cf. above.
    - **method**: Metrics to use for the evaluation of split loss, etc. Can be either "classification", "logistic", "regression", or _None_. Default is "regression". If _None_ is chosen, the `method` is deduced from the training _dataframe_.
    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or "simple".
    - **attribute_handlers**: _dict_ with attribute handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.
* Methods
    - **predict(df)**
        + Parameters:
            * **df**: _dataframe_ with inputs for predictions.
        + Returns:
            * array with predicted values/labels.
    - **train(M)**
        + Parameters:
            * **M**: Number of individual trees to create for the ensemble.
        + Returns:
    - **validate(df=None)**
        + Parameters:
            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.
        + Returns:
            * _dict_ with metrics, e.g. accuracy or RSquared.
    - **validate_oob()**:
        + Returns:
            * _dict_ with metrics, e.g. accuracy or RSquared.
    - **variable_importance()**:
        + Returns:
            * _dict_ with normalized importance values.
* Attributes

## Principle
Decision trees are, by design, data type agnostic. With only a few methods like _spliter_ for input variables and meaningful quantification for the _loss_, any data type can be perused. In this code, this is implemented using a factory pattern for _data handling_ and _metrics_ making decision tree learing simple and versatile.

For more information please feel free to take a look at the code.

## Performance

### Kaggle

## Sources

[Decision tree](https://en.m.wikipedia.org/wiki/Decision_tree)

[CART](https://de.m.wikipedia.org/wiki/CART_(Algorithmus))

[Gradient Boosted Tree](https://en.m.wikipedia.org/wiki/Gradient_boosting)

[Random Forest](https://de.m.wikipedia.org/wiki/Random_Forest)

[pruning](https://online.stat.psu.edu/stat508/lesson/11/11.8/11.8.2)

## Contributions
Contributions in the form of pull requests are always welcome.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "binarybeech",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "decision tree,machine learning,supervised learning,unsupervised learning",
    "author": "Armin Witte",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/5d/b9/5442102c5f2666482c8abad773eba078f244c91406e588279034d3eea2ee/binarybeech-0.3.1.tar.gz",
    "platform": null,
    "description": "# binarybeech\nSimplistic algorithms to train decision trees for regression and classification\n\n## Features\n\n- Create binary trees using the CART (Classification and Regression Tree) algorithm.\n- Train ensembles of trees using Gradient Boosting, Adaptive Boosting (AdaBoost) or Random Forest.\n- Process each datatype with a data handler as provided or imp\u00f6ememted to suit your needs. Just add your own implementation to the factory.\n- Metrics for different kinds of outcome variables are implemented analogously.\n- Features with high cardinality are treated with a simulated annealing solver to find the best combination.\n- No need for dummy encoding.\n- Train models using supervised or unsupervised learning.\n- Specify weights for unbalanced datasets.\n\n> **NOTE:**  These pure python (and a bit of numpy) algorithms are many times slower than, e.g., `sklearn` or `xgboost`.\n\n## Install\n\n```\npip install binarybeech[visualize]\n```\nThe dependencies installed using the visualize option enable support for plotting and formatting trees.\n\n## Example\n\nLoad the Classification And Regression Tree model class\n\n```\nimport pandas as pd\nfrom binarybeech.binarybeech import CART\nfrom binarybeech.extra import k_fold_split\n```\nget the data from a csv file\n```\ndf = pd.read_csv(\"data/titanic.csv\")\n[(df_train, df_test)] = k_fold_split(df,frac=0.75,random=True,replace=False)\n```\ngrow a decision tree\n```\nc = CART(df=df_train,y_name=\"Survived\", method=\"classification\")\nc.create_tree()\n```\npredict\n```\nc.predict(df_test)\n```\nvalidation metrics\n```\nc.validate(df=df_test)\n```\n\nPlease have a look at the jupyter notebooks in this repository for more examples. To try them out online, you can use [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/arminwitte/binarybeech/HEAD?labpath=notebooks%2Ftitanic.ipynb).\n\n## Usage\n### binarybeech.binarybeech.CART\n**CART(df, y_name, X_names=None, min_leaf_samples=1, min_split_samples=1, max_depth=10, method=\"regression\", handle_missings=\"simple\", attribute_handlers=None)**\n\nClass for a Classification and Regression Tree (CART) model.\n\n* Parameters\n    - **df**: pandas _dataframe_ with training data\n    - **y_name**: name of the column with the output data/labels\n    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.\n    - **min_leaf_samples**: If the number of training samples is lower than this, a terminal node (leaf) is created. Default is 1.\n    - **min_split_samples**: If a split of the training data is proposed with at least one branch containing less samples than this, the split is rejected. Default is 1.\n    - **max_depth**: Maximum number of sequential splits. This corresponds to the number of vertical layers of the tree. Default is 10, which corresponds to a maximum number of 1024 terminal nodes.\n    - **method**: Metrics to use for the evaluation of split loss, etc. Can be either \"classification\", \"logistic\", \"regression\", or _None_. Default is \"regression\". If _None_ is chosen, the `method` is deduced from the training _dataframe_.\n    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or \"simple\".\n    - **attribute_handlers**: _dict_ with attribute handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.\n* Methods\n    - **predict(df)**:\n        + Parameters:\n            * **df**: _dataframe_ with inputs for predictions.\n        + Returns:\n            * array with predicted values/labels.\n    - **train(k=5, plot=True, slack=1.0)**:\n        + Parameters:\n            * **k**: number of different splits of the _dataframe_ into training and test sets for k-fold cross-validation.\n            * **plot**: flag for plotting a diagram of the loss over cost complexity parameter alpha using _matplotlib_.\n            * **slack**: the amount of slack granted in chosing the best cost complexity parameter alpha. It is given as multiplier for the standard deviation of the alpha at minimum loss and allows thus to chose an alpha that is probably larger to account for the uncertainty in the k-fold cross validation procedure.\n        + Returns:\n    - **create_tree(leaf_loss_threshold=1e-12)**\n        + Returns\n    - **prune(alpha_max=None, test_set=None, metrics_only=False)**\n        + Parameters:\n            * **alpha_max**: Stop the pruning procedure at this value of the cost complexity parameter alpha. If _None_, the tree is pruned down to its root giving the complete relationship between alpha and the loss. Default is _None_.\n            * **test_set**: data set to use for the evaluation off the losses. If _None_, the training set is used. Default is _None_.\n            * **metrics_only**: If _True_, pruning is performed on a copy of the tree, leaving the actual tree intact. Default is _False_\n    - **validate(df=None)**\n        + Parameters:\n            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.\n        + Returns:\n            * _dict_ with metrics, e.g. accuracy or RSquared.\n* Attributes\n    - **tree**:\n\n### binarybeech.binarybeech.GradientBoostedTree\n\n**GradientBoostedTree(df, y_name, X_names=None, sample_frac=1, n_attributes=None, learning_rate=0.1, cart_settings={}, init_method=\"logistic\", gamma=None, handle_missings=\"simple\", s=None)**\n\nClass for a Gradient Boosted Tree model.\n\n* Parameters\n    - **df**: pandas _dataframe_ with training data\n    - **y_name**: name of the column with the output data/labels\n    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.\n    - **sample_frac**: fraction (0, 1] of the training data to use for the training of an individual tree of the ensemble. Default is 1.\n    - **n_attributes**: number of attributes (elements of the X_names list) to use for the training of an individual tree of the ensemble. Default is _None_ which corresponds to all available attributes.\n    - **learning_rate**: the shinkage parameter used to \"downweight\" individual trees of the ensemble. Default is 0.1.\n    - **cart_settings**: _dict_ that is passed on to the constuctor of the individual tree (binarybeech.binarybeech.CART). For details cf. above.\n    - **init_method**: Metrics to use for the evaluation of split loss, etc if the initial tree (stump). Can be either \"classification\", \"logistic\", \"regression\", or _None_. Default is \"regression\". If _None_ is chosen, the `method` is deduced from the training _dataframe_.\n    - **gamma**: weight for individual trees of the ensemble. If _None_, the weight for each tree is chosen by line search minimizing the loss given by _init_method_.\n    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or \"simple\".\n    - **attribute_handlers**: _dict_ with data handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.\n* Methods\n    - **predict(df)**\n        + Parameters:\n            * **df**: _dataframe_ with inputs for predictions.\n        + Returns:\n            * array with predicted values/labels.\n    - **train(M)**\n        + Parameters:\n            * **M**: Number of individual trees to create for the ensemble.\n        + Returns:\n    - **validate(df=None)**\n        + Parameters:\n            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.\n        + Returns:\n            * _dict_ with metrics, e.g. accuracy or RSquared.\n* Attributes\n    - **trees**\n\n### binarybeech.binarybeech.AdaBoostTree\n\n**AdaBoostTree(training_data=None, df=None, y_name=None, X_names=None, sample_frac=1, n_attributes=None, cart_settings={}, method=\"classification\", handle_missings=\"simple\", attribute_handlers=None, seed=None, algorithm_kwargs={})**\n\nClass for a AdaBoost model using CARTs as weak learners.\n\n* Parameters:\n    - **training_data**: Preprocessed instance of class _TrainingData_.\n    - **df**: pandas _dataframe_ with training data\n    - **y_name**: name of the column with the output data/labels\n    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.\n    - **method**: Metrics to use for the evaluation of split loss, etc. Can be either \"classification\", \"logistic\", \"regression\", or _None_. Default is \"regression\". If _None_ is chosen, the `method` is deduced from the training _dataframe_.\n    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or \"simple\".\n    - **attribute_handlers**: _dict_ with attribute handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.\n* Methods\n    - **predict(df)**\n        + Parameters:\n            * **df**: _dataframe_ with inputs for predictions.\n        + Returns:\n            * array with predicted values/labels.\n    - **train(M)**\n        + Parameters:\n            * **M**: Number of individual trees to create for the ensemble.\n        + Returns:\n    - **validate(df=None)**\n        + Parameters:\n            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.\n        + Returns:\n            * _dict_ with metrics, e.g. accuracy or RSquared.\n    - **variable_importance()**:\n        + Returns:\n            * _dict_ with normalized importance values.\n* Attributes\n\n### binarybeech.binarybeech.RandomForest\n\n**RandomForest(df, y_name, X_names=None, verbose=False, sample_frac=1, n_attributes=None, cart_settings={}, method=\"regression\", handle_missings=\"simple\", attribute_handlers=None)**\n\nClass for a Random Forest model.\n\n* Parameters\n    - **df**: pandas _dataframe_ with training data\n    - **y_name**: name of the column with the output data/labels\n    - **X_names**: _list_ of names with the inputs to use for the modelling. If _None_, all columns except y_name are chosen. Default is _None_.\n    - **verbose**: if set to _True_, status messages are sent to stdout. Default is _False_.\n    - **sample_frac**: fraction (0, 1] of the training data to use for the training of an individual tree of the ensemble. Default is 1.\n    - **n_attributes**: number of attributes (elements of the X_names list) to use for the training of an individual tree of the ensemble. Default is _None_ which corresponds to all available attributes.\n    - **cart_settings**: _dict_ that is passed on to the constuctor of the individual tree (binarybeech.binarybeech.CART). For details cf. above.\n    - **method**: Metrics to use for the evaluation of split loss, etc. Can be either \"classification\", \"logistic\", \"regression\", or _None_. Default is \"regression\". If _None_ is chosen, the `method` is deduced from the training _dataframe_.\n    - **handle_missings**: Specify the way how missing data is handeled. Can be eiter _None_ or \"simple\".\n    - **attribute_handlers**: _dict_ with attribute handler instances for each variable. The data handler determins, e.g., how splits of the dataset are made.\n* Methods\n    - **predict(df)**\n        + Parameters:\n            * **df**: _dataframe_ with inputs for predictions.\n        + Returns:\n            * array with predicted values/labels.\n    - **train(M)**\n        + Parameters:\n            * **M**: Number of individual trees to create for the ensemble.\n        + Returns:\n    - **validate(df=None)**\n        + Parameters:\n            * **df**: _dataframe_ to use for (cross-)validation. If _None_, the training set is used. Default is _None_.\n        + Returns:\n            * _dict_ with metrics, e.g. accuracy or RSquared.\n    - **validate_oob()**:\n        + Returns:\n            * _dict_ with metrics, e.g. accuracy or RSquared.\n    - **variable_importance()**:\n        + Returns:\n            * _dict_ with normalized importance values.\n* Attributes\n\n## Principle\nDecision trees are, by design, data type agnostic. With only a few methods like _spliter_ for input variables and meaningful quantification for the _loss_, any data type can be perused. In this code, this is implemented using a factory pattern for _data handling_ and _metrics_ making decision tree learing simple and versatile.\n\nFor more information please feel free to take a look at the code.\n\n## Performance\n\n### Kaggle\n\n## Sources\n\n[Decision tree](https://en.m.wikipedia.org/wiki/Decision_tree)\n\n[CART](https://de.m.wikipedia.org/wiki/CART_(Algorithmus))\n\n[Gradient Boosted Tree](https://en.m.wikipedia.org/wiki/Gradient_boosting)\n\n[Random Forest](https://de.m.wikipedia.org/wiki/Random_Forest)\n\n[pruning](https://online.stat.psu.edu/stat508/lesson/11/11.8/11.8.2)\n\n## Contributions\nContributions in the form of pull requests are always welcome.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2023 arminwitte  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Simplistic algorithms to train decision trees for regression and classification",
    "version": "0.3.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/arminwitte/binarybeech/issues",
        "Homepage": "https://github.com/arminwitte/binarybeech"
    },
    "split_keywords": [
        "decision tree",
        "machine learning",
        "supervised learning",
        "unsupervised learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "56c0a0e7f4f204cad93793bbd5c75af07d5c5553a5c4bf347a4abc8e710f62ad",
                "md5": "43dce813e876b9270ccc89acfbeaa650",
                "sha256": "07a26171b3e195dad59d4b30ec040676965b55983bf0c14d9fd2ab216ba7f80c"
            },
            "downloads": -1,
            "filename": "binarybeech-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "43dce813e876b9270ccc89acfbeaa650",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 28325,
            "upload_time": "2023-09-24T20:00:57",
            "upload_time_iso_8601": "2023-09-24T20:00:57.649174Z",
            "url": "https://files.pythonhosted.org/packages/56/c0/a0e7f4f204cad93793bbd5c75af07d5c5553a5c4bf347a4abc8e710f62ad/binarybeech-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5db95442102c5f2666482c8abad773eba078f244c91406e588279034d3eea2ee",
                "md5": "541d37ae4ade7922c623804b99ec53ee",
                "sha256": "bc99c73f4d0d5bbfa46bccb82593d5195af902033c41f613ffca865226fa5596"
            },
            "downloads": -1,
            "filename": "binarybeech-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "541d37ae4ade7922c623804b99ec53ee",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 770637,
            "upload_time": "2023-09-24T20:00:59",
            "upload_time_iso_8601": "2023-09-24T20:00:59.382959Z",
            "url": "https://files.pythonhosted.org/packages/5d/b9/5442102c5f2666482c8abad773eba078f244c91406e588279034d3eea2ee/binarybeech-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-24 20:00:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "arminwitte",
    "github_project": "binarybeech",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "binarybeech"
}

Armin Witte