custom-decision-trees


Namecustom-decision-trees JSON
Version 3.0.0 PyPI version JSON
download
home_pagehttps://github.com/AntoinePinto/custom-decision-trees
SummaryA package for building customizable decision trees and random forests.
upload_time2025-09-16 14:47:52
maintainerNone
docs_urlNone
authorAntoine Pinto
requires_python>=3.10
licenseMIT
keywords machine learning decision trees random forest customization classification custom splitting criteria
VCS
bugtrack_url
requirements joblib matplotlib numpy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<h1 align="center">
  <a><img src="https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/logo.png?raw=true" width="80"></a>
  <br>
  <b>Custom Decision Trees</b>
  <br>
</h1>

![Static Badge](https://img.shields.io/badge/python->=3.10-blue)
![GitHub License](https://img.shields.io/github/license/AntoinePinto/StringPairFinder)
![PyPI - Downloads](https://img.shields.io/pypi/dm/custom-decision-trees)
![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)

</div>

**Custom Decision Trees** is a Python package that lets you build Decision Trees / Random Forests models with advanced configuration :

## Main Features

### Splitting criteria customization

Define your own cutting criteria in Python language (documentation in the following sections). 

This feature is particularly useful in "cost-dependent" scenarios. Examples:

- **Trading Movements Classification:** When the goal is to maximize economic profit, the metric can be set to economic profit, optimizing tree splitting accordingly.
- **Churn Prediction:** To minimize false negatives, metrics like F1 score or recall can guide the splitting process.
- **Fraud Detection:** Splitting can be optimized based on the proportion of fraudulent transactions identified relative to the total, rather than overall classification accuracy.
- **Marketing Campaigns:** The splitting can focus on maximizing expected revenue from customer segments identified by the tree.

### Multi-conditional node splitting

Allow trees to split nodes with one or more simultaneous conditions.

Example of multi-condition splitting on the Titanic dataset:

![Multi Conditional Node Splitting](https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/multi-condition-splitting.png?raw=true)

### Other features

*   Supports classification and regression
*   Supports multiclass classification
*   Supports standard decision tree parameters (max_depth, min_samples_split, max_features, n_estimators, etc.)
*   Supports STRING type explanatory variables
*   Ability to control the number of variable splitting options when optimizing a split (i.e `nb_max_cut_options_per_var` parameter).
*   Ability to control the maximum number of splits to be tested per node to avoid overly long calculations in multi-condition mode (i.e `nb_max_split_options_per_node` parameters)
*   Possibility of parallelizing calculations (i.e `n_jobs` parameters)

## Reminder on splitting criteria

Splitting in a decision tree is achieved by **optimizing a metric**. For example, Gini optimization consists in **maximizing** the $\Delta_{Gini}$ :

*   **The Gini Index** represents the impurity of a group of observations based on the observations of each class (0 and 1):

$$ I_{Gini} = 1 - p_0^2 - p_1^2 $$

*   The metric to be maximized is $\Delta_{Gini}$, the difference between **the Gini index on the parent node** and **the weighted average of the Gini index between the two child nodes** ($L$ and $R$).

$$ \Delta_{Gini} = I_{Gini} - \frac{N_L * I_{Gini_L}}{N} - \frac{N_R * I_{Gini_R}}{N} $$

At each node, the tree algorithm finds the split that minimizes $\Delta$ over all possible splits and over all features. Once the optimal split is selected, the tree is grown by recursively applying this splitting process to the resulting child nodes.

## Usage

> See `./notebooks/` folder for complete examples.

### Installation

```
pip install custom-decision-trees
```

### Define your metric

To integrate a specific measure, the user must define a class containing the `compute_metric` and `compute_delta` methods, then insert this class into the classifier.

Example of a class with the Gini index :

```python
import numpy as np

from custom_decision_trees.metrics import MetricBase


class Gini(MetricBase):

    def __init__(
            self,
            n_classes: int = 2,
        ) -> None:
        
        self.n_classes = n_classes
        self.max_impurity = 1 - 1 / n_classes

    def compute_gini(
            self,
            metric_data: np.ndarray,
        ) -> float:

        y = metric_data[:, 0]
        
        nb_obs = len(y)

        if nb_obs == 0:
            return self.max_impurity

        props = [(np.sum(y == i) / nb_obs) for i in range(self.n_classes)]

        metric = 1 - np.sum([prop**2 for prop in props])

        return float(metric)

    def compute_metric(
            self,
            metric_data: np.ndarray,
            mask: np.ndarray,
        ):

        gini_parent = self.compute_gini(metric_data)
        gini_side1 = self.compute_gini(metric_data[mask])
        gini_side2 = self.compute_gini(metric_data[~mask])

        delta = (
            gini_parent -
            gini_side1 * np.mean(mask) -
            gini_side2 * (1 - np.mean(mask))
        )

        metadata = {"gini": round(gini_side1, 3)}

        return float(delta), metadata
```

### Train and predict

Once you have instantiated the model with your custom metric, all you have to do is use the `.fit` and `.predict_proba` methods:

```python
from custom_decision_trees import DecisionTreeClassifier

gini = Gini()

decision_tree = DecisionTreeClassifier(
    metric=gini,
    max_depth=2,
    nb_max_conditions_per_node=2 # Set to 1 for a traditional decision tree
)

decision_tree.fit(
    X=X_train,
    y=y_train,
    metric_data=metric_data,
)

probas = model.predict_probas(
    X=X_test
)

probas[:5]
```

```
>>> array([[0.75308642, 0.24691358],
           [0.36206897, 0.63793103],
           [0.75308642, 0.24691358],
           [0.36206897, 0.63793103],
           [0.90243902, 0.09756098]])
```

## Print the tree

You can also display the decision tree, with the values of your metrics, using the `print_tree` method:

```python
decision_tree.print_tree(
    feature_names=features,
    metric_name="MyMetric",
)
```

```
>>> [0] 712 obs -> MyMetric = 0.0
    |   [1] (x["Sex"] <= 0.0) AND (x["Pclass"] <= 2.0) | 157 obs -> MyMetric = 0.16
    |   |   [3] (x["Age"] <= 2.0) AND (x["Fare"] > 26.55) | 1 obs -> MyMetric = 0.01
    |   |   [4] (x["Age"] > 2.0) OR (x["Fare"] <= 26.55) | 156 obs -> MyMetric = 0.01
    |   [2] (x["Sex"] > 0.0) OR (x["Pclass"] > 2.0) | 555 obs -> MyMetric = 0.16
    |   |   [5] (x["SibSp"] <= 2.0) AND (x["Age"] <= 8.75) | 27 obs -> MyMetric = 0.05
    |   |   [6] (x["SibSp"] > 2.0) OR (x["Age"] > 8.75) | 528 obs -> MyMetric = 0.05
```

## Plot the tree

```python
decision_tree.plot_tree(
    feature_names=features,
    metric_name="delta gini",
)
```

![Multi Conditional Node Splitting](https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/multi-condition-splitting.png?raw=true)

### Random Forest

Same with Random Forest Classifier :

```python
from custom_decision_trees import RandomForestClassifier

random_forest = RandomForest(
    metric=gini,
    n_estimators=10,
    max_depth=2,
    nb_max_conditions_per_node=2,
)

random_forest.fit(
    X=X_train, 
    y=y_train, 
    metric_data=metric_data
)

probas = random_forest.predict_probas(
    X=X_test
)
```

### Regression

The "regression" mode is used in exactly the same way as "classification", i.e., by specifying the metric from a Python class.

```python
your_metric = YourMetric()

decision_tree_regressor = DecisionTreeRegressor(
    metric=your_metric,
    max_depth=2,
    min_samples_split=2,
    min_samples_leaf=1,
    max_features=None,
    nb_max_conditions_per_node=2,
    nb_max_cut_options_per_var=10,
    n_jobs=1
)

decision_tree_regressor.fit(
    X=X,
    y=y,
    metric_data=metric_data,
)
```

**See `/notebooks` folder for complete examples.**

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AntoinePinto/custom-decision-trees",
    "name": "custom-decision-trees",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "machine learning, decision trees, random forest, customization, classification, custom splitting criteria",
    "author": "Antoine Pinto",
    "author_email": "antoine.pinto1@outlook.fr",
    "download_url": "https://files.pythonhosted.org/packages/3d/ba/0887b1a81f774a6f4dd19cb889ca6cec577164aaa6892e77135003765891/custom_decision_trees-3.0.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\r\n<h1 align=\"center\">\r\n  <a><img src=\"https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/logo.png?raw=true\" width=\"80\"></a>\r\n  <br>\r\n  <b>Custom Decision Trees</b>\r\n  <br>\r\n</h1>\r\n\r\n![Static Badge](https://img.shields.io/badge/python->=3.10-blue)\r\n![GitHub License](https://img.shields.io/github/license/AntoinePinto/StringPairFinder)\r\n![PyPI - Downloads](https://img.shields.io/pypi/dm/custom-decision-trees)\r\n![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)\r\n\r\n</div>\r\n\r\n**Custom Decision Trees** is a Python package that lets you build Decision Trees / Random Forests models with advanced configuration :\r\n\r\n## Main Features\r\n\r\n### Splitting criteria customization\r\n\r\nDefine your own cutting criteria in Python language (documentation in the following sections). \r\n\r\nThis feature is particularly useful in \"cost-dependent\" scenarios. Examples:\r\n\r\n- **Trading Movements Classification:** When the goal is to maximize economic profit, the metric can be set to economic profit, optimizing tree splitting accordingly.\r\n- **Churn Prediction:** To minimize false negatives, metrics like F1 score or recall can guide the splitting process.\r\n- **Fraud Detection:** Splitting can be optimized based on the proportion of fraudulent transactions identified relative to the total, rather than overall classification accuracy.\r\n- **Marketing Campaigns:** The splitting can focus on maximizing expected revenue from customer segments identified by the tree.\r\n\r\n### Multi-conditional node splitting\r\n\r\nAllow trees to split nodes with one or more simultaneous conditions.\r\n\r\nExample of multi-condition splitting on the Titanic dataset:\r\n\r\n![Multi Conditional Node Splitting](https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/multi-condition-splitting.png?raw=true)\r\n\r\n### Other features\r\n\r\n*   Supports classification and regression\r\n*   Supports multiclass classification\r\n*   Supports standard decision tree parameters (max_depth, min_samples_split, max_features, n_estimators, etc.)\r\n*   Supports STRING type explanatory variables\r\n*   Ability to control the number of variable splitting options when optimizing a split (i.e `nb_max_cut_options_per_var` parameter).\r\n*   Ability to control the maximum number of splits to be tested per node to avoid overly long calculations in multi-condition mode (i.e `nb_max_split_options_per_node` parameters)\r\n*   Possibility of parallelizing calculations (i.e `n_jobs` parameters)\r\n\r\n## Reminder on splitting criteria\r\n\r\nSplitting in a decision tree is achieved by **optimizing a metric**. For example, Gini optimization consists in **maximizing** the $\\Delta_{Gini}$ :\r\n\r\n*   **The Gini Index** represents the impurity of a group of observations based on the observations of each class (0 and 1):\r\n\r\n$$ I_{Gini} = 1 - p_0^2 - p_1^2 $$\r\n\r\n*   The metric to be maximized is $\\Delta_{Gini}$, the difference between **the Gini index on the parent node** and **the weighted average of the Gini index between the two child nodes** ($L$ and $R$).\r\n\r\n$$ \\Delta_{Gini} = I_{Gini} - \\frac{N_L * I_{Gini_L}}{N} - \\frac{N_R * I_{Gini_R}}{N} $$\r\n\r\nAt each node, the tree algorithm finds the split that minimizes $\\Delta$ over all possible splits and over all features. Once the optimal split is selected, the tree is grown by recursively applying this splitting process to the resulting child nodes.\r\n\r\n## Usage\r\n\r\n> See `./notebooks/` folder for complete examples.\r\n\r\n### Installation\r\n\r\n```\r\npip install custom-decision-trees\r\n```\r\n\r\n### Define your metric\r\n\r\nTo integrate a specific measure, the user must define a class containing the `compute_metric` and `compute_delta` methods, then insert this class into the classifier.\r\n\r\nExample of a class with the Gini index :\r\n\r\n```python\r\nimport numpy as np\r\n\r\nfrom custom_decision_trees.metrics import MetricBase\r\n\r\n\r\nclass Gini(MetricBase):\r\n\r\n    def __init__(\r\n            self,\r\n            n_classes: int = 2,\r\n        ) -> None:\r\n        \r\n        self.n_classes = n_classes\r\n        self.max_impurity = 1 - 1 / n_classes\r\n\r\n    def compute_gini(\r\n            self,\r\n            metric_data: np.ndarray,\r\n        ) -> float:\r\n\r\n        y = metric_data[:, 0]\r\n        \r\n        nb_obs = len(y)\r\n\r\n        if nb_obs == 0:\r\n            return self.max_impurity\r\n\r\n        props = [(np.sum(y == i) / nb_obs) for i in range(self.n_classes)]\r\n\r\n        metric = 1 - np.sum([prop**2 for prop in props])\r\n\r\n        return float(metric)\r\n\r\n    def compute_metric(\r\n            self,\r\n            metric_data: np.ndarray,\r\n            mask: np.ndarray,\r\n        ):\r\n\r\n        gini_parent = self.compute_gini(metric_data)\r\n        gini_side1 = self.compute_gini(metric_data[mask])\r\n        gini_side2 = self.compute_gini(metric_data[~mask])\r\n\r\n        delta = (\r\n            gini_parent -\r\n            gini_side1 * np.mean(mask) -\r\n            gini_side2 * (1 - np.mean(mask))\r\n        )\r\n\r\n        metadata = {\"gini\": round(gini_side1, 3)}\r\n\r\n        return float(delta), metadata\r\n```\r\n\r\n### Train and predict\r\n\r\nOnce you have instantiated the model with your custom metric, all you have to do is use the `.fit` and `.predict_proba` methods:\r\n\r\n```python\r\nfrom custom_decision_trees import DecisionTreeClassifier\r\n\r\ngini = Gini()\r\n\r\ndecision_tree = DecisionTreeClassifier(\r\n    metric=gini,\r\n    max_depth=2,\r\n    nb_max_conditions_per_node=2 # Set to 1 for a traditional decision tree\r\n)\r\n\r\ndecision_tree.fit(\r\n    X=X_train,\r\n    y=y_train,\r\n    metric_data=metric_data,\r\n)\r\n\r\nprobas = model.predict_probas(\r\n    X=X_test\r\n)\r\n\r\nprobas[:5]\r\n```\r\n\r\n```\r\n>>> array([[0.75308642, 0.24691358],\r\n           [0.36206897, 0.63793103],\r\n           [0.75308642, 0.24691358],\r\n           [0.36206897, 0.63793103],\r\n           [0.90243902, 0.09756098]])\r\n```\r\n\r\n## Print the tree\r\n\r\nYou can also display the decision tree, with the values of your metrics, using the `print_tree` method:\r\n\r\n```python\r\ndecision_tree.print_tree(\r\n    feature_names=features,\r\n    metric_name=\"MyMetric\",\r\n)\r\n```\r\n\r\n```\r\n>>> [0] 712 obs -> MyMetric = 0.0\r\n    |   [1] (x[\"Sex\"] <= 0.0) AND (x[\"Pclass\"] <= 2.0) | 157 obs -> MyMetric = 0.16\r\n    |   |   [3] (x[\"Age\"] <= 2.0) AND (x[\"Fare\"] > 26.55) | 1 obs -> MyMetric = 0.01\r\n    |   |   [4] (x[\"Age\"] > 2.0) OR (x[\"Fare\"] <= 26.55) | 156 obs -> MyMetric = 0.01\r\n    |   [2] (x[\"Sex\"] > 0.0) OR (x[\"Pclass\"] > 2.0) | 555 obs -> MyMetric = 0.16\r\n    |   |   [5] (x[\"SibSp\"] <= 2.0) AND (x[\"Age\"] <= 8.75) | 27 obs -> MyMetric = 0.05\r\n    |   |   [6] (x[\"SibSp\"] > 2.0) OR (x[\"Age\"] > 8.75) | 528 obs -> MyMetric = 0.05\r\n```\r\n\r\n## Plot the tree\r\n\r\n```python\r\ndecision_tree.plot_tree(\r\n    feature_names=features,\r\n    metric_name=\"delta gini\",\r\n)\r\n```\r\n\r\n![Multi Conditional Node Splitting](https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/multi-condition-splitting.png?raw=true)\r\n\r\n### Random Forest\r\n\r\nSame with Random Forest Classifier :\r\n\r\n```python\r\nfrom custom_decision_trees import RandomForestClassifier\r\n\r\nrandom_forest = RandomForest(\r\n    metric=gini,\r\n    n_estimators=10,\r\n    max_depth=2,\r\n    nb_max_conditions_per_node=2,\r\n)\r\n\r\nrandom_forest.fit(\r\n    X=X_train, \r\n    y=y_train, \r\n    metric_data=metric_data\r\n)\r\n\r\nprobas = random_forest.predict_probas(\r\n    X=X_test\r\n)\r\n```\r\n\r\n### Regression\r\n\r\nThe \"regression\" mode is used in exactly the same way as \"classification\", i.e., by specifying the metric from a Python class.\r\n\r\n```python\r\nyour_metric = YourMetric()\r\n\r\ndecision_tree_regressor = DecisionTreeRegressor(\r\n    metric=your_metric,\r\n    max_depth=2,\r\n    min_samples_split=2,\r\n    min_samples_leaf=1,\r\n    max_features=None,\r\n    nb_max_conditions_per_node=2,\r\n    nb_max_cut_options_per_var=10,\r\n    n_jobs=1\r\n)\r\n\r\ndecision_tree_regressor.fit(\r\n    X=X,\r\n    y=y,\r\n    metric_data=metric_data,\r\n)\r\n```\r\n\r\n**See `/notebooks` folder for complete examples.**\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A package for building customizable decision trees and random forests.",
    "version": "3.0.0",
    "project_urls": {
        "Homepage": "https://github.com/AntoinePinto/custom-decision-trees",
        "Source Code": "https://github.com/AntoinePinto/custom-decision-trees"
    },
    "split_keywords": [
        "machine learning",
        " decision trees",
        " random forest",
        " customization",
        " classification",
        " custom splitting criteria"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "415e1e42709b96d1e90bc676c79fdd91229ca2064575fe98243523cca33c248c",
                "md5": "5ddd62e5fda5a7cdcf61d3810708808d",
                "sha256": "f333d47b392b2b6ad2be8abaf07f0d6e9bed84cb80dd9856c4bdb2bdff7d1cc1"
            },
            "downloads": -1,
            "filename": "custom_decision_trees-3.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ddd62e5fda5a7cdcf61d3810708808d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 31480,
            "upload_time": "2025-09-16T14:47:51",
            "upload_time_iso_8601": "2025-09-16T14:47:51.623804Z",
            "url": "https://files.pythonhosted.org/packages/41/5e/1e42709b96d1e90bc676c79fdd91229ca2064575fe98243523cca33c248c/custom_decision_trees-3.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3dba0887b1a81f774a6f4dd19cb889ca6cec577164aaa6892e77135003765891",
                "md5": "e9946129b8a5e7bcb8404240e7a9ba51",
                "sha256": "58f889df6c631731215b7d9ac43bead87050f3f7f62526b68864b777426b80b2"
            },
            "downloads": -1,
            "filename": "custom_decision_trees-3.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e9946129b8a5e7bcb8404240e7a9ba51",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 23874,
            "upload_time": "2025-09-16T14:47:52",
            "upload_time_iso_8601": "2025-09-16T14:47:52.828119Z",
            "url": "https://files.pythonhosted.org/packages/3d/ba/0887b1a81f774a6f4dd19cb889ca6cec577164aaa6892e77135003765891/custom_decision_trees-3.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-16 14:47:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AntoinePinto",
    "github_project": "custom-decision-trees",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.9.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.26.0"
                ]
            ]
        }
    ],
    "lcname": "custom-decision-trees"
}
        
Elapsed time: 3.66479s