<div align="center">
<h1 align="center">
<a><img src="https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/logo.png?raw=true" width="80"></a>
<br>
<b>Custom Decision Trees</b>
<br>
</h1>




</div>
**Custom Decision Trees** is a Python package that lets you build Decision Trees / Random Forests models with advanced configuration :
## Main Features
### Splitting criteria customization
Define your own cutting criteria in Python language (documentation in the following sections).
This feature is particularly useful in "cost-dependent" scenarios. Examples:
- **Trading Movements Classification:** When the goal is to maximize economic profit, the metric can be set to economic profit, optimizing tree splitting accordingly.
- **Churn Prediction:** To minimize false negatives, metrics like F1 score or recall can guide the splitting process.
- **Fraud Detection:** Splitting can be optimized based on the proportion of fraudulent transactions identified relative to the total, rather than overall classification accuracy.
- **Marketing Campaigns:** The splitting can focus on maximizing expected revenue from customer segments identified by the tree.
### Multi-conditional node splitting
Allow trees to split nodes with one or more simultaneous conditions.
Example of multi-condition splitting on the Titanic dataset:

### Other features
* Supports classification and regression
* Supports multiclass classification
* Supports standard decision tree parameters (max_depth, min_samples_split, max_features, n_estimators, etc.)
* Supports STRING type explanatory variables
* Ability to control the number of variable splitting options when optimizing a split (i.e `nb_max_cut_options_per_var` parameter).
* Ability to control the maximum number of splits to be tested per node to avoid overly long calculations in multi-condition mode (i.e `nb_max_split_options_per_node` parameters)
* Possibility of parallelizing calculations (i.e `n_jobs` parameters)
## Reminder on splitting criteria
Splitting in a decision tree is achieved by **optimizing a metric**. For example, Gini optimization consists in **maximizing** the $\Delta_{Gini}$ :
* **The Gini Index** represents the impurity of a group of observations based on the observations of each class (0 and 1):
$$ I_{Gini} = 1 - p_0^2 - p_1^2 $$
* The metric to be maximized is $\Delta_{Gini}$, the difference between **the Gini index on the parent node** and **the weighted average of the Gini index between the two child nodes** ($L$ and $R$).
$$ \Delta_{Gini} = I_{Gini} - \frac{N_L * I_{Gini_L}}{N} - \frac{N_R * I_{Gini_R}}{N} $$
At each node, the tree algorithm finds the split that minimizes $\Delta$ over all possible splits and over all features. Once the optimal split is selected, the tree is grown by recursively applying this splitting process to the resulting child nodes.
## Usage
> See `./notebooks/` folder for complete examples.
### Installation
```
pip install custom-decision-trees
```
### Define your metric
To integrate a specific measure, the user must define a class containing the `compute_metric` and `compute_delta` methods, then insert this class into the classifier.
Example of a class with the Gini index :
```python
import numpy as np
from custom_decision_trees.metrics import MetricBase
class Gini(MetricBase):
def __init__(
self,
n_classes: int = 2,
) -> None:
self.n_classes = n_classes
self.max_impurity = 1 - 1 / n_classes
def compute_gini(
self,
metric_data: np.ndarray,
) -> float:
y = metric_data[:, 0]
nb_obs = len(y)
if nb_obs == 0:
return self.max_impurity
props = [(np.sum(y == i) / nb_obs) for i in range(self.n_classes)]
metric = 1 - np.sum([prop**2 for prop in props])
return float(metric)
def compute_metric(
self,
metric_data: np.ndarray,
mask: np.ndarray,
):
gini_parent = self.compute_gini(metric_data)
gini_side1 = self.compute_gini(metric_data[mask])
gini_side2 = self.compute_gini(metric_data[~mask])
delta = (
gini_parent -
gini_side1 * np.mean(mask) -
gini_side2 * (1 - np.mean(mask))
)
metadata = {"gini": round(gini_side1, 3)}
return float(delta), metadata
```
### Train and predict
Once you have instantiated the model with your custom metric, all you have to do is use the `.fit` and `.predict_proba` methods:
```python
from custom_decision_trees import DecisionTreeClassifier
gini = Gini()
decision_tree = DecisionTreeClassifier(
metric=gini,
max_depth=2,
nb_max_conditions_per_node=2 # Set to 1 for a traditional decision tree
)
decision_tree.fit(
X=X_train,
y=y_train,
metric_data=metric_data,
)
probas = model.predict_probas(
X=X_test
)
probas[:5]
```
```
>>> array([[0.75308642, 0.24691358],
[0.36206897, 0.63793103],
[0.75308642, 0.24691358],
[0.36206897, 0.63793103],
[0.90243902, 0.09756098]])
```
## Print the tree
You can also display the decision tree, with the values of your metrics, using the `print_tree` method:
```python
decision_tree.print_tree(
feature_names=features,
metric_name="MyMetric",
)
```
```
>>> [0] 712 obs -> MyMetric = 0.0
| [1] (x["Sex"] <= 0.0) AND (x["Pclass"] <= 2.0) | 157 obs -> MyMetric = 0.16
| | [3] (x["Age"] <= 2.0) AND (x["Fare"] > 26.55) | 1 obs -> MyMetric = 0.01
| | [4] (x["Age"] > 2.0) OR (x["Fare"] <= 26.55) | 156 obs -> MyMetric = 0.01
| [2] (x["Sex"] > 0.0) OR (x["Pclass"] > 2.0) | 555 obs -> MyMetric = 0.16
| | [5] (x["SibSp"] <= 2.0) AND (x["Age"] <= 8.75) | 27 obs -> MyMetric = 0.05
| | [6] (x["SibSp"] > 2.0) OR (x["Age"] > 8.75) | 528 obs -> MyMetric = 0.05
```
## Plot the tree
```python
decision_tree.plot_tree(
feature_names=features,
metric_name="delta gini",
)
```

### Random Forest
Same with Random Forest Classifier :
```python
from custom_decision_trees import RandomForestClassifier
random_forest = RandomForest(
metric=gini,
n_estimators=10,
max_depth=2,
nb_max_conditions_per_node=2,
)
random_forest.fit(
X=X_train,
y=y_train,
metric_data=metric_data
)
probas = random_forest.predict_probas(
X=X_test
)
```
### Regression
The "regression" mode is used in exactly the same way as "classification", i.e., by specifying the metric from a Python class.
```python
your_metric = YourMetric()
decision_tree_regressor = DecisionTreeRegressor(
metric=your_metric,
max_depth=2,
min_samples_split=2,
min_samples_leaf=1,
max_features=None,
nb_max_conditions_per_node=2,
nb_max_cut_options_per_var=10,
n_jobs=1
)
decision_tree_regressor.fit(
X=X,
y=y,
metric_data=metric_data,
)
```
**See `/notebooks` folder for complete examples.**
Raw data
{
"_id": null,
"home_page": "https://github.com/AntoinePinto/custom-decision-trees",
"name": "custom-decision-trees",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "machine learning, decision trees, random forest, customization, classification, custom splitting criteria",
"author": "Antoine Pinto",
"author_email": "antoine.pinto1@outlook.fr",
"download_url": "https://files.pythonhosted.org/packages/3d/ba/0887b1a81f774a6f4dd19cb889ca6cec577164aaa6892e77135003765891/custom_decision_trees-3.0.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\r\n<h1 align=\"center\">\r\n <a><img src=\"https://github.com/AntoinePinto/custom-decision-trees/blob/master/media/logo.png?raw=true\" width=\"80\"></a>\r\n <br>\r\n <b>Custom Decision Trees</b>\r\n <br>\r\n</h1>\r\n\r\n\r\n\r\n\r\n\r\n\r\n</div>\r\n\r\n**Custom Decision Trees** is a Python package that lets you build Decision Trees / Random Forests models with advanced configuration :\r\n\r\n## Main Features\r\n\r\n### Splitting criteria customization\r\n\r\nDefine your own cutting criteria in Python language (documentation in the following sections). \r\n\r\nThis feature is particularly useful in \"cost-dependent\" scenarios. Examples:\r\n\r\n- **Trading Movements Classification:** When the goal is to maximize economic profit, the metric can be set to economic profit, optimizing tree splitting accordingly.\r\n- **Churn Prediction:** To minimize false negatives, metrics like F1 score or recall can guide the splitting process.\r\n- **Fraud Detection:** Splitting can be optimized based on the proportion of fraudulent transactions identified relative to the total, rather than overall classification accuracy.\r\n- **Marketing Campaigns:** The splitting can focus on maximizing expected revenue from customer segments identified by the tree.\r\n\r\n### Multi-conditional node splitting\r\n\r\nAllow trees to split nodes with one or more simultaneous conditions.\r\n\r\nExample of multi-condition splitting on the Titanic dataset:\r\n\r\n\r\n\r\n### Other features\r\n\r\n* Supports classification and regression\r\n* Supports multiclass classification\r\n* Supports standard decision tree parameters (max_depth, min_samples_split, max_features, n_estimators, etc.)\r\n* Supports STRING type explanatory variables\r\n* Ability to control the number of variable splitting options when optimizing a split (i.e `nb_max_cut_options_per_var` parameter).\r\n* Ability to control the maximum number of splits to be tested per node to avoid overly long calculations in multi-condition mode (i.e `nb_max_split_options_per_node` parameters)\r\n* Possibility of parallelizing calculations (i.e `n_jobs` parameters)\r\n\r\n## Reminder on splitting criteria\r\n\r\nSplitting in a decision tree is achieved by **optimizing a metric**. For example, Gini optimization consists in **maximizing** the $\\Delta_{Gini}$ :\r\n\r\n* **The Gini Index** represents the impurity of a group of observations based on the observations of each class (0 and 1):\r\n\r\n$$ I_{Gini} = 1 - p_0^2 - p_1^2 $$\r\n\r\n* The metric to be maximized is $\\Delta_{Gini}$, the difference between **the Gini index on the parent node** and **the weighted average of the Gini index between the two child nodes** ($L$ and $R$).\r\n\r\n$$ \\Delta_{Gini} = I_{Gini} - \\frac{N_L * I_{Gini_L}}{N} - \\frac{N_R * I_{Gini_R}}{N} $$\r\n\r\nAt each node, the tree algorithm finds the split that minimizes $\\Delta$ over all possible splits and over all features. Once the optimal split is selected, the tree is grown by recursively applying this splitting process to the resulting child nodes.\r\n\r\n## Usage\r\n\r\n> See `./notebooks/` folder for complete examples.\r\n\r\n### Installation\r\n\r\n```\r\npip install custom-decision-trees\r\n```\r\n\r\n### Define your metric\r\n\r\nTo integrate a specific measure, the user must define a class containing the `compute_metric` and `compute_delta` methods, then insert this class into the classifier.\r\n\r\nExample of a class with the Gini index :\r\n\r\n```python\r\nimport numpy as np\r\n\r\nfrom custom_decision_trees.metrics import MetricBase\r\n\r\n\r\nclass Gini(MetricBase):\r\n\r\n def __init__(\r\n self,\r\n n_classes: int = 2,\r\n ) -> None:\r\n \r\n self.n_classes = n_classes\r\n self.max_impurity = 1 - 1 / n_classes\r\n\r\n def compute_gini(\r\n self,\r\n metric_data: np.ndarray,\r\n ) -> float:\r\n\r\n y = metric_data[:, 0]\r\n \r\n nb_obs = len(y)\r\n\r\n if nb_obs == 0:\r\n return self.max_impurity\r\n\r\n props = [(np.sum(y == i) / nb_obs) for i in range(self.n_classes)]\r\n\r\n metric = 1 - np.sum([prop**2 for prop in props])\r\n\r\n return float(metric)\r\n\r\n def compute_metric(\r\n self,\r\n metric_data: np.ndarray,\r\n mask: np.ndarray,\r\n ):\r\n\r\n gini_parent = self.compute_gini(metric_data)\r\n gini_side1 = self.compute_gini(metric_data[mask])\r\n gini_side2 = self.compute_gini(metric_data[~mask])\r\n\r\n delta = (\r\n gini_parent -\r\n gini_side1 * np.mean(mask) -\r\n gini_side2 * (1 - np.mean(mask))\r\n )\r\n\r\n metadata = {\"gini\": round(gini_side1, 3)}\r\n\r\n return float(delta), metadata\r\n```\r\n\r\n### Train and predict\r\n\r\nOnce you have instantiated the model with your custom metric, all you have to do is use the `.fit` and `.predict_proba` methods:\r\n\r\n```python\r\nfrom custom_decision_trees import DecisionTreeClassifier\r\n\r\ngini = Gini()\r\n\r\ndecision_tree = DecisionTreeClassifier(\r\n metric=gini,\r\n max_depth=2,\r\n nb_max_conditions_per_node=2 # Set to 1 for a traditional decision tree\r\n)\r\n\r\ndecision_tree.fit(\r\n X=X_train,\r\n y=y_train,\r\n metric_data=metric_data,\r\n)\r\n\r\nprobas = model.predict_probas(\r\n X=X_test\r\n)\r\n\r\nprobas[:5]\r\n```\r\n\r\n```\r\n>>> array([[0.75308642, 0.24691358],\r\n [0.36206897, 0.63793103],\r\n [0.75308642, 0.24691358],\r\n [0.36206897, 0.63793103],\r\n [0.90243902, 0.09756098]])\r\n```\r\n\r\n## Print the tree\r\n\r\nYou can also display the decision tree, with the values of your metrics, using the `print_tree` method:\r\n\r\n```python\r\ndecision_tree.print_tree(\r\n feature_names=features,\r\n metric_name=\"MyMetric\",\r\n)\r\n```\r\n\r\n```\r\n>>> [0] 712 obs -> MyMetric = 0.0\r\n | [1] (x[\"Sex\"] <= 0.0) AND (x[\"Pclass\"] <= 2.0) | 157 obs -> MyMetric = 0.16\r\n | | [3] (x[\"Age\"] <= 2.0) AND (x[\"Fare\"] > 26.55) | 1 obs -> MyMetric = 0.01\r\n | | [4] (x[\"Age\"] > 2.0) OR (x[\"Fare\"] <= 26.55) | 156 obs -> MyMetric = 0.01\r\n | [2] (x[\"Sex\"] > 0.0) OR (x[\"Pclass\"] > 2.0) | 555 obs -> MyMetric = 0.16\r\n | | [5] (x[\"SibSp\"] <= 2.0) AND (x[\"Age\"] <= 8.75) | 27 obs -> MyMetric = 0.05\r\n | | [6] (x[\"SibSp\"] > 2.0) OR (x[\"Age\"] > 8.75) | 528 obs -> MyMetric = 0.05\r\n```\r\n\r\n## Plot the tree\r\n\r\n```python\r\ndecision_tree.plot_tree(\r\n feature_names=features,\r\n metric_name=\"delta gini\",\r\n)\r\n```\r\n\r\n\r\n\r\n### Random Forest\r\n\r\nSame with Random Forest Classifier :\r\n\r\n```python\r\nfrom custom_decision_trees import RandomForestClassifier\r\n\r\nrandom_forest = RandomForest(\r\n metric=gini,\r\n n_estimators=10,\r\n max_depth=2,\r\n nb_max_conditions_per_node=2,\r\n)\r\n\r\nrandom_forest.fit(\r\n X=X_train, \r\n y=y_train, \r\n metric_data=metric_data\r\n)\r\n\r\nprobas = random_forest.predict_probas(\r\n X=X_test\r\n)\r\n```\r\n\r\n### Regression\r\n\r\nThe \"regression\" mode is used in exactly the same way as \"classification\", i.e., by specifying the metric from a Python class.\r\n\r\n```python\r\nyour_metric = YourMetric()\r\n\r\ndecision_tree_regressor = DecisionTreeRegressor(\r\n metric=your_metric,\r\n max_depth=2,\r\n min_samples_split=2,\r\n min_samples_leaf=1,\r\n max_features=None,\r\n nb_max_conditions_per_node=2,\r\n nb_max_cut_options_per_var=10,\r\n n_jobs=1\r\n)\r\n\r\ndecision_tree_regressor.fit(\r\n X=X,\r\n y=y,\r\n metric_data=metric_data,\r\n)\r\n```\r\n\r\n**See `/notebooks` folder for complete examples.**\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A package for building customizable decision trees and random forests.",
"version": "3.0.0",
"project_urls": {
"Homepage": "https://github.com/AntoinePinto/custom-decision-trees",
"Source Code": "https://github.com/AntoinePinto/custom-decision-trees"
},
"split_keywords": [
"machine learning",
" decision trees",
" random forest",
" customization",
" classification",
" custom splitting criteria"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "415e1e42709b96d1e90bc676c79fdd91229ca2064575fe98243523cca33c248c",
"md5": "5ddd62e5fda5a7cdcf61d3810708808d",
"sha256": "f333d47b392b2b6ad2be8abaf07f0d6e9bed84cb80dd9856c4bdb2bdff7d1cc1"
},
"downloads": -1,
"filename": "custom_decision_trees-3.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5ddd62e5fda5a7cdcf61d3810708808d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 31480,
"upload_time": "2025-09-16T14:47:51",
"upload_time_iso_8601": "2025-09-16T14:47:51.623804Z",
"url": "https://files.pythonhosted.org/packages/41/5e/1e42709b96d1e90bc676c79fdd91229ca2064575fe98243523cca33c248c/custom_decision_trees-3.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3dba0887b1a81f774a6f4dd19cb889ca6cec577164aaa6892e77135003765891",
"md5": "e9946129b8a5e7bcb8404240e7a9ba51",
"sha256": "58f889df6c631731215b7d9ac43bead87050f3f7f62526b68864b777426b80b2"
},
"downloads": -1,
"filename": "custom_decision_trees-3.0.0.tar.gz",
"has_sig": false,
"md5_digest": "e9946129b8a5e7bcb8404240e7a9ba51",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 23874,
"upload_time": "2025-09-16T14:47:52",
"upload_time_iso_8601": "2025-09-16T14:47:52.828119Z",
"url": "https://files.pythonhosted.org/packages/3d/ba/0887b1a81f774a6f4dd19cb889ca6cec577164aaa6892e77135003765891/custom_decision_trees-3.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-16 14:47:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AntoinePinto",
"github_project": "custom-decision-trees",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "joblib",
"specs": [
[
">=",
"1.4.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.9.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.26.0"
]
]
}
],
"lcname": "custom-decision-trees"
}