![a gecko looking at the camera with bayesian math in white on a pink and green background](documentation/assets/header.png)
# (100)gecs
Bayesian hyperparameter tuning for LGBMClassifier, LGBMRegressor, CatBoostClassifier and CatBoostRegressor with a scikit-learn API
## Table of Contents
- [Project Overview](#project-overview)
- [Introduction](#introduction)
- [Installation](#installation)
- [Usage](#usage)
- [Example](#example)
- [Contributing](#contributing)
- [License](#license)
## Project Overview
`gecs` is a tool to help automate the process of hyperparameter tuning for boosting classifiers and regressors, which can potentially save significant time and computational resources in model building and optimization processes. The `GEC` stands for **G**ood **E**nough **C**lassifier, which allows you to focus on other tasks such as feature engineering. If you deploy 100 of them, you get 100GECs.
## Introduction
The primary class in this package is `LightGEC`, which is derived from `LGBMClassifier`. Like its parent, LightGEC can be used to build and train gradient boosting models, but with the added feature of **automated bayesian hyperparameter optimization**. It can be imported from `gecs.lightgec` and then used in place of `LGBMClassifier`, with the same API.
By default, `LightGEC` optimizes `num_leaves`, `boosting_type`, `learning_rate`, `reg_alpha`, `reg_lambda`, `min_child_samples`, `min_child_weight`, `colsample_bytree`, `subsample_freq`, `subsample` and optionally`n_estimators`. Which hyperparameters to tune is fully customizable.
## Installation
The installation requires `cmake`, which can be installed using `apt` on linux or `brew` on mac. Then you can install (100)gecs using pip.
pip install gecs
## Usage
The `LightGEC` class provides the same API to the user as the `LGBMClassifier` class of `lightgbm`, and additionally:
- the two additional parameters to the fit method:
- `n_iter`: Defines the number of hyperparameter combinations that the model should try. More iterations could lead to better model performance, but at the expense of computational resources
- `fixed_hyperparameters`: Allows the user to specify hyperparameters that the GEC should not optimize. By default, only `n_estimators` is fixed. Any of the LGBMClassifier init arguments can be fixed, and so can `subsample_freq` and `subsample`, but only jointly. This is done by passing the value `bagging`.
- the methods `serialize` and `deserialize`, which stores the `LightGEC` state for the hyperparameter optimization process, **but not the fitted** `LGBMClassifier` **parameters**, to a json file. To store the boosted tree model itself, you have to provide your own serialization or use `pickle`
- the methods `freeze` and `unfreeze` that turn the `LightGEC` functionally into a `LGBMClassifier` and back
## Example
The default use of `LightGEC` would look like this:
from sklearn.datasets import load_iris
from gecs.lightgec import LightGEC # LGBMClassifier with hyperparameter optimization
from gecs.lightger import LightGER # LGBMRegressor with hyperparameter optimization
from gecs.catgec import CatGEC # CatBoostClassifier with hyperparameter optimization
from gecs.catger import CatGER # CatBoostRegressor with hyperparameter optimization
X, y = load_iris(return_X_y=True)
# fit and infer GEC
gec = LightGEC()
gec.fit(X, y)
yhat = gec.predict(X)
# manage GEC state
path = "./gec.json"
gec.serialize(path) # stores gec data and settings, but not underlying LGBMClassifier attributes
gec2 = LightGEC.deserialize(path, X, y) # X and y are necessary to fit the underlying LGBMClassifier
gec.freeze() # freeze GEC so that it behaves like a LGBMClassifier
gec.unfreeze() # unfreeze to enable GEC hyperparameter optimisation
# benchmark against LGBMClassifier
from lightgbm import LGBMClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
clf = LGBMClassifier()
lgbm_score = np.mean(cross_val_score(clf, X, y))
gec.freeze()
gec_score = np.mean(cross_val_score(gec, X, y))
print(f"{gec_score = }, {lgbm_score = }")
assert gec_score > lgbm_score, "GEC doesn't outperform LGBMClassifier"
#check what hyperparameter combinations were tried
gec.tried_hyperparameters()
## Contributing
If you want to contribute, please reach out and I'll design a process around it.
## License
MIT
## Contact Information
You can find my contact information on my website: [https://leonluithlen.eu](https://leonluithlen.eu)
Raw data
{
"_id": null,
"home_page": "https://github.com/0xideas/sequifier",
"name": "gecs",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10.0,<3.13.0",
"maintainer_email": "",
"keywords": "lightgbm,classification,machine learning,hyperparameter optimization,classifier",
"author": "Leon Luithlen",
"author_email": "leontimnaluithlen@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/4c/c9/2eb1e10f5dee0eec05780d7b516e4c58100e3caa553020ed2f07f1089632/gecs-0.1.1.tar.gz",
"platform": null,
"description": "![a gecko looking at the camera with bayesian math in white on a pink and green background](documentation/assets/header.png)\n\n\n# (100)gecs\n\nBayesian hyperparameter tuning for LGBMClassifier, LGBMRegressor, CatBoostClassifier and CatBoostRegressor with a scikit-learn API\n\n\n## Table of Contents\n\n- [Project Overview](#project-overview)\n- [Introduction](#introduction)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Example](#example)\n- [Contributing](#contributing)\n- [License](#license)\n\n\n## Project Overview\n\n`gecs` is a tool to help automate the process of hyperparameter tuning for boosting classifiers and regressors, which can potentially save significant time and computational resources in model building and optimization processes. The `GEC` stands for **G**ood **E**nough **C**lassifier, which allows you to focus on other tasks such as feature engineering. If you deploy 100 of them, you get 100GECs.\n\n\n## Introduction\n\nThe primary class in this package is `LightGEC`, which is derived from `LGBMClassifier`. Like its parent, LightGEC can be used to build and train gradient boosting models, but with the added feature of **automated bayesian hyperparameter optimization**. It can be imported from `gecs.lightgec` and then used in place of `LGBMClassifier`, with the same API.\n\nBy default, `LightGEC` optimizes `num_leaves`, `boosting_type`, `learning_rate`, `reg_alpha`, `reg_lambda`, `min_child_samples`, `min_child_weight`, `colsample_bytree`, `subsample_freq`, `subsample` and optionally`n_estimators`. Which hyperparameters to tune is fully customizable.\n\n\n## Installation\nThe installation requires `cmake`, which can be installed using `apt` on linux or `brew` on mac. Then you can install (100)gecs using pip.\n\n pip install gecs\n\n\n\n## Usage\n\n\nThe `LightGEC` class provides the same API to the user as the `LGBMClassifier` class of `lightgbm`, and additionally:\n\n- the two additional parameters to the fit method:\n - `n_iter`: Defines the number of hyperparameter combinations that the model should try. More iterations could lead to better model performance, but at the expense of computational resources\n\n - `fixed_hyperparameters`: Allows the user to specify hyperparameters that the GEC should not optimize. By default, only `n_estimators` is fixed. Any of the LGBMClassifier init arguments can be fixed, and so can `subsample_freq` and `subsample`, but only jointly. This is done by passing the value `bagging`.\n\n- the methods `serialize` and `deserialize`, which stores the `LightGEC` state for the hyperparameter optimization process, **but not the fitted** `LGBMClassifier` **parameters**, to a json file. To store the boosted tree model itself, you have to provide your own serialization or use `pickle`\n\n- the methods `freeze` and `unfreeze` that turn the `LightGEC` functionally into a `LGBMClassifier` and back\n\n\n## Example\n\nThe default use of `LightGEC` would look like this:\n\n from sklearn.datasets import load_iris\n from gecs.lightgec import LightGEC # LGBMClassifier with hyperparameter optimization\n from gecs.lightger import LightGER # LGBMRegressor with hyperparameter optimization\n from gecs.catgec import CatGEC # CatBoostClassifier with hyperparameter optimization\n from gecs.catger import CatGER # CatBoostRegressor with hyperparameter optimization\n\n\n X, y = load_iris(return_X_y=True)\n\n\n # fit and infer GEC\n gec = LightGEC()\n gec.fit(X, y)\n yhat = gec.predict(X)\n\n\n # manage GEC state\n path = \"./gec.json\"\n gec.serialize(path) # stores gec data and settings, but not underlying LGBMClassifier attributes\n gec2 = LightGEC.deserialize(path, X, y) # X and y are necessary to fit the underlying LGBMClassifier\n gec.freeze() # freeze GEC so that it behaves like a LGBMClassifier\n gec.unfreeze() # unfreeze to enable GEC hyperparameter optimisation\n\n\n # benchmark against LGBMClassifier\n from lightgbm import LGBMClassifier\n from sklearn.model_selection import cross_val_score\n import numpy as np\n\n clf = LGBMClassifier()\n lgbm_score = np.mean(cross_val_score(clf, X, y))\n\n gec.freeze()\n gec_score = np.mean(cross_val_score(gec, X, y))\n\n print(f\"{gec_score = }, {lgbm_score = }\")\n assert gec_score > lgbm_score, \"GEC doesn't outperform LGBMClassifier\"\n\n #check what hyperparameter combinations were tried\n gec.tried_hyperparameters()\n\n\n\n## Contributing\n\nIf you want to contribute, please reach out and I'll design a process around it.\n\n## License\n\nMIT\n\n## Contact Information\n\nYou can find my contact information on my website: [https://leonluithlen.eu](https://leonluithlen.eu)",
"bugtrack_url": null,
"license": "MIT",
"summary": "LightGBM Classifier with integrated bayesian hyperparameter optimization",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/0xideas/sequifier",
"Repository": "https://github.com/0xideas/sequifier"
},
"split_keywords": [
"lightgbm",
"classification",
"machine learning",
"hyperparameter optimization",
"classifier"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6c12f63b4057c4ae80b9dbbef0057ba3810e10ef7c7fcc649d19a4a5bdac4491",
"md5": "dad99bd676426f171666b27508c54a62",
"sha256": "cca6eab53d9566f1102cfa165a0c4eae50377deb6663ad08f35e3473849be426"
},
"downloads": -1,
"filename": "gecs-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dad99bd676426f171666b27508c54a62",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10.0,<3.13.0",
"size": 27686,
"upload_time": "2023-09-10T09:29:06",
"upload_time_iso_8601": "2023-09-10T09:29:06.797314Z",
"url": "https://files.pythonhosted.org/packages/6c/12/f63b4057c4ae80b9dbbef0057ba3810e10ef7c7fcc649d19a4a5bdac4491/gecs-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4cc92eb1e10f5dee0eec05780d7b516e4c58100e3caa553020ed2f07f1089632",
"md5": "11d6e67c67da6c6aa9595fe2e0486561",
"sha256": "66466364229ceffa210676147209c6e5a1d818561bfaae650c7800b2fa49871f"
},
"downloads": -1,
"filename": "gecs-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "11d6e67c67da6c6aa9595fe2e0486561",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10.0,<3.13.0",
"size": 19983,
"upload_time": "2023-09-10T09:29:08",
"upload_time_iso_8601": "2023-09-10T09:29:08.638562Z",
"url": "https://files.pythonhosted.org/packages/4c/c9/2eb1e10f5dee0eec05780d7b516e4c58100e3caa553020ed2f07f1089632/gecs-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-10 09:29:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "0xideas",
"github_project": "sequifier",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "gecs"
}