gecs


Namegecs JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/0xideas/sequifier
SummaryLightGBM Classifier with integrated bayesian hyperparameter optimization
upload_time2023-09-10 09:29:08
maintainer
docs_urlNone
authorLeon Luithlen
requires_python>=3.10.0,<3.13.0
licenseMIT
keywords lightgbm classification machine learning hyperparameter optimization classifier
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![a gecko looking at the camera with bayesian math in white on a pink and green background](documentation/assets/header.png)


# (100)gecs

Bayesian hyperparameter tuning for LGBMClassifier, LGBMRegressor, CatBoostClassifier and CatBoostRegressor with a scikit-learn API


## Table of Contents

- [Project Overview](#project-overview)
- [Introduction](#introduction)
- [Installation](#installation)
- [Usage](#usage)
- [Example](#example)
- [Contributing](#contributing)
- [License](#license)


## Project Overview

`gecs` is a tool to help automate the process of hyperparameter tuning for boosting classifiers and regressors, which can potentially save significant time and computational resources in model building and optimization processes. The `GEC` stands for **G**ood **E**nough **C**lassifier, which allows you to focus on other tasks such as feature engineering. If you deploy 100 of them, you get 100GECs.


## Introduction

The primary class in this package is `LightGEC`, which is derived from `LGBMClassifier`. Like its parent, LightGEC can be used to build and train gradient boosting models, but with the added feature of **automated bayesian hyperparameter optimization**. It can be imported from `gecs.lightgec` and then used in place of `LGBMClassifier`, with the same API.

By default, `LightGEC` optimizes  `num_leaves`, `boosting_type`, `learning_rate`, `reg_alpha`, `reg_lambda`, `min_child_samples`, `min_child_weight`, `colsample_bytree`, `subsample_freq`, `subsample` and optionally`n_estimators`. Which hyperparameters to tune is fully customizable.


## Installation
The installation requires `cmake`, which can be installed using `apt` on linux or `brew` on mac. Then you can install (100)gecs using pip.

    pip install gecs



## Usage


The `LightGEC` class provides the same API to the user as the `LGBMClassifier` class of `lightgbm`, and additionally:

-   the two additional parameters to the fit method:
    - `n_iter`: Defines the number of hyperparameter combinations that the model should try. More iterations could lead to better model performance, but at the expense of computational resources

    - `fixed_hyperparameters`: Allows the user to specify hyperparameters that the GEC should not optimize. By default, only `n_estimators` is fixed. Any of the LGBMClassifier init arguments can be fixed, and so can  `subsample_freq` and `subsample`, but only jointly. This is done by passing the value `bagging`.

-   the methods `serialize` and `deserialize`, which stores the `LightGEC` state for the hyperparameter optimization process, **but not the fitted** `LGBMClassifier` **parameters**, to a json file. To store the boosted tree model itself, you have to provide your own serialization or use `pickle`

-   the methods `freeze` and `unfreeze` that turn the `LightGEC` functionally into a `LGBMClassifier` and back


## Example

The default use of `LightGEC` would look like this:

    from sklearn.datasets import load_iris
    from gecs.lightgec import LightGEC # LGBMClassifier with hyperparameter optimization
    from gecs.lightger import LightGER # LGBMRegressor with hyperparameter optimization
    from gecs.catgec import CatGEC # CatBoostClassifier with hyperparameter optimization
    from gecs.catger import CatGER # CatBoostRegressor with hyperparameter optimization


    X, y = load_iris(return_X_y=True)


    # fit and infer GEC
    gec = LightGEC()
    gec.fit(X, y)
    yhat = gec.predict(X)


    # manage GEC state
    path = "./gec.json"
    gec.serialize(path) # stores gec data and settings, but not underlying LGBMClassifier attributes
    gec2 = LightGEC.deserialize(path, X, y) # X and y are necessary to fit the underlying LGBMClassifier
    gec.freeze() # freeze GEC so that it behaves like a LGBMClassifier
    gec.unfreeze() # unfreeze to enable GEC hyperparameter optimisation


    # benchmark against LGBMClassifier
    from lightgbm import LGBMClassifier
    from sklearn.model_selection import cross_val_score
    import numpy as np

    clf = LGBMClassifier()
    lgbm_score = np.mean(cross_val_score(clf, X, y))

    gec.freeze()
    gec_score = np.mean(cross_val_score(gec, X, y))

    print(f"{gec_score = }, {lgbm_score = }")
    assert gec_score > lgbm_score, "GEC doesn't outperform LGBMClassifier"

    #check what hyperparameter combinations were tried
    gec.tried_hyperparameters()



## Contributing

If you want to contribute, please reach out and I'll design a process around it.

## License

MIT

## Contact Information

You can find my contact information on my website: [https://leonluithlen.eu](https://leonluithlen.eu)
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/0xideas/sequifier",
    "name": "gecs",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10.0,<3.13.0",
    "maintainer_email": "",
    "keywords": "lightgbm,classification,machine learning,hyperparameter optimization,classifier",
    "author": "Leon Luithlen",
    "author_email": "leontimnaluithlen@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/4c/c9/2eb1e10f5dee0eec05780d7b516e4c58100e3caa553020ed2f07f1089632/gecs-0.1.1.tar.gz",
    "platform": null,
    "description": "![a gecko looking at the camera with bayesian math in white on a pink and green background](documentation/assets/header.png)\n\n\n# (100)gecs\n\nBayesian hyperparameter tuning for LGBMClassifier, LGBMRegressor, CatBoostClassifier and CatBoostRegressor with a scikit-learn API\n\n\n## Table of Contents\n\n- [Project Overview](#project-overview)\n- [Introduction](#introduction)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Example](#example)\n- [Contributing](#contributing)\n- [License](#license)\n\n\n## Project Overview\n\n`gecs` is a tool to help automate the process of hyperparameter tuning for boosting classifiers and regressors, which can potentially save significant time and computational resources in model building and optimization processes. The `GEC` stands for **G**ood **E**nough **C**lassifier, which allows you to focus on other tasks such as feature engineering. If you deploy 100 of them, you get 100GECs.\n\n\n## Introduction\n\nThe primary class in this package is `LightGEC`, which is derived from `LGBMClassifier`. Like its parent, LightGEC can be used to build and train gradient boosting models, but with the added feature of **automated bayesian hyperparameter optimization**. It can be imported from `gecs.lightgec` and then used in place of `LGBMClassifier`, with the same API.\n\nBy default, `LightGEC` optimizes  `num_leaves`, `boosting_type`, `learning_rate`, `reg_alpha`, `reg_lambda`, `min_child_samples`, `min_child_weight`, `colsample_bytree`, `subsample_freq`, `subsample` and optionally`n_estimators`. Which hyperparameters to tune is fully customizable.\n\n\n## Installation\nThe installation requires `cmake`, which can be installed using `apt` on linux or `brew` on mac. Then you can install (100)gecs using pip.\n\n    pip install gecs\n\n\n\n## Usage\n\n\nThe `LightGEC` class provides the same API to the user as the `LGBMClassifier` class of `lightgbm`, and additionally:\n\n-   the two additional parameters to the fit method:\n    - `n_iter`: Defines the number of hyperparameter combinations that the model should try. More iterations could lead to better model performance, but at the expense of computational resources\n\n    - `fixed_hyperparameters`: Allows the user to specify hyperparameters that the GEC should not optimize. By default, only `n_estimators` is fixed. Any of the LGBMClassifier init arguments can be fixed, and so can  `subsample_freq` and `subsample`, but only jointly. This is done by passing the value `bagging`.\n\n-   the methods `serialize` and `deserialize`, which stores the `LightGEC` state for the hyperparameter optimization process, **but not the fitted** `LGBMClassifier` **parameters**, to a json file. To store the boosted tree model itself, you have to provide your own serialization or use `pickle`\n\n-   the methods `freeze` and `unfreeze` that turn the `LightGEC` functionally into a `LGBMClassifier` and back\n\n\n## Example\n\nThe default use of `LightGEC` would look like this:\n\n    from sklearn.datasets import load_iris\n    from gecs.lightgec import LightGEC # LGBMClassifier with hyperparameter optimization\n    from gecs.lightger import LightGER # LGBMRegressor with hyperparameter optimization\n    from gecs.catgec import CatGEC # CatBoostClassifier with hyperparameter optimization\n    from gecs.catger import CatGER # CatBoostRegressor with hyperparameter optimization\n\n\n    X, y = load_iris(return_X_y=True)\n\n\n    # fit and infer GEC\n    gec = LightGEC()\n    gec.fit(X, y)\n    yhat = gec.predict(X)\n\n\n    # manage GEC state\n    path = \"./gec.json\"\n    gec.serialize(path) # stores gec data and settings, but not underlying LGBMClassifier attributes\n    gec2 = LightGEC.deserialize(path, X, y) # X and y are necessary to fit the underlying LGBMClassifier\n    gec.freeze() # freeze GEC so that it behaves like a LGBMClassifier\n    gec.unfreeze() # unfreeze to enable GEC hyperparameter optimisation\n\n\n    # benchmark against LGBMClassifier\n    from lightgbm import LGBMClassifier\n    from sklearn.model_selection import cross_val_score\n    import numpy as np\n\n    clf = LGBMClassifier()\n    lgbm_score = np.mean(cross_val_score(clf, X, y))\n\n    gec.freeze()\n    gec_score = np.mean(cross_val_score(gec, X, y))\n\n    print(f\"{gec_score = }, {lgbm_score = }\")\n    assert gec_score > lgbm_score, \"GEC doesn't outperform LGBMClassifier\"\n\n    #check what hyperparameter combinations were tried\n    gec.tried_hyperparameters()\n\n\n\n## Contributing\n\nIf you want to contribute, please reach out and I'll design a process around it.\n\n## License\n\nMIT\n\n## Contact Information\n\nYou can find my contact information on my website: [https://leonluithlen.eu](https://leonluithlen.eu)",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "LightGBM Classifier with integrated bayesian hyperparameter optimization",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/0xideas/sequifier",
        "Repository": "https://github.com/0xideas/sequifier"
    },
    "split_keywords": [
        "lightgbm",
        "classification",
        "machine learning",
        "hyperparameter optimization",
        "classifier"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6c12f63b4057c4ae80b9dbbef0057ba3810e10ef7c7fcc649d19a4a5bdac4491",
                "md5": "dad99bd676426f171666b27508c54a62",
                "sha256": "cca6eab53d9566f1102cfa165a0c4eae50377deb6663ad08f35e3473849be426"
            },
            "downloads": -1,
            "filename": "gecs-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dad99bd676426f171666b27508c54a62",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10.0,<3.13.0",
            "size": 27686,
            "upload_time": "2023-09-10T09:29:06",
            "upload_time_iso_8601": "2023-09-10T09:29:06.797314Z",
            "url": "https://files.pythonhosted.org/packages/6c/12/f63b4057c4ae80b9dbbef0057ba3810e10ef7c7fcc649d19a4a5bdac4491/gecs-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4cc92eb1e10f5dee0eec05780d7b516e4c58100e3caa553020ed2f07f1089632",
                "md5": "11d6e67c67da6c6aa9595fe2e0486561",
                "sha256": "66466364229ceffa210676147209c6e5a1d818561bfaae650c7800b2fa49871f"
            },
            "downloads": -1,
            "filename": "gecs-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "11d6e67c67da6c6aa9595fe2e0486561",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10.0,<3.13.0",
            "size": 19983,
            "upload_time": "2023-09-10T09:29:08",
            "upload_time_iso_8601": "2023-09-10T09:29:08.638562Z",
            "url": "https://files.pythonhosted.org/packages/4c/c9/2eb1e10f5dee0eec05780d7b516e4c58100e3caa553020ed2f07f1089632/gecs-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-10 09:29:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "0xideas",
    "github_project": "sequifier",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "gecs"
}
        
Elapsed time: 0.68719s