xgbsearch

Name	xgbsearch JSON
Version	0.1.6 JSON
	download
home_page	None
Summary	An implementation of grid search, random search for XGBoost with eval_set and early stopping support.
upload_time	2024-11-12 16:30:21
maintainer	None
docs_url	None
author	None
requires_python	>=3.6
license	MIT License Copyright (c) 2024 Jakub Deka Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	xgboost
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # xgbsearch

A pacakge implementing grid search and random search for XGBoost.

## Description

An implementation of grid search and random search for hyper parameter tunning for `xgboost` that allows for specification of `eval_sets` and usage of early stopping.

## Installation

Install it via PyPI using `pip` command.

```bash
# Install or Upgrade to newest available version
$ pip install -U xgbsearch
```

## Motivation

Current implementation of grid search and random search in scikit-learn (using GridSearchCV/RandomSearchCV) does not allow for specifying evaluation sets that can be easily passed to `XGboost.fit()` and used in early stopping.

This package significantly simplifies implementation of grid search and random search.

Additionally, it makes implementation of your own, new search algorithms very simple and provides a consistent interface for running various hyper parameter tunning algorightms while retaining early stopping and eval sets functionality provided by `xgboost`.

## Example usage

The package provides 3 classes:

* `XgbSearch` - the base super class that contains most of the implementation details.
* `XgbGridSearch` - implementation of grid search (exchaustive search) algorithm for hyper parameter tunning.
* `XgbRandomSearch` - implementation of random search algorithm for hyper parameter tunning.

```python
from xgbsearch import XgbGridSearch, XgbRandomSearch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.metrics import roc_auc_score

X, y = make_classification(random_state=42)
X = pd.DataFrame(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# These parameters will be passed to xgb.fit as is.
fit_params = {
    "device": "cuda",
    "objective": "binary:logistic",
    "eval_metric": ["auc"],
}

# The parameters here will be tuned. If the parameter is a single value, it will be passed as is.
# If the parameter is a list, all possible combinations will be searched using grid search.
tune_params_grid = {
    "eta": [0.01, 0.001],
    "max_depth": [5, 11],
    "min_child_weight": 3,
}

grid_search = XgbGridSearch(tune_params_grid, fit_params)
eval_set = [(X_train, y_train, "train"), (X_test, y_test, "test")]
grid_search.fit(X_train, y_train, eval_set, 10000, 100, verbose_eval=25)

# The parameters here will be tuned. If the parameter is a single value, it will be passed as is.
# If the parameter is a list, during each iteration a single value will be picked from that list.
# If the parameter is a tuple of two floats, a random value between the two ends will be picked.
# If the parameter is a tuple of two ints, a random int value between the two ends will be picked.
tune_params_random = {
    "eta": (0.1, 0.005),
    "max_depth": (5, 11),
    "min_child_weight": [1, 2, 3],
}

random_search = XgbRandomSearch(tune_params_random, fit_params, max_iter_count=3)
eval_set = [(X_train, y_train, "train"), (X_test, y_test, "test")]
random_search.fit(X_train, y_train, eval_set, 10000, 100, verbose_eval=25)

# You can access the results like this.
print(random_search.get_best_model())  # returns the best model object
print(
    random_search.get_best_model_results()
)  # returns best model results dict with complete results
print(random_search.predict(X_test))  # generates predictions for the BEST model
print(
    random_search.score(X_test, y_test, roc_auc_score)
)  # calculates the score using given function; note the function needs to accept X, y as input

```

## Output

After fitting, `XgbGridSearch` and `XgbRandomSearch` will populate a `result` list. This will contain one dict per iteration with complete result of the model run.

```python
{'model': <xgboost.core.Booster at 0x7f3b00bcd070>, # actual model object, if early stopping is enabled this will be the LAST model fitted, not the best one
 'parameters': {'eta': 0.05,'colsample_bytree': 0.8,'max_depth': 11,'min_child_weight': 3,'device': 'cuda','objective': 'binary:logistic','eval_metric': ['auc']}, # Parameters passed to xgb.fit
 'num_boost_round': 10000,'verbose_eval': 10,'early_stopping_rounds': 100},
 # Results of the model fitting process by boosting round
 'model_training_results': {'TRAIN': OrderedDict([('auc',
                [0.8943848819579814,
                 ...
                 0.9805122373835824,
                 0.9805555555555555,
                 0.9806530214424951])]),
  'train': OrderedDict([('auc',
                [0.8943848819579814,
                 ...
                 0.9805122373835824,
                 0.9805555555555555,
                 0.9806530214424951])]),
  'test': OrderedDict([('auc',
                [0.8711075249536788,
                 ...
                 0.8837786145478453,
                 0.8839579224194609,
                 0.8836590759667683])])},
 'best_iteration': 204, # index of the best iteration, this is significant when using early_stopping
 'best_score': 0.8865877712031558, # value of last eval_metric on last eval_set for the best model
 'best_model': <xgboost.core.Booster at 0x7f3b01a75a30>} # actual best model object
 ```

 ## Implementing your own search

 It is very easy to implement your own search using XgbSearch class.

```python
from xgbsearch import XgbSearch

class MyOwnSearch(XgbSearch):

    def __init__(self, tune_params, fit_params, add_value, maximise_score=True):
        super().__init__(tune_params, fit_params, maximise_score)
        self.add_value = add_value

    def _generate_params(self):
        # Toy example. Will just take the first parameter and add to it a value specified in the constructor.
        # this method needs to return a list dicts of parameters that will be passed into xgb.fit()
        result = []
        for i in range(3):
            first_key = list(self.tune_params.keys())[0]
            loop_result = self.tune_params | self.fit_params
            loop_result[first_key] = loop_result[first_key] + i * self.add_value
            result.append(loop_result)

        return result


# Run it!
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.metrics import roc_auc_score

X, y = make_classification(random_state=42)
X = pd.DataFrame(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# These parameters will be passed to xgb.fit as is.
fit_params = {
    "device": "cuda",
    "objective": "binary:logistic",
    "eval_metric": ["auc"],
}

# The parameters here will be tuned. If the parameter is a single value, it will be passed as is.
# If the parameter is a list, all possible combinations will be searched using grid search.
tune_params_grid = {
    "eta": 0.01,
    "max_depth": 5,
    "min_child_weight": 3,
}

my_search = MyOwnSearch(tune_params_grid, fit_params, 0.01)
eval_set = [(X_train, y_train, "train"), (X_test, y_test, "test")]
my_search.fit(X_train, y_train, eval_set, 10000, 100, verbose_eval=25)
```

## Additional classes

This package also contains a supplement class `XgbResultDisplay` which makes accessing results of grid search/random search easier. This class uses static methods.

See [display-demo.ipynb](examples/display-demo.ipynb) for examples.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "xgbsearch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "Jakub Deka <jakub.deka@gmail.com>",
    "keywords": "xgboost",
    "author": null,
    "author_email": "Jakub Deka <jakub.deka@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e4/db/abd07dfa30c598258d1fddf5163e1e2b0b4369fb809e55e7013bd1e567a7/xgbsearch-0.1.6.tar.gz",
    "platform": null,
    "description": "# xgbsearch\n\nA pacakge implementing grid search and random search for XGBoost.\n\n## Description\n\nAn implementation of grid search and random search for hyper parameter tunning for `xgboost` that allows for specification of `eval_sets` and usage of early stopping.\n\n## Installation\n\nInstall it via PyPI using `pip` command.\n\n```bash\n# Install or Upgrade to newest available version\n$ pip install -U xgbsearch\n```\n\n## Motivation\n\nCurrent implementation of grid search and random search in scikit-learn (using GridSearchCV/RandomSearchCV) does not allow for specifying evaluation sets that can be easily passed to `XGboost.fit()` and used in early stopping.\n\nThis package significantly simplifies implementation of grid search and random search.\n\nAdditionally, it makes implementation of your own, new search algorithms very simple and provides a consistent interface for running various hyper parameter tunning algorightms while retaining early stopping and eval sets functionality provided by `xgboost`.\n\n## Example usage\n\nThe package provides 3 classes:\n\n* `XgbSearch` - the base super class that contains most of the implementation details.\n* `XgbGridSearch` - implementation of grid search (exchaustive search) algorithm for hyper parameter tunning.\n* `XgbRandomSearch` - implementation of random search algorithm for hyper parameter tunning.\n\n```python\nfrom xgbsearch import XgbGridSearch, XgbRandomSearch\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import train_test_split\nimport pandas as pd\nfrom sklearn.metrics import roc_auc_score\n\nX, y = make_classification(random_state=42)\nX = pd.DataFrame(X)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\n\n# These parameters will be passed to xgb.fit as is.\nfit_params = {\n    \"device\": \"cuda\",\n    \"objective\": \"binary:logistic\",\n    \"eval_metric\": [\"auc\"],\n}\n\n# The parameters here will be tuned. If the parameter is a single value, it will be passed as is.\n# If the parameter is a list, all possible combinations will be searched using grid search.\ntune_params_grid = {\n    \"eta\": [0.01, 0.001],\n    \"max_depth\": [5, 11],\n    \"min_child_weight\": 3,\n}\n\ngrid_search = XgbGridSearch(tune_params_grid, fit_params)\neval_set = [(X_train, y_train, \"train\"), (X_test, y_test, \"test\")]\ngrid_search.fit(X_train, y_train, eval_set, 10000, 100, verbose_eval=25)\n\n# The parameters here will be tuned. If the parameter is a single value, it will be passed as is.\n# If the parameter is a list, during each iteration a single value will be picked from that list.\n# If the parameter is a tuple of two floats, a random value between the two ends will be picked.\n# If the parameter is a tuple of two ints, a random int value between the two ends will be picked.\ntune_params_random = {\n    \"eta\": (0.1, 0.005),\n    \"max_depth\": (5, 11),\n    \"min_child_weight\": [1, 2, 3],\n}\n\nrandom_search = XgbRandomSearch(tune_params_random, fit_params, max_iter_count=3)\neval_set = [(X_train, y_train, \"train\"), (X_test, y_test, \"test\")]\nrandom_search.fit(X_train, y_train, eval_set, 10000, 100, verbose_eval=25)\n\n# You can access the results like this.\nprint(random_search.get_best_model())  # returns the best model object\nprint(\n    random_search.get_best_model_results()\n)  # returns best model results dict with complete results\nprint(random_search.predict(X_test))  # generates predictions for the BEST model\nprint(\n    random_search.score(X_test, y_test, roc_auc_score)\n)  # calculates the score using given function; note the function needs to accept X, y as input\n\n```\n\n## Output\n\nAfter fitting, `XgbGridSearch` and `XgbRandomSearch` will populate a `result` list. This will contain one dict per iteration with complete result of the model run.\n\n```python\n{'model': <xgboost.core.Booster at 0x7f3b00bcd070>, # actual model object, if early stopping is enabled this will be the LAST model fitted, not the best one\n 'parameters': {'eta': 0.05,'colsample_bytree': 0.8,'max_depth': 11,'min_child_weight': 3,'device': 'cuda','objective': 'binary:logistic','eval_metric': ['auc']}, # Parameters passed to xgb.fit\n 'num_boost_round': 10000,'verbose_eval': 10,'early_stopping_rounds': 100},\n # Results of the model fitting process by boosting round\n 'model_training_results': {'TRAIN': OrderedDict([('auc',\n                [0.8943848819579814,\n                 ...\n                 0.9805122373835824,\n                 0.9805555555555555,\n                 0.9806530214424951])]),\n  'train': OrderedDict([('auc',\n                [0.8943848819579814,\n                 ...\n                 0.9805122373835824,\n                 0.9805555555555555,\n                 0.9806530214424951])]),\n  'test': OrderedDict([('auc',\n                [0.8711075249536788,\n                 ...\n                 0.8837786145478453,\n                 0.8839579224194609,\n                 0.8836590759667683])])},\n 'best_iteration': 204, # index of the best iteration, this is significant when using early_stopping\n 'best_score': 0.8865877712031558, # value of last eval_metric on last eval_set for the best model\n 'best_model': <xgboost.core.Booster at 0x7f3b01a75a30>} # actual best model object\n ```\n\n ## Implementing your own search\n\n It is very easy to implement your own search using XgbSearch class.\n\n```python\nfrom xgbsearch import XgbSearch\n\nclass MyOwnSearch(XgbSearch):\n\n    def __init__(self, tune_params, fit_params, add_value, maximise_score=True):\n        super().__init__(tune_params, fit_params, maximise_score)\n        self.add_value = add_value\n\n    def _generate_params(self):\n        # Toy example. Will just take the first parameter and add to it a value specified in the constructor.\n        # this method needs to return a list dicts of parameters that will be passed into xgb.fit()\n        result = []\n        for i in range(3):\n            first_key = list(self.tune_params.keys())[0]\n            loop_result = self.tune_params | self.fit_params\n            loop_result[first_key] = loop_result[first_key] + i * self.add_value\n            result.append(loop_result)\n\n        return result\n\n\n# Run it!\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import train_test_split\nimport pandas as pd\nfrom sklearn.metrics import roc_auc_score\n\nX, y = make_classification(random_state=42)\nX = pd.DataFrame(X)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\n\n# These parameters will be passed to xgb.fit as is.\nfit_params = {\n    \"device\": \"cuda\",\n    \"objective\": \"binary:logistic\",\n    \"eval_metric\": [\"auc\"],\n}\n\n# The parameters here will be tuned. If the parameter is a single value, it will be passed as is.\n# If the parameter is a list, all possible combinations will be searched using grid search.\ntune_params_grid = {\n    \"eta\": 0.01,\n    \"max_depth\": 5,\n    \"min_child_weight\": 3,\n}\n\nmy_search = MyOwnSearch(tune_params_grid, fit_params, 0.01)\neval_set = [(X_train, y_train, \"train\"), (X_test, y_test, \"test\")]\nmy_search.fit(X_train, y_train, eval_set, 10000, 100, verbose_eval=25)\n```\n\n## Additional classes\n\nThis package also contains a supplement class `XgbResultDisplay` which makes accessing results of grid search/random search easier. This class uses static methods.\n\nSee [display-demo.ipynb](examples/display-demo.ipynb) for examples.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Jakub Deka  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "An implementation of grid search, random search for XGBoost with eval_set and early stopping support.",
    "version": "0.1.6",
    "project_urls": {
        "Bug Reports": "https://github.com/jakub-deka/xgbsearch/issues",
        "Homepage": "https://github.com/jakub-deka/xgbsearch",
        "Source": "https://github.com/jakub-deka/xgbsearch"
    },
    "split_keywords": [
        "xgboost"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5d7a8ea990ac34191625d4d4ac91f054541ab72e7bc00e32c530483ec3603913",
                "md5": "93fda9c92f2feeae43be070a390c8144",
                "sha256": "c23c6eb14181a2622e0986edc19b5907726ae4d11c0d5e7fb8d09225acf6cabe"
            },
            "downloads": -1,
            "filename": "xgbsearch-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "93fda9c92f2feeae43be070a390c8144",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 12242,
            "upload_time": "2024-11-12T16:30:19",
            "upload_time_iso_8601": "2024-11-12T16:30:19.714887Z",
            "url": "https://files.pythonhosted.org/packages/5d/7a/8ea990ac34191625d4d4ac91f054541ab72e7bc00e32c530483ec3603913/xgbsearch-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e4dbabd07dfa30c598258d1fddf5163e1e2b0b4369fb809e55e7013bd1e567a7",
                "md5": "3c3b32109f915646fc2d5e2de08af8b9",
                "sha256": "e41273c895c626c677b49fdc7884ac4d78b2526eaf2d518f3b88413d4a6ccae9"
            },
            "downloads": -1,
            "filename": "xgbsearch-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "3c3b32109f915646fc2d5e2de08af8b9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 13209,
            "upload_time": "2024-11-12T16:30:21",
            "upload_time_iso_8601": "2024-11-12T16:30:21.485385Z",
            "url": "https://files.pythonhosted.org/packages/e4/db/abd07dfa30c598258d1fddf5163e1e2b0b4369fb809e55e7013bd1e567a7/xgbsearch-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-12 16:30:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jakub-deka",
    "github_project": "xgbsearch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "xgbsearch"
}

None