Name | clique-ml JSON |
Version |
0.0.4
JSON |
| download |
home_page | None |
Summary | A selective ensemble for predictive models that tests new additions to prevent downgrades in performance. |
upload_time | 2024-11-18 03:55:12 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
selective
ensemble
tensorflow
maching learning
cuda
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# clique-ml
A selective ensemble for predictive time-series models that tests new additions to prevent downgrades in performance.
This code was written and tested against a CUDA 12.2 environment; if you run into compatability issues, try setting up a `venv` using [`cuda-venv.sh`](./cuda-venv.sh).
## Usage
### Setup
```
pip install -U clique-ml
```
```
import clique
```
### Training
Create a list of models to train. Supports any class that can call `fit()`, `predict()`, and `get_params()`:
```
import xgboost as xgb
import lightgbm as lgb
import catboost as cat
import tensorflow as tf
models = [
xgb.XGBRegressor(...),
lgb.LGBMRegressor(...),
cat.CatBoostRegressor(...),
tf.keras.Sequential(...),
]
```
Data is automatically split for training, testing, and validaiton, so simply pass `models`, inputs (`X`) and targets(`y`) to `train_ensemble()`:
```
X, y = ... # preprocessed data; 20% is set aside for validation, and the rest is trained on using k-folds
ensemble = clique.train_ensemble(models, X, y, folds=5, limit=3) # instance of clique.SelectiveEnsemble
```
`folds` sets `n_splits` for [scikit-learn's `TimeSeriesSplit` class](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html), which is used to implement k-folds here. For a single split, pass `folds=1`.
`limit` sets a **soft target** for how many models to include in the ensemble. When set, once exceeded, the ensemble will reject new models that raise its mean score.
By default, the ensemble trains using **5 folds** and **no size limit**.
### Evaluation
`train_ensemble()` will output the results of each sub-model's training on every fold:
```
Pre-training setup...Complete (0.0s)
Model 1/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3
Model 2/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3
Model 3/5: Fold 1/5: Accepted with score: 0.03233311 (0.1s) (CatBoostRegressor_1731893049_0)
Model 3/5: Fold 2/5: Accepted with score: 0.02314115 (0.0s) (CatBoostRegressor_1731893050_1)
Model 3/5: Fold 3/5: Accepted with score: 0.01777214 (0.0s) (CatBoostRegressor_1731893050_2)
...
Model 5/5: Fold 2/5: Rejected with score: 0.97019375 (0.3s)
Model 5/5: Fold 3/5: Rejected with score: 0.41385662 (1.4s)
Model 5/5: Fold 4/5: Rejected with score: 0.41153231 (0.8s)
Model 5/5: Fold 5/5: Rejected with score: 0.40335007 (1.6s)
```
Once trained, details of the final ensemble can be reviewed with:
```
print(ensemble) # <SelectiveEnsemble (5 model(s); mean: 0.03389993; best: 0.03321487; limit: 3)>
```
Or:
```
print(len(ensemble)) # 5
print(ensemble.mean_score) # 0.033899934449981864
print(ensemble.best_score) # 0.033214874389494775
```
### Pruning
Since `SelectiveEnsemble` has to accept the first *N* models to establish a mean, frontloading with weaker models may cause oversaturation, even when `limit` is set.
To remedy this, call `SelectiveEnsemble.prune()`:
```
pruned = ensemble.prune()
```
Which will return a copy of the ensemble with all sub-models scoring above the mean removed.
If a `limit` is passed in, the removal of all models above the mean will recurse until that limit is reached:
```
triumvate = ensemble.prune(3)
print(len(ensemble)) # 3 (or less)
```
This recursion is automatic for instances where `SelectiveEnsemble.limit` is set manually or by `train_ensemble()`.
### Deployment
To make predictions, simply call:
```
predictions = ensemble.predict(...) # with a new set of inputs
```
Which will use the mean score across all sub-models for each prediction.
If you wish to continue training on an existing ensemble, use:
```
existing = clique.load_ensemble(X_test=X, y_test=y) # test data must be passed in for new model evaluation
updated = clique.train_ensemble(models, X, y, ensemble=existing)
```
Note that if a limit is set on the existing model, that will be set and enforced on the updated one.
Raw data
{
"_id": null,
"home_page": null,
"name": "clique-ml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "selective, ensemble, tensorflow, maching learning, cuda",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/61/49/85d480ecd996be70819cd19c0ee25e25a6203c51c32091402113cb75e52d/clique_ml-0.0.4.tar.gz",
"platform": null,
"description": "# clique-ml\nA selective ensemble for predictive time-series models that tests new additions to prevent downgrades in performance.\n\nThis code was written and tested against a CUDA 12.2 environment; if you run into compatability issues, try setting up a `venv` using [`cuda-venv.sh`](./cuda-venv.sh).\n\n## Usage\n### Setup\n```\npip install -U clique-ml\n```\n```\nimport clique\n```\n### Training\n\nCreate a list of models to train. Supports any class that can call `fit()`, `predict()`, and `get_params()`:\n```\nimport xgboost as xgb\nimport lightgbm as lgb\nimport catboost as cat\nimport tensorflow as tf\n\nmodels = [\n xgb.XGBRegressor(...),\n lgb.LGBMRegressor(...),\n cat.CatBoostRegressor(...),\n tf.keras.Sequential(...),\n]\n```\nData is automatically split for training, testing, and validaiton, so simply pass `models`, inputs (`X`) and targets(`y`) to `train_ensemble()`:\n\n```\nX, y = ... # preprocessed data; 20% is set aside for validation, and the rest is trained on using k-folds\n\nensemble = clique.train_ensemble(models, X, y, folds=5, limit=3) # instance of clique.SelectiveEnsemble\n```\n`folds` sets `n_splits` for [scikit-learn's `TimeSeriesSplit` class](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html), which is used to implement k-folds here. For a single split, pass `folds=1`.\n\n`limit` sets a **soft target** for how many models to include in the ensemble. When set, once exceeded, the ensemble will reject new models that raise its mean score.\n\nBy default, the ensemble trains using **5 folds** and **no size limit**.\n\n### Evaluation\n\n`train_ensemble()` will output the results of each sub-model's training on every fold:\n```\nPre-training setup...Complete (0.0s)\nModel 1/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3 \nModel 2/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3 \nModel 3/5: Fold 1/5: Accepted with score: 0.03233311 (0.1s) (CatBoostRegressor_1731893049_0) \nModel 3/5: Fold 2/5: Accepted with score: 0.02314115 (0.0s) (CatBoostRegressor_1731893050_1) \nModel 3/5: Fold 3/5: Accepted with score: 0.01777214 (0.0s) (CatBoostRegressor_1731893050_2)\n... \nModel 5/5: Fold 2/5: Rejected with score: 0.97019375 (0.3s) \nModel 5/5: Fold 3/5: Rejected with score: 0.41385662 (1.4s) \nModel 5/5: Fold 4/5: Rejected with score: 0.41153231 (0.8s) \nModel 5/5: Fold 5/5: Rejected with score: 0.40335007 (1.6s)\n```\nOnce trained, details of the final ensemble can be reviewed with:\n```\nprint(ensemble) # <SelectiveEnsemble (5 model(s); mean: 0.03389993; best: 0.03321487; limit: 3)>\n```\nOr:\n```\nprint(len(ensemble)) # 5\nprint(ensemble.mean_score) # 0.033899934449981864\nprint(ensemble.best_score) # 0.033214874389494775\n```\n### Pruning\nSince `SelectiveEnsemble` has to accept the first *N* models to establish a mean, frontloading with weaker models may cause oversaturation, even when `limit` is set.\n\nTo remedy this, call `SelectiveEnsemble.prune()`:\n```\npruned = ensemble.prune()\n```\nWhich will return a copy of the ensemble with all sub-models scoring above the mean removed. \n\nIf a `limit` is passed in, the removal of all models above the mean will recurse until that limit is reached:\n```\ntriumvate = ensemble.prune(3)\nprint(len(ensemble)) # 3 (or less)\n```\nThis recursion is automatic for instances where `SelectiveEnsemble.limit` is set manually or by `train_ensemble()`.\n\n### Deployment\nTo make predictions, simply call:\n```\npredictions = ensemble.predict(...) # with a new set of inputs\n```\nWhich will use the mean score across all sub-models for each prediction.\n\nIf you wish to continue training on an existing ensemble, use:\n```\nexisting = clique.load_ensemble(X_test=X, y_test=y) # test data must be passed in for new model evaluation\nupdated = clique.train_ensemble(models, X, y, ensemble=existing)\n```\nNote that if a limit is set on the existing model, that will be set and enforced on the updated one.\n",
"bugtrack_url": null,
"license": null,
"summary": "A selective ensemble for predictive models that tests new additions to prevent downgrades in performance.",
"version": "0.0.4",
"project_urls": {
"Source": "https://github.com/whitgroves/clique-ml"
},
"split_keywords": [
"selective",
" ensemble",
" tensorflow",
" maching learning",
" cuda"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2369bcc9a7cd74bbde11b1bdfa7c31167f6b246d39ee0657e070cb0042a83ba7",
"md5": "dcd5eb4489db1ee4056c198869dbbf86",
"sha256": "b2347fe07ac5f8641ea7fa69164a5744aa8b1a35b01fe0c54a38818cadc42c0c"
},
"downloads": -1,
"filename": "clique_ml-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dcd5eb4489db1ee4056c198869dbbf86",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 6881,
"upload_time": "2024-11-18T03:55:11",
"upload_time_iso_8601": "2024-11-18T03:55:11.045918Z",
"url": "https://files.pythonhosted.org/packages/23/69/bcc9a7cd74bbde11b1bdfa7c31167f6b246d39ee0657e070cb0042a83ba7/clique_ml-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "614985d480ecd996be70819cd19c0ee25e25a6203c51c32091402113cb75e52d",
"md5": "eea1d606bf4caad0ea459b919a2d2325",
"sha256": "b8effffa395239f12f6aa931cdaccc85b0f33924166b65b91d51ff74294ee76a"
},
"downloads": -1,
"filename": "clique_ml-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "eea1d606bf4caad0ea459b919a2d2325",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 7552,
"upload_time": "2024-11-18T03:55:12",
"upload_time_iso_8601": "2024-11-18T03:55:12.539086Z",
"url": "https://files.pythonhosted.org/packages/61/49/85d480ecd996be70819cd19c0ee25e25a6203c51c32091402113cb75e52d/clique_ml-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-18 03:55:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "whitgroves",
"github_project": "clique-ml",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "clique-ml"
}