clique-ml


Nameclique-ml JSON
Version 0.0.4 PyPI version JSON
download
home_pageNone
SummaryA selective ensemble for predictive models that tests new additions to prevent downgrades in performance.
upload_time2024-11-18 03:55:12
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords selective ensemble tensorflow maching learning cuda
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # clique-ml
A selective ensemble for predictive time-series models that tests new additions to prevent downgrades in performance.

This code was written and tested against a CUDA 12.2 environment; if you run into compatability issues, try setting up a `venv` using [`cuda-venv.sh`](./cuda-venv.sh).

## Usage
### Setup
```
pip install -U clique-ml
```
```
import clique
```
### Training

Create a list of models to train. Supports any class that can call `fit()`, `predict()`, and `get_params()`:
```
import xgboost as xgb
import lightgbm as lgb
import catboost as cat
import tensorflow as tf

models = [
    xgb.XGBRegressor(...),
    lgb.LGBMRegressor(...),
    cat.CatBoostRegressor(...),
    tf.keras.Sequential(...),
]
```
Data is automatically split for training, testing, and validaiton, so simply pass `models`, inputs (`X`) and targets(`y`) to `train_ensemble()`:

```
X, y = ... # preprocessed data; 20% is set aside for validation, and the rest is trained on using k-folds

ensemble = clique.train_ensemble(models, X, y, folds=5, limit=3) # instance of clique.SelectiveEnsemble
```
`folds` sets `n_splits` for [scikit-learn's `TimeSeriesSplit` class](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html), which is used to implement k-folds here. For a single split, pass `folds=1`.

`limit` sets a **soft target** for how many models to include in the ensemble. When set, once exceeded, the ensemble will reject new models that raise its mean score.

By default, the ensemble trains using **5 folds** and **no size limit**.

### Evaluation

`train_ensemble()` will output the results of each sub-model's training on every fold:
```
Pre-training setup...Complete (0.0s)
Model 1/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3      
Model 2/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3       
Model 3/5: Fold 1/5: Accepted with score: 0.03233311 (0.1s) (CatBoostRegressor_1731893049_0)          
Model 3/5: Fold 2/5: Accepted with score: 0.02314115 (0.0s) (CatBoostRegressor_1731893050_1)          
Model 3/5: Fold 3/5: Accepted with score: 0.01777214 (0.0s) (CatBoostRegressor_1731893050_2)
...      
Model 5/5: Fold 2/5: Rejected with score: 0.97019375 (0.3s)                            
Model 5/5: Fold 3/5: Rejected with score: 0.41385662 (1.4s)                         
Model 5/5: Fold 4/5: Rejected with score: 0.41153231 (0.8s)          
Model 5/5: Fold 5/5: Rejected with score: 0.40335007 (1.6s)
```
Once trained, details of the final ensemble can be reviewed with:
```
print(ensemble) # <SelectiveEnsemble (5 model(s); mean: 0.03389993; best: 0.03321487; limit: 3)>
```
Or:
```
print(len(ensemble)) # 5
print(ensemble.mean_score) # 0.033899934449981864
print(ensemble.best_score) # 0.033214874389494775
```
### Pruning
Since `SelectiveEnsemble` has to accept the first *N* models to establish a mean, frontloading with weaker models may cause oversaturation, even when `limit` is set.

To remedy this, call `SelectiveEnsemble.prune()`:
```
pruned = ensemble.prune()
```
Which will return a copy of the ensemble with all sub-models scoring above the mean removed. 

If a `limit` is passed in, the removal of all models above the mean will recurse until that limit is reached:
```
triumvate = ensemble.prune(3)
print(len(ensemble)) # 3 (or less)
```
This recursion is automatic for instances where `SelectiveEnsemble.limit` is set manually or by `train_ensemble()`.

### Deployment
To make predictions, simply call:
```
predictions = ensemble.predict(...) # with a new set of inputs
```
Which will use the mean score across all sub-models for each prediction.

If you wish to continue training on an existing ensemble, use:
```
existing = clique.load_ensemble(X_test=X, y_test=y) # test data must be passed in for new model evaluation
updated = clique.train_ensemble(models, X, y, ensemble=existing)
```
Note that if a limit is set on the existing model, that will be set and enforced on the updated one.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "clique-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "selective, ensemble, tensorflow, maching learning, cuda",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/61/49/85d480ecd996be70819cd19c0ee25e25a6203c51c32091402113cb75e52d/clique_ml-0.0.4.tar.gz",
    "platform": null,
    "description": "# clique-ml\nA selective ensemble for predictive time-series models that tests new additions to prevent downgrades in performance.\n\nThis code was written and tested against a CUDA 12.2 environment; if you run into compatability issues, try setting up a `venv` using [`cuda-venv.sh`](./cuda-venv.sh).\n\n## Usage\n### Setup\n```\npip install -U clique-ml\n```\n```\nimport clique\n```\n### Training\n\nCreate a list of models to train. Supports any class that can call `fit()`, `predict()`, and `get_params()`:\n```\nimport xgboost as xgb\nimport lightgbm as lgb\nimport catboost as cat\nimport tensorflow as tf\n\nmodels = [\n    xgb.XGBRegressor(...),\n    lgb.LGBMRegressor(...),\n    cat.CatBoostRegressor(...),\n    tf.keras.Sequential(...),\n]\n```\nData is automatically split for training, testing, and validaiton, so simply pass `models`, inputs (`X`) and targets(`y`) to `train_ensemble()`:\n\n```\nX, y = ... # preprocessed data; 20% is set aside for validation, and the rest is trained on using k-folds\n\nensemble = clique.train_ensemble(models, X, y, folds=5, limit=3) # instance of clique.SelectiveEnsemble\n```\n`folds` sets `n_splits` for [scikit-learn's `TimeSeriesSplit` class](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html), which is used to implement k-folds here. For a single split, pass `folds=1`.\n\n`limit` sets a **soft target** for how many models to include in the ensemble. When set, once exceeded, the ensemble will reject new models that raise its mean score.\n\nBy default, the ensemble trains using **5 folds** and **no size limit**.\n\n### Evaluation\n\n`train_ensemble()` will output the results of each sub-model's training on every fold:\n```\nPre-training setup...Complete (0.0s)\nModel 1/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3      \nModel 2/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3       \nModel 3/5: Fold 1/5: Accepted with score: 0.03233311 (0.1s) (CatBoostRegressor_1731893049_0)          \nModel 3/5: Fold 2/5: Accepted with score: 0.02314115 (0.0s) (CatBoostRegressor_1731893050_1)          \nModel 3/5: Fold 3/5: Accepted with score: 0.01777214 (0.0s) (CatBoostRegressor_1731893050_2)\n...      \nModel 5/5: Fold 2/5: Rejected with score: 0.97019375 (0.3s)                            \nModel 5/5: Fold 3/5: Rejected with score: 0.41385662 (1.4s)                         \nModel 5/5: Fold 4/5: Rejected with score: 0.41153231 (0.8s)          \nModel 5/5: Fold 5/5: Rejected with score: 0.40335007 (1.6s)\n```\nOnce trained, details of the final ensemble can be reviewed with:\n```\nprint(ensemble) # <SelectiveEnsemble (5 model(s); mean: 0.03389993; best: 0.03321487; limit: 3)>\n```\nOr:\n```\nprint(len(ensemble)) # 5\nprint(ensemble.mean_score) # 0.033899934449981864\nprint(ensemble.best_score) # 0.033214874389494775\n```\n### Pruning\nSince `SelectiveEnsemble` has to accept the first *N* models to establish a mean, frontloading with weaker models may cause oversaturation, even when `limit` is set.\n\nTo remedy this, call `SelectiveEnsemble.prune()`:\n```\npruned = ensemble.prune()\n```\nWhich will return a copy of the ensemble with all sub-models scoring above the mean removed. \n\nIf a `limit` is passed in, the removal of all models above the mean will recurse until that limit is reached:\n```\ntriumvate = ensemble.prune(3)\nprint(len(ensemble)) # 3 (or less)\n```\nThis recursion is automatic for instances where `SelectiveEnsemble.limit` is set manually or by `train_ensemble()`.\n\n### Deployment\nTo make predictions, simply call:\n```\npredictions = ensemble.predict(...) # with a new set of inputs\n```\nWhich will use the mean score across all sub-models for each prediction.\n\nIf you wish to continue training on an existing ensemble, use:\n```\nexisting = clique.load_ensemble(X_test=X, y_test=y) # test data must be passed in for new model evaluation\nupdated = clique.train_ensemble(models, X, y, ensemble=existing)\n```\nNote that if a limit is set on the existing model, that will be set and enforced on the updated one.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A selective ensemble for predictive models that tests new additions to prevent downgrades in performance.",
    "version": "0.0.4",
    "project_urls": {
        "Source": "https://github.com/whitgroves/clique-ml"
    },
    "split_keywords": [
        "selective",
        " ensemble",
        " tensorflow",
        " maching learning",
        " cuda"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2369bcc9a7cd74bbde11b1bdfa7c31167f6b246d39ee0657e070cb0042a83ba7",
                "md5": "dcd5eb4489db1ee4056c198869dbbf86",
                "sha256": "b2347fe07ac5f8641ea7fa69164a5744aa8b1a35b01fe0c54a38818cadc42c0c"
            },
            "downloads": -1,
            "filename": "clique_ml-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dcd5eb4489db1ee4056c198869dbbf86",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 6881,
            "upload_time": "2024-11-18T03:55:11",
            "upload_time_iso_8601": "2024-11-18T03:55:11.045918Z",
            "url": "https://files.pythonhosted.org/packages/23/69/bcc9a7cd74bbde11b1bdfa7c31167f6b246d39ee0657e070cb0042a83ba7/clique_ml-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "614985d480ecd996be70819cd19c0ee25e25a6203c51c32091402113cb75e52d",
                "md5": "eea1d606bf4caad0ea459b919a2d2325",
                "sha256": "b8effffa395239f12f6aa931cdaccc85b0f33924166b65b91d51ff74294ee76a"
            },
            "downloads": -1,
            "filename": "clique_ml-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "eea1d606bf4caad0ea459b919a2d2325",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 7552,
            "upload_time": "2024-11-18T03:55:12",
            "upload_time_iso_8601": "2024-11-18T03:55:12.539086Z",
            "url": "https://files.pythonhosted.org/packages/61/49/85d480ecd996be70819cd19c0ee25e25a6203c51c32091402113cb75e52d/clique_ml-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-18 03:55:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "whitgroves",
    "github_project": "clique-ml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "clique-ml"
}
        
Elapsed time: 1.77905s