Name | clique-ml JSON |
Version |
0.0.11
JSON |
| download |
home_page | None |
Summary | A selective ensemble for predictive models that tests new additions to prevent downgrades in performance. |
upload_time | 2025-07-23 05:25:12 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
selective
ensemble
tensorflow
maching learning
cuda
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# clique-ml
An ensemble for machine learning models that, when provided with test data, validates new additions to prevent downgrades in performance.
The container class `Clique` is compatible with any model that has both `fit` and `predict` methiods, although it has only been tested against regression models included in `tensorflow.`, `lightgbm`, `xgboost`, and `catboost`.
This code was written against a CUDA 12.2 environment; if you run into compatability issues, [`cuda-venv.sh`](./cuda-venv.sh) can be used to setup a virtualenv on linux.
All code is unlicensed and freely available for anyone to use. If you run into issues, please contact me on X: [@whitgroves](https://x.com/whitgroves)
## Classes
`clique` makes several classes available to the developer:
- `IModel`: A `Protocol` that acts as an interface for any model matching `scikit-learn`'s estimator API (`fit` and `predict`).
- `ModelProfile`: A wrapper class to bundle any `IModel` with `fit` and `predict` keyword arguments, plus error scores post-evaluation.
- `EvaluationError`: An error class for exceptions specific to model evaluation.
- `EvaluationWarning`: A warning class for states that may raise exceptions during model evaluation.
- `Clique`: A container class that provides the main functionality of the package, detailed below. Also supports `IModel`.
## Usage
### Installation
The package can be installed with `pip`:
```
pip install -U clique-ml
```
### Setup
First, prepare your data:
```
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('./training_data.csv')
... # preprocessing
y = df['target_variable']
X = df.drop(['target_variable'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=489)
```
Then, instantiate the models that will form the core of the ensemble:
```
import xgboost as xgb
import lightgbm as lgb
import catboost as cat
import tensorflow as tf
models = [
xgb.XGBRegressor(...),
lgb.LGBMRegressor(...),
cat.CatBoostRegressor(...),
tf.keras.Sequential(...),
]
```
And load those models into your ensemble:
```
from clique import Clique
ensemble = Clique(models=models)
```
Now you can run your training loop (in this example, for time series data):
```
from sklearn.model_selection import TimeSeriesSplit
for i, (i_train, i_valid) in enumerate(TimeSeriesSplit().split(X_train)):
val_data = [(X_train.iloc[i_valid, :], y_train.iloc[i_valid])]
for model in ensemble: # for ModelProfile in Clique
fit_kw = dict()
predict_kw = dict()
match model.model_type:
case 'Sequential':
if i == 0: model.compile(optimizer='adam', loss='mae')
keras_kw = dict(verbose=0, batch_size=256)
fit_kw.update(keras_kw)
predict_kw.update(keras_kw)
case 'LGBMRegressor':
fit_kw.update(dict(eval_set=val_data, eval_metric='l1'))
case 'XGBRegressor' | 'CatBoostRegressor':
fit_kw.update(dict(verbose=0, eval_set=val_data))
model.fit_kw = fit_kw
model.predict_kw = predict_kw
ensemble.fit(X_train.iloc[i_train, :], y_train.iloc[i_train])
```
And start to make predictions with the ensemble:
```
predictions = ensemble.predict(X_test)
```
### Evaluation
Once `fit` has been called, `Clique` can be evaluated and pruned to improve performance.
To start, load your test data into the ensemble:
```
ensemble.inputs = X_test
ensemble.targets = y_test
```
Note that once either `inputs` or `targets` is set, the ensemble will not accept assignments to the other attribute if the data length does not match (e.g., if `inputs` has 40 rows, setting `targets` with data for 39 rows will raise a `ValueError`).
Consequently, the safest way to swap out testing data is through `reset_test_data`:
```
ensemble.reset_test_data(inputs=X_test, targets=y_test)
```
Which will first clear existing test data before assigning the new values (or clearing them, if no parameters are passed).
Then, call `evaluate` to score your models:
```
ensemble.evaluate()
```
Which will populate the `mean_score`, `best_score`, and `best_model` properties for the ensemble:
```
ensemble. # <Clique (5 model(s); limit: none)>
ensemble.mean_score # 0.31446850398821063
ensemble.best_score # 0.033214874389494775
ensemble.best_model # <ModelProfile (CatBoostRegressor)>
```
An enable the `prune` function to return a copy of the ensemble with all models scoring above the mean removed:
```
triad = ensemble.prune(3) # <Clique (3 model(s); limit: 3)>
triad.mean_score # 0.03373027543351623
```
### Deployment
Once trained and evaluated, the ensemble's models can be saved for later:
```
ensemble.save('.models/')
```
Which will save copies of each underlying model to be used elsewhere, or reloaded into another ensemble:
```
Clique().load('.models/') # fresh ensemble with no duplications
ensemble.load('.models/') # will duplicate all models in the collection
triad.load('.models/') # will duplicate any saved models still in the collection
```
Also note that similar to `fit`, calls to `load` will enable evaluation and pruning, assuming all loaded models were previously trained.
Raw data
{
"_id": null,
"home_page": null,
"name": "clique-ml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "selective, ensemble, tensorflow, maching learning, cuda",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/45/5b/5e88ce443e7a4357c483ba95948f888f9e3a90e3ad953bb03222ef8099cf/clique_ml-0.0.11.tar.gz",
"platform": null,
"description": "# clique-ml\nAn ensemble for machine learning models that, when provided with test data, validates new additions to prevent downgrades in performance.\n\nThe container class `Clique` is compatible with any model that has both `fit` and `predict` methiods, although it has only been tested against regression models included in `tensorflow.`, `lightgbm`, `xgboost`, and `catboost`.\n\nThis code was written against a CUDA 12.2 environment; if you run into compatability issues, [`cuda-venv.sh`](./cuda-venv.sh) can be used to setup a virtualenv on linux.\n\nAll code is unlicensed and freely available for anyone to use. If you run into issues, please contact me on X: [@whitgroves](https://x.com/whitgroves)\n\n## Classes\n\n`clique` makes several classes available to the developer:\n\n- `IModel`: A `Protocol` that acts as an interface for any model matching `scikit-learn`'s estimator API (`fit` and `predict`).\n\n- `ModelProfile`: A wrapper class to bundle any `IModel` with `fit` and `predict` keyword arguments, plus error scores post-evaluation.\n\n- `EvaluationError`: An error class for exceptions specific to model evaluation.\n\n- `EvaluationWarning`: A warning class for states that may raise exceptions during model evaluation.\n\n- `Clique`: A container class that provides the main functionality of the package, detailed below. Also supports `IModel`.\n\n## Usage\n\n### Installation\n\nThe package can be installed with `pip`:\n\n```\npip install -U clique-ml\n```\n\n### Setup\n\nFirst, prepare your data:\n\n```\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\n\ndf = pd.read_csv('./training_data.csv')\n... # preprocessing\ny = df['target_variable']\nX = df.drop(['target_variable'], axis=1)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=489)\n```\n\nThen, instantiate the models that will form the core of the ensemble:\n\n```\nimport xgboost as xgb\nimport lightgbm as lgb\nimport catboost as cat\nimport tensorflow as tf\n\nmodels = [\n xgb.XGBRegressor(...),\n lgb.LGBMRegressor(...),\n cat.CatBoostRegressor(...),\n tf.keras.Sequential(...),\n]\n```\n\nAnd load those models into your ensemble:\n\n```\nfrom clique import Clique\n\nensemble = Clique(models=models)\n```\n\nNow you can run your training loop (in this example, for time series data):\n\n```\nfrom sklearn.model_selection import TimeSeriesSplit\n\nfor i, (i_train, i_valid) in enumerate(TimeSeriesSplit().split(X_train)):\n val_data = [(X_train.iloc[i_valid, :], y_train.iloc[i_valid])]\n for model in ensemble: # for ModelProfile in Clique\n fit_kw = dict()\n predict_kw = dict()\n match model.model_type:\n case 'Sequential':\n if i == 0: model.compile(optimizer='adam', loss='mae')\n keras_kw = dict(verbose=0, batch_size=256)\n fit_kw.update(keras_kw)\n predict_kw.update(keras_kw)\n case 'LGBMRegressor':\n fit_kw.update(dict(eval_set=val_data, eval_metric='l1'))\n case 'XGBRegressor' | 'CatBoostRegressor':\n fit_kw.update(dict(verbose=0, eval_set=val_data))\n model.fit_kw = fit_kw\n model.predict_kw = predict_kw\n ensemble.fit(X_train.iloc[i_train, :], y_train.iloc[i_train])\n```\n\nAnd start to make predictions with the ensemble:\n\n```\npredictions = ensemble.predict(X_test)\n```\n\n### Evaluation\n\nOnce `fit` has been called, `Clique` can be evaluated and pruned to improve performance. \n\nTo start, load your test data into the ensemble:\n\n```\nensemble.inputs = X_test\nensemble.targets = y_test\n```\n\nNote that once either `inputs` or `targets` is set, the ensemble will not accept assignments to the other attribute if the data length does not match (e.g., if `inputs` has 40 rows, setting `targets` with data for 39 rows will raise a `ValueError`).\n\nConsequently, the safest way to swap out testing data is through `reset_test_data`:\n\n```\nensemble.reset_test_data(inputs=X_test, targets=y_test)\n```\n\nWhich will first clear existing test data before assigning the new values (or clearing them, if no parameters are passed).\n\nThen, call `evaluate` to score your models:\n\n```\nensemble.evaluate()\n```\n\nWhich will populate the `mean_score`, `best_score`, and `best_model` properties for the ensemble:\n\n```\nensemble. # <Clique (5 model(s); limit: none)>\nensemble.mean_score # 0.31446850398821063\nensemble.best_score # 0.033214874389494775\nensemble.best_model # <ModelProfile (CatBoostRegressor)>\n```\n\nAn enable the `prune` function to return a copy of the ensemble with all models scoring above the mean removed:\n\n```\ntriad = ensemble.prune(3) # <Clique (3 model(s); limit: 3)>\ntriad.mean_score # 0.03373027543351623\n```\n\n### Deployment\n\nOnce trained and evaluated, the ensemble's models can be saved for later:\n\n```\nensemble.save('.models/')\n```\n\nWhich will save copies of each underlying model to be used elsewhere, or reloaded into another ensemble:\n\n```\nClique().load('.models/') # fresh ensemble with no duplications\nensemble.load('.models/') # will duplicate all models in the collection\ntriad.load('.models/') # will duplicate any saved models still in the collection\n```\n\nAlso note that similar to `fit`, calls to `load` will enable evaluation and pruning, assuming all loaded models were previously trained.\n",
"bugtrack_url": null,
"license": null,
"summary": "A selective ensemble for predictive models that tests new additions to prevent downgrades in performance.",
"version": "0.0.11",
"project_urls": {
"Source": "https://github.com/whitgroves/clique-ml"
},
"split_keywords": [
"selective",
" ensemble",
" tensorflow",
" maching learning",
" cuda"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "acbaa7613ac9820c5cbd3c2cdd55b1d0f4a3737e9450a84a406b0cb08bbb4bf1",
"md5": "58beac2cb8bd8b2aa0eb41ec2c838ef0",
"sha256": "db3eeee4031e916523fff2669c7de42cbf737fa97aab0bef3212a7dd2f03e5c8"
},
"downloads": -1,
"filename": "clique_ml-0.0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "58beac2cb8bd8b2aa0eb41ec2c838ef0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 8998,
"upload_time": "2025-07-23T05:25:10",
"upload_time_iso_8601": "2025-07-23T05:25:10.834941Z",
"url": "https://files.pythonhosted.org/packages/ac/ba/a7613ac9820c5cbd3c2cdd55b1d0f4a3737e9450a84a406b0cb08bbb4bf1/clique_ml-0.0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "455b5e88ce443e7a4357c483ba95948f888f9e3a90e3ad953bb03222ef8099cf",
"md5": "1728aa0bd3cf2587decd30dcaa2fd799",
"sha256": "33a788571a751d78ef7c79f4b6a37c47c841ca27530e78af620613a29b762639"
},
"downloads": -1,
"filename": "clique_ml-0.0.11.tar.gz",
"has_sig": false,
"md5_digest": "1728aa0bd3cf2587decd30dcaa2fd799",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 11364,
"upload_time": "2025-07-23T05:25:12",
"upload_time_iso_8601": "2025-07-23T05:25:12.209764Z",
"url": "https://files.pythonhosted.org/packages/45/5b/5e88ce443e7a4357c483ba95948f888f9e3a90e3ad953bb03222ef8099cf/clique_ml-0.0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-23 05:25:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "whitgroves",
"github_project": "clique-ml",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "clique-ml"
}