clique-ml


Nameclique-ml JSON
Version 0.0.11 PyPI version JSON
download
home_pageNone
SummaryA selective ensemble for predictive models that tests new additions to prevent downgrades in performance.
upload_time2025-07-23 05:25:12
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords selective ensemble tensorflow maching learning cuda
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # clique-ml
An ensemble for machine learning models that, when provided with test data, validates new additions to prevent downgrades in performance.

The container class `Clique` is compatible with any model that has both `fit` and `predict` methiods, although it has only been tested against regression models included in `tensorflow.`, `lightgbm`, `xgboost`, and `catboost`.

This code was written against a CUDA 12.2 environment; if you run into compatability issues, [`cuda-venv.sh`](./cuda-venv.sh) can be used to setup a virtualenv on linux.

All code is unlicensed and freely available for anyone to use. If you run into issues, please contact me on X: [@whitgroves](https://x.com/whitgroves)

## Classes

`clique` makes several classes available to the developer:

- `IModel`: A `Protocol` that acts as an interface for any model matching `scikit-learn`'s estimator API (`fit` and `predict`).

- `ModelProfile`: A wrapper class to bundle any `IModel` with `fit` and `predict` keyword arguments, plus error scores post-evaluation.

- `EvaluationError`: An error class for exceptions specific to model evaluation.

- `EvaluationWarning`: A warning class for states that may raise exceptions during model evaluation.

- `Clique`: A container class that provides the main functionality of the package, detailed below. Also supports `IModel`.

## Usage

### Installation

The package can be installed with `pip`:

```
pip install -U clique-ml
```

### Setup

First, prepare your data:

```
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv('./training_data.csv')
... # preprocessing
y = df['target_variable']
X = df.drop(['target_variable'], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=489)
```

Then, instantiate the models that will form the core of the ensemble:

```
import xgboost as xgb
import lightgbm as lgb
import catboost as cat
import tensorflow as tf

models = [
    xgb.XGBRegressor(...),
    lgb.LGBMRegressor(...),
    cat.CatBoostRegressor(...),
    tf.keras.Sequential(...),
]
```

And load those models into your ensemble:

```
from clique import Clique

ensemble = Clique(models=models)
```

Now you can run your training loop (in this example, for time series data):

```
from sklearn.model_selection import TimeSeriesSplit

for i, (i_train, i_valid) in enumerate(TimeSeriesSplit().split(X_train)):
    val_data = [(X_train.iloc[i_valid, :], y_train.iloc[i_valid])]
    for model in ensemble: # for ModelProfile in Clique
        fit_kw = dict()
        predict_kw = dict()
        match model.model_type:
            case 'Sequential':
                if i == 0: model.compile(optimizer='adam', loss='mae')
                keras_kw = dict(verbose=0, batch_size=256)
                fit_kw.update(keras_kw)
                predict_kw.update(keras_kw)
            case 'LGBMRegressor':
                fit_kw.update(dict(eval_set=val_data, eval_metric='l1'))
            case 'XGBRegressor' | 'CatBoostRegressor':
                fit_kw.update(dict(verbose=0, eval_set=val_data))
        model.fit_kw = fit_kw
        model.predict_kw = predict_kw
    ensemble.fit(X_train.iloc[i_train, :], y_train.iloc[i_train])
```

And start to make predictions with the ensemble:

```
predictions = ensemble.predict(X_test)
```

### Evaluation

Once `fit` has been called, `Clique` can be evaluated and pruned to improve performance. 

To start, load your test data into the ensemble:

```
ensemble.inputs = X_test
ensemble.targets = y_test
```

Note that once either `inputs` or `targets` is set, the ensemble will not accept assignments to the other attribute if the data length does not match (e.g., if `inputs` has 40 rows, setting `targets` with data for 39 rows will raise a `ValueError`).

Consequently, the safest way to swap out testing data is through `reset_test_data`:

```
ensemble.reset_test_data(inputs=X_test, targets=y_test)
```

Which will first clear existing test data before assigning the new values (or clearing them, if no parameters are passed).

Then, call `evaluate` to score your models:

```
ensemble.evaluate()
```

Which will populate the `mean_score`, `best_score`, and `best_model` properties for the ensemble:

```
ensemble.           # <Clique (5 model(s); limit: none)>
ensemble.mean_score # 0.31446850398821063
ensemble.best_score # 0.033214874389494775
ensemble.best_model # <ModelProfile (CatBoostRegressor)>
```

An enable the `prune` function to return a copy of the ensemble with all models scoring above the mean removed:

```
triad = ensemble.prune(3) # <Clique (3 model(s); limit: 3)>
triad.mean_score          # 0.03373027543351623
```

### Deployment

Once trained and evaluated, the ensemble's models can be saved for later:

```
ensemble.save('.models/')
```

Which will save copies of each underlying model to be used elsewhere, or reloaded into another ensemble:

```
Clique().load('.models/') # fresh ensemble with no duplications
ensemble.load('.models/') # will duplicate all models in the collection
triad.load('.models/')    # will duplicate any saved models still in the collection
```

Also note that similar to `fit`, calls to `load` will enable evaluation and pruning, assuming all loaded models were previously trained.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "clique-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "selective, ensemble, tensorflow, maching learning, cuda",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/45/5b/5e88ce443e7a4357c483ba95948f888f9e3a90e3ad953bb03222ef8099cf/clique_ml-0.0.11.tar.gz",
    "platform": null,
    "description": "# clique-ml\nAn ensemble for machine learning models that, when provided with test data, validates new additions to prevent downgrades in performance.\n\nThe container class `Clique` is compatible with any model that has both `fit` and `predict` methiods, although it has only been tested against regression models included in `tensorflow.`, `lightgbm`, `xgboost`, and `catboost`.\n\nThis code was written against a CUDA 12.2 environment; if you run into compatability issues, [`cuda-venv.sh`](./cuda-venv.sh) can be used to setup a virtualenv on linux.\n\nAll code is unlicensed and freely available for anyone to use. If you run into issues, please contact me on X: [@whitgroves](https://x.com/whitgroves)\n\n## Classes\n\n`clique` makes several classes available to the developer:\n\n- `IModel`: A `Protocol` that acts as an interface for any model matching `scikit-learn`'s estimator API (`fit` and `predict`).\n\n- `ModelProfile`: A wrapper class to bundle any `IModel` with `fit` and `predict` keyword arguments, plus error scores post-evaluation.\n\n- `EvaluationError`: An error class for exceptions specific to model evaluation.\n\n- `EvaluationWarning`: A warning class for states that may raise exceptions during model evaluation.\n\n- `Clique`: A container class that provides the main functionality of the package, detailed below. Also supports `IModel`.\n\n## Usage\n\n### Installation\n\nThe package can be installed with `pip`:\n\n```\npip install -U clique-ml\n```\n\n### Setup\n\nFirst, prepare your data:\n\n```\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\n\ndf = pd.read_csv('./training_data.csv')\n... # preprocessing\ny = df['target_variable']\nX = df.drop(['target_variable'], axis=1)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=489)\n```\n\nThen, instantiate the models that will form the core of the ensemble:\n\n```\nimport xgboost as xgb\nimport lightgbm as lgb\nimport catboost as cat\nimport tensorflow as tf\n\nmodels = [\n    xgb.XGBRegressor(...),\n    lgb.LGBMRegressor(...),\n    cat.CatBoostRegressor(...),\n    tf.keras.Sequential(...),\n]\n```\n\nAnd load those models into your ensemble:\n\n```\nfrom clique import Clique\n\nensemble = Clique(models=models)\n```\n\nNow you can run your training loop (in this example, for time series data):\n\n```\nfrom sklearn.model_selection import TimeSeriesSplit\n\nfor i, (i_train, i_valid) in enumerate(TimeSeriesSplit().split(X_train)):\n    val_data = [(X_train.iloc[i_valid, :], y_train.iloc[i_valid])]\n    for model in ensemble: # for ModelProfile in Clique\n        fit_kw = dict()\n        predict_kw = dict()\n        match model.model_type:\n            case 'Sequential':\n                if i == 0: model.compile(optimizer='adam', loss='mae')\n                keras_kw = dict(verbose=0, batch_size=256)\n                fit_kw.update(keras_kw)\n                predict_kw.update(keras_kw)\n            case 'LGBMRegressor':\n                fit_kw.update(dict(eval_set=val_data, eval_metric='l1'))\n            case 'XGBRegressor' | 'CatBoostRegressor':\n                fit_kw.update(dict(verbose=0, eval_set=val_data))\n        model.fit_kw = fit_kw\n        model.predict_kw = predict_kw\n    ensemble.fit(X_train.iloc[i_train, :], y_train.iloc[i_train])\n```\n\nAnd start to make predictions with the ensemble:\n\n```\npredictions = ensemble.predict(X_test)\n```\n\n### Evaluation\n\nOnce `fit` has been called, `Clique` can be evaluated and pruned to improve performance. \n\nTo start, load your test data into the ensemble:\n\n```\nensemble.inputs = X_test\nensemble.targets = y_test\n```\n\nNote that once either `inputs` or `targets` is set, the ensemble will not accept assignments to the other attribute if the data length does not match (e.g., if `inputs` has 40 rows, setting `targets` with data for 39 rows will raise a `ValueError`).\n\nConsequently, the safest way to swap out testing data is through `reset_test_data`:\n\n```\nensemble.reset_test_data(inputs=X_test, targets=y_test)\n```\n\nWhich will first clear existing test data before assigning the new values (or clearing them, if no parameters are passed).\n\nThen, call `evaluate` to score your models:\n\n```\nensemble.evaluate()\n```\n\nWhich will populate the `mean_score`, `best_score`, and `best_model` properties for the ensemble:\n\n```\nensemble.           # <Clique (5 model(s); limit: none)>\nensemble.mean_score # 0.31446850398821063\nensemble.best_score # 0.033214874389494775\nensemble.best_model # <ModelProfile (CatBoostRegressor)>\n```\n\nAn enable the `prune` function to return a copy of the ensemble with all models scoring above the mean removed:\n\n```\ntriad = ensemble.prune(3) # <Clique (3 model(s); limit: 3)>\ntriad.mean_score          # 0.03373027543351623\n```\n\n### Deployment\n\nOnce trained and evaluated, the ensemble's models can be saved for later:\n\n```\nensemble.save('.models/')\n```\n\nWhich will save copies of each underlying model to be used elsewhere, or reloaded into another ensemble:\n\n```\nClique().load('.models/') # fresh ensemble with no duplications\nensemble.load('.models/') # will duplicate all models in the collection\ntriad.load('.models/')    # will duplicate any saved models still in the collection\n```\n\nAlso note that similar to `fit`, calls to `load` will enable evaluation and pruning, assuming all loaded models were previously trained.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A selective ensemble for predictive models that tests new additions to prevent downgrades in performance.",
    "version": "0.0.11",
    "project_urls": {
        "Source": "https://github.com/whitgroves/clique-ml"
    },
    "split_keywords": [
        "selective",
        " ensemble",
        " tensorflow",
        " maching learning",
        " cuda"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "acbaa7613ac9820c5cbd3c2cdd55b1d0f4a3737e9450a84a406b0cb08bbb4bf1",
                "md5": "58beac2cb8bd8b2aa0eb41ec2c838ef0",
                "sha256": "db3eeee4031e916523fff2669c7de42cbf737fa97aab0bef3212a7dd2f03e5c8"
            },
            "downloads": -1,
            "filename": "clique_ml-0.0.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "58beac2cb8bd8b2aa0eb41ec2c838ef0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 8998,
            "upload_time": "2025-07-23T05:25:10",
            "upload_time_iso_8601": "2025-07-23T05:25:10.834941Z",
            "url": "https://files.pythonhosted.org/packages/ac/ba/a7613ac9820c5cbd3c2cdd55b1d0f4a3737e9450a84a406b0cb08bbb4bf1/clique_ml-0.0.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "455b5e88ce443e7a4357c483ba95948f888f9e3a90e3ad953bb03222ef8099cf",
                "md5": "1728aa0bd3cf2587decd30dcaa2fd799",
                "sha256": "33a788571a751d78ef7c79f4b6a37c47c841ca27530e78af620613a29b762639"
            },
            "downloads": -1,
            "filename": "clique_ml-0.0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "1728aa0bd3cf2587decd30dcaa2fd799",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 11364,
            "upload_time": "2025-07-23T05:25:12",
            "upload_time_iso_8601": "2025-07-23T05:25:12.209764Z",
            "url": "https://files.pythonhosted.org/packages/45/5b/5e88ce443e7a4357c483ba95948f888f9e3a90e3ad953bb03222ef8099cf/clique_ml-0.0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-23 05:25:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "whitgroves",
    "github_project": "clique-ml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "clique-ml"
}
        
Elapsed time: 1.37569s