combss


Namecombss JSON
Version 1.0.2 PyPI version JSON
download
home_pageNone
SummaryA package implementation of COMBSS, a novel continuous optimisation method toward best subset selection
upload_time2024-11-18 11:41:19
maintainerNone
docs_urlNone
authorSarat Moka, Hua Yang Hu
requires_pythonNone
licenseNone
keywords mathematics optimization subset selection
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # COMBSS
This is the package for COMBSS, a novel continuous optimisation method toward best subset selection, developed from the paper Moka et al. (2024).

For a more detaled overview of COMBSS, refer to https://link.springer.com/article/10.1007/s11222-024-10387-8.

## Dependencies

This package relies on the following libraries:

- `numpy` (version 1.21.0 or later): Numerical computing.
- `scipy` (version 1.7.0 or later): Sparse matrix operations and linear algebra.
- `scikit-learn` (version 1.0.0 or later): Machine learning and evaluation metrics.

These will be installed automatically if you install the package via `pip`. Alternatively, they can also be installed manually.

# COMBSS Installation and Usage Guide

## Installation

Users can install **COMBSS** using the `pip` command-line tool:

```bash
pip install combss
```

## Usage Guide
For demonstrative purposes, we apply COMBSS on a dataset created beforehand, with X_train, y_train, X_test, y_test generated from a 80-20 train-test split prior to this example.

### Importing COMBSS

To import **COMBSS** after installation, use the following command:

```python
import combss
```

COMBSS is implemented as a class named `model` within the `linear` module. Users can instantiate an instance of the `model` class to utilize its methods:

```python
# Instantiating an instance of the combss class
optimiser = combss.linear.model()
```

### Fitting the Model

To use COMBSS for best subset selection, call the `fit` method within the `model` class. Here are some commonly used arguments:

- **q**: Maximum subset size. Defaults to min(number of observations, number of predictors).
- **nlam**: Number of λ values in the dynamic grid. Default is 50.
- **scaling**: Boolean to enable feature scaling. Default is `False`.

Example usage 1:

```python
# A sample usage of the commonly used arguments
optimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q=8, nlam=20, scaling=True)
```

### Additional Fitting Arguments

Other arguments include:

- **t_init**: Initial point for the vector t.
- **tau**: Threshold parameter for subset mapping.
- **delta_frac**: Value of δ/n in the objective function.
- **eta**: Truncation parameter during gradient descent.
- **patience**: Number of iterations before termination.
- **gd_maxiter**: Maximum iterations for gradient descent.
- **gd_tol**: Tolerance for gradient descent.
- **cg_maxiter**: Maximum iterations for the conjugate gradient algorithm.
- **cg_tol**: Tolerance for the conjugate gradient algorithm.

Modified usage example 2:

```python
# A modified usage of the fit method
optimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q=10, nlam=50, scaling=True, tau=0.9, delta_frac=20)
```

### Model Attributes

After fitting, the following attributes can be accessed:

- **subset**: Indices of the optimal subset.
- **mse**: Mean squared error on test data.
- **coef_**: Coefficients of the linear model.
- **lambda_**: Optimal λ value.
- **run_time**: Time taken for fitting.
- **lambda_list**: List of λ values explored.
- **subset_list**: Subsets obtained for each λ.

Example:

```python
optimiser.subset
# Output: array([0, 1, 2, 3, 4, 6, 7, 8])

optimiser.mse
# Output: 19.94
```

## Illustrative Examples

### Example 1

```python
# A sample usage of the commonly used arguments
optimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q = 8, nlam = 20, scaling=True)

optimiser.subset
# array([0, 1, 2, 3, 4, 6, 7, 8])

optimiser.mse
# 19.940929997277212

optimiser.coef_
# array([ 0.85215,  1.50009,  0.39557,  2.3919,  -0.56994,
#         0.     ,  2.6758 ,  0.72726,  1.70696,  0.        ,
#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ,
#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ])

optimiser.lambda_
# 0.6401161339265333

optimiser.run_time
# 2.591932

optimiser.lambda_list
# [65.54789211407702,
# 32.77394605703851,
# 16.386973028519254,
# .
# .
# 0.5120929071412267,
# 0.6401161339265333]

optimiser.subset_list
# [array([], dtype=int64),
# array([], dtype=int64),
# array([], dtype=int64),
# .
# .
# array([0, 1, 2, 3, 4, 6, 7, 8]),
# array([0, 1, 2, 3, 4, 6, 7, 8])]
```
One can observe that a model of size q = 8 was recovered from the training data after approximately 2.59 seconds. The recovered model with elements of indices in the optimiser.subset array achieved a mean squared error of approximately 19.94 on the test data, after a series of up to nlam = 50 values of λ were explored in the dynamic grid search, starting with an null model explored when COMBSS was initialised with λ approximately equal to 65.548. 

One can additionally observe the following output after performing the fitting in the modified code example 2. In this setting, q is instead taken to equal 10, exploring 50 values of λ with feature scaling, a more stringent thresholding value of 𝜏 = 0.9, and taking the fraction delta/n for the objective function equal to 20. All other arguments take their default values.

### Example 2

```python
# A sample usage of additional arguments
combssOptimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q = 10, nlam = 50, scaling=True, tau = 0.9, delta_frac = 20)

optimiser.subset
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

optimiser.mse
# 19.50638319557191

optimiser.coef_
# array([ 0.76678,  1.51074,  0.49312,  2.45588,  -0.69150,
#         0.13782,  2.43072,  0.89641,  0.88130,  1.13421 ,  
#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ,
#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ])

optimiser.lambda_
# 0.022003992103724584

optimiser.run_time
# 5.400080000000001

optimiser.lambda_list
# [65.54789211407702,
# 32.77394605703851,
# 16.386973028519254,
# .
# .
# 0.020003629185204166,
# 0.016002903348163334]

optimiser.subset_list
# [array([], dtype=int64),
# array([], dtype=int64),
# array([], dtype=int64),
# .
# .
# array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
# array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])]
```

One can observe that the changes to tau, delta_frac and nlam result in different values of lambda being explored, with a different navigation of subsets as the threshold parameter 𝜏 is increased in the subset mapping process, and the landscape of the objective function is changed. Consequently, an additional predictor from the true model is recovered at the expense of a larger computational cost.



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "combss",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "mathematics, optimization, subset selection",
    "author": "Sarat Moka, Hua Yang Hu",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/f4/87/5049c5f3240ed56c8ac6e4375e127d3c50b95306f0b418492d791921a2b4/combss-1.0.2.tar.gz",
    "platform": null,
    "description": "# COMBSS\nThis is the package for COMBSS, a novel continuous optimisation method toward best subset selection, developed from the paper Moka et al. (2024).\n\nFor a more detaled overview of COMBSS, refer to https://link.springer.com/article/10.1007/s11222-024-10387-8.\n\n## Dependencies\n\nThis package relies on the following libraries:\n\n- `numpy` (version 1.21.0 or later): Numerical computing.\n- `scipy` (version 1.7.0 or later): Sparse matrix operations and linear algebra.\n- `scikit-learn` (version 1.0.0 or later): Machine learning and evaluation metrics.\n\nThese will be installed automatically if you install the package via `pip`. Alternatively, they can also be installed manually.\n\n# COMBSS Installation and Usage Guide\n\n## Installation\n\nUsers can install **COMBSS** using the `pip` command-line tool:\n\n```bash\npip install combss\n```\n\n## Usage Guide\nFor demonstrative purposes, we apply COMBSS on a dataset created beforehand, with X_train, y_train, X_test, y_test generated from a 80-20 train-test split prior to this example.\n\n### Importing COMBSS\n\nTo import **COMBSS** after installation, use the following command:\n\n```python\nimport combss\n```\n\nCOMBSS is implemented as a class named `model` within the `linear` module. Users can instantiate an instance of the `model` class to utilize its methods:\n\n```python\n# Instantiating an instance of the combss class\noptimiser = combss.linear.model()\n```\n\n### Fitting the Model\n\nTo use COMBSS for best subset selection, call the `fit` method within the `model` class. Here are some commonly used arguments:\n\n- **q**: Maximum subset size. Defaults to min(number of observations, number of predictors).\n- **nlam**: Number of \u03bb values in the dynamic grid. Default is 50.\n- **scaling**: Boolean to enable feature scaling. Default is `False`.\n\nExample usage 1:\n\n```python\n# A sample usage of the commonly used arguments\noptimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q=8, nlam=20, scaling=True)\n```\n\n### Additional Fitting Arguments\n\nOther arguments include:\n\n- **t_init**: Initial point for the vector t.\n- **tau**: Threshold parameter for subset mapping.\n- **delta_frac**: Value of \u03b4/n in the objective function.\n- **eta**: Truncation parameter during gradient descent.\n- **patience**: Number of iterations before termination.\n- **gd_maxiter**: Maximum iterations for gradient descent.\n- **gd_tol**: Tolerance for gradient descent.\n- **cg_maxiter**: Maximum iterations for the conjugate gradient algorithm.\n- **cg_tol**: Tolerance for the conjugate gradient algorithm.\n\nModified usage example 2:\n\n```python\n# A modified usage of the fit method\noptimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q=10, nlam=50, scaling=True, tau=0.9, delta_frac=20)\n```\n\n### Model Attributes\n\nAfter fitting, the following attributes can be accessed:\n\n- **subset**: Indices of the optimal subset.\n- **mse**: Mean squared error on test data.\n- **coef_**: Coefficients of the linear model.\n- **lambda_**: Optimal \u03bb value.\n- **run_time**: Time taken for fitting.\n- **lambda_list**: List of \u03bb values explored.\n- **subset_list**: Subsets obtained for each \u03bb.\n\nExample:\n\n```python\noptimiser.subset\n# Output: array([0, 1, 2, 3, 4, 6, 7, 8])\n\noptimiser.mse\n# Output: 19.94\n```\n\n## Illustrative Examples\n\n### Example 1\n\n```python\n# A sample usage of the commonly used arguments\noptimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q = 8, nlam = 20, scaling=True)\n\noptimiser.subset\n# array([0, 1, 2, 3, 4, 6, 7, 8])\n\noptimiser.mse\n# 19.940929997277212\n\noptimiser.coef_\n# array([ 0.85215,  1.50009,  0.39557,  2.3919,  -0.56994,\n#         0.     ,  2.6758 ,  0.72726,  1.70696,  0.        ,\n#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ,\n#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ])\n\noptimiser.lambda_\n# 0.6401161339265333\n\noptimiser.run_time\n# 2.591932\n\noptimiser.lambda_list\n# [65.54789211407702,\n# 32.77394605703851,\n# 16.386973028519254,\n# .\n# .\n# 0.5120929071412267,\n# 0.6401161339265333]\n\noptimiser.subset_list\n# [array([], dtype=int64),\n# array([], dtype=int64),\n# array([], dtype=int64),\n# .\n# .\n# array([0, 1, 2, 3, 4, 6, 7, 8]),\n# array([0, 1, 2, 3, 4, 6, 7, 8])]\n```\nOne can observe that a model of size q = 8 was recovered from the training data after approximately 2.59 seconds. The recovered model with elements of indices in the optimiser.subset array achieved a mean squared error of approximately 19.94 on the test data, after a series of up to nlam = 50 values of \u03bb were explored in the dynamic grid search, starting with an null model explored when COMBSS was initialised with \u03bb approximately equal to 65.548. \n\nOne can additionally observe the following output after performing the fitting in the modified code example 2. In this setting, q is instead taken to equal 10, exploring 50 values of \u03bb with feature scaling, a more stringent thresholding value of \ud835\udf0f = 0.9, and taking the fraction delta/n for the objective function equal to 20. All other arguments take their default values.\n\n### Example 2\n\n```python\n# A sample usage of additional arguments\ncombssOptimiser.fit(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, q = 10, nlam = 50, scaling=True, tau = 0.9, delta_frac = 20)\n\noptimiser.subset\n# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n\noptimiser.mse\n# 19.50638319557191\n\noptimiser.coef_\n# array([ 0.76678,  1.51074,  0.49312,  2.45588,  -0.69150,\n#         0.13782,  2.43072,  0.89641,  0.88130,  1.13421 ,  \n#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ,\n#         0.     ,  0.     ,  0.     ,  0.     ,  0.        ])\n\noptimiser.lambda_\n# 0.022003992103724584\n\noptimiser.run_time\n# 5.400080000000001\n\noptimiser.lambda_list\n# [65.54789211407702,\n# 32.77394605703851,\n# 16.386973028519254,\n# .\n# .\n# 0.020003629185204166,\n# 0.016002903348163334]\n\noptimiser.subset_list\n# [array([], dtype=int64),\n# array([], dtype=int64),\n# array([], dtype=int64),\n# .\n# .\n# array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),\n# array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])]\n```\n\nOne can observe that the changes to tau, delta_frac and nlam result in different values of lambda being explored, with a different navigation of subsets as the threshold parameter \ud835\udf0f is increased in the subset mapping process, and the landscape of the objective function is changed. Consequently, an additional predictor from the true model is recovered at the expense of a larger computational cost.\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A package implementation of COMBSS, a novel continuous optimisation method toward best subset selection",
    "version": "1.0.2",
    "project_urls": null,
    "split_keywords": [
        "mathematics",
        " optimization",
        " subset selection"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a4130367d027e5f0356c6eb8c1d07836366affba7c0f81387d25d3d9e2add58",
                "md5": "f6c0c8368bb85b6e179ab0dd25bc9ee8",
                "sha256": "3cdeba831eb592c1839dc587e458f14c9912372d07dd7ad18aafcc29c03141f3"
            },
            "downloads": -1,
            "filename": "combss-1.0.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f6c0c8368bb85b6e179ab0dd25bc9ee8",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 13931,
            "upload_time": "2024-11-18T11:41:18",
            "upload_time_iso_8601": "2024-11-18T11:41:18.728254Z",
            "url": "https://files.pythonhosted.org/packages/5a/41/30367d027e5f0356c6eb8c1d07836366affba7c0f81387d25d3d9e2add58/combss-1.0.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f4875049c5f3240ed56c8ac6e4375e127d3c50b95306f0b418492d791921a2b4",
                "md5": "86e4306a5a71e0dffb7f6bff3d8149d6",
                "sha256": "37184672cc1e1b4a7a02a6b9c31bacfd46344a1d51f66cfab90167ce1cb15cbe"
            },
            "downloads": -1,
            "filename": "combss-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "86e4306a5a71e0dffb7f6bff3d8149d6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14015,
            "upload_time": "2024-11-18T11:41:19",
            "upload_time_iso_8601": "2024-11-18T11:41:19.832733Z",
            "url": "https://files.pythonhosted.org/packages/f4/87/5049c5f3240ed56c8ac6e4375e127d3c50b95306f0b418492d791921a2b4/combss-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-18 11:41:19",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "combss"
}
        
Elapsed time: 7.29990s