trexselector


Nametrexselector JSON
Version 0.6.18 PyPI version JSON
download
home_pagehttps://github.com/ArnauVilella/TRexSelector-python
SummaryT-Rex Selector: High-Dimensional Variable Selection & FDR Control
upload_time2025-09-08 17:34:44
maintainerNone
docs_urlNone
authorPython port by Arnau Vilella
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements numpy scipy scikit-learn tlars joblib matplotlib pandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TRexSelector-Python (trexselector)

A Python port of the [TRexSelector](https://github.com/jasinmachkour/TRexSelector) R package for high-dimensional variable selection with false discovery rate (FDR) control.

## Overview

TRexSelector performs fast variable selection in high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level. The package implements the Terminating-Random Experiments Selector (T-Rex) as described in [Machkour, Muma, and Palomar (2025)](https://doi.org/10.1016/j.sigpro.2025.109894).

This Python package provides a port of the original R implementation, maintaining the same functionality while providing a more Pythonic interface. The Python port was created by Arnau Vilella (avp@connect.ust.hk).

## Installation

### Requirements

- Python >= 3.8
- numpy
- scipy >= 1.7.0
- scikit-learn >= 1.0.0
- tlars
- joblib >= 1.0.0
- matplotlib >= 3.4.0
- pandas >= 1.3.0

For manylinux (most Linux distributions), macOS, and Windows platforms, all dependencies including `tlars` will be installed automatically when installing the package. For other systems, you might need to build the `tlars` package from source.

```bash
pip install trexselector==0.6.18
```

## Usage

```python
import numpy as np
from trexselector import trex, generate_gaussian_data

# Generate some example data
X, y, beta = generate_gaussian_data(n=100, p=20, seed=1234)

# Run the T-Rex selector
res = trex(X=X, y=y)

# Get the selected variables
selected_var = res["selected_var"]
print(f"Selected variables: {selected_var}")
```

## Library Reference

### Main Functions

#### trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, max_T_stop=True, method="trex", ...)

The main function for high-dimensional variable selection with FDR control.

- **X**: ndarray - Predictor matrix of shape (n, p).
- **y**: ndarray - Response vector of shape (n,).
- **tFDR**: float - Target FDR level (between 0 and 1).
- **K**: int - Number of random experiments.
- **max_num_dummies**: int - Factor determining maximum number of dummies.
- **max_T_stop**: bool - If True, maximum number of included dummies is set to ceiling(n/2).
- **method**: str - Method to use ('trex', 'trex+GVS', 'trex+DA+AR1', 'trex+DA+equi', 'trex+DA+BT', 'trex+DA+NN').
- **Returns**: dict - Contains selected variables and additional information.

#### screen_trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, ...)

Screening variant of T-Rex for ultra-high dimensional datasets.

- **X**, **y**, **tFDR**, **K**: Same as trex().
- **q**: int - Number of variables to select in each split.
- **num_splits**: int - Number of splits of the original problem.
- **Returns**: dict - Contains selected variables and screening information.

#### random_experiments(X, y, K=20, T_stop=1, num_dummies=None, ...)

Run K random experiments with the T-Rex selector.

- **X**, **y**, **K**: Same as trex().
- **T_stop**: int - Number of included dummies before stopping.
- **num_dummies**: int - Number of dummies to append.
- **parallel_process**: bool - If True, experiments run in parallel.
- **Returns**: dict - Contains experiment results and statistics.

### Helper Functions

#### add_dummies(X, num_dummies)

Add random dummy variables to the predictor matrix.

- **X**: ndarray - Predictor matrix.
- **num_dummies**: int - Number of dummies to append.
- **Returns**: ndarray - Matrix with appended dummies.

#### add_dummies_GVS(X, num_dummies, corr_max=0.5)

Add dummy variables with correlation constraints for group variable selection.

- **X**: ndarray - Predictor matrix.
- **num_dummies**: int - Number of dummies to append.
- **corr_max**: float - Maximum allowed correlation between predictors.
- **Returns**: dict - Contains matrix with dummies and group information.

#### FDP(beta_hat, beta)

Compute the false discovery proportion.

- **beta_hat**: ndarray - Estimated coefficient vector.
- **beta**: ndarray - True coefficient vector.
- **Returns**: float - False discovery proportion.

#### TPP(beta_hat, beta)

Compute the true positive proportion.

- **beta_hat**: ndarray - Estimated coefficient vector.
- **beta**: ndarray - True coefficient vector.
- **Returns**: float - True positive proportion.

#### generate_gaussian_data(n=50, p=100, seed=789)

Generate synthetic Gaussian data for testing.

- **n**: int - Number of observations.
- **p**: int - Number of variables.
- **seed**: int - Random seed.
- **Returns**: tuple - (X, y, beta) containing predictor matrix, response, and true coefficients.

#### fdp_hat(V, Phi, Phi_prime)

Compute the estimated FDP for a set of voting thresholds.

- **V**: ndarray - Voting thresholds.
- **Phi**: ndarray - Vector of relative occurrences.
- **Phi_prime**: ndarray - Vector of expected relative occurrences.
- **Returns**: ndarray - Estimated FDP for each voting threshold.

#### Phi_prime_fun(p, T_stop, num_dummies, phi_T_mat, Phi)

Compute the expected relative occurrences for the T-Rex selector.

- **p**: int - Number of variables.
- **T_stop**: int - Number of included dummies before stopping.
- **num_dummies**: int - Number of dummies appended.
- **phi_T_mat**: ndarray - Matrix of relative occurrences.
- **Phi**: ndarray - Vector of relative occurrences.
- **Returns**: ndarray - Vector of expected relative occurrences.

#### select_var_fun(p, tFDR, T_stop, FDP_hat_mat, Phi_mat, V)

Select variables based on estimated FDP and voting thresholds for basic T-Rex variants.

- **p**: int - Number of variables.
- **tFDR**: float - Target FDR level.
- **T_stop**: int - Number of included dummies before stopping.
- **FDP_hat_mat**: ndarray - Matrix of estimated FDP values.
- **Phi_mat**: ndarray - Matrix of relative occurrences.
- **V**: ndarray - Voting thresholds.
- **Returns**: dict - Contains selected variables and selection information.

#### select_var_fun_DA_BT(p, tFDR, T_stop, FDP_hat_array_BT, Phi_array_BT, V, rho_grid)

Select variables based on estimated FDP and voting thresholds for dependency-aware T-Rex variants.

- **p**: int - Number of variables.
- **tFDR**: float - Target FDR level.
- **T_stop**: int - Number of included dummies before stopping.
- **FDP_hat_array_BT**: ndarray - Array of estimated FDP values.
- **Phi_array_BT**: ndarray - Array of relative occurrences.
- **V**: ndarray - Voting thresholds.
- **rho_grid**: ndarray - Grid of correlation thresholds.
- **Returns**: dict - Contains selected variables and selection information.

### Advanced Features

The package supports several variants of the T-Rex selector:

- **Basic T-Rex**: Standard variable selection with FDR control
- **T-Rex+GVS**: Group variable selection using correlation structure
- **T-Rex+DA variants**: Dependency-aware variants
  - AR1: Using AR(1) correlation structure
  - Equi: Using equicorrelation structure
  - BT: Using binary tree structure
  - NN: Using nearest neighbor structure

## References

- Machkour, J., Muma, M., & Palomar, D. P. (2025). The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control. Signal Processing, 231, 109894.

## License

This package is licensed under the GNU General Public License v3.0 (GPL-3.0).

## Acknowledgments

The original R package [TRexSelector](https://github.com/jasinmachkour/TRexSelector) was created by Jasin Machkour, Simon Tien, Daniel P. Palomar, and Michael Muma. This Python port was developed by Arnau Vilella (avp@connect.ust.hk).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ArnauVilella/TRexSelector-python",
    "name": "trexselector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Python port by Arnau Vilella",
    "author_email": "avp@connect.ust.hk",
    "download_url": "https://files.pythonhosted.org/packages/81/cc/7c68a072c13a15e4452c0693e11349adcb0ae72bba78670a094b39c84e2d/trexselector-0.6.18.tar.gz",
    "platform": null,
    "description": "# TRexSelector-Python (trexselector)\n\nA Python port of the [TRexSelector](https://github.com/jasinmachkour/TRexSelector) R package for high-dimensional variable selection with false discovery rate (FDR) control.\n\n## Overview\n\nTRexSelector performs fast variable selection in high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level. The package implements the Terminating-Random Experiments Selector (T-Rex) as described in [Machkour, Muma, and Palomar (2025)](https://doi.org/10.1016/j.sigpro.2025.109894).\n\nThis Python package provides a port of the original R implementation, maintaining the same functionality while providing a more Pythonic interface. The Python port was created by Arnau Vilella (avp@connect.ust.hk).\n\n## Installation\n\n### Requirements\n\n- Python >= 3.8\n- numpy\n- scipy >= 1.7.0\n- scikit-learn >= 1.0.0\n- tlars\n- joblib >= 1.0.0\n- matplotlib >= 3.4.0\n- pandas >= 1.3.0\n\nFor manylinux (most Linux distributions), macOS, and Windows platforms, all dependencies including `tlars` will be installed automatically when installing the package. For other systems, you might need to build the `tlars` package from source.\n\n```bash\npip install trexselector==0.6.18\n```\n\n## Usage\n\n```python\nimport numpy as np\nfrom trexselector import trex, generate_gaussian_data\n\n# Generate some example data\nX, y, beta = generate_gaussian_data(n=100, p=20, seed=1234)\n\n# Run the T-Rex selector\nres = trex(X=X, y=y)\n\n# Get the selected variables\nselected_var = res[\"selected_var\"]\nprint(f\"Selected variables: {selected_var}\")\n```\n\n## Library Reference\n\n### Main Functions\n\n#### trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, max_T_stop=True, method=\"trex\", ...)\n\nThe main function for high-dimensional variable selection with FDR control.\n\n- **X**: ndarray - Predictor matrix of shape (n, p).\n- **y**: ndarray - Response vector of shape (n,).\n- **tFDR**: float - Target FDR level (between 0 and 1).\n- **K**: int - Number of random experiments.\n- **max_num_dummies**: int - Factor determining maximum number of dummies.\n- **max_T_stop**: bool - If True, maximum number of included dummies is set to ceiling(n/2).\n- **method**: str - Method to use ('trex', 'trex+GVS', 'trex+DA+AR1', 'trex+DA+equi', 'trex+DA+BT', 'trex+DA+NN').\n- **Returns**: dict - Contains selected variables and additional information.\n\n#### screen_trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, ...)\n\nScreening variant of T-Rex for ultra-high dimensional datasets.\n\n- **X**, **y**, **tFDR**, **K**: Same as trex().\n- **q**: int - Number of variables to select in each split.\n- **num_splits**: int - Number of splits of the original problem.\n- **Returns**: dict - Contains selected variables and screening information.\n\n#### random_experiments(X, y, K=20, T_stop=1, num_dummies=None, ...)\n\nRun K random experiments with the T-Rex selector.\n\n- **X**, **y**, **K**: Same as trex().\n- **T_stop**: int - Number of included dummies before stopping.\n- **num_dummies**: int - Number of dummies to append.\n- **parallel_process**: bool - If True, experiments run in parallel.\n- **Returns**: dict - Contains experiment results and statistics.\n\n### Helper Functions\n\n#### add_dummies(X, num_dummies)\n\nAdd random dummy variables to the predictor matrix.\n\n- **X**: ndarray - Predictor matrix.\n- **num_dummies**: int - Number of dummies to append.\n- **Returns**: ndarray - Matrix with appended dummies.\n\n#### add_dummies_GVS(X, num_dummies, corr_max=0.5)\n\nAdd dummy variables with correlation constraints for group variable selection.\n\n- **X**: ndarray - Predictor matrix.\n- **num_dummies**: int - Number of dummies to append.\n- **corr_max**: float - Maximum allowed correlation between predictors.\n- **Returns**: dict - Contains matrix with dummies and group information.\n\n#### FDP(beta_hat, beta)\n\nCompute the false discovery proportion.\n\n- **beta_hat**: ndarray - Estimated coefficient vector.\n- **beta**: ndarray - True coefficient vector.\n- **Returns**: float - False discovery proportion.\n\n#### TPP(beta_hat, beta)\n\nCompute the true positive proportion.\n\n- **beta_hat**: ndarray - Estimated coefficient vector.\n- **beta**: ndarray - True coefficient vector.\n- **Returns**: float - True positive proportion.\n\n#### generate_gaussian_data(n=50, p=100, seed=789)\n\nGenerate synthetic Gaussian data for testing.\n\n- **n**: int - Number of observations.\n- **p**: int - Number of variables.\n- **seed**: int - Random seed.\n- **Returns**: tuple - (X, y, beta) containing predictor matrix, response, and true coefficients.\n\n#### fdp_hat(V, Phi, Phi_prime)\n\nCompute the estimated FDP for a set of voting thresholds.\n\n- **V**: ndarray - Voting thresholds.\n- **Phi**: ndarray - Vector of relative occurrences.\n- **Phi_prime**: ndarray - Vector of expected relative occurrences.\n- **Returns**: ndarray - Estimated FDP for each voting threshold.\n\n#### Phi_prime_fun(p, T_stop, num_dummies, phi_T_mat, Phi)\n\nCompute the expected relative occurrences for the T-Rex selector.\n\n- **p**: int - Number of variables.\n- **T_stop**: int - Number of included dummies before stopping.\n- **num_dummies**: int - Number of dummies appended.\n- **phi_T_mat**: ndarray - Matrix of relative occurrences.\n- **Phi**: ndarray - Vector of relative occurrences.\n- **Returns**: ndarray - Vector of expected relative occurrences.\n\n#### select_var_fun(p, tFDR, T_stop, FDP_hat_mat, Phi_mat, V)\n\nSelect variables based on estimated FDP and voting thresholds for basic T-Rex variants.\n\n- **p**: int - Number of variables.\n- **tFDR**: float - Target FDR level.\n- **T_stop**: int - Number of included dummies before stopping.\n- **FDP_hat_mat**: ndarray - Matrix of estimated FDP values.\n- **Phi_mat**: ndarray - Matrix of relative occurrences.\n- **V**: ndarray - Voting thresholds.\n- **Returns**: dict - Contains selected variables and selection information.\n\n#### select_var_fun_DA_BT(p, tFDR, T_stop, FDP_hat_array_BT, Phi_array_BT, V, rho_grid)\n\nSelect variables based on estimated FDP and voting thresholds for dependency-aware T-Rex variants.\n\n- **p**: int - Number of variables.\n- **tFDR**: float - Target FDR level.\n- **T_stop**: int - Number of included dummies before stopping.\n- **FDP_hat_array_BT**: ndarray - Array of estimated FDP values.\n- **Phi_array_BT**: ndarray - Array of relative occurrences.\n- **V**: ndarray - Voting thresholds.\n- **rho_grid**: ndarray - Grid of correlation thresholds.\n- **Returns**: dict - Contains selected variables and selection information.\n\n### Advanced Features\n\nThe package supports several variants of the T-Rex selector:\n\n- **Basic T-Rex**: Standard variable selection with FDR control\n- **T-Rex+GVS**: Group variable selection using correlation structure\n- **T-Rex+DA variants**: Dependency-aware variants\n  - AR1: Using AR(1) correlation structure\n  - Equi: Using equicorrelation structure\n  - BT: Using binary tree structure\n  - NN: Using nearest neighbor structure\n\n## References\n\n- Machkour, J., Muma, M., & Palomar, D. P. (2025). The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control. Signal Processing, 231, 109894.\n\n## License\n\nThis package is licensed under the GNU General Public License v3.0 (GPL-3.0).\n\n## Acknowledgments\n\nThe original R package [TRexSelector](https://github.com/jasinmachkour/TRexSelector) was created by Jasin Machkour, Simon Tien, Daniel P. Palomar, and Michael Muma. This Python port was developed by Arnau Vilella (avp@connect.ust.hk).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "T-Rex Selector: High-Dimensional Variable Selection & FDR Control",
    "version": "0.6.18",
    "project_urls": {
        "Homepage": "https://github.com/ArnauVilella/TRexSelector-python"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9f377154434e99198be34e2f5636247808cb7f3aa1f75bcb815926ea640efff5",
                "md5": "ae822339c25008baa8bfefa4850eab24",
                "sha256": "d4ba78b654f678ad2967794ad91263a894d3a3de1cd755c3d811ba2c9627eef0"
            },
            "downloads": -1,
            "filename": "trexselector-0.6.18-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ae822339c25008baa8bfefa4850eab24",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 45540,
            "upload_time": "2025-09-08T17:34:43",
            "upload_time_iso_8601": "2025-09-08T17:34:43.628837Z",
            "url": "https://files.pythonhosted.org/packages/9f/37/7154434e99198be34e2f5636247808cb7f3aa1f75bcb815926ea640efff5/trexselector-0.6.18-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "81cc7c68a072c13a15e4452c0693e11349adcb0ae72bba78670a094b39c84e2d",
                "md5": "356021f17a9f76ca434cd092887f0b0e",
                "sha256": "6771cb23e1dc5474cb03d16106e551e9a77407896f4eaaeab6f59db11a5a231b"
            },
            "downloads": -1,
            "filename": "trexselector-0.6.18.tar.gz",
            "has_sig": false,
            "md5_digest": "356021f17a9f76ca434cd092887f0b0e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 35972,
            "upload_time": "2025-09-08T17:34:44",
            "upload_time_iso_8601": "2025-09-08T17:34:44.516625Z",
            "url": "https://files.pythonhosted.org/packages/81/cc/7c68a072c13a15e4452c0693e11349adcb0ae72bba78670a094b39c84e2d/trexselector-0.6.18.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 17:34:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ArnauVilella",
    "github_project": "TRexSelector-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "tlars",
            "specs": []
        },
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        }
    ],
    "lcname": "trexselector"
}
        
Elapsed time: 9.68013s