# TRexSelector-Python (trexselector)
A Python port of the [TRexSelector](https://github.com/jasinmachkour/TRexSelector) R package for high-dimensional variable selection with false discovery rate (FDR) control.
## Overview
TRexSelector performs fast variable selection in high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level. The package implements the Terminating-Random Experiments Selector (T-Rex) as described in [Machkour, Muma, and Palomar (2025)](https://doi.org/10.1016/j.sigpro.2025.109894).
This Python package provides a port of the original R implementation, maintaining the same functionality while providing a more Pythonic interface. The Python port was created by Arnau Vilella (avp@connect.ust.hk).
## Installation
### Requirements
- Python >= 3.8
- numpy
- scipy >= 1.7.0
- scikit-learn >= 1.0.0
- tlars
- joblib >= 1.0.0
- matplotlib >= 3.4.0
- pandas >= 1.3.0
For manylinux (most Linux distributions), macOS, and Windows platforms, all dependencies including `tlars` will be installed automatically when installing the package. For other systems, you might need to build the `tlars` package from source.
```bash
pip install trexselector==0.6.18
```
## Usage
```python
import numpy as np
from trexselector import trex, generate_gaussian_data
# Generate some example data
X, y, beta = generate_gaussian_data(n=100, p=20, seed=1234)
# Run the T-Rex selector
res = trex(X=X, y=y)
# Get the selected variables
selected_var = res["selected_var"]
print(f"Selected variables: {selected_var}")
```
## Library Reference
### Main Functions
#### trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, max_T_stop=True, method="trex", ...)
The main function for high-dimensional variable selection with FDR control.
- **X**: ndarray - Predictor matrix of shape (n, p).
- **y**: ndarray - Response vector of shape (n,).
- **tFDR**: float - Target FDR level (between 0 and 1).
- **K**: int - Number of random experiments.
- **max_num_dummies**: int - Factor determining maximum number of dummies.
- **max_T_stop**: bool - If True, maximum number of included dummies is set to ceiling(n/2).
- **method**: str - Method to use ('trex', 'trex+GVS', 'trex+DA+AR1', 'trex+DA+equi', 'trex+DA+BT', 'trex+DA+NN').
- **Returns**: dict - Contains selected variables and additional information.
#### screen_trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, ...)
Screening variant of T-Rex for ultra-high dimensional datasets.
- **X**, **y**, **tFDR**, **K**: Same as trex().
- **q**: int - Number of variables to select in each split.
- **num_splits**: int - Number of splits of the original problem.
- **Returns**: dict - Contains selected variables and screening information.
#### random_experiments(X, y, K=20, T_stop=1, num_dummies=None, ...)
Run K random experiments with the T-Rex selector.
- **X**, **y**, **K**: Same as trex().
- **T_stop**: int - Number of included dummies before stopping.
- **num_dummies**: int - Number of dummies to append.
- **parallel_process**: bool - If True, experiments run in parallel.
- **Returns**: dict - Contains experiment results and statistics.
### Helper Functions
#### add_dummies(X, num_dummies)
Add random dummy variables to the predictor matrix.
- **X**: ndarray - Predictor matrix.
- **num_dummies**: int - Number of dummies to append.
- **Returns**: ndarray - Matrix with appended dummies.
#### add_dummies_GVS(X, num_dummies, corr_max=0.5)
Add dummy variables with correlation constraints for group variable selection.
- **X**: ndarray - Predictor matrix.
- **num_dummies**: int - Number of dummies to append.
- **corr_max**: float - Maximum allowed correlation between predictors.
- **Returns**: dict - Contains matrix with dummies and group information.
#### FDP(beta_hat, beta)
Compute the false discovery proportion.
- **beta_hat**: ndarray - Estimated coefficient vector.
- **beta**: ndarray - True coefficient vector.
- **Returns**: float - False discovery proportion.
#### TPP(beta_hat, beta)
Compute the true positive proportion.
- **beta_hat**: ndarray - Estimated coefficient vector.
- **beta**: ndarray - True coefficient vector.
- **Returns**: float - True positive proportion.
#### generate_gaussian_data(n=50, p=100, seed=789)
Generate synthetic Gaussian data for testing.
- **n**: int - Number of observations.
- **p**: int - Number of variables.
- **seed**: int - Random seed.
- **Returns**: tuple - (X, y, beta) containing predictor matrix, response, and true coefficients.
#### fdp_hat(V, Phi, Phi_prime)
Compute the estimated FDP for a set of voting thresholds.
- **V**: ndarray - Voting thresholds.
- **Phi**: ndarray - Vector of relative occurrences.
- **Phi_prime**: ndarray - Vector of expected relative occurrences.
- **Returns**: ndarray - Estimated FDP for each voting threshold.
#### Phi_prime_fun(p, T_stop, num_dummies, phi_T_mat, Phi)
Compute the expected relative occurrences for the T-Rex selector.
- **p**: int - Number of variables.
- **T_stop**: int - Number of included dummies before stopping.
- **num_dummies**: int - Number of dummies appended.
- **phi_T_mat**: ndarray - Matrix of relative occurrences.
- **Phi**: ndarray - Vector of relative occurrences.
- **Returns**: ndarray - Vector of expected relative occurrences.
#### select_var_fun(p, tFDR, T_stop, FDP_hat_mat, Phi_mat, V)
Select variables based on estimated FDP and voting thresholds for basic T-Rex variants.
- **p**: int - Number of variables.
- **tFDR**: float - Target FDR level.
- **T_stop**: int - Number of included dummies before stopping.
- **FDP_hat_mat**: ndarray - Matrix of estimated FDP values.
- **Phi_mat**: ndarray - Matrix of relative occurrences.
- **V**: ndarray - Voting thresholds.
- **Returns**: dict - Contains selected variables and selection information.
#### select_var_fun_DA_BT(p, tFDR, T_stop, FDP_hat_array_BT, Phi_array_BT, V, rho_grid)
Select variables based on estimated FDP and voting thresholds for dependency-aware T-Rex variants.
- **p**: int - Number of variables.
- **tFDR**: float - Target FDR level.
- **T_stop**: int - Number of included dummies before stopping.
- **FDP_hat_array_BT**: ndarray - Array of estimated FDP values.
- **Phi_array_BT**: ndarray - Array of relative occurrences.
- **V**: ndarray - Voting thresholds.
- **rho_grid**: ndarray - Grid of correlation thresholds.
- **Returns**: dict - Contains selected variables and selection information.
### Advanced Features
The package supports several variants of the T-Rex selector:
- **Basic T-Rex**: Standard variable selection with FDR control
- **T-Rex+GVS**: Group variable selection using correlation structure
- **T-Rex+DA variants**: Dependency-aware variants
- AR1: Using AR(1) correlation structure
- Equi: Using equicorrelation structure
- BT: Using binary tree structure
- NN: Using nearest neighbor structure
## References
- Machkour, J., Muma, M., & Palomar, D. P. (2025). The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control. Signal Processing, 231, 109894.
## License
This package is licensed under the GNU General Public License v3.0 (GPL-3.0).
## Acknowledgments
The original R package [TRexSelector](https://github.com/jasinmachkour/TRexSelector) was created by Jasin Machkour, Simon Tien, Daniel P. Palomar, and Michael Muma. This Python port was developed by Arnau Vilella (avp@connect.ust.hk).
Raw data
{
"_id": null,
"home_page": "https://github.com/ArnauVilella/TRexSelector-python",
"name": "trexselector",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Python port by Arnau Vilella",
"author_email": "avp@connect.ust.hk",
"download_url": "https://files.pythonhosted.org/packages/81/cc/7c68a072c13a15e4452c0693e11349adcb0ae72bba78670a094b39c84e2d/trexselector-0.6.18.tar.gz",
"platform": null,
"description": "# TRexSelector-Python (trexselector)\n\nA Python port of the [TRexSelector](https://github.com/jasinmachkour/TRexSelector) R package for high-dimensional variable selection with false discovery rate (FDR) control.\n\n## Overview\n\nTRexSelector performs fast variable selection in high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level. The package implements the Terminating-Random Experiments Selector (T-Rex) as described in [Machkour, Muma, and Palomar (2025)](https://doi.org/10.1016/j.sigpro.2025.109894).\n\nThis Python package provides a port of the original R implementation, maintaining the same functionality while providing a more Pythonic interface. The Python port was created by Arnau Vilella (avp@connect.ust.hk).\n\n## Installation\n\n### Requirements\n\n- Python >= 3.8\n- numpy\n- scipy >= 1.7.0\n- scikit-learn >= 1.0.0\n- tlars\n- joblib >= 1.0.0\n- matplotlib >= 3.4.0\n- pandas >= 1.3.0\n\nFor manylinux (most Linux distributions), macOS, and Windows platforms, all dependencies including `tlars` will be installed automatically when installing the package. For other systems, you might need to build the `tlars` package from source.\n\n```bash\npip install trexselector==0.6.18\n```\n\n## Usage\n\n```python\nimport numpy as np\nfrom trexselector import trex, generate_gaussian_data\n\n# Generate some example data\nX, y, beta = generate_gaussian_data(n=100, p=20, seed=1234)\n\n# Run the T-Rex selector\nres = trex(X=X, y=y)\n\n# Get the selected variables\nselected_var = res[\"selected_var\"]\nprint(f\"Selected variables: {selected_var}\")\n```\n\n## Library Reference\n\n### Main Functions\n\n#### trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, max_T_stop=True, method=\"trex\", ...)\n\nThe main function for high-dimensional variable selection with FDR control.\n\n- **X**: ndarray - Predictor matrix of shape (n, p).\n- **y**: ndarray - Response vector of shape (n,).\n- **tFDR**: float - Target FDR level (between 0 and 1).\n- **K**: int - Number of random experiments.\n- **max_num_dummies**: int - Factor determining maximum number of dummies.\n- **max_T_stop**: bool - If True, maximum number of included dummies is set to ceiling(n/2).\n- **method**: str - Method to use ('trex', 'trex+GVS', 'trex+DA+AR1', 'trex+DA+equi', 'trex+DA+BT', 'trex+DA+NN').\n- **Returns**: dict - Contains selected variables and additional information.\n\n#### screen_trex(X, y, tFDR=0.2, K=20, max_num_dummies=10, ...)\n\nScreening variant of T-Rex for ultra-high dimensional datasets.\n\n- **X**, **y**, **tFDR**, **K**: Same as trex().\n- **q**: int - Number of variables to select in each split.\n- **num_splits**: int - Number of splits of the original problem.\n- **Returns**: dict - Contains selected variables and screening information.\n\n#### random_experiments(X, y, K=20, T_stop=1, num_dummies=None, ...)\n\nRun K random experiments with the T-Rex selector.\n\n- **X**, **y**, **K**: Same as trex().\n- **T_stop**: int - Number of included dummies before stopping.\n- **num_dummies**: int - Number of dummies to append.\n- **parallel_process**: bool - If True, experiments run in parallel.\n- **Returns**: dict - Contains experiment results and statistics.\n\n### Helper Functions\n\n#### add_dummies(X, num_dummies)\n\nAdd random dummy variables to the predictor matrix.\n\n- **X**: ndarray - Predictor matrix.\n- **num_dummies**: int - Number of dummies to append.\n- **Returns**: ndarray - Matrix with appended dummies.\n\n#### add_dummies_GVS(X, num_dummies, corr_max=0.5)\n\nAdd dummy variables with correlation constraints for group variable selection.\n\n- **X**: ndarray - Predictor matrix.\n- **num_dummies**: int - Number of dummies to append.\n- **corr_max**: float - Maximum allowed correlation between predictors.\n- **Returns**: dict - Contains matrix with dummies and group information.\n\n#### FDP(beta_hat, beta)\n\nCompute the false discovery proportion.\n\n- **beta_hat**: ndarray - Estimated coefficient vector.\n- **beta**: ndarray - True coefficient vector.\n- **Returns**: float - False discovery proportion.\n\n#### TPP(beta_hat, beta)\n\nCompute the true positive proportion.\n\n- **beta_hat**: ndarray - Estimated coefficient vector.\n- **beta**: ndarray - True coefficient vector.\n- **Returns**: float - True positive proportion.\n\n#### generate_gaussian_data(n=50, p=100, seed=789)\n\nGenerate synthetic Gaussian data for testing.\n\n- **n**: int - Number of observations.\n- **p**: int - Number of variables.\n- **seed**: int - Random seed.\n- **Returns**: tuple - (X, y, beta) containing predictor matrix, response, and true coefficients.\n\n#### fdp_hat(V, Phi, Phi_prime)\n\nCompute the estimated FDP for a set of voting thresholds.\n\n- **V**: ndarray - Voting thresholds.\n- **Phi**: ndarray - Vector of relative occurrences.\n- **Phi_prime**: ndarray - Vector of expected relative occurrences.\n- **Returns**: ndarray - Estimated FDP for each voting threshold.\n\n#### Phi_prime_fun(p, T_stop, num_dummies, phi_T_mat, Phi)\n\nCompute the expected relative occurrences for the T-Rex selector.\n\n- **p**: int - Number of variables.\n- **T_stop**: int - Number of included dummies before stopping.\n- **num_dummies**: int - Number of dummies appended.\n- **phi_T_mat**: ndarray - Matrix of relative occurrences.\n- **Phi**: ndarray - Vector of relative occurrences.\n- **Returns**: ndarray - Vector of expected relative occurrences.\n\n#### select_var_fun(p, tFDR, T_stop, FDP_hat_mat, Phi_mat, V)\n\nSelect variables based on estimated FDP and voting thresholds for basic T-Rex variants.\n\n- **p**: int - Number of variables.\n- **tFDR**: float - Target FDR level.\n- **T_stop**: int - Number of included dummies before stopping.\n- **FDP_hat_mat**: ndarray - Matrix of estimated FDP values.\n- **Phi_mat**: ndarray - Matrix of relative occurrences.\n- **V**: ndarray - Voting thresholds.\n- **Returns**: dict - Contains selected variables and selection information.\n\n#### select_var_fun_DA_BT(p, tFDR, T_stop, FDP_hat_array_BT, Phi_array_BT, V, rho_grid)\n\nSelect variables based on estimated FDP and voting thresholds for dependency-aware T-Rex variants.\n\n- **p**: int - Number of variables.\n- **tFDR**: float - Target FDR level.\n- **T_stop**: int - Number of included dummies before stopping.\n- **FDP_hat_array_BT**: ndarray - Array of estimated FDP values.\n- **Phi_array_BT**: ndarray - Array of relative occurrences.\n- **V**: ndarray - Voting thresholds.\n- **rho_grid**: ndarray - Grid of correlation thresholds.\n- **Returns**: dict - Contains selected variables and selection information.\n\n### Advanced Features\n\nThe package supports several variants of the T-Rex selector:\n\n- **Basic T-Rex**: Standard variable selection with FDR control\n- **T-Rex+GVS**: Group variable selection using correlation structure\n- **T-Rex+DA variants**: Dependency-aware variants\n - AR1: Using AR(1) correlation structure\n - Equi: Using equicorrelation structure\n - BT: Using binary tree structure\n - NN: Using nearest neighbor structure\n\n## References\n\n- Machkour, J., Muma, M., & Palomar, D. P. (2025). The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control. Signal Processing, 231, 109894.\n\n## License\n\nThis package is licensed under the GNU General Public License v3.0 (GPL-3.0).\n\n## Acknowledgments\n\nThe original R package [TRexSelector](https://github.com/jasinmachkour/TRexSelector) was created by Jasin Machkour, Simon Tien, Daniel P. Palomar, and Michael Muma. This Python port was developed by Arnau Vilella (avp@connect.ust.hk).\n",
"bugtrack_url": null,
"license": null,
"summary": "T-Rex Selector: High-Dimensional Variable Selection & FDR Control",
"version": "0.6.18",
"project_urls": {
"Homepage": "https://github.com/ArnauVilella/TRexSelector-python"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9f377154434e99198be34e2f5636247808cb7f3aa1f75bcb815926ea640efff5",
"md5": "ae822339c25008baa8bfefa4850eab24",
"sha256": "d4ba78b654f678ad2967794ad91263a894d3a3de1cd755c3d811ba2c9627eef0"
},
"downloads": -1,
"filename": "trexselector-0.6.18-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ae822339c25008baa8bfefa4850eab24",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 45540,
"upload_time": "2025-09-08T17:34:43",
"upload_time_iso_8601": "2025-09-08T17:34:43.628837Z",
"url": "https://files.pythonhosted.org/packages/9f/37/7154434e99198be34e2f5636247808cb7f3aa1f75bcb815926ea640efff5/trexselector-0.6.18-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "81cc7c68a072c13a15e4452c0693e11349adcb0ae72bba78670a094b39c84e2d",
"md5": "356021f17a9f76ca434cd092887f0b0e",
"sha256": "6771cb23e1dc5474cb03d16106e551e9a77407896f4eaaeab6f59db11a5a231b"
},
"downloads": -1,
"filename": "trexselector-0.6.18.tar.gz",
"has_sig": false,
"md5_digest": "356021f17a9f76ca434cd092887f0b0e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 35972,
"upload_time": "2025-09-08T17:34:44",
"upload_time_iso_8601": "2025-09-08T17:34:44.516625Z",
"url": "https://files.pythonhosted.org/packages/81/cc/7c68a072c13a15e4452c0693e11349adcb0ae72bba78670a094b39c84e2d/trexselector-0.6.18.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-08 17:34:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ArnauVilella",
"github_project": "TRexSelector-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "tlars",
"specs": []
},
{
"name": "joblib",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.4.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
}
],
"lcname": "trexselector"
}