# **Fast-Select: Accelerated Feature Selection for Modern Datasets**
[](https://pypi.org/project/fast-select/)
[](https://github.com/GavinLynch04/FastSelect/actions)
[](https://pypi.org/project/fast-select/)
[](https://github.com/GavinLynch04/FastSelect/blob/main/LICENSE)
[](https://github.com/psf/black)
[](https://github.com/astral-sh/ruff)
[](https://doi.org/10.5281/zenodo.16285073)
<!-- start-include -->
A high-performance Python library powered by **Numba** and **CUDA**, offering accelerated algorithms for feature selection. Initially built to optimize the complete Relief family of algorithms, `fast-select` aims to expand and accelerate a wide range of feature selection methods to empower machine learning on large-scale datasets.
---
## **Key Features**
- **Fast Performance:** Leverages **Numba** for JIT compilation, **Joblib** for multi-core parallelism, and **Numba CUDA** for GPU acceleration, providing unmatched performance while scaling with modern hardware.
- **ML Pipeline Integration:** Fully compatible with **Scikit-Learn**, making it easy to fit into any machine learning pipeline with a familiar `.fit()`, `.transform()`, `.fit_transform()` interface.
- **Flexible Backends:** Offers dual execution modes for both CPU (`Joblib`) and GPU (`CUDA`). Automatically detects hardware with an easy-to-use `backend` parameter.
- **Feature-Rich Implementation:** Provides highly optimized implementations of ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, and TuRF, with plans to support additional feature selection algorithms in future releases.
- **Lightweight & Simple:** Avoids heavy dependencies like TensorFlow or PyTorch while delivering significant speedups for feature selection workflows.
<!-- end-include -->
---
## **Table of Contents**
1. [Installation](#installation)
2. [Quickstart](#quickstart)
3. [Backend Selection](#backend-selection-cpu-vs-gpu)
4. [Benchmarking Highlights](#benchmarking-highlights)
5. [Algorithm Implementations](#algorithm-implementations)
6. [Future Directions](#future-directions)
7. [Contributing](#contributing)
8. [License](#license)
9. [How to Cite](#how-to-cite)
10. [Acknowledgments](#acknowledgments)
---
## **Installation**
<!-- start-installation-section -->
Install `fast-select` directly from PyPI:
```bash
pip install fast-select
```
For development versions (with testing and documentation dependencies):
```bash
git clone https://github.com/GavinLynch04/FastSelect.git
cd fast-select
pip install -e .[dev]
```
<!-- end-installation-section -->
---
## **Quickstart**
<!-- start-quickstart-section -->
Using `fast-select` is simple and seamless for anyone familiar with Scikit-Learn.
```python
from fast_select import MultiSURF
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# 1. Generate a synthetic dataset
X, y = make_classification(
n_samples=500,
n_features=1000,
n_informative=20,
n_redundant=100,
random_state=42
)
# 2. Use the MultiSURF estimator to select the top 15 features
selector = MultiSURF(n_features_to_select=15)
X_selected = selector.fit_transform(X, y)
print(f"Original feature count: {X.shape[1]}")
print(f"Selected feature count: {X_selected.shape[1]}")
print(f"Top 15 feature indices: {selector.top_features_}")
# 3. Integrate into a Scikit-Learn Pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('feature_selector', MultiSURF(n_features_to_select=10, backend='cpu')),
('classifier', LogisticRegression())
])
# Fit the pipeline
# pipeline.fit(X, y)
```
<!-- end-quickstart-section -->
---
## **Backend Selection (CPU vs. GPU)**
You can control the computational backend with the `backend` parameter during initialization:
- **`backend='auto'`**: Automatically detects if an NVIDIA GPU is available. Falls back to CPU if a GPU is not available.
- **`backend='gpu'`**: Explicitly runs on GPU. Will raise a `RuntimeError` if no compatible GPU is found.
- **`backend='cpu'`**: Forces CPU computations, even if a GPU is available.
Example usage:
```python
# Force CPU usage
cpu_selector = MultiSURF(n_features_to_select=10, backend='cpu')
# Force GPU usage
gpu_selector = MultiSURF(n_features_to_select=10, backend='gpu')
```
---
## **Benchmarking Highlights**
Fast-Select delivers groundbreaking improvements in runtime and memory efficiency. Benchmarks show **50-100x speed-ups** compared to `scikit-rebate` and R's `CORElearn` library, particularly on large datasets exceeding 10,000 samples and features. [Benchmarking scripts](./benchmarking) are available in the repository for further testing.
#### Runtime vs. Number of Samples (n >> p)
<p align="center">
<img alt="Runtime Benchmark N-Dominant" width="700" src="https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_n_dominant_runtime.png">
</p>
#### Runtime vs. Number of Features (p >> n)
<p align="center">
<img alt="Memory Benchmark P-Dominant" width="700" src="https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_p_dominant_runtime.png">
</p>
---
## **Algorithm Implementations**
Currently supported:
- **Relief-Family Algorithms:**
- ReliefF
- SURF
- SURF*
- MultiSURF
- MultiSURF*
- TuRF
Future plans include additional feature selection algorithms, such as wrappers, embedded methods, and more filter-based approaches.
---
## **Contributing**
Contributions are highly encouraged. Whether you're fixing bugs, improving performance, or proposing new algorithms, your work is invaluable. Please ensure your submissions include relevant test cases and documentation updates.
---
## **License**
This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for full details.
---
## Citing `fast-select`
If you use `fast-select` in your research or work, please cite it using the following DOI. This helps to track the impact of the work and ensures its continued development.
> Gavin Lynch. (2025). GavinLynch04/FastSelect: v0.1.5 (0.1.5). Zenodo. [https://doi.org/10.5281/zenodo.16285073](https://doi.org/10.5281/zenodo.16285073)
You can use the following BibTeX entry:
```bibtex
@software{gavin_lynch_2025,
author = {Gavin Lynch},
title = {{GavinLynch04/FastSelect: v0.1.5}},
month = jul,
year = 2025,
publisher = {Zenodo},
version = {0.1.5},
doi = {10.5281/zenodo.16285073},
url = {https://doi.org/10.5281/zenodo.16285073}
}
```
---
## **Acknowledgments**
This library builds on the exceptional work of the following:
- The **Numba** team for enabling JIT compilation and GPU acceleration.
- The **scikit-rebate** authors for their inspiring Relief-based library.
- The original researchers behind the Relief algorithms for their foundational contributions to feature selection.
Raw data
{
"_id": null,
"home_page": null,
"name": "fast-select",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "bioinformatics, feature selection, relief, reliefF, machine learning, numba, cuda, gpu, SURF, MultiSURF, TuRF, SURF*, MultiSURF*, mRMR, chi2",
"author": null,
"author_email": "Gavin Lynch <gavinlynch04@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/bd/35/ac9c71bcc42fffc28371f111965d1e757d38c8d2ba6c2d4c626f98148ba9/fast_select-0.1.5.tar.gz",
"platform": null,
"description": "# **Fast-Select: Accelerated Feature Selection for Modern Datasets**\n[](https://pypi.org/project/fast-select/)\n[](https://github.com/GavinLynch04/FastSelect/actions)\n[](https://pypi.org/project/fast-select/)\n[](https://github.com/GavinLynch04/FastSelect/blob/main/LICENSE)\n[](https://github.com/psf/black)\n[](https://github.com/astral-sh/ruff)\n[](https://doi.org/10.5281/zenodo.16285073)\n<!-- start-include -->\nA high-performance Python library powered by **Numba** and **CUDA**, offering accelerated algorithms for feature selection. Initially built to optimize the complete Relief family of algorithms, `fast-select` aims to expand and accelerate a wide range of feature selection methods to empower machine learning on large-scale datasets.\n\n---\n\n## **Key Features**\n\n- **Fast Performance:** Leverages **Numba** for JIT compilation, **Joblib** for multi-core parallelism, and **Numba CUDA** for GPU acceleration, providing unmatched performance while scaling with modern hardware.\n \n- **ML Pipeline Integration:** Fully compatible with **Scikit-Learn**, making it easy to fit into any machine learning pipeline with a familiar `.fit()`, `.transform()`, `.fit_transform()` interface.\n \n- **Flexible Backends:** Offers dual execution modes for both CPU (`Joblib`) and GPU (`CUDA`). Automatically detects hardware with an easy-to-use `backend` parameter.\n \n- **Feature-Rich Implementation:** Provides highly optimized implementations of ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, and TuRF, with plans to support additional feature selection algorithms in future releases.\n \n- **Lightweight & Simple:** Avoids heavy dependencies like TensorFlow or PyTorch while delivering significant speedups for feature selection workflows.\n \n<!-- end-include -->\n\n---\n\n## **Table of Contents**\n\n1. [Installation](#installation)\n2. [Quickstart](#quickstart)\n3. [Backend Selection](#backend-selection-cpu-vs-gpu)\n4. [Benchmarking Highlights](#benchmarking-highlights)\n5. [Algorithm Implementations](#algorithm-implementations)\n6. [Future Directions](#future-directions)\n7. [Contributing](#contributing)\n8. [License](#license)\n9. [How to Cite](#how-to-cite)\n10. [Acknowledgments](#acknowledgments)\n\n---\n\n## **Installation**\n<!-- start-installation-section -->\n\nInstall `fast-select` directly from PyPI:\n\n```bash\npip install fast-select\n```\n\nFor development versions (with testing and documentation dependencies):\n\n```bash\ngit clone https://github.com/GavinLynch04/FastSelect.git\ncd fast-select\npip install -e .[dev]\n```\n\n<!-- end-installation-section -->\n\n---\n\n## **Quickstart**\n<!-- start-quickstart-section -->\n\nUsing `fast-select` is simple and seamless for anyone familiar with Scikit-Learn.\n\n```python\nfrom fast_select import MultiSURF\nfrom sklearn.datasets import make_classification\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\n\n# 1. Generate a synthetic dataset\nX, y = make_classification(\n n_samples=500, \n n_features=1000, \n n_informative=20, \n n_redundant=100, \n random_state=42\n)\n\n# 2. Use the MultiSURF estimator to select the top 15 features\nselector = MultiSURF(n_features_to_select=15)\nX_selected = selector.fit_transform(X, y)\nprint(f\"Original feature count: {X.shape[1]}\")\nprint(f\"Selected feature count: {X_selected.shape[1]}\")\nprint(f\"Top 15 feature indices: {selector.top_features_}\")\n\n# 3. Integrate into a Scikit-Learn Pipeline\npipeline = Pipeline([\n ('scaler', StandardScaler()),\n ('feature_selector', MultiSURF(n_features_to_select=10, backend='cpu')),\n ('classifier', LogisticRegression())\n])\n\n# Fit the pipeline\n# pipeline.fit(X, y)\n```\n<!-- end-quickstart-section -->\n\n---\n\n## **Backend Selection (CPU vs. GPU)**\n\nYou can control the computational backend with the `backend` parameter during initialization:\n\n- **`backend='auto'`**: Automatically detects if an NVIDIA GPU is available. Falls back to CPU if a GPU is not available.\n \n- **`backend='gpu'`**: Explicitly runs on GPU. Will raise a `RuntimeError` if no compatible GPU is found.\n \n- **`backend='cpu'`**: Forces CPU computations, even if a GPU is available.\n\nExample usage:\n\n```python\n# Force CPU usage\ncpu_selector = MultiSURF(n_features_to_select=10, backend='cpu')\n\n# Force GPU usage\ngpu_selector = MultiSURF(n_features_to_select=10, backend='gpu')\n```\n\n---\n\n## **Benchmarking Highlights**\n\nFast-Select delivers groundbreaking improvements in runtime and memory efficiency. Benchmarks show **50-100x speed-ups** compared to `scikit-rebate` and R's `CORElearn` library, particularly on large datasets exceeding 10,000 samples and features. [Benchmarking scripts](./benchmarking) are available in the repository for further testing.\n\n#### Runtime vs. Number of Samples (n >> p)\n\n<p align=\"center\">\n <img alt=\"Runtime Benchmark N-Dominant\" width=\"700\" src=\"https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_n_dominant_runtime.png\">\n</p>\n\n#### Runtime vs. Number of Features (p >> n)\n\n<p align=\"center\">\n <img alt=\"Memory Benchmark P-Dominant\" width=\"700\" src=\"https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_p_dominant_runtime.png\">\n</p>\n\n---\n\n## **Algorithm Implementations**\n\nCurrently supported:\n\n- **Relief-Family Algorithms:**\n - ReliefF\n - SURF\n - SURF*\n - MultiSURF\n - MultiSURF*\n - TuRF\n\nFuture plans include additional feature selection algorithms, such as wrappers, embedded methods, and more filter-based approaches.\n\n---\n\n## **Contributing**\n\nContributions are highly encouraged. Whether you're fixing bugs, improving performance, or proposing new algorithms, your work is invaluable. Please ensure your submissions include relevant test cases and documentation updates.\n\n---\n\n## **License**\n\nThis project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for full details.\n\n---\n\n## Citing `fast-select`\n\nIf you use `fast-select` in your research or work, please cite it using the following DOI. This helps to track the impact of the work and ensures its continued development.\n\n> Gavin Lynch. (2025). GavinLynch04/FastSelect: v0.1.5 (0.1.5). Zenodo. [https://doi.org/10.5281/zenodo.16285073](https://doi.org/10.5281/zenodo.16285073)\n\nYou can use the following BibTeX entry:\n\n```bibtex\n@software{gavin_lynch_2025,\n author = {Gavin Lynch},\n title = {{GavinLynch04/FastSelect: v0.1.5}},\n month = jul,\n year = 2025,\n publisher = {Zenodo},\n version = {0.1.5},\n doi = {10.5281/zenodo.16285073},\n url = {https://doi.org/10.5281/zenodo.16285073}\n}\n```\n\n---\n\n## **Acknowledgments**\n\nThis library builds on the exceptional work of the following:\n\n- The **Numba** team for enabling JIT compilation and GPU acceleration.\n- The **scikit-rebate** authors for their inspiring Relief-based library.\n- The original researchers behind the Relief algorithms for their foundational contributions to feature selection.\n",
"bugtrack_url": null,
"license": null,
"summary": "A Sci-Kit Learn compatible Numba and CUDA-accelerated implementation of various feature selection algorithms.",
"version": "0.1.5",
"project_urls": {
"Bug Tracker": "https://github.com/GavinLynch04/FastSelect/issues",
"Changelog": "https://github.com/GavinLynch04/FastSelect/blob/main/CHANGELOG.md",
"Documentation": "https://fastselect.readthedocs.io/en/latest/",
"Homepage": "https://github.com/GavinLynch04/FastSelect"
},
"split_keywords": [
"bioinformatics",
" feature selection",
" relief",
" relieff",
" machine learning",
" numba",
" cuda",
" gpu",
" surf",
" multisurf",
" turf",
" surf*",
" multisurf*",
" mrmr",
" chi2"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "23aecb2967c179121e116edc394225b9c4c5f536f34955f6f42c7109e85a29ab",
"md5": "46e634c13035a8c4ca9010ed35bca115",
"sha256": "c5ebdf484cb9385d2580348559ed04c45bd72ac904b145bfa85c39ed2f969695"
},
"downloads": -1,
"filename": "fast_select-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "46e634c13035a8c4ca9010ed35bca115",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 20670,
"upload_time": "2025-07-21T21:44:08",
"upload_time_iso_8601": "2025-07-21T21:44:08.308551Z",
"url": "https://files.pythonhosted.org/packages/23/ae/cb2967c179121e116edc394225b9c4c5f536f34955f6f42c7109e85a29ab/fast_select-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bd35ac9c71bcc42fffc28371f111965d1e757d38c8d2ba6c2d4c626f98148ba9",
"md5": "bc79e8b9d6b14f566756290b046e21b4",
"sha256": "ade2e7f9b3fa309f9f56fb48a7a5bac91cbeb2d0a736f7c6f07f34de07f9a03b"
},
"downloads": -1,
"filename": "fast_select-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "bc79e8b9d6b14f566756290b046e21b4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 22198,
"upload_time": "2025-07-21T21:44:09",
"upload_time_iso_8601": "2025-07-21T21:44:09.317994Z",
"url": "https://files.pythonhosted.org/packages/bd/35/ac9c71bcc42fffc28371f111965d1e757d38c8d2ba6c2d4c626f98148ba9/fast_select-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-21 21:44:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "GavinLynch04",
"github_project": "FastSelect",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "numba",
"specs": []
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "scipy",
"specs": []
},
{
"name": "pytest",
"specs": []
},
{
"name": "pytest-cov",
"specs": []
},
{
"name": "skrebate",
"specs": []
}
],
"lcname": "fast-select"
}