fast-select


Namefast-select JSON
Version 0.1.5 PyPI version JSON
download
home_pageNone
SummaryA Sci-Kit Learn compatible Numba and CUDA-accelerated implementation of various feature selection algorithms.
upload_time2025-07-21 21:44:09
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords bioinformatics feature selection relief relieff machine learning numba cuda gpu surf multisurf turf surf* multisurf* mrmr chi2
VCS
bugtrack_url
requirements numpy numba scikit-learn scipy pytest pytest-cov skrebate
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # **Fast-Select: Accelerated Feature Selection for Modern Datasets**
[![PyPI version](https://img.shields.io/pypi/v/fast-select?color=blue)](https://pypi.org/project/fast-select/)
[![Build Status](https://img.shields.io/github/actions/workflow/status/GavinLynch04/FastSelect/python-tests.yml?branch=main)](https://github.com/GavinLynch04/FastSelect/actions)
[![Python Versions](https://img.shields.io/pypi/pyversions/fast-select.svg)](https://pypi.org/project/fast-select/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/GavinLynch04/FastSelect/blob/main/LICENSE)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![DOI](https://zenodo.org/badge/1018195486.svg)](https://doi.org/10.5281/zenodo.16285073)
<!-- start-include -->
A high-performance Python library powered by **Numba** and **CUDA**, offering accelerated algorithms for feature selection. Initially built to optimize the complete Relief family of algorithms, `fast-select` aims to expand and accelerate a wide range of feature selection methods to empower machine learning on large-scale datasets.

---

## **Key Features**

- **Fast Performance:** Leverages **Numba** for JIT compilation, **Joblib** for multi-core parallelism, and **Numba CUDA** for GPU acceleration, providing unmatched performance while scaling with modern hardware.
  
- **ML Pipeline Integration:** Fully compatible with **Scikit-Learn**, making it easy to fit into any machine learning pipeline with a familiar `.fit()`, `.transform()`, `.fit_transform()` interface.
  
- **Flexible Backends:** Offers dual execution modes for both CPU (`Joblib`) and GPU (`CUDA`). Automatically detects hardware with an easy-to-use `backend` parameter.
  
- **Feature-Rich Implementation:** Provides highly optimized implementations of ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, and TuRF, with plans to support additional feature selection algorithms in future releases.
  
- **Lightweight & Simple:** Avoids heavy dependencies like TensorFlow or PyTorch while delivering significant speedups for feature selection workflows.
  
<!-- end-include -->

---

## **Table of Contents**

1. [Installation](#installation)
2. [Quickstart](#quickstart)
3. [Backend Selection](#backend-selection-cpu-vs-gpu)
4. [Benchmarking Highlights](#benchmarking-highlights)
5. [Algorithm Implementations](#algorithm-implementations)
6. [Future Directions](#future-directions)
7. [Contributing](#contributing)
8. [License](#license)
9. [How to Cite](#how-to-cite)
10. [Acknowledgments](#acknowledgments)

---

## **Installation**
<!-- start-installation-section -->

Install `fast-select` directly from PyPI:

```bash
pip install fast-select
```

For development versions (with testing and documentation dependencies):

```bash
git clone https://github.com/GavinLynch04/FastSelect.git
cd fast-select
pip install -e .[dev]
```

<!-- end-installation-section -->

---

## **Quickstart**
<!-- start-quickstart-section -->

Using `fast-select` is simple and seamless for anyone familiar with Scikit-Learn.

```python
from fast_select import MultiSURF
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# 1. Generate a synthetic dataset
X, y = make_classification(
    n_samples=500, 
    n_features=1000, 
    n_informative=20, 
    n_redundant=100, 
    random_state=42
)

# 2. Use the MultiSURF estimator to select the top 15 features
selector = MultiSURF(n_features_to_select=15)
X_selected = selector.fit_transform(X, y)
print(f"Original feature count: {X.shape[1]}")
print(f"Selected feature count: {X_selected.shape[1]}")
print(f"Top 15 feature indices: {selector.top_features_}")

# 3. Integrate into a Scikit-Learn Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('feature_selector', MultiSURF(n_features_to_select=10, backend='cpu')),
    ('classifier', LogisticRegression())
])

# Fit the pipeline
# pipeline.fit(X, y)
```
<!-- end-quickstart-section -->

---

## **Backend Selection (CPU vs. GPU)**

You can control the computational backend with the `backend` parameter during initialization:

- **`backend='auto'`**: Automatically detects if an NVIDIA GPU is available. Falls back to CPU if a GPU is not available.
  
- **`backend='gpu'`**: Explicitly runs on GPU. Will raise a `RuntimeError` if no compatible GPU is found.
  
- **`backend='cpu'`**: Forces CPU computations, even if a GPU is available.

Example usage:

```python
# Force CPU usage
cpu_selector = MultiSURF(n_features_to_select=10, backend='cpu')

# Force GPU usage
gpu_selector = MultiSURF(n_features_to_select=10, backend='gpu')
```

---

## **Benchmarking Highlights**

Fast-Select delivers groundbreaking improvements in runtime and memory efficiency. Benchmarks show **50-100x speed-ups** compared to `scikit-rebate` and R's `CORElearn` library, particularly on large datasets exceeding 10,000 samples and features. [Benchmarking scripts](./benchmarking) are available in the repository for further testing.

#### Runtime vs. Number of Samples (n >> p)

<p align="center">
  <img alt="Runtime Benchmark N-Dominant" width="700" src="https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_n_dominant_runtime.png">
</p>

#### Runtime vs. Number of Features (p >> n)

<p align="center">
  <img alt="Memory Benchmark P-Dominant" width="700" src="https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_p_dominant_runtime.png">
</p>

---

## **Algorithm Implementations**

Currently supported:

- **Relief-Family Algorithms:**
  - ReliefF
  - SURF
  - SURF*
  - MultiSURF
  - MultiSURF*
  - TuRF

Future plans include additional feature selection algorithms, such as wrappers, embedded methods, and more filter-based approaches.

---

## **Contributing**

Contributions are highly encouraged. Whether you're fixing bugs, improving performance, or proposing new algorithms, your work is invaluable. Please ensure your submissions include relevant test cases and documentation updates.

---

## **License**

This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for full details.

---

## Citing `fast-select`

If you use `fast-select` in your research or work, please cite it using the following DOI. This helps to track the impact of the work and ensures its continued development.

> Gavin Lynch. (2025). GavinLynch04/FastSelect: v0.1.5 (0.1.5). Zenodo. [https://doi.org/10.5281/zenodo.16285073](https://doi.org/10.5281/zenodo.16285073)

You can use the following BibTeX entry:

```bibtex
@software{gavin_lynch_2025,
  author       = {Gavin Lynch},
  title        = {{GavinLynch04/FastSelect: v0.1.5}},
  month        = jul,
  year         = 2025,
  publisher    = {Zenodo},
  version      = {0.1.5},
  doi          = {10.5281/zenodo.16285073},
  url          = {https://doi.org/10.5281/zenodo.16285073}
}
```

---

## **Acknowledgments**

This library builds on the exceptional work of the following:

- The **Numba** team for enabling JIT compilation and GPU acceleration.
- The **scikit-rebate** authors for their inspiring Relief-based library.
- The original researchers behind the Relief algorithms for their foundational contributions to feature selection.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fast-select",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "bioinformatics, feature selection, relief, reliefF, machine learning, numba, cuda, gpu, SURF, MultiSURF, TuRF, SURF*, MultiSURF*, mRMR, chi2",
    "author": null,
    "author_email": "Gavin Lynch <gavinlynch04@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/bd/35/ac9c71bcc42fffc28371f111965d1e757d38c8d2ba6c2d4c626f98148ba9/fast_select-0.1.5.tar.gz",
    "platform": null,
    "description": "# **Fast-Select: Accelerated Feature Selection for Modern Datasets**\n[![PyPI version](https://img.shields.io/pypi/v/fast-select?color=blue)](https://pypi.org/project/fast-select/)\n[![Build Status](https://img.shields.io/github/actions/workflow/status/GavinLynch04/FastSelect/python-tests.yml?branch=main)](https://github.com/GavinLynch04/FastSelect/actions)\n[![Python Versions](https://img.shields.io/pypi/pyversions/fast-select.svg)](https://pypi.org/project/fast-select/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/GavinLynch04/FastSelect/blob/main/LICENSE)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![DOI](https://zenodo.org/badge/1018195486.svg)](https://doi.org/10.5281/zenodo.16285073)\n<!-- start-include -->\nA high-performance Python library powered by **Numba** and **CUDA**, offering accelerated algorithms for feature selection. Initially built to optimize the complete Relief family of algorithms, `fast-select` aims to expand and accelerate a wide range of feature selection methods to empower machine learning on large-scale datasets.\n\n---\n\n## **Key Features**\n\n- **Fast Performance:** Leverages **Numba** for JIT compilation, **Joblib** for multi-core parallelism, and **Numba CUDA** for GPU acceleration, providing unmatched performance while scaling with modern hardware.\n  \n- **ML Pipeline Integration:** Fully compatible with **Scikit-Learn**, making it easy to fit into any machine learning pipeline with a familiar `.fit()`, `.transform()`, `.fit_transform()` interface.\n  \n- **Flexible Backends:** Offers dual execution modes for both CPU (`Joblib`) and GPU (`CUDA`). Automatically detects hardware with an easy-to-use `backend` parameter.\n  \n- **Feature-Rich Implementation:** Provides highly optimized implementations of ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, and TuRF, with plans to support additional feature selection algorithms in future releases.\n  \n- **Lightweight & Simple:** Avoids heavy dependencies like TensorFlow or PyTorch while delivering significant speedups for feature selection workflows.\n  \n<!-- end-include -->\n\n---\n\n## **Table of Contents**\n\n1. [Installation](#installation)\n2. [Quickstart](#quickstart)\n3. [Backend Selection](#backend-selection-cpu-vs-gpu)\n4. [Benchmarking Highlights](#benchmarking-highlights)\n5. [Algorithm Implementations](#algorithm-implementations)\n6. [Future Directions](#future-directions)\n7. [Contributing](#contributing)\n8. [License](#license)\n9. [How to Cite](#how-to-cite)\n10. [Acknowledgments](#acknowledgments)\n\n---\n\n## **Installation**\n<!-- start-installation-section -->\n\nInstall `fast-select` directly from PyPI:\n\n```bash\npip install fast-select\n```\n\nFor development versions (with testing and documentation dependencies):\n\n```bash\ngit clone https://github.com/GavinLynch04/FastSelect.git\ncd fast-select\npip install -e .[dev]\n```\n\n<!-- end-installation-section -->\n\n---\n\n## **Quickstart**\n<!-- start-quickstart-section -->\n\nUsing `fast-select` is simple and seamless for anyone familiar with Scikit-Learn.\n\n```python\nfrom fast_select import MultiSURF\nfrom sklearn.datasets import make_classification\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\n\n# 1. Generate a synthetic dataset\nX, y = make_classification(\n    n_samples=500, \n    n_features=1000, \n    n_informative=20, \n    n_redundant=100, \n    random_state=42\n)\n\n# 2. Use the MultiSURF estimator to select the top 15 features\nselector = MultiSURF(n_features_to_select=15)\nX_selected = selector.fit_transform(X, y)\nprint(f\"Original feature count: {X.shape[1]}\")\nprint(f\"Selected feature count: {X_selected.shape[1]}\")\nprint(f\"Top 15 feature indices: {selector.top_features_}\")\n\n# 3. Integrate into a Scikit-Learn Pipeline\npipeline = Pipeline([\n    ('scaler', StandardScaler()),\n    ('feature_selector', MultiSURF(n_features_to_select=10, backend='cpu')),\n    ('classifier', LogisticRegression())\n])\n\n# Fit the pipeline\n# pipeline.fit(X, y)\n```\n<!-- end-quickstart-section -->\n\n---\n\n## **Backend Selection (CPU vs. GPU)**\n\nYou can control the computational backend with the `backend` parameter during initialization:\n\n- **`backend='auto'`**: Automatically detects if an NVIDIA GPU is available. Falls back to CPU if a GPU is not available.\n  \n- **`backend='gpu'`**: Explicitly runs on GPU. Will raise a `RuntimeError` if no compatible GPU is found.\n  \n- **`backend='cpu'`**: Forces CPU computations, even if a GPU is available.\n\nExample usage:\n\n```python\n# Force CPU usage\ncpu_selector = MultiSURF(n_features_to_select=10, backend='cpu')\n\n# Force GPU usage\ngpu_selector = MultiSURF(n_features_to_select=10, backend='gpu')\n```\n\n---\n\n## **Benchmarking Highlights**\n\nFast-Select delivers groundbreaking improvements in runtime and memory efficiency. Benchmarks show **50-100x speed-ups** compared to `scikit-rebate` and R's `CORElearn` library, particularly on large datasets exceeding 10,000 samples and features. [Benchmarking scripts](./benchmarking) are available in the repository for further testing.\n\n#### Runtime vs. Number of Samples (n >> p)\n\n<p align=\"center\">\n  <img alt=\"Runtime Benchmark N-Dominant\" width=\"700\" src=\"https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_n_dominant_runtime.png\">\n</p>\n\n#### Runtime vs. Number of Features (p >> n)\n\n<p align=\"center\">\n  <img alt=\"Memory Benchmark P-Dominant\" width=\"700\" src=\"https://raw.githubusercontent.com/GavinLynch04/FastSelect/main/benchmarking/benchmark_p_dominant_runtime.png\">\n</p>\n\n---\n\n## **Algorithm Implementations**\n\nCurrently supported:\n\n- **Relief-Family Algorithms:**\n  - ReliefF\n  - SURF\n  - SURF*\n  - MultiSURF\n  - MultiSURF*\n  - TuRF\n\nFuture plans include additional feature selection algorithms, such as wrappers, embedded methods, and more filter-based approaches.\n\n---\n\n## **Contributing**\n\nContributions are highly encouraged. Whether you're fixing bugs, improving performance, or proposing new algorithms, your work is invaluable. Please ensure your submissions include relevant test cases and documentation updates.\n\n---\n\n## **License**\n\nThis project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for full details.\n\n---\n\n## Citing `fast-select`\n\nIf you use `fast-select` in your research or work, please cite it using the following DOI. This helps to track the impact of the work and ensures its continued development.\n\n> Gavin Lynch. (2025). GavinLynch04/FastSelect: v0.1.5 (0.1.5). Zenodo. [https://doi.org/10.5281/zenodo.16285073](https://doi.org/10.5281/zenodo.16285073)\n\nYou can use the following BibTeX entry:\n\n```bibtex\n@software{gavin_lynch_2025,\n  author       = {Gavin Lynch},\n  title        = {{GavinLynch04/FastSelect: v0.1.5}},\n  month        = jul,\n  year         = 2025,\n  publisher    = {Zenodo},\n  version      = {0.1.5},\n  doi          = {10.5281/zenodo.16285073},\n  url          = {https://doi.org/10.5281/zenodo.16285073}\n}\n```\n\n---\n\n## **Acknowledgments**\n\nThis library builds on the exceptional work of the following:\n\n- The **Numba** team for enabling JIT compilation and GPU acceleration.\n- The **scikit-rebate** authors for their inspiring Relief-based library.\n- The original researchers behind the Relief algorithms for their foundational contributions to feature selection.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Sci-Kit Learn compatible Numba and CUDA-accelerated implementation of various feature selection algorithms.",
    "version": "0.1.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/GavinLynch04/FastSelect/issues",
        "Changelog": "https://github.com/GavinLynch04/FastSelect/blob/main/CHANGELOG.md",
        "Documentation": "https://fastselect.readthedocs.io/en/latest/",
        "Homepage": "https://github.com/GavinLynch04/FastSelect"
    },
    "split_keywords": [
        "bioinformatics",
        " feature selection",
        " relief",
        " relieff",
        " machine learning",
        " numba",
        " cuda",
        " gpu",
        " surf",
        " multisurf",
        " turf",
        " surf*",
        " multisurf*",
        " mrmr",
        " chi2"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "23aecb2967c179121e116edc394225b9c4c5f536f34955f6f42c7109e85a29ab",
                "md5": "46e634c13035a8c4ca9010ed35bca115",
                "sha256": "c5ebdf484cb9385d2580348559ed04c45bd72ac904b145bfa85c39ed2f969695"
            },
            "downloads": -1,
            "filename": "fast_select-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "46e634c13035a8c4ca9010ed35bca115",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 20670,
            "upload_time": "2025-07-21T21:44:08",
            "upload_time_iso_8601": "2025-07-21T21:44:08.308551Z",
            "url": "https://files.pythonhosted.org/packages/23/ae/cb2967c179121e116edc394225b9c4c5f536f34955f6f42c7109e85a29ab/fast_select-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bd35ac9c71bcc42fffc28371f111965d1e757d38c8d2ba6c2d4c626f98148ba9",
                "md5": "bc79e8b9d6b14f566756290b046e21b4",
                "sha256": "ade2e7f9b3fa309f9f56fb48a7a5bac91cbeb2d0a736f7c6f07f34de07f9a03b"
            },
            "downloads": -1,
            "filename": "fast_select-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "bc79e8b9d6b14f566756290b046e21b4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 22198,
            "upload_time": "2025-07-21T21:44:09",
            "upload_time_iso_8601": "2025-07-21T21:44:09.317994Z",
            "url": "https://files.pythonhosted.org/packages/bd/35/ac9c71bcc42fffc28371f111965d1e757d38c8d2ba6c2d4c626f98148ba9/fast_select-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-21 21:44:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GavinLynch04",
    "github_project": "FastSelect",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "numba",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "pytest-cov",
            "specs": []
        },
        {
            "name": "skrebate",
            "specs": []
        }
    ],
    "lcname": "fast-select"
}
        
Elapsed time: 1.08348s