felimination


Namefelimination JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryThis library contains some useful scikit-learn compatible classes for feature selection.
upload_time2024-07-31 08:05:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords feature selection scikit-learn machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Felimination

This library contains some useful scikit-learn compatible classes for feature selection.

## Features

- [Recursive Feature Elimination with Cross Validation using Permutation Importance](reference/RFE.md#felimination.rfe.PermutationImportanceRFECV)
- [Hybrid Genetic Algorithms x Feature Importance selection](/reference/genetic_algorithms/#felimination.ga.HybridImportanceGACVFeatureSelector)

## Requirements

- Python 3.7+
- NumPy
- Scikit-learn
- Pandas

## Installation

In a terminal shell run the following command
```
pip install felimination
```

## Usage

### Recursive Feature Elimination
In this section it will be illustrated how to use the `PermutationImportanceRFECV` class.

```python
from felimination.rfe import PermutationImportanceRFECV
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
import numpy as np


X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=6,
    n_redundant=10,
    n_clusters_per_class=1,
    random_state=42,
)

selector = PermutationImportanceRFECV(LogisticRegression(), step=0.3)

selector.fit(X, y)

selector.support_
# array([False, False, False, False, False, False, False, False, False,
#        False, False,  True, False, False, False, False, False, False,
#        False, False])

selector.ranking_
# array([9, 3, 8, 9, 7, 8, 5, 6, 9, 6, 8, 1, 9, 7, 8, 9, 9, 2, 4, 7])
selector.plot()
```
![RFECV fit plot](./docs/assets/rfecv_fit_plot.png)

It looks like `5` is a good number of features, we can set the number of features to select to 5 without need of retraining

```python
selector.set_n_features_to_select(5)
selector.support_
# array([False,  True, False, False, False, False,  True, False, False,
#        False, False,  True, False, False, False, False, False,  True,
#         True, False])
```

## Genetic Algorithms
In this section it will be illustrated how to use the `HybridImportanceGACVFeatureSelector` class.

```python
from felimination.ga import HybridImportanceGACVFeatureSelector
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
import numpy as np

# Create dummy dataset
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=6,
    n_redundant=10,
    n_clusters_per_class=1,
    random_state=42,
)

# Initialize selector
selector = HybridImportanceGACVFeatureSelector(
    LogisticRegression(random_state=42),
    random_state=42,
    pool_size=5,
    patience=5
)

# Run optimisation
selector.fit(X, y)

# Show selected features
selector.support_
#array([False,  True, False,  True,  True, False, False, False,  True,
#       False, False, False,  True,  True,  True,  True, False,  True,
#        True, False])

# Show best solution
selector.best_solution_
# {'features': [1, 12, 13, 8, 17, 15, 18, 4, 3, 14],
#  'train_scores_per_fold': [0.88625, 0.89, 0.8825, 0.8925, 0.88625],
#  'test_scores_per_fold': [0.895, 0.885, 0.885, 0.89, 0.89],
#  'cv_importances': [array([[ 1.09135972,  1.13502636,  1.12100231,  0.38285736,  0.28944072,
#            0.04688614,  0.44259813,  0.09832365,  0.10190421, -0.48101593]]),
#   array([[ 1.17345812,  1.29375208,  1.2065342 ,  0.40418709,  0.41839714,
#            0.00447802,  0.466717  ,  0.21733829, -0.00842075, -0.50078996]]),
#   array([[ 1.15416104,  1.18458564,  1.18083266,  0.37071253,  0.22842685,
#            0.1087814 ,  0.44446793,  0.12740545,  0.00621562, -0.54064287]]),
#   array([[ 1.26011643,  1.36996058,  1.30481424,  0.48183549,  0.40589887,
#           -0.01849671,  0.45606913,  0.18330816,  0.03667055, -0.50869557]]),
#   array([[ 1.18227123,  1.28988253,  1.2496398 ,  0.50754295,  0.38942303,
#           -0.01725074,  0.4481891 ,  0.19472963,  0.10034316, -0.50131192]])],
#  'mean_train_score': 0.8875,
#  'mean_test_score': 0.889,
#  'mean_cv_importances': array([ 1.17227331,  1.25464144,  1.21256464,  0.42942709,  0.34631732,
#          0.02487962,  0.45160826,  0.16422104,  0.04734256, -0.50649125])}

# Show progress as a plot
selector.plot()
```
![GA fit plot](./docs/assets/ga_fit_plot.png)

Looks like that the optimisation process converged after 2 steps, since the best score did not improve for 5(=`patience`) consecutive steps, the optimisation process stopped early.

## License

This project is licensed under the BSD 3-Clause License - see the LICENSE.md file for details

## Acknowledgments

- [scikit-learn](https://scikit-learn.org/)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "felimination",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "feature selection, scikit-learn, machine learning",
    "author": null,
    "author_email": "Claudio Salvatore Arcidiacono <author@email.com>",
    "download_url": "https://files.pythonhosted.org/packages/30/7c/ef1d39acc703aca565da0a55e884442ddbc343b0dcf30beb3918b71e82e8/felimination-0.3.0.tar.gz",
    "platform": null,
    "description": "# Felimination\n\nThis library contains some useful scikit-learn compatible classes for feature selection.\n\n## Features\n\n- [Recursive Feature Elimination with Cross Validation using Permutation Importance](reference/RFE.md#felimination.rfe.PermutationImportanceRFECV)\n- [Hybrid Genetic Algorithms x Feature Importance selection](/reference/genetic_algorithms/#felimination.ga.HybridImportanceGACVFeatureSelector)\n\n## Requirements\n\n- Python 3.7+\n- NumPy\n- Scikit-learn\n- Pandas\n\n## Installation\n\nIn a terminal shell run the following command\n```\npip install felimination\n```\n\n## Usage\n\n### Recursive Feature Elimination\nIn this section it will be illustrated how to use the `PermutationImportanceRFECV` class.\n\n```python\nfrom felimination.rfe import PermutationImportanceRFECV\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.datasets import make_classification\nimport numpy as np\n\n\nX, y = make_classification(\n    n_samples=1000,\n    n_features=20,\n    n_informative=6,\n    n_redundant=10,\n    n_clusters_per_class=1,\n    random_state=42,\n)\n\nselector = PermutationImportanceRFECV(LogisticRegression(), step=0.3)\n\nselector.fit(X, y)\n\nselector.support_\n# array([False, False, False, False, False, False, False, False, False,\n#        False, False,  True, False, False, False, False, False, False,\n#        False, False])\n\nselector.ranking_\n# array([9, 3, 8, 9, 7, 8, 5, 6, 9, 6, 8, 1, 9, 7, 8, 9, 9, 2, 4, 7])\nselector.plot()\n```\n![RFECV fit plot](./docs/assets/rfecv_fit_plot.png)\n\nIt looks like `5` is a good number of features, we can set the number of features to select to 5 without need of retraining\n\n```python\nselector.set_n_features_to_select(5)\nselector.support_\n# array([False,  True, False, False, False, False,  True, False, False,\n#        False, False,  True, False, False, False, False, False,  True,\n#         True, False])\n```\n\n## Genetic Algorithms\nIn this section it will be illustrated how to use the `HybridImportanceGACVFeatureSelector` class.\n\n```python\nfrom felimination.ga import HybridImportanceGACVFeatureSelector\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.datasets import make_classification\nimport numpy as np\n\n# Create dummy dataset\nX, y = make_classification(\n    n_samples=1000,\n    n_features=20,\n    n_informative=6,\n    n_redundant=10,\n    n_clusters_per_class=1,\n    random_state=42,\n)\n\n# Initialize selector\nselector = HybridImportanceGACVFeatureSelector(\n    LogisticRegression(random_state=42),\n    random_state=42,\n    pool_size=5,\n    patience=5\n)\n\n# Run optimisation\nselector.fit(X, y)\n\n# Show selected features\nselector.support_\n#array([False,  True, False,  True,  True, False, False, False,  True,\n#       False, False, False,  True,  True,  True,  True, False,  True,\n#        True, False])\n\n# Show best solution\nselector.best_solution_\n# {'features': [1, 12, 13, 8, 17, 15, 18, 4, 3, 14],\n#  'train_scores_per_fold': [0.88625, 0.89, 0.8825, 0.8925, 0.88625],\n#  'test_scores_per_fold': [0.895, 0.885, 0.885, 0.89, 0.89],\n#  'cv_importances': [array([[ 1.09135972,  1.13502636,  1.12100231,  0.38285736,  0.28944072,\n#            0.04688614,  0.44259813,  0.09832365,  0.10190421, -0.48101593]]),\n#   array([[ 1.17345812,  1.29375208,  1.2065342 ,  0.40418709,  0.41839714,\n#            0.00447802,  0.466717  ,  0.21733829, -0.00842075, -0.50078996]]),\n#   array([[ 1.15416104,  1.18458564,  1.18083266,  0.37071253,  0.22842685,\n#            0.1087814 ,  0.44446793,  0.12740545,  0.00621562, -0.54064287]]),\n#   array([[ 1.26011643,  1.36996058,  1.30481424,  0.48183549,  0.40589887,\n#           -0.01849671,  0.45606913,  0.18330816,  0.03667055, -0.50869557]]),\n#   array([[ 1.18227123,  1.28988253,  1.2496398 ,  0.50754295,  0.38942303,\n#           -0.01725074,  0.4481891 ,  0.19472963,  0.10034316, -0.50131192]])],\n#  'mean_train_score': 0.8875,\n#  'mean_test_score': 0.889,\n#  'mean_cv_importances': array([ 1.17227331,  1.25464144,  1.21256464,  0.42942709,  0.34631732,\n#          0.02487962,  0.45160826,  0.16422104,  0.04734256, -0.50649125])}\n\n# Show progress as a plot\nselector.plot()\n```\n![GA fit plot](./docs/assets/ga_fit_plot.png)\n\nLooks like that the optimisation process converged after 2 steps, since the best score did not improve for 5(=`patience`) consecutive steps, the optimisation process stopped early.\n\n## License\n\nThis project is licensed under the BSD 3-Clause License - see the LICENSE.md file for details\n\n## Acknowledgments\n\n- [scikit-learn](https://scikit-learn.org/)\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "This library contains some useful scikit-learn compatible classes for feature selection.",
    "version": "0.3.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/ClaudioSalvatoreArcidiacono/felimination/issues",
        "Documentation": "https://claudiosalvatorearcidiacono.github.io/felimination/",
        "Homepage": "https://github.com/ClaudioSalvatoreArcidiacono/felimination"
    },
    "split_keywords": [
        "feature selection",
        " scikit-learn",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "83c716f3fb3014ab30d33f42df365bfc2ec6577fe600cc6a632022aab2ae6c1c",
                "md5": "3a8f39f98b2eedf09bb761c7cc00d7c1",
                "sha256": "4fb6127038aabacbcc183143e51b0f6ec594e2122f946bcae88583535ae6a576"
            },
            "downloads": -1,
            "filename": "felimination-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a8f39f98b2eedf09bb761c7cc00d7c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 26490,
            "upload_time": "2024-07-31T08:05:24",
            "upload_time_iso_8601": "2024-07-31T08:05:24.808671Z",
            "url": "https://files.pythonhosted.org/packages/83/c7/16f3fb3014ab30d33f42df365bfc2ec6577fe600cc6a632022aab2ae6c1c/felimination-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "307cef1d39acc703aca565da0a55e884442ddbc343b0dcf30beb3918b71e82e8",
                "md5": "ea66dcfbda90c7c8e03371d5e1a756af",
                "sha256": "43e53d946e1a8171e56b1ce50cbca7e885714d3a87bf667015ed57eca431f42e"
            },
            "downloads": -1,
            "filename": "felimination-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ea66dcfbda90c7c8e03371d5e1a756af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 23659,
            "upload_time": "2024-07-31T08:05:25",
            "upload_time_iso_8601": "2024-07-31T08:05:25.952044Z",
            "url": "https://files.pythonhosted.org/packages/30/7c/ef1d39acc703aca565da0a55e884442ddbc343b0dcf30beb3918b71e82e8/felimination-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-31 08:05:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ClaudioSalvatoreArcidiacono",
    "github_project": "felimination",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "felimination"
}
        
Elapsed time: 1.32320s