FSRLearning


NameFSRLearning JSON
Version 1.0.7 PyPI version JSON
download
home_pagehttps://github.com/blefo/FSRLearning
SummaryThe first feature selection method based on reinforcement learning - Python library available on pip for a fast deployment.
upload_time2024-06-17 22:25:48
maintainerNone
docs_urlNone
authorBaptiste Lefort
requires_pythonNone
licenseMIT
keywords feature selection reinforcement learning large dataset ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FSRLeaning - Python Library

[![Downloads](https://static.pepy.tech/badge/FSRLearning)](https://pepy.tech/project/FSRLearning)
[![Downloads](https://static.pepy.tech/badge/FSRLearning/month)](https://pepy.tech/project/FSRLearning)

FSRLeaning is a Python library for feature selection using reinforcement learning. It's designed to be easy to use and efficient, particularly for selecting the most relevant features from a very large set.

## Installation

Install FSRLearning using pip:

```bash
pip install FSRLearning
```

## Example usage

### Data Pre-processing

#### The Dataset

In this example, we're using the Australian credit approval dataset. It has 14 features that have been intentionally anonymized. The goal is to predict whether the label is 0 or 1. We're using this dataset to demonstrate how to use the library, but the model can work with any dataset. You can find more details about the dataset [here](https://archive.ics.uci.edu/dataset/143/statlog+australian+credit+approval).

#### The process

The first step is a pre-processing of the data. You need to give as input to the method for feature selection a X and y pandas DataFrame. X is the dataset with all the features that we want to evaluate and y the label to be predicted. **It is highly recommended to create a mapping between features and a list of number.** For example each feature is associated with a number. Here is an example of the data pre-processing step on a data set with 14 features including 1 label.
```python
import pandas as pd

# Get the pandas DataFrame
australian_data = pd.read_csv('australian_data.csv', header=None)

# Get the dataset with the features
X = australian_data.drop(14, axis=1)

# Get the dataset with the label values
y = australian_data[14]
```

After this step we can simply run a feature selection and ranking process that maximises a metric. 

```python
from FSRLearning import FeatureSelectorRL

# Create the object of feature selection with RL
fsrl_obj = FeatureSelectorRL(14, nb_iter=200)

# Returns the results of the selection and the ranking
results = fsrl_obj.fit_predict(X, y)
results
```

The feature_Selector_RL has several parameters that can be tuned. Here is all of them and the values that they can take.

- feature_number (integer) : number of features in the DataFrame X

- feature_structure (dictionary, optional) : dictionary for the graph implementation
- eps (float [0; 1], optional) : probability of choosing a random next state, 0 is an only greedy algorithm and 1 only random
- alpha (float [0; 1], optional): control the rate of updates, 0 is a very not updating state and 1 a very updated
- gamma (float [0, 1], optional): factor of moderation of the observation of the next state, 0 is a shortsighted condition and 1 it exhibits farsighted behavior
- nb_iter (int, optional): number of sequences to go through the graph
- starting_state ("empty" or "random", optional) : if "empty" the algorithm starts from the empty state and if "random" the algorithm starts from a random state in the graph 

The output of the selection process is a 5-tuple object.

- Index of the features that have been sorted

- Number of times that each feature has been chosen
- Mean reward brought by each feature
- Ranking of the features from the less important to the most important
- Number of states visited


## Existing methods

- Compare the performance of the FSRLearning library with RFE from Sickit-Learn :

```python
fsrl_obj.compare_with_benchmark(X, y, results)
```
Returns some comparisons and plot a graph with the metric for each set of features selected. It is useful for parameters tuning. 

- Get the evolution of the number of the visited states for the first time and the already visited states :

```python
fsrl_obj.get_plot_ratio_exploration()
```
Returns a plot. It is useful to get an overview of how the graph is browse and to tune the epsilon parameter (exploration parameter).

- Get an overview of the relative impact of each feature on the model :

```python
fsrl_obj.get_feature_strengh(results)
```

Returns a bar plot.

- Get an overview of the action of the stop conditions :

```python
fsrl_obj.get_depth_of_visited_states()
```

Returns a plot. It is useful to see how deep the Markovian Decision Process goes in the graph. 

## Your contribution is welcomed !

- Automatise the data processing step and generalize the input data format and type
- Distribute the computation of each reward for making the algorithm faster
- Add more vizualization and feedback methods

## References

This library has been implemented with the help of these two articles :
- Sali Rasoul, Sodiq Adewole and Alphonse Akakpo, FEATURE SELECTION USING REINFORCEMENT LEARNING (2021)
- Seyed Mehdin Hazrati Fard, Ali Hamzeh and Sattar Hashemi, USING REINFORCEMENT LEARNING TO FIND AN OPTIMAL SET OF FEATURES (2013)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/blefo/FSRLearning",
    "name": "FSRLearning",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "feature, selection, reinforcement learning, large dataset, ai",
    "author": "Baptiste Lefort",
    "author_email": "lefort.baptiste@icloud.com",
    "download_url": "https://files.pythonhosted.org/packages/2d/04/633d5ce611f2d6f96b07fde08c45b25eba9e4deab43f24036a4368805518/fsrlearning-1.0.7.tar.gz",
    "platform": null,
    "description": "# FSRLeaning - Python Library\n\n[![Downloads](https://static.pepy.tech/badge/FSRLearning)](https://pepy.tech/project/FSRLearning)\n[![Downloads](https://static.pepy.tech/badge/FSRLearning/month)](https://pepy.tech/project/FSRLearning)\n\nFSRLeaning is a Python library for feature selection using reinforcement learning. It's designed to be easy to use and efficient, particularly for selecting the most relevant features from a very large set.\n\n## Installation\n\nInstall FSRLearning using pip:\n\n```bash\npip install FSRLearning\n```\n\n## Example usage\n\n### Data Pre-processing\n\n#### The Dataset\n\nIn this example, we're using the Australian credit approval dataset. It has 14 features that have been intentionally anonymized. The goal is to predict whether the label is 0 or 1. We're using this dataset to demonstrate how to use the library, but the model can work with any dataset. You can find more details about the dataset [here](https://archive.ics.uci.edu/dataset/143/statlog+australian+credit+approval).\n\n#### The process\n\nThe first step is a pre-processing of the data. You need to give as input to the method for feature selection a X and y pandas DataFrame. X is the dataset with all the features that we want to evaluate and y the label to be predicted. **It is highly recommended to create a mapping between features and a list of number.** For example each feature is associated with a number. Here is an example of the data pre-processing step on a data set with 14 features including 1 label.\n```python\nimport pandas as pd\n\n# Get the pandas DataFrame\naustralian_data = pd.read_csv('australian_data.csv', header=None)\n\n# Get the dataset with the features\nX = australian_data.drop(14, axis=1)\n\n# Get the dataset with the label values\ny = australian_data[14]\n```\n\nAfter this step we can simply run a feature selection and ranking process that maximises a metric. \n\n```python\nfrom FSRLearning import FeatureSelectorRL\n\n# Create the object of feature selection with RL\nfsrl_obj = FeatureSelectorRL(14, nb_iter=200)\n\n# Returns the results of the selection and the ranking\nresults = fsrl_obj.fit_predict(X, y)\nresults\n```\n\nThe feature_Selector_RL has several parameters that can be tuned. Here is all of them and the values that they can take.\n\n- feature_number (integer)\u00a0: number of features in the DataFrame X\n\n- feature_structure (dictionary, optional)\u00a0: dictionary for the graph implementation\n- eps (float [0; 1], optional)\u00a0: probability of choosing a random next state, 0 is an only greedy algorithm and 1 only random\n- alpha (float [0; 1], optional): control the rate of updates, 0 is a very not updating state and 1 a very updated\n- gamma (float [0, 1], optional): factor of moderation of the observation of the next state, 0 is a shortsighted condition and 1 it exhibits farsighted behavior\n- nb_iter (int, optional): number of sequences to go through the graph\n- starting_state (\"empty\" or \"random\", optional)\u00a0: if \"empty\" the algorithm starts from the empty state and if \"random\" the algorithm starts from a random state in the graph \n\nThe output of the selection process is a 5-tuple object.\n\n- Index of the features that have been sorted\n\n- Number of times that each feature has been chosen\n- Mean reward brought by each feature\n- Ranking of the features from the less important to the most important\n- Number of states visited\n\n\n## Existing methods\n\n- Compare the performance of the FSRLearning library with RFE from Sickit-Learn :\n\n```python\nfsrl_obj.compare_with_benchmark(X, y, results)\n```\nReturns some comparisons and plot a graph with the metric for each set of features selected. It is useful for parameters tuning. \n\n- Get the evolution of the number of the visited states for the first time and the already visited states :\n\n```python\nfsrl_obj.get_plot_ratio_exploration()\n```\nReturns a plot. It is useful to get an overview of how the graph is browse and to tune the epsilon parameter (exploration parameter).\n\n- Get an overview of the relative impact of each feature on the model :\n\n```python\nfsrl_obj.get_feature_strengh(results)\n```\n\nReturns a bar plot.\n\n- Get an overview of the action of the stop conditions :\n\n```python\nfsrl_obj.get_depth_of_visited_states()\n```\n\nReturns a plot. It is useful to see how deep the Markovian Decision Process goes in the graph. \n\n## Your contribution is welcomed !\n\n- Automatise the data processing step and generalize the input data format and type\n- Distribute the computation of each reward for making the algorithm faster\n- Add more vizualization and feedback methods\n\n## References\n\nThis library has been implemented with the help of these two articles\u00a0:\n- Sali Rasoul, Sodiq Adewole and Alphonse Akakpo, FEATURE SELECTION USING REINFORCEMENT LEARNING (2021)\n- Seyed Mehdin Hazrati Fard, Ali Hamzeh and Sattar Hashemi, USING REINFORCEMENT LEARNING TO FIND AN OPTIMAL SET OF FEATURES (2013)\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "The first feature selection method based on reinforcement learning - Python library available on pip for a fast deployment.",
    "version": "1.0.7",
    "project_urls": {
        "Homepage": "https://github.com/blefo/FSRLearning"
    },
    "split_keywords": [
        "feature",
        " selection",
        " reinforcement learning",
        " large dataset",
        " ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "29f7f42240f7a71d681080707fd2847e4ea50d14cd17d3b96ba1db419c4c4f5b",
                "md5": "9df39ab88953595b321ca6565c9dfecf",
                "sha256": "de8e60c8e5146ab9355d719324bab2d177a7fee804da8a7638847356e1f3e228"
            },
            "downloads": -1,
            "filename": "FSRLearning-1.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9df39ab88953595b321ca6565c9dfecf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12081,
            "upload_time": "2024-06-17T22:25:47",
            "upload_time_iso_8601": "2024-06-17T22:25:47.166491Z",
            "url": "https://files.pythonhosted.org/packages/29/f7/f42240f7a71d681080707fd2847e4ea50d14cd17d3b96ba1db419c4c4f5b/FSRLearning-1.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2d04633d5ce611f2d6f96b07fde08c45b25eba9e4deab43f24036a4368805518",
                "md5": "544133ba393d27afa7e524549d7519c3",
                "sha256": "e3b975a2fe0513c402babe64393e67666418801ecd78019d432c12e0dc4d4b6b"
            },
            "downloads": -1,
            "filename": "fsrlearning-1.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "544133ba393d27afa7e524549d7519c3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10301,
            "upload_time": "2024-06-17T22:25:48",
            "upload_time_iso_8601": "2024-06-17T22:25:48.651414Z",
            "url": "https://files.pythonhosted.org/packages/2d/04/633d5ce611f2d6f96b07fde08c45b25eba9e4deab43f24036a4368805518/fsrlearning-1.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-17 22:25:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "blefo",
    "github_project": "FSRLearning",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "fsrlearning"
}
        
Elapsed time: 0.28117s