mafese


Namemafese JSON
Version 0.1.9 PyPI version JSON
download
home_pagehttps://github.com/thieu1995/mafese
SummaryMAFESE: Metaheuristic Algorithm for Feature Selection - An Open Source Python Library
upload_time2023-08-07 11:14:05
maintainer
docs_urlNone
authorThieu
requires_python>=3.7
licenseGPLv3
keywords engineering optimization problems mathematical optimization feature selection classification problem feature selector dimensionality reduction subset selection wrapper methods embedded methods mutual information correlation-based feature selection recursive feature selection principal component analysis pca lasso regularization ridge regularization genetic algorithm (ga) particle swarm optimization (pso) ant colony optimization (aco) differential evolution (de) simulated annealing grey wolf optimizer (gwo) whale optimization algorithm (woa) confusion matrix recall precision accuracy k-nearest neighbors random forest support vector machine pearson correlation coefficient (pcc) spearman correlation coefficient (scc) relief relief-f multi-objectives optimization problems stochastic optimization global optimization convergence analysis search space exploration local search computational intelligence robust optimization performance analysis intelligent optimization simulations
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<p align="center">
<img style="max-width:100%;" 
src="https://thieu1995.github.io/post/2023-08/mafese-02.png" 
alt="MAFESE"/>
</p>


---

[![GitHub release](https://img.shields.io/badge/release-0.1.9-yellow.svg)](https://github.com/thieu1995/mafese/releases)
[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/mafese) 
[![PyPI version](https://badge.fury.io/py/mafese.svg)](https://badge.fury.io/py/mafese)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mafese.svg)
![PyPI - Status](https://img.shields.io/pypi/status/mafese.svg)
![PyPI - Downloads](https://img.shields.io/pypi/dm/mafese.svg)
[![Downloads](https://pepy.tech/badge/mafese)](https://pepy.tech/project/mafese)
[![Tests & Publishes to PyPI](https://github.com/thieu1995/mafese/actions/workflows/publish-package.yaml/badge.svg)](https://github.com/thieu1995/mafese/actions/workflows/publish-package.yaml)
![GitHub Release Date](https://img.shields.io/github/release-date/thieu1995/mafese.svg)
[![Documentation Status](https://readthedocs.org/projects/mafese/badge/?version=latest)](https://mafese.readthedocs.io/en/latest/?badge=latest)
[![Chat](https://img.shields.io/badge/Chat-on%20Telegram-blue)](https://t.me/+fRVCJGuGJg1mNDg1)
![GitHub contributors](https://img.shields.io/github/contributors/thieu1995/mafese.svg)
[![GitTutorial](https://img.shields.io/badge/PR-Welcome-%23FF8300.svg?)](https://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project)
[![DOI](https://zenodo.org/badge/545209353.svg)](https://doi.org/10.5281/zenodo.7969042)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)


MAFESE (Metaheuristic Algorithms for FEature SElection) is the biggest python library for feature selection (FS) 
problem using meta-heuristic algorithms.

* **Free software:** GNU General Public License (GPL) V3 license
* **Total Wrapper-based (Metaheuristic Algorithms)**: > 200 methods
* **Total Filter-based (Statistical-based)**: > 15 methods
* **Total Embedded-based (Tree and Lasso)**: > 10 methods
* **Total Unsupervised-based**: >= 4 methods
* **Total datasets**: >= 30 (47 classifications and 7 regressions)
* **Total performance metrics**: >= 61 (45 regressions and 16 classifications)
* **Total objective functions (as fitness functions)**: >= 61 (45 regressions and 16 classifications)
* **Documentation:** https://mafese.readthedocs.io/en/latest/
* **Python versions:** >= 3.7.x
* **Dependencies:** numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido


# Installation

* Install the [current PyPI release](https://pypi.python.org/pypi/mafese):
```sh 
$ pip install mafese==0.1.9
```

* Install directly from source code
```sh 
$ git clone https://github.com/thieu1995/mafese.git
$ cd mafese
$ python setup.py install
```

* In case, you want to install the development version from Github:
```sh 
$ pip install git+https://github.com/thieu1995/mafese 
```

After installation, you can import MAFESE as any other Python module:

```sh
$ python
>>> import mafese
>>> mafese.__version__
```


### Lib's structure

```code 
docs
examples
mafese
    data/
        cls/
            aggregation.csv
            Arrhythmia.csv
            ...
        reg/
            boston-housing.csv
            diabetes.csv
            ...
    wrapper/
        mha.py
        recursive.py
        sequential.py
    embedded/
        lasso.py
        tree.py
    filter.py
    unsupervised.py
    utils/
        correlation.py
        data_loader.py
        encoder.py
        estimator.py
        mealpy_util.py
        transfer.py
        validator.py
    __init__.py
    selector.py
README.md
setup.py
```

### Examples

Let's go through some examples.

#### 1. First, load dataset. You can use the available datasets from Mafese:

```python 
# Load available dataset from MAFESE
from mafese import get_dataset

# Try unknown data
get_dataset("unknown")
# Enter: 1      -> This wil list all of avaialble dataset

data = get_dataset("Arrhythmia")
```

* Or you can load your own dataset 

```python
import pandas as pd
from mafese import Data

# load X and y
# NOTE mafese accepts numpy arrays only, hence the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y)
```

#### 2. Next, split dataset into train and test set

```python 
data.split_train_test(test_size=0.2, inplace=True)
print(data.X_train[:2].shape)
print(data.y_train[:2].shape)
```

**You should confirm that your dataset is scaled and normalized for some problem or estimator such as Neural Network**


#### 3. Next, choose the Selector that you want to use by first import them:

```python 
## First way, we recommended 
from mafese import UnsupervisedSelector, FilterSelector, LassoSelector, TreeSelector
from mafese import SequentialSelector, RecursiveSelector, MhaSelector, MultiMhaSelector

## Second way
from mafese.unsupervised import UnsupervisedSelector
from mafese.filter import FilterSelector
from mafese.embedded.lasso import LassoSelector
from mafese.embedded.tree import TreeSelector
from mafese.wrapper.sequential import SequentialSelector
from mafese.wrapper.recursive import RecursiveSelector
from mafese.wrapper.mha import MhaSelector, MultiMhaSelector
```

#### 4. Next, create an instance of Selector class you want to use:

```python 
feat_selector = UnsupervisedSelector(problem='classification', method='DR', n_features=5)

feat_selector = FilterSelector(problem='classification', method='SPEARMAN', n_features=5)

feat_selector = LassoSelector(problem="classification", estimator="lasso", estimator_paras={"alpha": 0.1})

feat_selector = TreeSelector(problem="classification", estimator="tree")

feat_selector = SequentialSelector(problem="classification", estimator="knn", n_features=3, direction="forward")

feat_selector = RecursiveSelector(problem="classification", estimator="rf", n_features=5)

feat_selector = MhaSelector(problem="classification", estimator="knn",
                            optimizer="BaseGA", optimizer_paras=None,
                            transfer_func="vstf_01", obj_name="AS")

list_optimizers = ("OriginalWOA", "OriginalGWO", "OriginalTLO", "OriginalGSKA")
list_paras = [{"epoch": 10, "pop_size": 30}, ]*4
feat_selector = MultiMhaSelector(problem="classification", estimator="knn",
                            list_optimizers=list_optimizers, list_optimizer_paras=list_paras,
                            transfer_func="vstf_01", obj_name="AS")
```

#### 5. Fit the model to X_train and y_train

```python 
feat_selector.fit(data.X_train, data.y_train)
```

#### 6. Get the information

```python 
# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)

# check the index of selected features
print(feat_selector.selected_feature_indexes)
```

#### 7. Call transform() on the X that you want to filter it down to selected features

```python 
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)
```

#### 8.You can build your own evaluating method or use our method.

**If you use our method, don't transform the data.**

i) You can use difference estimator than the one used in feature selection process 
```python 
feat_selector.evaluate(estimator="svm", data=data, metrics=["AS", "PS", "RS"])

## Here, we pass the data that was loaded above. So it contains both train and test set. So, the results will look 
like this: 
{'AS_train': 0.77176, 'PS_train': 0.54177, 'RS_train': 0.6205, 'AS_test': 0.72636, 'PS_test': 0.34628, 'RS_test': 0.52747}
```

ii) You can use the same estimator in feature selection process 
```python 
X_test, y_test = data.X_test, data.y_test
feat_selector.evaluate(estimator=None, data=data, metrics=["AS", "PS", "RS"])
```

1) Where do I find the supported metrics like above ["AS", "PS", "RS"]. What is that?
You can find it here: https://github.com/thieu1995/permetrics or use this 
```python 
from mafese import MhaSelector 

print(MhaSelector.SUPPORTED_REGRESSION_METRICS)
print(MhaSelector.SUPPORTED_CLASSIFICATION_METRICS)
```

3) How do I know my Selector support which estimator? which methods?
```python 
print(feat_selector.SUPPORT) 
```
Or you better read the document from: https://mafese.readthedocs.io/en/latest/

3) I got this type of error
```python 
raise ValueError("Existed at least one new label in y_pred.")
ValueError: Existed at least one new label in y_pred.
``` 
How to solve this?

+ This occurs only when you are working on a classification problem with a small dataset that has many classes. For 
  instance, the "Zoo" dataset contains only 101 samples, but it has 7 classes. If you split the dataset into a 
  training and testing set with a ratio of around 80% - 20%, there is a chance that one or more classes may appear 
  in the testing set but not in the training set. As a result, when you calculate the performance metrics, you may 
  encounter this error. You cannot predict or assign new data to a new label because you have no knowledge about the 
  new label. There are several solutions to this problem.

+ 1st: Use the SMOTE method to address imbalanced data and ensure that all classes have the same number of samples.

```python 
from imblearn.over_sampling import SMOTE
import pandas as pd
from mafese import Data

dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]

X_new, y_new = SMOTE().fit_resample(X, y)
data = Data(X_new, y_new)
```

+ 2nd: Use different random_state numbers in split_train_test() function.
```python
import pandas as pd 
from mafese import Data 

dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y)
data.split_train_test(test_size=0.2, random_state=10)   # Try different random_state value 
```


For more usage examples please look at [examples](/examples) folder.


# Support (questions, problems)

### Official Links 

* Official source code repo: https://github.com/thieu1995/mafese
* Official document: https://mafese.readthedocs.io/
* Download releases: https://pypi.org/project/mafese/
* Issue tracker: https://github.com/thieu1995/mafese/issues
* Notable changes log: https://github.com/thieu1995/mafese/blob/master/ChangeLog.md
* Examples with different mealpy version: https://github.com/thieu1995/mafese/blob/master/examples.md
* Official chat group: https://t.me/+fRVCJGuGJg1mNDg1

* This project also related to our another projects which are "optimization" and "machine learning", check it here:
    * https://github.com/thieu1995/mealpy
    * https://github.com/thieu1995/metaheuristics
    * https://github.com/thieu1995/opfunu
    * https://github.com/thieu1995/enoppy
    * https://github.com/thieu1995/permetrics
    * https://github.com/thieu1995/MetaCluster
    * https://github.com/thieu1995/pfevaluator
    * https://github.com/aiir-team

### Citation Request 

Please include these citations if you plan to use this library:

```code 
@software{nguyen_van_thieu_2023_7969043,
  author       = {Nguyen Van Thieu, Ngoc Hung Nguyen, Ali Asghar Heidari},
  title        = {Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python},
  month        = may,
  year         = 2023,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.7969042},
  url          = {https://github.com/thieu1995/mafese}
}

@article{van2023mealpy,
  title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},
  author={Van Thieu, Nguyen and Mirjalili, Seyedali},
  journal={Journal of Systems Architecture},
  year={2023},
  publisher={Elsevier},
  doi={10.1016/j.sysarc.2023.102871}
}
```



### Related Documents

1. https://neptune.ai/blog/feature-selection-methods
2. https://www.blog.trainindata.com/feature-selection-machine-learning-with-python/
3. https://github.com/LBBSoft/FeatureSelect
4. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2754-0
5. https://github.com/scikit-learn-contrib/boruta_py
6. https://elki-project.github.io/
7. https://sci2s.ugr.es/keel/index.php
8. https://archive.ics.uci.edu/datasets
9. https://python-charts.com/distribution/box-plot-plotly/
10. https://plotly.com/python/box-plots/?_ga=2.50659434.2126348639.1688086416-114197406.1688086416#box-plot-styling-mean--standard-deviation

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thieu1995/mafese",
    "name": "mafese",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "engineering optimization problems,mathematical optimization,feature selection,classification problem,feature selector,dimensionality reduction,subset selection,wrapper methods,embedded methods,mutual information,correlation-based feature selection,recursive feature selection,principal component analysis,PCA,lasso regularization,ridge regularization,Genetic algorithm (GA),Particle swarm optimization (PSO),Ant colony optimization (ACO),Differential evolution (DE),Simulated annealing,Grey wolf optimizer (GWO),Whale Optimization Algorithm (WOA),confusion matrix,recall,precision,accuracy,K-Nearest Neighbors,random forest,support vector machine,pearson correlation coefficient (PCC),spearman correlation coefficient (SCC),relief,relief-f,multi-objectives optimization problems,Stochastic optimization,Global optimization,Convergence analysis,Search space exploration,Local search,Computational intelligence,Robust optimization,Performance analysis,Intelligent optimization,Simulations",
    "author": "Thieu",
    "author_email": "nguyenthieu2102@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/95/9c/453410815079421951e337090639b615280d556923cfe6139558436e3476/mafese-0.1.9.tar.gz",
    "platform": null,
    "description": "\n<p align=\"center\">\n<img style=\"max-width:100%;\" \nsrc=\"https://thieu1995.github.io/post/2023-08/mafese-02.png\" \nalt=\"MAFESE\"/>\n</p>\n\n\n---\n\n[![GitHub release](https://img.shields.io/badge/release-0.1.9-yellow.svg)](https://github.com/thieu1995/mafese/releases)\n[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/mafese) \n[![PyPI version](https://badge.fury.io/py/mafese.svg)](https://badge.fury.io/py/mafese)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mafese.svg)\n![PyPI - Status](https://img.shields.io/pypi/status/mafese.svg)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/mafese.svg)\n[![Downloads](https://pepy.tech/badge/mafese)](https://pepy.tech/project/mafese)\n[![Tests & Publishes to PyPI](https://github.com/thieu1995/mafese/actions/workflows/publish-package.yaml/badge.svg)](https://github.com/thieu1995/mafese/actions/workflows/publish-package.yaml)\n![GitHub Release Date](https://img.shields.io/github/release-date/thieu1995/mafese.svg)\n[![Documentation Status](https://readthedocs.org/projects/mafese/badge/?version=latest)](https://mafese.readthedocs.io/en/latest/?badge=latest)\n[![Chat](https://img.shields.io/badge/Chat-on%20Telegram-blue)](https://t.me/+fRVCJGuGJg1mNDg1)\n![GitHub contributors](https://img.shields.io/github/contributors/thieu1995/mafese.svg)\n[![GitTutorial](https://img.shields.io/badge/PR-Welcome-%23FF8300.svg?)](https://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project)\n[![DOI](https://zenodo.org/badge/545209353.svg)](https://doi.org/10.5281/zenodo.7969042)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n\n\nMAFESE (Metaheuristic Algorithms for FEature SElection) is the biggest python library for feature selection (FS) \nproblem using meta-heuristic algorithms.\n\n* **Free software:** GNU General Public License (GPL) V3 license\n* **Total Wrapper-based (Metaheuristic Algorithms)**: > 200 methods\n* **Total Filter-based (Statistical-based)**: > 15 methods\n* **Total Embedded-based (Tree and Lasso)**: > 10 methods\n* **Total Unsupervised-based**: >= 4 methods\n* **Total datasets**: >= 30 (47 classifications and 7 regressions)\n* **Total performance metrics**: >= 61 (45 regressions and 16 classifications)\n* **Total objective functions (as fitness functions)**: >= 61 (45 regressions and 16 classifications)\n* **Documentation:** https://mafese.readthedocs.io/en/latest/\n* **Python versions:** >= 3.7.x\n* **Dependencies:** numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido\n\n\n# Installation\n\n* Install the [current PyPI release](https://pypi.python.org/pypi/mafese):\n```sh \n$ pip install mafese==0.1.9\n```\n\n* Install directly from source code\n```sh \n$ git clone https://github.com/thieu1995/mafese.git\n$ cd mafese\n$ python setup.py install\n```\n\n* In case, you want to install the development version from Github:\n```sh \n$ pip install git+https://github.com/thieu1995/mafese \n```\n\nAfter installation, you can import MAFESE as any other Python module:\n\n```sh\n$ python\n>>> import mafese\n>>> mafese.__version__\n```\n\n\n### Lib's structure\n\n```code \ndocs\nexamples\nmafese\n    data/\n        cls/\n            aggregation.csv\n            Arrhythmia.csv\n            ...\n        reg/\n            boston-housing.csv\n            diabetes.csv\n            ...\n    wrapper/\n        mha.py\n        recursive.py\n        sequential.py\n    embedded/\n        lasso.py\n        tree.py\n    filter.py\n    unsupervised.py\n    utils/\n        correlation.py\n        data_loader.py\n        encoder.py\n        estimator.py\n        mealpy_util.py\n        transfer.py\n        validator.py\n    __init__.py\n    selector.py\nREADME.md\nsetup.py\n```\n\n### Examples\n\nLet's go through some examples.\n\n#### 1. First, load dataset. You can use the available datasets from Mafese:\n\n```python \n# Load available dataset from MAFESE\nfrom mafese import get_dataset\n\n# Try unknown data\nget_dataset(\"unknown\")\n# Enter: 1      -> This wil list all of avaialble dataset\n\ndata = get_dataset(\"Arrhythmia\")\n```\n\n* Or you can load your own dataset \n\n```python\nimport pandas as pd\nfrom mafese import Data\n\n# load X and y\n# NOTE mafese accepts numpy arrays only, hence the .values attribute\ndataset = pd.read_csv('examples/dataset.csv', index_col=0).values\nX, y = dataset[:, 0:-1], dataset[:, -1]\ndata = Data(X, y)\n```\n\n#### 2. Next, split dataset into train and test set\n\n```python \ndata.split_train_test(test_size=0.2, inplace=True)\nprint(data.X_train[:2].shape)\nprint(data.y_train[:2].shape)\n```\n\n**You should confirm that your dataset is scaled and normalized for some problem or estimator such as Neural Network**\n\n\n#### 3. Next, choose the Selector that you want to use by first import them:\n\n```python \n## First way, we recommended \nfrom mafese import UnsupervisedSelector, FilterSelector, LassoSelector, TreeSelector\nfrom mafese import SequentialSelector, RecursiveSelector, MhaSelector, MultiMhaSelector\n\n## Second way\nfrom mafese.unsupervised import UnsupervisedSelector\nfrom mafese.filter import FilterSelector\nfrom mafese.embedded.lasso import LassoSelector\nfrom mafese.embedded.tree import TreeSelector\nfrom mafese.wrapper.sequential import SequentialSelector\nfrom mafese.wrapper.recursive import RecursiveSelector\nfrom mafese.wrapper.mha import MhaSelector, MultiMhaSelector\n```\n\n#### 4. Next, create an instance of Selector class you want to use:\n\n```python \nfeat_selector = UnsupervisedSelector(problem='classification', method='DR', n_features=5)\n\nfeat_selector = FilterSelector(problem='classification', method='SPEARMAN', n_features=5)\n\nfeat_selector = LassoSelector(problem=\"classification\", estimator=\"lasso\", estimator_paras={\"alpha\": 0.1})\n\nfeat_selector = TreeSelector(problem=\"classification\", estimator=\"tree\")\n\nfeat_selector = SequentialSelector(problem=\"classification\", estimator=\"knn\", n_features=3, direction=\"forward\")\n\nfeat_selector = RecursiveSelector(problem=\"classification\", estimator=\"rf\", n_features=5)\n\nfeat_selector = MhaSelector(problem=\"classification\", estimator=\"knn\",\n                            optimizer=\"BaseGA\", optimizer_paras=None,\n                            transfer_func=\"vstf_01\", obj_name=\"AS\")\n\nlist_optimizers = (\"OriginalWOA\", \"OriginalGWO\", \"OriginalTLO\", \"OriginalGSKA\")\nlist_paras = [{\"epoch\": 10, \"pop_size\": 30}, ]*4\nfeat_selector = MultiMhaSelector(problem=\"classification\", estimator=\"knn\",\n                            list_optimizers=list_optimizers, list_optimizer_paras=list_paras,\n                            transfer_func=\"vstf_01\", obj_name=\"AS\")\n```\n\n#### 5. Fit the model to X_train and y_train\n\n```python \nfeat_selector.fit(data.X_train, data.y_train)\n```\n\n#### 6. Get the information\n\n```python \n# check selected features - True (or 1) is selected, False (or 0) is not selected\nprint(feat_selector.selected_feature_masks)\nprint(feat_selector.selected_feature_solution)\n\n# check the index of selected features\nprint(feat_selector.selected_feature_indexes)\n```\n\n#### 7. Call transform() on the X that you want to filter it down to selected features\n\n```python \nX_train_selected = feat_selector.transform(data.X_train)\nX_test_selected = feat_selector.transform(data.X_test)\n```\n\n#### 8.You can build your own evaluating method or use our method.\n\n**If you use our method, don't transform the data.**\n\ni) You can use difference estimator than the one used in feature selection process \n```python \nfeat_selector.evaluate(estimator=\"svm\", data=data, metrics=[\"AS\", \"PS\", \"RS\"])\n\n## Here, we pass the data that was loaded above. So it contains both train and test set. So, the results will look \nlike this: \n{'AS_train': 0.77176, 'PS_train': 0.54177, 'RS_train': 0.6205, 'AS_test': 0.72636, 'PS_test': 0.34628, 'RS_test': 0.52747}\n```\n\nii) You can use the same estimator in feature selection process \n```python \nX_test, y_test = data.X_test, data.y_test\nfeat_selector.evaluate(estimator=None, data=data, metrics=[\"AS\", \"PS\", \"RS\"])\n```\n\n1) Where do I find the supported metrics like above [\"AS\", \"PS\", \"RS\"]. What is that?\nYou can find it here: https://github.com/thieu1995/permetrics or use this \n```python \nfrom mafese import MhaSelector \n\nprint(MhaSelector.SUPPORTED_REGRESSION_METRICS)\nprint(MhaSelector.SUPPORTED_CLASSIFICATION_METRICS)\n```\n\n3) How do I know my Selector support which estimator? which methods?\n```python \nprint(feat_selector.SUPPORT) \n```\nOr you better read the document from: https://mafese.readthedocs.io/en/latest/\n\n3) I got this type of error\n```python \nraise ValueError(\"Existed at least one new label in y_pred.\")\nValueError: Existed at least one new label in y_pred.\n``` \nHow to solve this?\n\n+ This occurs only when you are working on a classification problem with a small dataset that has many classes. For \n  instance, the \"Zoo\" dataset contains only 101 samples, but it has 7 classes. If you split the dataset into a \n  training and testing set with a ratio of around 80% - 20%, there is a chance that one or more classes may appear \n  in the testing set but not in the training set. As a result, when you calculate the performance metrics, you may \n  encounter this error. You cannot predict or assign new data to a new label because you have no knowledge about the \n  new label. There are several solutions to this problem.\n\n+ 1st: Use the SMOTE method to address imbalanced data and ensure that all classes have the same number of samples.\n\n```python \nfrom imblearn.over_sampling import SMOTE\nimport pandas as pd\nfrom mafese import Data\n\ndataset = pd.read_csv('examples/dataset.csv', index_col=0).values\nX, y = dataset[:, 0:-1], dataset[:, -1]\n\nX_new, y_new = SMOTE().fit_resample(X, y)\ndata = Data(X_new, y_new)\n```\n\n+ 2nd: Use different random_state numbers in split_train_test() function.\n```python\nimport pandas as pd \nfrom mafese import Data \n\ndataset = pd.read_csv('examples/dataset.csv', index_col=0).values\nX, y = dataset[:, 0:-1], dataset[:, -1]\ndata = Data(X, y)\ndata.split_train_test(test_size=0.2, random_state=10)   # Try different random_state value \n```\n\n\nFor more usage examples please look at [examples](/examples) folder.\n\n\n# Support (questions, problems)\n\n### Official Links \n\n* Official source code repo: https://github.com/thieu1995/mafese\n* Official document: https://mafese.readthedocs.io/\n* Download releases: https://pypi.org/project/mafese/\n* Issue tracker: https://github.com/thieu1995/mafese/issues\n* Notable changes log: https://github.com/thieu1995/mafese/blob/master/ChangeLog.md\n* Examples with different mealpy version: https://github.com/thieu1995/mafese/blob/master/examples.md\n* Official chat group: https://t.me/+fRVCJGuGJg1mNDg1\n\n* This project also related to our another projects which are \"optimization\" and \"machine learning\", check it here:\n    * https://github.com/thieu1995/mealpy\n    * https://github.com/thieu1995/metaheuristics\n    * https://github.com/thieu1995/opfunu\n    * https://github.com/thieu1995/enoppy\n    * https://github.com/thieu1995/permetrics\n    * https://github.com/thieu1995/MetaCluster\n    * https://github.com/thieu1995/pfevaluator\n    * https://github.com/aiir-team\n\n### Citation Request \n\nPlease include these citations if you plan to use this library:\n\n```code \n@software{nguyen_van_thieu_2023_7969043,\n  author       = {Nguyen Van Thieu, Ngoc Hung Nguyen, Ali Asghar Heidari},\n  title        = {Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python},\n  month        = may,\n  year         = 2023,\n  publisher    = {Zenodo},\n  doi          = {10.5281/zenodo.7969042},\n  url          = {https://github.com/thieu1995/mafese}\n}\n\n@article{van2023mealpy,\n  title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},\n  author={Van Thieu, Nguyen and Mirjalili, Seyedali},\n  journal={Journal of Systems Architecture},\n  year={2023},\n  publisher={Elsevier},\n  doi={10.1016/j.sysarc.2023.102871}\n}\n```\n\n\n\n### Related Documents\n\n1. https://neptune.ai/blog/feature-selection-methods\n2. https://www.blog.trainindata.com/feature-selection-machine-learning-with-python/\n3. https://github.com/LBBSoft/FeatureSelect\n4. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2754-0\n5. https://github.com/scikit-learn-contrib/boruta_py\n6. https://elki-project.github.io/\n7. https://sci2s.ugr.es/keel/index.php\n8. https://archive.ics.uci.edu/datasets\n9. https://python-charts.com/distribution/box-plot-plotly/\n10. https://plotly.com/python/box-plots/?_ga=2.50659434.2126348639.1688086416-114197406.1688086416#box-plot-styling-mean--standard-deviation\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "MAFESE: Metaheuristic Algorithm for Feature Selection - An Open Source Python Library",
    "version": "0.1.9",
    "project_urls": {
        "Bug Tracker": "https://github.com/thieu1995/mafese/issues",
        "Change Log": "https://github.com/thieu1995/mafese/blob/master/ChangeLog.md",
        "Documentation": "https://mafese.readthedocs.io/",
        "Forum": "https://t.me/+fRVCJGuGJg1mNDg1",
        "Homepage": "https://github.com/thieu1995/mafese",
        "Source Code": "https://github.com/thieu1995/mafese"
    },
    "split_keywords": [
        "engineering optimization problems",
        "mathematical optimization",
        "feature selection",
        "classification problem",
        "feature selector",
        "dimensionality reduction",
        "subset selection",
        "wrapper methods",
        "embedded methods",
        "mutual information",
        "correlation-based feature selection",
        "recursive feature selection",
        "principal component analysis",
        "pca",
        "lasso regularization",
        "ridge regularization",
        "genetic algorithm (ga)",
        "particle swarm optimization (pso)",
        "ant colony optimization (aco)",
        "differential evolution (de)",
        "simulated annealing",
        "grey wolf optimizer (gwo)",
        "whale optimization algorithm (woa)",
        "confusion matrix",
        "recall",
        "precision",
        "accuracy",
        "k-nearest neighbors",
        "random forest",
        "support vector machine",
        "pearson correlation coefficient (pcc)",
        "spearman correlation coefficient (scc)",
        "relief",
        "relief-f",
        "multi-objectives optimization problems",
        "stochastic optimization",
        "global optimization",
        "convergence analysis",
        "search space exploration",
        "local search",
        "computational intelligence",
        "robust optimization",
        "performance analysis",
        "intelligent optimization",
        "simulations"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e38b061feba418f4ec0715d15c1a21d83e82d8634be98962224233031a5133d7",
                "md5": "dc47210851395def4009d4c973eab945",
                "sha256": "d190a724386a006114b827af100e9f1fff134bba916f42845cae79a70e3897da"
            },
            "downloads": -1,
            "filename": "mafese-0.1.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dc47210851395def4009d4c973eab945",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 4210733,
            "upload_time": "2023-08-07T11:14:03",
            "upload_time_iso_8601": "2023-08-07T11:14:03.285692Z",
            "url": "https://files.pythonhosted.org/packages/e3/8b/061feba418f4ec0715d15c1a21d83e82d8634be98962224233031a5133d7/mafese-0.1.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "959c453410815079421951e337090639b615280d556923cfe6139558436e3476",
                "md5": "e8f704263e870f7386c4e80caeeb92da",
                "sha256": "85d4de94f9a7d751ba5338130bf241832cb214aeb0c5209f1c38d44657780b78"
            },
            "downloads": -1,
            "filename": "mafese-0.1.9.tar.gz",
            "has_sig": false,
            "md5_digest": "e8f704263e870f7386c4e80caeeb92da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 4117708,
            "upload_time": "2023-08-07T11:14:05",
            "upload_time_iso_8601": "2023-08-07T11:14:05.828204Z",
            "url": "https://files.pythonhosted.org/packages/95/9c/453410815079421951e337090639b615280d556923cfe6139558436e3476/mafese-0.1.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-07 11:14:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thieu1995",
    "github_project": "mafese",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "mafese"
}
        
Elapsed time: 0.13326s