ensemble-binclass

Name	ensemble-binclass JSON
Version	1.0.10 JSON
	download
home_page	https://github.com/biocsuwb/ensemble-binclass
Summary	Feature selection and ensemble classification
upload_time	2025-02-05 22:45:46
maintainer	None
docs_url	None
author	Szymon Kolodziejski
requires_python	>=3.9
license	None
keywords	feature selection ensemble classification machine learning omnic data
VCS
bugtrack_url
requirements	category-encoders colorama contourpy coverage cycler fonttools gprofiler-official iniconfig jinja2 joblib kiwisolver markupsafe matplotlib mrmr-selection numpy packaging pandas patsy pillow pluggy polars pypandoc pyparsing pytest pytest-cov python-dateutil pytz relieff scikit-learn scipy seaborn six statsmodels threadpoolctl tqdm tzdata xgboost
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # The ensemble-binclass: an Python package for a feature selection and ensemble classification of molecular omic data
## Project developed by:
Szymon Kołodziejski and Polewko-Klim Aneta
## Description
ensemble-binclass is an python package for feature selection (FS) and ensemble classification of omic data (numeric data formats).
This tool is based on four feature filters (**U-test**, the minimum redundancy maximum relevance (**MRMR**) 
([Ding 2005](https://pubmed.ncbi.nlm.nih.gov/15852500/), the **Relief** ([Kira 1992](https://dl.acm.org/doi/abs/10.5555/1867135.1867155)), the least absolute shrinkage and selection operator (**LASSO**) ([Tibshirani 1996](https://www.jstor.org/stable/2346178))) and eight binary classifiers
(**Adaboost**, **gradient boosting**, **random forest** [Breiman 2001](https://link.springer.com/article/10.1023/A:1010933404324), k-nearest neighbor (**k-NN**), **decision tree**, **extra trees**, support vector machine (**SVM**) and extreme gradient boost (**XGBoost**)).
The ensemble-binclass uses the machine learning algorithms (ML) to evaluate the quality of feature sets (**k-fold stratified crossvalidation** or **resampling**). 
It can be applied to two-class problems.

**Ensemble-binclass is a Python package that allows the user to:**
* filter the most informative features by using up to four FS methods from numeric omics data generated from high-throughput molecular biology experiments;
* remove redundant features by building the Spearman correlation matrix that identifies highly correlated features;
* perform binary classification by using one of the eight classifiers:
* build predictive model using one of ensemble learning, namely, **stacking, boosting, and bagging** (see Figure 1);
* evaluate predictive model by using several metrics (the area under receiver operator curve  (AUC), accuracy (ACC), and the Matthews correlation coefficient(MCC), F1, ....);
* compare the predictive performance of models and the stability of selected feature sets for selected FS algorithms, individual classifiers, and combined classifiers;  
* establish the selected parameters for predictive models, such as the number of top N informative features;
* product plots to visualize the model results;
  
![Fig.1](https://github.com/biocsuwb/Images/blob/main/EnsembleLearning.jpg?raw=true) 

Fig.1 The ensemble learning strategy in ensemble-binclass.

## General workflow of the ensemble learning

![Fig.2](https://github.com/biocsuwb/Images/blob/main/WorkflowEnsembleBin.jpg?raw=true) 

Fig.2 Flowchart of ensemble learning.

## The package consists of the following components:

| Component          | Description                                                                        |
|--------------------|------------------------------------------------------------------------------------|   
| preprocessing      | load, prepare, normalization and remove collinear features                         |  
| featureSelection   | top k features thought FS methods: LASSO, ReliefF, MRMR and U-test                 |
| modelEvaluation    | cross-validation algorithms: hold out, k-fold, stratified k-fold and leave one out |
| classifier         | individual classifiers: ADA, GB, RF, kNN, DT, ET, SVM and XGB                      |
| ensemble           | ensemble classifiers: voting, stacking and boosting                                |
| performanceMetrics | model performance metrics: ACC, ROC AUC, F1-score, MCC and MSE                     | 

### Install the development version from GitHub:
To install this package, clone the repository and install with pip:
```shell
git clone https://github.com/biocsuwb/ensemble-binclass.git
```
### Required packages
pandas 2.2.0, scikit-learn 1.5.2, xgboost 2.0.3, numpy 1.26.3, ReliefF 0.1.2, scipy 1.12.0, mrmr-selection 0.2.8, matplotlib 3.8.2, pytest 8.1.1, pytest-cov 4.1.0,
seaborn 0.13.2, and grofiler-official 0.3.5

### Install required packages
To install required packages, run the following command:
```shell
pip install -r ./requirements.txt
```

### Install the development version from PyPi repository:

```python
pip install ensemble-binclass
```

## Import module

```python
from ensbinclass import preprocessing as pre
from ensbinclass import featureSelection as fs
from ensbinclass import classifier as cl
from ensbinclass import ensemble as ens
import pandas as pd
import numpy as np
```
## Example data sets
The RNA-sequencing data of subtypes low-grade glioblastoma (LGG) patients from The Cancer Genome Atlas database ([TCGA](https://www.cancer.gov/tcga)) was used. The preprocessing of data involved standard steps for RNA-Seq data. The log2 transformation was performed. Features with zero and near-zero (1%) variance across patients were removed. After preprocessing, the primary dataset contains xxx tumor samples (wymienic podtypy) described with xxx differentially expressed genes (DEGs). This dataset includes highly correlated features, and the number of cancer samples is roughly ten times more than normal samples. For testing purposes, the number of molecular markers was limited to 2000 DEGs ranked by the highest difference in the gene expression level between subtypes of tumor ([exampleData_TCGA_LUAD_2000.csv](https://github.com/biocsuwb/EnsembleFS-package/tree/main/data)). 
The first column ("class") includes the subtype of patients with LGG. 

## Challenges in analysing molecular data

![Fig.3](https://github.com/biocsuwb/Images/blob/main/Problem_dataomics.jpg?raw=true)

Fig.3 Challenges in analysing molecular data.

## Detailed step-by-step instruction on how to conduct the analysis:
1. Identify relevant variables (candidate diagnostic biomarkers) by using FS methods
2. Remove redundant variables (correlated biomarkers)
3. Select top-N relevant variables 
4. Select a method to validate the ML model
5. Construct predictive model using individual classifier or ensemble classifier 
6. Evaluate biomarker discovery methods
7. Perform hyperparameter optimization for classifier algorithm
8. Construct diagnosis system using the best classifier 

## Example 1 - Construct the predictive model with molecular data by using one of eight individual classifiers
### Load and check correctness of example data

#### Load data parameters
- path, path to the data file (str) ***path='test_data/exampleData_TCGA_LUAD_2000.csv'***;
- index_col, is data contains index column (bool) ***index_col=False***;
- encode, is data should be encoded (bool) ***encode=False***;
- sep, separator in the data file (str) ***sep=','***;

```python
pr = pre.DataPreprocessing()
pr.load_data('test_data/exampleData_TCGA_LUAD_2000.csv')
pr.show_data()
```
### Prepare omic data for machine learning

#### Choose target column parameters
- target, target column name (str) ***target='class'*** ;

```python
X, y = pr.set_target('class')
```

#### Remove collinear features throught the Spearman correlation matrix parameters
- threshold, threshold for the Spearman correlation matrix (float) ***threshold=0.75*** ;

```python
pr.remove_collinear_features(threshold=0.75)
```

#### MinMaxScaler features normalization
```python
X = pr.normalization()
```

### Feature selection using the FS methods

#### Feature selection parameters
- X, variables (pd.DataFrame) ***X=X***;
- y, target variable (pd.Series) ***y=y***;
- method_, FS method name (str) ***method_='lasso'***;
- size, top n variables (int) ***size=100***;

#### LASSO hyperparameters 
- params, feature selection method hyperparameters (dict) ***params={'alpha': 0.00001, 'fit_intercept: True, 'precompute': False, 'max_iter': 10000, 'tol': 0.0001, 'selection': 'cyclic', 'random_state': None}***;

#### Run LASSO 
```python
lasso_features = fs.FeatureSelection(X, y, method_='lasso', size=100, params={'alpha': 0.00001, 'fit_intercept': True},)
```

#### RELIEFF hyperparameters
- params, feature selection method hyperparameters (dict) ***params={'n_neighbors': 100}***;

#### Run RELIEFF 
```python
relieff_features = fs.FeatureSelection(X, y, method_='relieff', size=100, params={'n_neighbors': 100},)
```

#### MRMR hyperparameters
- params, feature selection method hyperparameters (dict) ***params={'relevance': 'f', 'redundancy': 'c', 'denominator': 'mean', 'cat_features': None, 'only_same_domain': False, 'return_scores': False, 'n_jobs': -1, 'show_progress': True}***;

#### Run MRMR
```python
mrmr_features = fs.FeatureSelection(X, y, method_='mrmr', size=100, params={'relevance': 'f', 'redundancy': 'c'},)
```

#### U-TEST hyperparameters
- params, feature selection method hyperparameters (dict) ***params={'use_continuity': True, 'alternative': 'two-sided', 'axis': 0, 'method': 'auto'}***;

#### Run U-TEST 
```python
utest_features = fs.FeatureSelection(X, y, method_='uTest', size=100, params={'use_continuity': True, 'alternative': 'two-sided'},)
```

### g:Profiler - interoperable web service for functional enrichment analysis and gene identifier mapping

#### get_profiler configurations parameters
- return_dataframe, result type (bool) if True, query results are presented as pandas DataFrames, otherwise as a list of dicts;
- organism, Organism id for profiling (str) For full list see https://biit.cs.ut.ee/gprofiler/page/organism-list
- query, list of genes to profile (pd.Series or  list)

```python
fs.get_profiler(
  return_dataframe=True,
  organism='hsapiens',
  query=lasso_features[:5],
)
```

|    | source   | native     | name                                             |   p_value | significant   | description                                                                                                                                                                                                                                                                                                                                                                                                                     |   term_size |   query_size |   intersection_size |   effective_domain_size |   precision |    recall | query   | parents                                                  |
|---:|:---------|:-----------|:-------------------------------------------------|----------:|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------:|-------------:|--------------------:|------------------------:|------------:|----------:|:--------|:---------------------------------------------------------|
|  0 | GO:BP    | GO:0050805 | negative regulation of synaptic transmission     | 0.0267705 | True          | "Any process that stops, prevents, or reduces the frequency, rate or extent of synaptic transmission, the process of communication from a neuron to a target (neuron, muscle, or secretory cell) across a synapse." [GOC:ai]                                                                                                                                                                                                    |          55 |            4 |                   2 |                   21031 |         0.5 | 0.0363636 | query_1 | ['GO:0007268', 'GO:0010648', 'GO:0023057', 'GO:0050804'] |
|  1 | GO:MF    | GO:0019811 | cocaine binding                                  | 0.0498746 | True          | "Binding to cocaine (2-beta-carbomethoxy-3-beta-benzoxytropane), an alkaloid obtained from dried leaves of the South American shrub Erythroxylon coca or by chemical synthesis." [GOC:jl, ISBN:0198506732]                                                                                                                                                                                                                      |           1 |            5 |                   1 |                   20212 |         0.2 | 1         | query_1 | ['GO:0043169', 'GO:0097159', 'GO:1901363']               |
|  2 | GO:MF    | GO:0050785 | advanced glycation end-product receptor activity | 0.0498746 | True          | "Combining with advanced glycation end-products and transmitting the signal to initiate a change in cell activity. Advanced glycation end-products (AGEs) form from a series of chemical reactions after an initial glycation event (a non-enzymatic reaction between reducing sugars and free amino groups of proteins)." [GOC:signaling, PMID:12453678, PMID:12707408, PMID:7592757, PMID:9224812, Wikipedia:RAGE_(receptor)] |           1 |            5 |                   1 |                   20212 |         0.2 | 1         | query_1 | ['GO:0038023']                                           |
 
Table 1. The g:Profiler results for the top 5 features selected by the LASSO method.

## Example 2 - Construct the predictive model with molecular data by using individual classifiers

### Required parameters
- X, variables (pd.DataFrame) ***X=X***;
- y, target (pd.Series) ***y=y***;
- features, list of feature sets (list) ***features=[relieff_features.features, lasso_features.features,]***;
- classifiers, list of classifiers (list) ***classifiers=['adaboost', 'random_forest', 'svm',]***;
- cv, cross-validation method (str) ***cv='stratified_k_fold'***;
- repetitions, number of repetitions (int) ***repetitions=10***;

### Optional parameters
- classifier_params, classifiers hyperparameters (list of dicts) ***classifier_params=[{'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}}, {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}}, {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},]***;
- cv_params, cross-validation method hyperparameters (dict) ***cv_params={'n_splits': 10}***;

### Construct the predictive model with molecular data by using individual classifiers

```python
clf = cl.Classifier(
    X,
    y,
    features=[relieff_features.features, lasso_features.features,],
    classifiers=['svm', 'adaboost', 'random_forest',],
    classifier_params=[
        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},
        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9}},
        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None}},
    ],
    cv='stratified_k_fold',
    cv_params={'n_splits': 10},
    repetitions=10,
) 
```

#### Get model metrics
```python
clf.all_metrics()
```

```console
["ACC: {'RELIEFF_SVM': [0.987, 0.015], 'RELIEFF_ADABOOST': [0.994, 0.011], 'RELIEFF_RANDOM_FOREST': [0.99, 0.013], 'LASSO_SVM': [0.995, 0.009], 'LASSO_ADABOOST': [0.992, 0.012], 'LASSO_RANDOM_FOREST': [0.99, 0.014]}",
 "Roc Auc: {'RELIEFF_SVM': [0.978, 0.038], 'RELIEFF_ADABOOST': [0.991, 0.023], 'RELIEFF_RANDOM_FOREST': [0.971, 0.048], 'LASSO_SVM': [0.99, 0.026], 'LASSO_ADABOOST': [0.982, 0.033], 'LASSO_RANDOM_FOREST': [0.971, 0.049]}",
 "F1 score: {'RELIEFF_SVM': [0.993, 0.008], 'RELIEFF_ADABOOST': [0.996, 0.006], 'RELIEFF_RANDOM_FOREST': [0.994, 0.007], 'LASSO_SVM': [0.997, 0.005], 'LASSO_ADABOOST': [0.996, 0.007], 'LASSO_RANDOM_FOREST': [0.994, 0.008]}",
 "MCC: {'RELIEFF_SVM': [0.938, 0.072], 'RELIEFF_ADABOOST': [0.969, 0.051], 'RELIEFF_RANDOM_FOREST': [0.945, 0.07], 'LASSO_SVM': [0.973, 0.047], 'LASSO_ADABOOST': [0.961, 0.06], 'LASSO_RANDOM_FOREST': [0.945, 0.076]}"]
```

#### Visualization of the model metrics

```python
clf.plot_roc_auc()
```
![Fig.4](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/clf_roc_auc.png?raw=true)

Fig. 4 The ROC AUC score for the predictive model with molecular data by using individual classifiers.

## Example 3 - Construct the predictive model with molecular data by using ensemble learning

### Ensemble configuration parameters

### Required parameters
- X, variables (pd.DataFrame) ***X=X***;
- y, target (pd.Series) ***y=y***;
- features, list of feature sets (list) ***features=[relieff_features.features, lasso_features.features,]***;
- classifiers, list of classifiers (list) ***classifiers=['adaboost', 'random_forest', 'svm',]***;
- cv, cross-validation method (str) ***cv='stratified_k_fold'***;
- ensemble, list of ensemble methods (list) ***ensemble=['stacking',]***;
- repetitions, number of repetitions (int) ***repetitions=10***;

### Optional parameters
- classifier_params, classifiers hyperparameters (list of dicts) ***classifier_params=[{'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}}, {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}}, {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},]***;
- cv_params, cross-validation method hyperparameters (dict) ***cv_params={'n_splits': 10}***;
- ensemble_params, ensemble hyperparameters (list of dicts) ***ensemble_params=[{'stacking': {'final_estimator': None,}},]***;

### Stacking ensemble learning

```python
ens_stacking = ens.Ensemble(
    X,
    y,
    features=[relieff_features.features, lasso_features.features,],
    classifiers=['adaboost', 'random_forest', 'svm',],
    classifier_params=[
        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}},
        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}},
        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},
   ],  
    cv='stratified_k_fold',
    cv_params={'n_splits': 10},
    ensemble=['stacking',],
    ensemble_params=[
        {'stacking': {'final_estimator': None,}},
    ],
    repetitions=10,
)
```

#### Get model metrics
```python
ens_stacking.all_metrics()
```

```console
["ACC: {'RELIEFF_STACKING': [0.993, 0.01], 'LASSO_STACKING': [0.994, 0.009]}",
 "Roc Auc: {'RELIEFF_STACKING': [0.981, 0.035], 'LASSO_STACKING': [0.984, 0.032]}",
 "F1 score: {'RELIEFF_STACKING': [0.996, 0.006], 'LASSO_STACKING': [0.997, 0.005]}",
 "MCC: {'RELIEFF_STACKING': [0.962, 0.054], 'LASSO_STACKING': [0.968, 0.051]}"]
```

#### Visualization of the model metrics

```python
ens_stacking.plot_roc_auc()
```

![Fig.5](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/ens_stacking_roc_auc.png?raw=true)

Fig. 5 The ROC AUC score for the predictive model with molecular data by using ensemble learning.

### Voting ensemble learning

```python
ens_voting = ens.Ensemble(
    X,
    y,
    features=[relieff_features.features, lasso_features.features,],
    classifiers=['adaboost', 'random_forest', 'svm',],
    classifier_params=[
        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}},
        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}},
        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},
    ],  
    cv='stratified_k_fold',
    cv_params={'n_splits': 10},
    ensemble=['voting',],
    ensemble_params=[
        {'voting': {'voting': 'soft'}},
    ],
    repetitions=10,
)

ens_voting.all_metrics()
```

#### Get model metrics
```python
ens_voting.all_metrics()
```

```console
["ACC: {'RELIEFF_VOTING': [0.991, 0.012], 'LASSO_VOTING': [0.995, 0.009]}",
 "Roc Auc: {'RELIEFF_VOTING': [0.98, 0.039], 'LASSO_VOTING': [0.989, 0.027]}",
 "F1 score: {'RELIEFF_VOTING': [0.995, 0.007], 'LASSO_VOTING': [0.997, 0.005]}",
 "MCC: {'RELIEFF_VOTING': [0.952, 0.061], 'LASSO_VOTING': [0.973, 0.047]}"]
```

#### Visualization of the model metrics

```python
ens_voting.plot_roc_auc()
```

![Fig.6](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/ens_voting_roc_auc.png?raw=true)

Fig. 6 The ROC AUC score for the predictive model with molecular data by using ensemble learning.

### Bagging ensemble learning
```python
ens_bagging = ens.Ensemble(
    X,
    y,
    features=[relieff_features.features, lasso_features.features,],
    classifiers=['adaboost', 'random_forest', 'svm',],
    classifier_params=[
        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}},
        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}},
        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},
    ],  
    cv='stratified_k_fold',
    cv_params={'n_splits': 10},
    ensemble=[
        'bagging',
    ],
    ensemble_params=[
        {'bagging': {
            'estimator_name': 'random_forest', 'n_estimators': 100, 'max_samples': 0.5, 'max_features': 0.5}},
    ],
    repetitions=10,
)

ens_bagging.all_metrics()
```

#### Get model metrics
```python
ens_bagging.all_metrics()
```

```console
["ACC: {'RELIEFF_BAGGING': [0.989, 0.013], 'LASSO_BAGGING': [0.987, 0.014]}",
 "Roc Auc: {'RELIEFF_BAGGING': [0.961, 0.052], 'LASSO_BAGGING': [0.962, 0.049]}",
 "F1 score: {'RELIEFF_BAGGING': [0.994, 0.007], 'LASSO_BAGGING': [0.993, 0.008]}",
 "MCC: {'RELIEFF_BAGGING': [0.94, 0.069], 'LASSO_BAGGING': [0.931, 0.072]}"]
 ```

#### Visualization of the model metrics

```python
ens_bagging.plot_roc_auc()
```
![Fig.7](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/ens_bagging_roc_auc.png?raw=true)

Fig. 7 The ROC AUC score for the predictive model with molecular data by using ensemble learning.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/biocsuwb/ensemble-binclass",
    "name": "ensemble-binclass",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "feature selection, ensemble classification, machine learning, omnic data",
    "author": "Szymon Kolodziejski",
    "author_email": "Szymon Ko\u0142odziejski <koodziejskisz@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/a9/65/d2ae90b045088ae966065b71084aa31e460cffa264f3c08ebd68a8821822/ensemble_binclass-1.0.10.tar.gz",
    "platform": null,
    "description": "# The ensemble-binclass: an Python package for a feature selection and ensemble classification of molecular omic data\r\n## Project developed by:\r\nSzymon Ko\u0142odziejski and Polewko-Klim Aneta\r\n## Description\r\nensemble-binclass is an python package for feature selection (FS) and ensemble classification of omic data (numeric data formats).\r\nThis tool is based on four feature filters (**U-test**, the minimum redundancy maximum relevance (**MRMR**) \r\n([Ding 2005](https://pubmed.ncbi.nlm.nih.gov/15852500/), the **Relief** ([Kira 1992](https://dl.acm.org/doi/abs/10.5555/1867135.1867155)), the least absolute shrinkage and selection operator (**LASSO**) ([Tibshirani 1996](https://www.jstor.org/stable/2346178))) and eight binary classifiers\r\n(**Adaboost**, **gradient boosting**, **random forest** [Breiman 2001](https://link.springer.com/article/10.1023/A:1010933404324), k-nearest neighbor (**k-NN**), **decision tree**, **extra trees**, support vector machine (**SVM**) and extreme gradient boost (**XGBoost**)).\r\nThe ensemble-binclass uses the machine learning algorithms (ML) to evaluate the quality of feature sets (**k-fold stratified crossvalidation** or **resampling**). \r\nIt can be applied to two-class problems.\r\n\r\n**Ensemble-binclass is a Python package that allows the user to:**\r\n* filter the most informative features by using up to four FS methods from numeric omics data generated from high-throughput molecular biology experiments;\r\n* remove redundant features by building the Spearman correlation matrix that identifies highly correlated features;\r\n* perform binary classification by using one of the eight classifiers:\r\n* build predictive model using one of ensemble learning, namely, **stacking, boosting, and bagging** (see Figure 1);\r\n* evaluate predictive model by using several metrics (the area under receiver operator curve  (AUC), accuracy (ACC), and the Matthews correlation coefficient(MCC), F1, ....);\r\n* compare the predictive performance of models and the stability of selected feature sets for selected FS algorithms, individual classifiers, and combined classifiers;  \r\n* establish the selected parameters for predictive models, such as the number of top N informative features;\r\n* product plots to visualize the model results;\r\n  \r\n![Fig.1](https://github.com/biocsuwb/Images/blob/main/EnsembleLearning.jpg?raw=true) \r\n\r\nFig.1 The ensemble learning strategy in ensemble-binclass.\r\n\r\n## General workflow of the ensemble learning\r\n\r\n![Fig.2](https://github.com/biocsuwb/Images/blob/main/WorkflowEnsembleBin.jpg?raw=true) \r\n\r\nFig.2 Flowchart of ensemble learning.\r\n\r\n## The package consists of the following components:\r\n\r\n| Component          | Description                                                                        |\r\n|--------------------|------------------------------------------------------------------------------------|   \r\n| preprocessing      | load, prepare, normalization and remove collinear features                         |  \r\n| featureSelection   | top k features thought FS methods: LASSO, ReliefF, MRMR and U-test                 |\r\n| modelEvaluation    | cross-validation algorithms: hold out, k-fold, stratified k-fold and leave one out |\r\n| classifier         | individual classifiers: ADA, GB, RF, kNN, DT, ET, SVM and XGB                      |\r\n| ensemble           | ensemble classifiers: voting, stacking and boosting                                |\r\n| performanceMetrics | model performance metrics: ACC, ROC AUC, F1-score, MCC and MSE                     | \r\n\r\n### Install the development version from GitHub:\r\nTo install this package, clone the repository and install with pip:\r\n```shell\r\ngit clone https://github.com/biocsuwb/ensemble-binclass.git\r\n```\r\n### Required packages\r\npandas 2.2.0, scikit-learn 1.5.2, xgboost 2.0.3, numpy 1.26.3, ReliefF 0.1.2, scipy 1.12.0, mrmr-selection 0.2.8, matplotlib 3.8.2, pytest 8.1.1, pytest-cov 4.1.0,\r\nseaborn 0.13.2, and grofiler-official 0.3.5\r\n\r\n### Install required packages\r\nTo install required packages, run the following command:\r\n```shell\r\npip install -r ./requirements.txt\r\n```\r\n\r\n### Install the development version from PyPi repository:\r\n\r\n```python\r\npip install ensemble-binclass\r\n```\r\n\r\n## Import module\r\n\r\n```python\r\nfrom ensbinclass import preprocessing as pre\r\nfrom ensbinclass import featureSelection as fs\r\nfrom ensbinclass import classifier as cl\r\nfrom ensbinclass import ensemble as ens\r\nimport pandas as pd\r\nimport numpy as np\r\n```\r\n## Example data sets\r\nThe RNA-sequencing data of subtypes low-grade glioblastoma (LGG) patients from The Cancer Genome Atlas database ([TCGA](https://www.cancer.gov/tcga)) was used. The preprocessing of data involved standard steps for RNA-Seq data. The log2 transformation was performed. Features with zero and near-zero (1%) variance across patients were removed. After preprocessing, the primary dataset contains xxx tumor samples (wymienic podtypy) described with xxx differentially expressed genes (DEGs). This dataset includes highly correlated features, and the number of cancer samples is roughly ten times more than normal samples. For testing purposes, the number of molecular markers was limited to 2000 DEGs ranked by the highest difference in the gene expression level between subtypes of tumor ([exampleData_TCGA_LUAD_2000.csv](https://github.com/biocsuwb/EnsembleFS-package/tree/main/data)). \r\nThe first column (\"class\") includes the subtype of patients with LGG. \r\n\r\n## Challenges in analysing molecular data\r\n\r\n![Fig.3](https://github.com/biocsuwb/Images/blob/main/Problem_dataomics.jpg?raw=true)\r\n\r\nFig.3 Challenges in analysing molecular data.\r\n\r\n## Detailed step-by-step instruction on how to conduct the analysis:\r\n1. Identify relevant variables (candidate diagnostic biomarkers) by using FS methods\r\n2. Remove redundant variables (correlated biomarkers)\r\n3. Select top-N relevant variables \r\n4. Select a method to validate the ML model\r\n5. Construct predictive model using individual classifier or ensemble classifier \r\n6. Evaluate biomarker discovery methods\r\n7. Perform hyperparameter optimization for classifier algorithm\r\n8. Construct diagnosis system using the best classifier \r\n\r\n## Example 1 - Construct the predictive model with molecular data by using one of eight individual classifiers\r\n### Load and check correctness of example data\r\n\r\n#### Load data parameters\r\n- path, path to the data file (str) ***path='test_data/exampleData_TCGA_LUAD_2000.csv'***;\r\n- index_col, is data contains index column (bool) ***index_col=False***;\r\n- encode, is data should be encoded (bool) ***encode=False***;\r\n- sep, separator in the data file (str) ***sep=','***;\r\n\r\n```python\r\npr = pre.DataPreprocessing()\r\npr.load_data('test_data/exampleData_TCGA_LUAD_2000.csv')\r\npr.show_data()\r\n```\r\n### Prepare omic data for machine learning\r\n\r\n#### Choose target column parameters\r\n- target, target column name (str) ***target='class'*** ;\r\n\r\n```python\r\nX, y = pr.set_target('class')\r\n```\r\n\r\n#### Remove collinear features throught the Spearman correlation matrix parameters\r\n- threshold, threshold for the Spearman correlation matrix (float) ***threshold=0.75*** ;\r\n\r\n```python\r\npr.remove_collinear_features(threshold=0.75)\r\n```\r\n\r\n#### MinMaxScaler features normalization\r\n```python\r\nX = pr.normalization()\r\n```\r\n\r\n### Feature selection using the FS methods\r\n\r\n#### Feature selection parameters\r\n- X, variables (pd.DataFrame) ***X=X***;\r\n- y, target variable (pd.Series) ***y=y***;\r\n- method_, FS method name (str) ***method_='lasso'***;\r\n- size, top n variables (int) ***size=100***;\r\n\r\n#### LASSO hyperparameters \r\n- params, feature selection method hyperparameters (dict) ***params={'alpha': 0.00001, 'fit_intercept: True, 'precompute': False, 'max_iter': 10000, 'tol': 0.0001, 'selection': 'cyclic', 'random_state': None}***;\r\n\r\n#### Run LASSO \r\n```python\r\nlasso_features = fs.FeatureSelection(X, y, method_='lasso', size=100, params={'alpha': 0.00001, 'fit_intercept': True},)\r\n```\r\n\r\n#### RELIEFF hyperparameters\r\n- params, feature selection method hyperparameters (dict) ***params={'n_neighbors': 100}***;\r\n\r\n#### Run RELIEFF \r\n```python\r\nrelieff_features = fs.FeatureSelection(X, y, method_='relieff', size=100, params={'n_neighbors': 100},)\r\n```\r\n\r\n#### MRMR hyperparameters\r\n- params, feature selection method hyperparameters (dict) ***params={'relevance': 'f', 'redundancy': 'c', 'denominator': 'mean', 'cat_features': None, 'only_same_domain': False, 'return_scores': False, 'n_jobs': -1, 'show_progress': True}***;\r\n\r\n#### Run MRMR\r\n```python\r\nmrmr_features = fs.FeatureSelection(X, y, method_='mrmr', size=100, params={'relevance': 'f', 'redundancy': 'c'},)\r\n```\r\n\r\n#### U-TEST hyperparameters\r\n- params, feature selection method hyperparameters (dict) ***params={'use_continuity': True, 'alternative': 'two-sided', 'axis': 0, 'method': 'auto'}***;\r\n\r\n#### Run U-TEST \r\n```python\r\nutest_features = fs.FeatureSelection(X, y, method_='uTest', size=100, params={'use_continuity': True, 'alternative': 'two-sided'},)\r\n```\r\n\r\n### g:Profiler - interoperable web service for functional enrichment analysis and gene identifier mapping\r\n\r\n#### get_profiler configurations parameters\r\n- return_dataframe, result type (bool) if True, query results are presented as pandas DataFrames, otherwise as a list of dicts;\r\n- organism, Organism id for profiling (str) For full list see https://biit.cs.ut.ee/gprofiler/page/organism-list\r\n- query, list of genes to profile (pd.Series or  list)\r\n\r\n```python\r\nfs.get_profiler(\r\n  return_dataframe=True,\r\n  organism='hsapiens',\r\n  query=lasso_features[:5],\r\n)\r\n```\r\n\r\n|    | source   | native     | name                                             |   p_value | significant   | description                                                                                                                                                                                                                                                                                                                                                                                                                     |   term_size |   query_size |   intersection_size |   effective_domain_size |   precision |    recall | query   | parents                                                  |\r\n|---:|:---------|:-----------|:-------------------------------------------------|----------:|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------:|-------------:|--------------------:|------------------------:|------------:|----------:|:--------|:---------------------------------------------------------|\r\n|  0 | GO:BP    | GO:0050805 | negative regulation of synaptic transmission     | 0.0267705 | True          | \"Any process that stops, prevents, or reduces the frequency, rate or extent of synaptic transmission, the process of communication from a neuron to a target (neuron, muscle, or secretory cell) across a synapse.\" [GOC:ai]                                                                                                                                                                                                    |          55 |            4 |                   2 |                   21031 |         0.5 | 0.0363636 | query_1 | ['GO:0007268', 'GO:0010648', 'GO:0023057', 'GO:0050804'] |\r\n|  1 | GO:MF    | GO:0019811 | cocaine binding                                  | 0.0498746 | True          | \"Binding to cocaine (2-beta-carbomethoxy-3-beta-benzoxytropane), an alkaloid obtained from dried leaves of the South American shrub Erythroxylon coca or by chemical synthesis.\" [GOC:jl, ISBN:0198506732]                                                                                                                                                                                                                      |           1 |            5 |                   1 |                   20212 |         0.2 | 1         | query_1 | ['GO:0043169', 'GO:0097159', 'GO:1901363']               |\r\n|  2 | GO:MF    | GO:0050785 | advanced glycation end-product receptor activity | 0.0498746 | True          | \"Combining with advanced glycation end-products and transmitting the signal to initiate a change in cell activity. Advanced glycation end-products (AGEs) form from a series of chemical reactions after an initial glycation event (a non-enzymatic reaction between reducing sugars and free amino groups of proteins).\" [GOC:signaling, PMID:12453678, PMID:12707408, PMID:7592757, PMID:9224812, Wikipedia:RAGE_(receptor)] |           1 |            5 |                   1 |                   20212 |         0.2 | 1         | query_1 | ['GO:0038023']                                           |\r\n \r\nTable 1. The g:Profiler results for the top 5 features selected by the LASSO method.\r\n\r\n## Example 2 - Construct the predictive model with molecular data by using individual classifiers\r\n\r\n### Required parameters\r\n- X, variables (pd.DataFrame) ***X=X***;\r\n- y, target (pd.Series) ***y=y***;\r\n- features, list of feature sets (list) ***features=[relieff_features.features, lasso_features.features,]***;\r\n- classifiers, list of classifiers (list) ***classifiers=['adaboost', 'random_forest', 'svm',]***;\r\n- cv, cross-validation method (str) ***cv='stratified_k_fold'***;\r\n- repetitions, number of repetitions (int) ***repetitions=10***;\r\n\r\n### Optional parameters\r\n- classifier_params, classifiers hyperparameters (list of dicts) ***classifier_params=[{'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}}, {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}}, {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},]***;\r\n- cv_params, cross-validation method hyperparameters (dict) ***cv_params={'n_splits': 10}***;\r\n\r\n### Construct the predictive model with molecular data by using individual classifiers\r\n\r\n```python\r\nclf = cl.Classifier(\r\n    X,\r\n    y,\r\n    features=[relieff_features.features, lasso_features.features,],\r\n    classifiers=['svm', 'adaboost', 'random_forest',],\r\n    classifier_params=[\r\n        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},\r\n        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9}},\r\n        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None}},\r\n    ],\r\n    cv='stratified_k_fold',\r\n    cv_params={'n_splits': 10},\r\n    repetitions=10,\r\n) \r\n```\r\n\r\n#### Get model metrics\r\n```python\r\nclf.all_metrics()\r\n```\r\n\r\n```console\r\n[\"ACC: {'RELIEFF_SVM': [0.987, 0.015], 'RELIEFF_ADABOOST': [0.994, 0.011], 'RELIEFF_RANDOM_FOREST': [0.99, 0.013], 'LASSO_SVM': [0.995, 0.009], 'LASSO_ADABOOST': [0.992, 0.012], 'LASSO_RANDOM_FOREST': [0.99, 0.014]}\",\r\n \"Roc Auc: {'RELIEFF_SVM': [0.978, 0.038], 'RELIEFF_ADABOOST': [0.991, 0.023], 'RELIEFF_RANDOM_FOREST': [0.971, 0.048], 'LASSO_SVM': [0.99, 0.026], 'LASSO_ADABOOST': [0.982, 0.033], 'LASSO_RANDOM_FOREST': [0.971, 0.049]}\",\r\n \"F1 score: {'RELIEFF_SVM': [0.993, 0.008], 'RELIEFF_ADABOOST': [0.996, 0.006], 'RELIEFF_RANDOM_FOREST': [0.994, 0.007], 'LASSO_SVM': [0.997, 0.005], 'LASSO_ADABOOST': [0.996, 0.007], 'LASSO_RANDOM_FOREST': [0.994, 0.008]}\",\r\n \"MCC: {'RELIEFF_SVM': [0.938, 0.072], 'RELIEFF_ADABOOST': [0.969, 0.051], 'RELIEFF_RANDOM_FOREST': [0.945, 0.07], 'LASSO_SVM': [0.973, 0.047], 'LASSO_ADABOOST': [0.961, 0.06], 'LASSO_RANDOM_FOREST': [0.945, 0.076]}\"]\r\n```\r\n\r\n#### Visualization of the model metrics\r\n\r\n```python\r\nclf.plot_roc_auc()\r\n```\r\n![Fig.4](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/clf_roc_auc.png?raw=true)\r\n\r\nFig. 4 The ROC AUC score for the predictive model with molecular data by using individual classifiers.\r\n\r\n## Example 3 - Construct the predictive model with molecular data by using ensemble learning\r\n\r\n### Ensemble configuration parameters\r\n\r\n### Required parameters\r\n- X, variables (pd.DataFrame) ***X=X***;\r\n- y, target (pd.Series) ***y=y***;\r\n- features, list of feature sets (list) ***features=[relieff_features.features, lasso_features.features,]***;\r\n- classifiers, list of classifiers (list) ***classifiers=['adaboost', 'random_forest', 'svm',]***;\r\n- cv, cross-validation method (str) ***cv='stratified_k_fold'***;\r\n- ensemble, list of ensemble methods (list) ***ensemble=['stacking',]***;\r\n- repetitions, number of repetitions (int) ***repetitions=10***;\r\n\r\n### Optional parameters\r\n- classifier_params, classifiers hyperparameters (list of dicts) ***classifier_params=[{'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}}, {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}}, {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},]***;\r\n- cv_params, cross-validation method hyperparameters (dict) ***cv_params={'n_splits': 10}***;\r\n- ensemble_params, ensemble hyperparameters (list of dicts) ***ensemble_params=[{'stacking': {'final_estimator': None,}},]***;\r\n\r\n### Stacking ensemble learning\r\n\r\n```python\r\nens_stacking = ens.Ensemble(\r\n    X,\r\n    y,\r\n    features=[relieff_features.features, lasso_features.features,],\r\n    classifiers=['adaboost', 'random_forest', 'svm',],\r\n    classifier_params=[\r\n        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}},\r\n        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}},\r\n        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},\r\n   ],  \r\n    cv='stratified_k_fold',\r\n    cv_params={'n_splits': 10},\r\n    ensemble=['stacking',],\r\n    ensemble_params=[\r\n        {'stacking': {'final_estimator': None,}},\r\n    ],\r\n    repetitions=10,\r\n)\r\n```\r\n\r\n#### Get model metrics\r\n```python\r\nens_stacking.all_metrics()\r\n```\r\n\r\n```console\r\n[\"ACC: {'RELIEFF_STACKING': [0.993, 0.01], 'LASSO_STACKING': [0.994, 0.009]}\",\r\n \"Roc Auc: {'RELIEFF_STACKING': [0.981, 0.035], 'LASSO_STACKING': [0.984, 0.032]}\",\r\n \"F1 score: {'RELIEFF_STACKING': [0.996, 0.006], 'LASSO_STACKING': [0.997, 0.005]}\",\r\n \"MCC: {'RELIEFF_STACKING': [0.962, 0.054], 'LASSO_STACKING': [0.968, 0.051]}\"]\r\n```\r\n\r\n#### Visualization of the model metrics\r\n\r\n```python\r\nens_stacking.plot_roc_auc()\r\n```\r\n\r\n![Fig.5](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/ens_stacking_roc_auc.png?raw=true)\r\n\r\nFig. 5 The ROC AUC score for the predictive model with molecular data by using ensemble learning.\r\n\r\n### Voting ensemble learning\r\n\r\n```python\r\nens_voting = ens.Ensemble(\r\n    X,\r\n    y,\r\n    features=[relieff_features.features, lasso_features.features,],\r\n    classifiers=['adaboost', 'random_forest', 'svm',],\r\n    classifier_params=[\r\n        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}},\r\n        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}},\r\n        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},\r\n    ],  \r\n    cv='stratified_k_fold',\r\n    cv_params={'n_splits': 10},\r\n    ensemble=['voting',],\r\n    ensemble_params=[\r\n        {'voting': {'voting': 'soft'}},\r\n    ],\r\n    repetitions=10,\r\n)\r\n\r\nens_voting.all_metrics()\r\n```\r\n\r\n#### Get model metrics\r\n```python\r\nens_voting.all_metrics()\r\n```\r\n\r\n```console\r\n[\"ACC: {'RELIEFF_VOTING': [0.991, 0.012], 'LASSO_VOTING': [0.995, 0.009]}\",\r\n \"Roc Auc: {'RELIEFF_VOTING': [0.98, 0.039], 'LASSO_VOTING': [0.989, 0.027]}\",\r\n \"F1 score: {'RELIEFF_VOTING': [0.995, 0.007], 'LASSO_VOTING': [0.997, 0.005]}\",\r\n \"MCC: {'RELIEFF_VOTING': [0.952, 0.061], 'LASSO_VOTING': [0.973, 0.047]}\"]\r\n```\r\n\r\n#### Visualization of the model metrics\r\n\r\n```python\r\nens_voting.plot_roc_auc()\r\n```\r\n\r\n![Fig.6](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/ens_voting_roc_auc.png?raw=true)\r\n\r\nFig. 6 The ROC AUC score for the predictive model with molecular data by using ensemble learning.\r\n\r\n### Bagging ensemble learning\r\n```python\r\nens_bagging = ens.Ensemble(\r\n    X,\r\n    y,\r\n    features=[relieff_features.features, lasso_features.features,],\r\n    classifiers=['adaboost', 'random_forest', 'svm',],\r\n    classifier_params=[\r\n        {'adaboost': {'n_estimators': 100, 'learning_rate': 0.9,}},\r\n        {'random_forest': {'n_estimators': 100, 'criterion': 'gini', 'max_depth': None,}},\r\n        {'svm': {'C': 1, 'kernel': 'linear', 'gamma': 'auto'}},\r\n    ],  \r\n    cv='stratified_k_fold',\r\n    cv_params={'n_splits': 10},\r\n    ensemble=[\r\n        'bagging',\r\n    ],\r\n    ensemble_params=[\r\n        {'bagging': {\r\n            'estimator_name': 'random_forest', 'n_estimators': 100, 'max_samples': 0.5, 'max_features': 0.5}},\r\n    ],\r\n    repetitions=10,\r\n)\r\n\r\nens_bagging.all_metrics()\r\n```\r\n\r\n#### Get model metrics\r\n```python\r\nens_bagging.all_metrics()\r\n```\r\n\r\n```console\r\n[\"ACC: {'RELIEFF_BAGGING': [0.989, 0.013], 'LASSO_BAGGING': [0.987, 0.014]}\",\r\n \"Roc Auc: {'RELIEFF_BAGGING': [0.961, 0.052], 'LASSO_BAGGING': [0.962, 0.049]}\",\r\n \"F1 score: {'RELIEFF_BAGGING': [0.994, 0.007], 'LASSO_BAGGING': [0.993, 0.008]}\",\r\n \"MCC: {'RELIEFF_BAGGING': [0.94, 0.069], 'LASSO_BAGGING': [0.931, 0.072]}\"]\r\n ```\r\n\r\n#### Visualization of the model metrics\r\n\r\n```python\r\nens_bagging.plot_roc_auc()\r\n```\r\n![Fig.7](https://github.com/biocsuwb/ensemble-binclass/blob/b7ba0b130c8f170ec2c9723436a82c2a5916b5b0/Images/ens_bagging_roc_auc.png?raw=true)\r\n\r\nFig. 7 The ROC AUC score for the predictive model with molecular data by using ensemble learning.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Feature selection and ensemble classification",
    "version": "1.0.10",
    "project_urls": {
        "Homepage": "https://github.com/biocsuwb/ensemble-binclass",
        "source": "https://github.com/biocsuwb/ensemble-binclass"
    },
    "split_keywords": [
        "feature selection",
        " ensemble classification",
        " machine learning",
        " omnic data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "073b8552ab8f648a91cc8e2ebb2f420f09a4db484a9c6d0607924848cac936c4",
                "md5": "a2064676b6c2f63495f1754f1de2fbdf",
                "sha256": "90953af8b63a7239dbb3cf383be0e1da9b7f76846569552692a26cc601f6522a"
            },
            "downloads": -1,
            "filename": "ensemble_binclass-1.0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a2064676b6c2f63495f1754f1de2fbdf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 20730,
            "upload_time": "2025-02-05T22:45:45",
            "upload_time_iso_8601": "2025-02-05T22:45:45.047346Z",
            "url": "https://files.pythonhosted.org/packages/07/3b/8552ab8f648a91cc8e2ebb2f420f09a4db484a9c6d0607924848cac936c4/ensemble_binclass-1.0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a965d2ae90b045088ae966065b71084aa31e460cffa264f3c08ebd68a8821822",
                "md5": "e037b32eb0b8a47dbf6f66f90ac21196",
                "sha256": "c95e154cbb68f6f20804f8f6e5ab1ccf15ee1bee9726752897280e2c55f2bc59"
            },
            "downloads": -1,
            "filename": "ensemble_binclass-1.0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "e037b32eb0b8a47dbf6f66f90ac21196",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 22919,
            "upload_time": "2025-02-05T22:45:46",
            "upload_time_iso_8601": "2025-02-05T22:45:46.984052Z",
            "url": "https://files.pythonhosted.org/packages/a9/65/d2ae90b045088ae966065b71084aa31e460cffa264f3c08ebd68a8821822/ensemble_binclass-1.0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-05 22:45:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "biocsuwb",
    "github_project": "ensemble-binclass",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "category-encoders",
            "specs": [
                [
                    "==",
                    "2.7.0"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "contourpy",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "coverage",
            "specs": [
                [
                    "==",
                    "7.6.10"
                ]
            ]
        },
        {
            "name": "cycler",
            "specs": [
                [
                    "==",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "fonttools",
            "specs": [
                [
                    "==",
                    "4.55.3"
                ]
            ]
        },
        {
            "name": "gprofiler-official",
            "specs": [
                [
                    "==",
                    "0.3.5"
                ]
            ]
        },
        {
            "name": "iniconfig",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "jinja2",
            "specs": [
                [
                    "==",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "kiwisolver",
            "specs": [
                [
                    "==",
                    "1.4.8"
                ]
            ]
        },
        {
            "name": "markupsafe",
            "specs": [
                [
                    "==",
                    "3.0.2"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.10.0"
                ]
            ]
        },
        {
            "name": "mrmr-selection",
            "specs": [
                [
                    "==",
                    "0.2.8"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "patsy",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "==",
                    "11.1.0"
                ]
            ]
        },
        {
            "name": "pluggy",
            "specs": [
                [
                    "==",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "polars",
            "specs": [
                [
                    "==",
                    "1.19.0"
                ]
            ]
        },
        {
            "name": "pypandoc",
            "specs": [
                [
                    "==",
                    "1.15"
                ]
            ]
        },
        {
            "name": "pyparsing",
            "specs": [
                [
                    "==",
                    "3.2.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.3.4"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    "==",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.2"
                ]
            ]
        },
        {
            "name": "relieff",
            "specs": [
                [
                    "==",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.2"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.15.1"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    "==",
                    "0.13.2"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.17.0"
                ]
            ]
        },
        {
            "name": "statsmodels",
            "specs": [
                [
                    "==",
                    "0.14.4"
                ]
            ]
        },
        {
            "name": "threadpoolctl",
            "specs": [
                [
                    "==",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.67.1"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2024.2"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    "==",
                    "2.0.3"
                ]
            ]
        }
    ],
    "lcname": "ensemble-binclass"
}

Szymon Kolodziejski