MultiTrain

Name	MultiTrain JSON
Version	1.0.1 JSON
	download
home_page	https://github.com/LOVE-DOCTOR/MultiTrain
Summary	MultiTrain is a user-friendly tool that lets you train several machine learning models at once on your dataset, helping you easily find the best model for your needs.
upload_time	2025-02-06 20:41:04
maintainer	Shittu Samson
docs_url	None
author	Shittu Samson
requires_python	>=3.8
license	None
keywords	multitrain multi train multitrain multiclass classifier automl automl train multiple models
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![PyPI](https://img.shields.io/pypi/v/MultiTrain?label=pypi%20package)
![Languages](https://img.shields.io/github/languages/top/LOVE-DOCTOR/train-with-models)
![GitHub repo size](https://img.shields.io/github/repo-size/LOVE-DOCTOR/train-with-models)
![GitHub](https://img.shields.io/github/license/LOVE-DOCTOR/train-with-models)
![GitHub Repo stars](https://img.shields.io/github/stars/love-doctor/train-with-models)
![GitHub contributors](https://img.shields.io/github/contributors/love-doctor/train-with-models)
[![Downloads](https://pepy.tech/badge/multitrain)](https://pepy.tech/project/multitrain)
[![python version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-blue)
![Windows](https://img.shields.io/badge/Windows-0078D6?&logo=windows&logoColor=white)
![Ubuntu](https://img.shields.io/badge/Ubuntu-E95420?&logo=ubuntu&logoColor=white)
![macOS](https://img.shields.io/badge/mac%20os-0078D6?&logo=macos&logoColor=white)


# CONTRIBUTING
If you wish to make small changes to the codebase, your pull requests are welcome. However, for major changes or ideas on how to improve the library, please create an issue.
# LINKS
- [MultiTrain](#multitrain)
- [Requirements](#requirements)
- [Installation](#installation)
- [Issues](#issues)
- [Usage](#usage)
    1. [Visualize training results](#visualize-training-results)
    2. [Hyperparameter Tuning](#hyperparameter-tuning)
    - [MultiClassifier(Classification)](#multiclassifier)
        1. [Classifier Model Names](#classifier-model-names)
        2. [Split](#split-classifier)
        3. [Fit](#fit-classifier)
    - [MultiRegressor](#multiregressor)
        1. [Regression Model Names](#regression-model-names)
        2. [Split](#split-regression)
        3. [Fit](#fit-regression)
# MultiTrain

MultiTrain is a python module for machine learning, built with the aim of assisting you to find the machine learning model that works best on a particular dataset.

# REQUIREMENTS

MultiTrain requires:

- matplotlib==3.5.3
- numpy==1.23.3
- pandas==1.4.4
- plotly==5.10.0
- scikit-learn==1.1.2
- xgboost==1.6.2
- catboost==1.0.6
- imbalanced-learn==0.9.1
- seaborn==0.12.0
- lightgbm==3.3.2
- scikit-optimize==0.9.0

# INSTALLATION
Install MultiTrain using:
```commandline
pip install MultiTrain
```

# ISSUES
If you experience issues or come across a bug while using MultiTrain, make sure to update to the latest version with
```commandline
pip install --upgrade MultiTrain
```
If that doesn't fix your bug, create an issue in the issue tracker

# USAGE

### MULTICLASSIFIER
The MultiClassifier is a combination of many classifier estimators, each of which is fitted on the training data and returns assessment metrics such as accuracy, balanced accuracy, r2 score, f1 score, precision, recall, roc auc score for each of the models.
```python
#This is a code snippet of how to import the MultiClassifier and the parameters contained in an instance

from MultiTrain import MultiClassifier
train = MultiClassifier(
    n_jobs=-1,          # Use all available CPU cores
    random_state=42,    # Ensure reproducibility
    max_iter=1000,      # Maximum number of iterations for models that require it
    custom_models=['LogisticRegression', 'GradientBoostingClassifier'] # If nothing is set here, all available classifiers will be used for training
)
```

### SPLIT CLASSIFIER
This function operates identically like the scikit-learn framework's train test split function.
However, it has some extra features.
For example, the split method is demonstrated in the code below.

```python
import pandas as pd
from MultiTrain import MultiClassifier

train = MultiClassifier()
df = pd.read_csv("nameofFile.csv")

split = train.split(
    data=df,
    target="label_column", # Specify the name of the target column here
    random_state=42, # Set a random seed
    test_size=0.3, # Set the test size to be used for splitting the dataset i.e 0.3 = 70% train, 30% test
    auto_cat_encode=True,  # Automatically encode all categorical columns
    manual_encode={'label': ['cat_feature'], 'onehot': ['city', 'country']},  # Optional manual encoding for select columns (You can't use this with auto_cat_encode)
    fix_nan_custom={'column1': 'ffill', 'column2': 'bfill', 'column3': 'interpolate'},  # Specify columns with the strategies to fill with 
    drop=['unnecessary_column']  # Drop columns that are not needed
)
```

#### Encoding categorical columns
In 'manual_encode', you are expected to pass in the type of encoding you want to perform on the columns in your dataset. The only available encoding types for now are 'label' for label encoding and 'onehot' for one hot encoding.


```python

# Automatic encoding
split = train.split(
    data=df,
    target='label_column',
    test_size=0.2,
    auto_cat_encode=True
)

# Label encoding
split = train.split(
    data=df,
    target='label_column',
    test_size=0.2,
    manual_encode={'label': ['column1', 'column2']}
)


# Onehot encoding
split = train.split(
    data=df,
    target='label_column',
    test_size=0.2,
    manual_encode={'onehot': ['column1', 'column2']}
)

# Label and onehot encoding
split = train.split(
    data=df,
    target='label_column',
    test_size=0.2,
    manual_encode={'label': ['column1', 'column2'],
                   'onehot': ['column3', 'column4']}
)
```
#### Filling missing values
With the help of the 'fix_nan_custom' argument, you may quickly fill in missing values.

You would need to supply a dictionary to the argument in order to fill in the missing values. Each preset key in the dictionary must be used as shown in the example below.


```python
# the three strategies available to fill missing values are ['ffill', 'bfill', 'interpolate']

```python
split = train.split(
    data=df,
    target='label_column',
    test_size=0.2,
    fix_nan_custom={'column1': 'ffill', 'column2': 'bfill', 'column3': 'interpolate'}
)
```


### FIT CLASSIFIER
Now that the dataset has been split using the split method, it is time to train on it using the fit method.
Instead of the standard training in scikit-learn, catboost, or xgboost, this fit method integrates almost all available machine learning algorithms and trains them all on the dataset.
It then returns a pandas dataframe including information such as which algorithm is overfitting, which algorithm has the greatest accuracy, and so on. A basic code example for using the fit function is shown below.
```python
import pandas as pd
from MultiTrain import MultiClassifier

train = MultiClassifier()
df = pd.read_csv('file.csv')


split = train.split(data=df
                    test_size=0.2,
                    auto_cat_encode=True,
                    target='label_column'
                    )

fit = train.fit(
    datasplits=split,
    sort='accuracy', # The metric to sort the final results
)

# The available metrics to pass into sort are 
# 1. accuracy 2. precision 3. recall 4. f1 5. roc_auc
```
Now, we would be looking at the various ways the fit method can be implemented. 
#### If you used the traditional train_test_split method available in scikit-learn
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from MultiTrain import MultiClassifier
train = MultiClassifier()

df = pd.read_csv('filename.csv')

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

datasplits = (X_train, X_test, y_train, y_test)
fit = train.fit(datasplits=datasplit
              show_train_score=True, # Only set this to true if you want to compare train equivalent of all the metrics shown on the dataframe
              sort='accuracy', # Set a metric here to sort the resulting dataframe by the best performing model based on the metric
              custom_metric='log_loss', # If you set a custom metric here, it will be added to the list of metrics displayed on the final table
              imbalanced=True, # Only set this to true if you're working with an imbalanced dataset. It adjust metrics calculation for imbalanced data
              text=True, # Set this to true if you're working with NLP
              vectorizer= 'count', # specify either count or tfidf if you set text to True
              pipeline_dict = {'ngram_range': (1, 2), 'encoding': 'utf-8', 'max_features': 5000, 'analyzer': 'word'} # You must pass in a similar dictionary also if you set text to True
              return_best_model = 'f1' # If you set this, it will return the single best performing model based on the f1 score metric
              ) 
```
#### If you used the split method provided by the MultiClassifier
```python
import pandas as pd
from MultiTrain import MultiClassifier

train = MultiClassifier()
df = pd.read_csv('filename.csv')

split = train.split(data=df
                    test_size=0.2,
                    auto_cat_encode=True,
                    target='label_column'
                    )

fit = train.fit(datasplits=split,
                sort='accuracy',
                show_train_score=True)     
```
#### If you're working on an NLP problem
```python
import pandas as pd
from MultiTrain import MultiClassifier

train = MultiClassifier()
df = pd.read_csv('filename.csv')

split = train.split(data=df
                    test_size=0.2,
                    auto_cat_encode=True,
                    target='label_column'
                    )

fit = train.fit(datasplits=split,
                sort='accuracy',
                show_train_score=True,
                text=True,
                vectorizer='tfidf',
                pipeline_dict = {'ngram_range': (1, 2), 'encoding': 'utf-8', 'max_features': 5000, 'analyzer': 'word'}
                ) 
```

## MULTIREGRESSOR

The MultiRegressor is a combination of many classifier estimators, each of which is fitted on the training data and returns assessment metrics for each of the models.
```python
#This is a code snippet of how to import the MultiClassifier and the parameters contained in an instance

from MultiTrain import MultiRegressor
train = MultiRegressor(
    n_jobs=-1,          # Use all available CPU cores
    random_state=42,    # Ensure reproducibility
    max_iter=1000,      # Maximum number of iterations for models that require it
    custom_models=['LogisticRegression', 'GradientBoostingClassifier'] # If nothing is set here, all available classifiers will be used for training
)
```

### SPLIT REGRESSION
This function operates identically like the scikit-learn framework's train test split function.
However, it has some extra features.
For example, the split method is demonstrated in the code below.
```python
from MultiTrain import MultiRegressor
train = MultiRegressor()
df = pd.read_csv('sample_data.csv')
split = train.split(data=df
                    test_size=0.2,
                    auto_cat_encode=True,
                    target='label_column'
                    )

```

If you want to fill missing values using the split function
> [Fill missing values](#filling-missing-values)

If you want to encode your categorical columns using the split function
> [Encode categorical columns](#encoding-categorical-columns)

All you need to do is swap out MultiClassifier with MultiRegressor and you're good to go.

### FIT REGRESSION
Now, we would be looking at the various ways the fit method can be implemented. 
#### If you used the traditional train_test_split method available in scikit-learn
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from MultiTrain import MultiRegressor
train = MultiRegressor()

df = pd.read_csv('filename.csv')

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

datasplits = (X_train, X_test, y_train, y_test)
fit = train.fit(datasplits=datasplit
              show_train_score=True, # Only set this to true if you want to compare train equivalent of all the metrics shown on the dataframe
              sort='mean_squared_error', # Set a metric here to sort the resulting dataframe by the best performing model based on the metric
              custom_metric='r2_score', # If you set a custom metric here, it will be added to the list of metrics displayed on the final table
              return_best_model = 'mean_squared_error' # If you set this, it will return the single best performing model based on the mean squared error metric
              ) 

# The metrics available for sorting are 
# mean squared error, r2 score, mean absolute error, median absolute error, mean squared log error, explained variance score
```
#### If you used the split method provided by the MultiRegressor
```python
import pandas as pd
from MultiTrain import MultiRegressor

train = MultiRegressor()
df = pd.read_csv('filename.csv')

split = train.split(data=df
                    test_size=0.2,
                    auto_cat_encode=True,
                    target='label_column'
                    )

fit = train.fit(datasplits=split,
                sort='r2 score',
                show_train_score=True)      
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LOVE-DOCTOR/MultiTrain",
    "name": "MultiTrain",
    "maintainer": "Shittu Samson",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "tunex885@gmail.com",
    "keywords": "multitrain, multi, train, MultiTrain, multiclass, classifier, automl, AutoML, train multiple models",
    "author": "Shittu Samson",
    "author_email": "tunexo885@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/65/a2/c8dd72feb57c3061c28d838b901132ed57b99ee2885070a38c9b7b1f7718/multitrain-1.0.1.tar.gz",
    "platform": null,
    "description": "![PyPI](https://img.shields.io/pypi/v/MultiTrain?label=pypi%20package)\n![Languages](https://img.shields.io/github/languages/top/LOVE-DOCTOR/train-with-models)\n![GitHub repo size](https://img.shields.io/github/repo-size/LOVE-DOCTOR/train-with-models)\n![GitHub](https://img.shields.io/github/license/LOVE-DOCTOR/train-with-models)\n![GitHub Repo stars](https://img.shields.io/github/stars/love-doctor/train-with-models)\n![GitHub contributors](https://img.shields.io/github/contributors/love-doctor/train-with-models)\n[![Downloads](https://pepy.tech/badge/multitrain)](https://pepy.tech/project/multitrain)\n[![python version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-blue)\n![Windows](https://img.shields.io/badge/Windows-0078D6?&logo=windows&logoColor=white)\n![Ubuntu](https://img.shields.io/badge/Ubuntu-E95420?&logo=ubuntu&logoColor=white)\n![macOS](https://img.shields.io/badge/mac%20os-0078D6?&logo=macos&logoColor=white)\n\n\n# CONTRIBUTING\nIf you wish to make small changes to the codebase, your pull requests are welcome. However, for major changes or ideas on how to improve the library, please create an issue.\n# LINKS\n- [MultiTrain](#multitrain)\n- [Requirements](#requirements)\n- [Installation](#installation)\n- [Issues](#issues)\n- [Usage](#usage)\n    1. [Visualize training results](#visualize-training-results)\n    2. [Hyperparameter Tuning](#hyperparameter-tuning)\n    - [MultiClassifier(Classification)](#multiclassifier)\n        1. [Classifier Model Names](#classifier-model-names)\n        2. [Split](#split-classifier)\n        3. [Fit](#fit-classifier)\n    - [MultiRegressor](#multiregressor)\n        1. [Regression Model Names](#regression-model-names)\n        2. [Split](#split-regression)\n        3. [Fit](#fit-regression)\n# MultiTrain\n\nMultiTrain is a python module for machine learning, built with the aim of assisting you to find the machine learning model that works best on a particular dataset.\n\n# REQUIREMENTS\n\nMultiTrain requires:\n\n- matplotlib==3.5.3\n- numpy==1.23.3\n- pandas==1.4.4\n- plotly==5.10.0\n- scikit-learn==1.1.2\n- xgboost==1.6.2\n- catboost==1.0.6\n- imbalanced-learn==0.9.1\n- seaborn==0.12.0\n- lightgbm==3.3.2\n- scikit-optimize==0.9.0\n\n# INSTALLATION\nInstall MultiTrain using:\n```commandline\npip install MultiTrain\n```\n\n# ISSUES\nIf you experience issues or come across a bug while using MultiTrain, make sure to update to the latest version with\n```commandline\npip install --upgrade MultiTrain\n```\nIf that doesn't fix your bug, create an issue in the issue tracker\n\n# USAGE\n\n### MULTICLASSIFIER\nThe MultiClassifier is a combination of many classifier estimators, each of which is fitted on the training data and returns assessment metrics such as accuracy, balanced accuracy, r2 score, f1 score, precision, recall, roc auc score for each of the models.\n```python\n#This is a code snippet of how to import the MultiClassifier and the parameters contained in an instance\n\nfrom MultiTrain import MultiClassifier\ntrain = MultiClassifier(\n    n_jobs=-1,          # Use all available CPU cores\n    random_state=42,    # Ensure reproducibility\n    max_iter=1000,      # Maximum number of iterations for models that require it\n    custom_models=['LogisticRegression', 'GradientBoostingClassifier'] # If nothing is set here, all available classifiers will be used for training\n)\n```\n\n### SPLIT CLASSIFIER\nThis function operates identically like the scikit-learn framework's train test split function.\nHowever, it has some extra features.\nFor example, the split method is demonstrated in the code below.\n\n```python\nimport pandas as pd\nfrom MultiTrain import MultiClassifier\n\ntrain = MultiClassifier()\ndf = pd.read_csv(\"nameofFile.csv\")\n\nsplit = train.split(\n    data=df,\n    target=\"label_column\", # Specify the name of the target column here\n    random_state=42, # Set a random seed\n    test_size=0.3, # Set the test size to be used for splitting the dataset i.e 0.3 = 70% train, 30% test\n    auto_cat_encode=True,  # Automatically encode all categorical columns\n    manual_encode={'label': ['cat_feature'], 'onehot': ['city', 'country']},  # Optional manual encoding for select columns (You can't use this with auto_cat_encode)\n    fix_nan_custom={'column1': 'ffill', 'column2': 'bfill', 'column3': 'interpolate'},  # Specify columns with the strategies to fill with \n    drop=['unnecessary_column']  # Drop columns that are not needed\n)\n```\n\n#### Encoding categorical columns\nIn 'manual_encode', you are expected to pass in the type of encoding you want to perform on the columns in your dataset. The only available encoding types for now are 'label' for label encoding and 'onehot' for one hot encoding.\n\n\n```python\n\n# Automatic encoding\nsplit = train.split(\n    data=df,\n    target='label_column',\n    test_size=0.2,\n    auto_cat_encode=True\n)\n\n# Label encoding\nsplit = train.split(\n    data=df,\n    target='label_column',\n    test_size=0.2,\n    manual_encode={'label': ['column1', 'column2']}\n)\n\n\n# Onehot encoding\nsplit = train.split(\n    data=df,\n    target='label_column',\n    test_size=0.2,\n    manual_encode={'onehot': ['column1', 'column2']}\n)\n\n# Label and onehot encoding\nsplit = train.split(\n    data=df,\n    target='label_column',\n    test_size=0.2,\n    manual_encode={'label': ['column1', 'column2'],\n                   'onehot': ['column3', 'column4']}\n)\n```\n#### Filling missing values\nWith the help of the 'fix_nan_custom' argument, you may quickly fill in missing values.\n\nYou would need to supply a dictionary to the argument in order to fill in the missing values. Each preset key in the dictionary must be used as shown in the example below.\n\n\n```python\n# the three strategies available to fill missing values are ['ffill', 'bfill', 'interpolate']\n\n```python\nsplit = train.split(\n    data=df,\n    target='label_column',\n    test_size=0.2,\n    fix_nan_custom={'column1': 'ffill', 'column2': 'bfill', 'column3': 'interpolate'}\n)\n```\n\n\n### FIT CLASSIFIER\nNow that the dataset has been split using the split method, it is time to train on it using the fit method.\nInstead of the standard training in scikit-learn, catboost, or xgboost, this fit method integrates almost all available machine learning algorithms and trains them all on the dataset.\nIt then returns a pandas dataframe including information such as which algorithm is overfitting, which algorithm has the greatest accuracy, and so on. A basic code example for using the fit function is shown below.\n```python\nimport pandas as pd\nfrom MultiTrain import MultiClassifier\n\ntrain = MultiClassifier()\ndf = pd.read_csv('file.csv')\n\n\nsplit = train.split(data=df\n                    test_size=0.2,\n                    auto_cat_encode=True,\n                    target='label_column'\n                    )\n\nfit = train.fit(\n    datasplits=split,\n    sort='accuracy', # The metric to sort the final results\n)\n\n# The available metrics to pass into sort are \n# 1. accuracy 2. precision 3. recall 4. f1 5. roc_auc\n```\nNow, we would be looking at the various ways the fit method can be implemented. \n#### If you used the traditional train_test_split method available in scikit-learn\n```python\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom MultiTrain import MultiClassifier\ntrain = MultiClassifier()\n\ndf = pd.read_csv('filename.csv')\n\nX_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)\n\ndatasplits = (X_train, X_test, y_train, y_test)\nfit = train.fit(datasplits=datasplit\n              show_train_score=True, # Only set this to true if you want to compare train equivalent of all the metrics shown on the dataframe\n              sort='accuracy', # Set a metric here to sort the resulting dataframe by the best performing model based on the metric\n              custom_metric='log_loss', # If you set a custom metric here, it will be added to the list of metrics displayed on the final table\n              imbalanced=True, # Only set this to true if you're working with an imbalanced dataset. It adjust metrics calculation for imbalanced data\n              text=True, # Set this to true if you're working with NLP\n              vectorizer= 'count', # specify either count or tfidf if you set text to True\n              pipeline_dict = {'ngram_range': (1, 2), 'encoding': 'utf-8', 'max_features': 5000, 'analyzer': 'word'} # You must pass in a similar dictionary also if you set text to True\n              return_best_model = 'f1' # If you set this, it will return the single best performing model based on the f1 score metric\n              ) \n```\n#### If you used the split method provided by the MultiClassifier\n```python\nimport pandas as pd\nfrom MultiTrain import MultiClassifier\n\ntrain = MultiClassifier()\ndf = pd.read_csv('filename.csv')\n\nsplit = train.split(data=df\n                    test_size=0.2,\n                    auto_cat_encode=True,\n                    target='label_column'\n                    )\n\nfit = train.fit(datasplits=split,\n                sort='accuracy',\n                show_train_score=True)     \n```\n#### If you're working on an NLP problem\n```python\nimport pandas as pd\nfrom MultiTrain import MultiClassifier\n\ntrain = MultiClassifier()\ndf = pd.read_csv('filename.csv')\n\nsplit = train.split(data=df\n                    test_size=0.2,\n                    auto_cat_encode=True,\n                    target='label_column'\n                    )\n\nfit = train.fit(datasplits=split,\n                sort='accuracy',\n                show_train_score=True,\n                text=True,\n                vectorizer='tfidf',\n                pipeline_dict = {'ngram_range': (1, 2), 'encoding': 'utf-8', 'max_features': 5000, 'analyzer': 'word'}\n                ) \n```\n\n## MULTIREGRESSOR\n\nThe MultiRegressor is a combination of many classifier estimators, each of which is fitted on the training data and returns assessment metrics for each of the models.\n```python\n#This is a code snippet of how to import the MultiClassifier and the parameters contained in an instance\n\nfrom MultiTrain import MultiRegressor\ntrain = MultiRegressor(\n    n_jobs=-1,          # Use all available CPU cores\n    random_state=42,    # Ensure reproducibility\n    max_iter=1000,      # Maximum number of iterations for models that require it\n    custom_models=['LogisticRegression', 'GradientBoostingClassifier'] # If nothing is set here, all available classifiers will be used for training\n)\n```\n\n### SPLIT REGRESSION\nThis function operates identically like the scikit-learn framework's train test split function.\nHowever, it has some extra features.\nFor example, the split method is demonstrated in the code below.\n```python\nfrom MultiTrain import MultiRegressor\ntrain = MultiRegressor()\ndf = pd.read_csv('sample_data.csv')\nsplit = train.split(data=df\n                    test_size=0.2,\n                    auto_cat_encode=True,\n                    target='label_column'\n                    )\n\n```\n\nIf you want to fill missing values using the split function\n> [Fill missing values](#filling-missing-values)\n\nIf you want to encode your categorical columns using the split function\n> [Encode categorical columns](#encoding-categorical-columns)\n\nAll you need to do is swap out MultiClassifier with MultiRegressor and you're good to go.\n\n### FIT REGRESSION\nNow, we would be looking at the various ways the fit method can be implemented. \n#### If you used the traditional train_test_split method available in scikit-learn\n```python\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom MultiTrain import MultiRegressor\ntrain = MultiRegressor()\n\ndf = pd.read_csv('filename.csv')\n\nX_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)\n\ndatasplits = (X_train, X_test, y_train, y_test)\nfit = train.fit(datasplits=datasplit\n              show_train_score=True, # Only set this to true if you want to compare train equivalent of all the metrics shown on the dataframe\n              sort='mean_squared_error', # Set a metric here to sort the resulting dataframe by the best performing model based on the metric\n              custom_metric='r2_score', # If you set a custom metric here, it will be added to the list of metrics displayed on the final table\n              return_best_model = 'mean_squared_error' # If you set this, it will return the single best performing model based on the mean squared error metric\n              ) \n\n# The metrics available for sorting are \n# mean squared error, r2 score, mean absolute error, median absolute error, mean squared log error, explained variance score\n```\n#### If you used the split method provided by the MultiRegressor\n```python\nimport pandas as pd\nfrom MultiTrain import MultiRegressor\n\ntrain = MultiRegressor()\ndf = pd.read_csv('filename.csv')\n\nsplit = train.split(data=df\n                    test_size=0.2,\n                    auto_cat_encode=True,\n                    target='label_column'\n                    )\n\nfit = train.fit(datasplits=split,\n                sort='r2 score',\n                show_train_score=True)      \n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "MultiTrain is a user-friendly tool that lets you train several machine learning models at once on your dataset, helping you easily find the best model for your needs.",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/LOVE-DOCTOR/MultiTrain"
    },
    "split_keywords": [
        "multitrain",
        " multi",
        " train",
        " multitrain",
        " multiclass",
        " classifier",
        " automl",
        " automl",
        " train multiple models"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fcd8580efe6f491a15a03e4e1413dc87d371cf22bae2750eda370009a5076a1b",
                "md5": "4a79045986cb56c84cb935142602fcb3",
                "sha256": "69f0a256215c1bf9dda6135ba45f557c1cf6dbd621095f59af55b7cc09fe0267"
            },
            "downloads": -1,
            "filename": "MultiTrain-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4a79045986cb56c84cb935142602fcb3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 23673,
            "upload_time": "2025-02-06T20:41:01",
            "upload_time_iso_8601": "2025-02-06T20:41:01.565475Z",
            "url": "https://files.pythonhosted.org/packages/fc/d8/580efe6f491a15a03e4e1413dc87d371cf22bae2750eda370009a5076a1b/MultiTrain-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "65a2c8dd72feb57c3061c28d838b901132ed57b99ee2885070a38c9b7b1f7718",
                "md5": "a3ed470a7ad64f731520c716578fc926",
                "sha256": "7419fd12efa955fc960eb73ece8ae5351f21841ebd260166395cafba1e39c87a"
            },
            "downloads": -1,
            "filename": "multitrain-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a3ed470a7ad64f731520c716578fc926",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19136,
            "upload_time": "2025-02-06T20:41:04",
            "upload_time_iso_8601": "2025-02-06T20:41:04.745131Z",
            "url": "https://files.pythonhosted.org/packages/65/a2/c8dd72feb57c3061c28d838b901132ed57b99ee2885070a38c9b7b1f7718/multitrain-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-06 20:41:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LOVE-DOCTOR",
    "github_project": "MultiTrain",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "multitrain"
}

Shittu Samson