# quick_ml : ML for everyone
[![Build Status](https://travis-ci.org/joemccann/dillinger.svg?branch=master)](https://travis-ci.org/joemccann/dillinger) [![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://gitlab.com/antoreep_jana/quick_ml/-/blob/master/LICENSE)
![quick_ml_logo](https://gitlab.com/antoreep_jana/quick_ml/-/raw/master/quick_ml_logo.png?inline=false)
<br><br>
## Official Website
<br>
**www.quickml.info**
<br><br>
**quick_ml** is a python package (pypi) which provides quick plug and plag prototypes to train Deep Learning Model(s) through optimized utilization of TPU computation power.
- Speed up your Deep Learning Experimentation Workflow by x20 times.
- No unncessary worrying about the details. Those have been taken care of by the library.
- Obtain results of Deep Learning Models on your dataset using minimal lines of code.
- Train Multiple Deep Learning Models in a go just by naming the model names. You need not to worry much about the internal working.
- Full control over the Deep Learning workflow & setting of parameters. All within a single or minimal function call(s).
# New Features!
- Rapidly train about 24 Deep Learning Pretrained Models in one session using TPUs.
- Quick & Easy TFRecords Dataset Maker. TFRecords expidite the training process. -
### Why quick_ml?
- Usual time taken to code & train a deep learning workflow for a single model is atleast 1.5 hrs to 2 hrs (given you are proficient in the field and you know what to code). Using quick_ml, you would be able to do this in less than 20 mins even if you are a beginner.
- Fast experimentation. That's it. Experimenting what works and what doesn't is tedious and tiresome.
## Specifications
Support for __Kaggle Notebooks__ w/ __TPU enabled ON__.
For best performance, import the pretrained weights dataset in the Kaggle Notebook. (https://www.kaggle.com/superficiallybot/tf-keras-pretrained-weights) <br>
**Tensorflow version==2.4.0** <br>
**Python 3.6+** <br>
__Note__ -> Don't import tensorflow in the beginning. With the upcoming updates in the tensorflow, it might take some time to reflect the corresponding changes to the package. The package is built and tested on the most stable version of tensorflow mentioned in the Specifications.
### Few Words about the package
> The idea behind designing the package was to
> reduce the unncessary training time for Deep Learning Models.
> The experimentation time if reduced can help the people concerned with
> the package to focus on the finer details which are often neglected.
> In addition to this, there are several utility functions provided at a single
> place aimed at expediting the ML workflow. These utility functions have been designed
> with ease of use as the foremost priority and attempt has been made to
> optimize the TPU computation as well as bring in most of the functionalities. Attempt has been made to reduce about 500-700 lines of code or even more (depending on what you are about to use) to less than 10 lines of code. Hope you like it!
***
## Table of Contents
***
***
* [Installation](#installation)
* [Getting Started](#getting-started)
* [Making Custom Datasets (TFRecords)](#making-custom-datasets-tfrecords)
* Labeled Data
* Unlabeled Data
* [Visualize & Check the Data](#visualize-check-the-data)
* [Begin Working w/ TPU](#begin-working-w-tpu)
* [Create Models Quickly](#create-model-quickly)
* [Models Training Report](#models-training-report)
* [Callbacks](#callbacks)
* [Predictions](#predictions)
* [K-Fold Training & Predictions](#k-fold-training-predictions)
* [Examples](#examples)
* [Feedback & Development](#feedback-development)
* [Upcoming Features!](#upcoming-features)
* [License](#license)
***
### Installation
***
***
You can quickly get started with the package using pip command.
```
!pip install quick-ml
```
Once you have installed quick_ml package, you can get started with your Deep Learning Workflow. <br> Quickly check whether the correct version of tenserflow has been installed and import tensorflow by the following statement.
<br>
```
import tensorflow as tf
import quick_ml
```
Check the output to know about the status of your installation. Also add, <br>
<br>
***
# Getting Started
___
***
Let's begin exploring the package.
## Making Custom Datasets (TFRecords)
---
To obtain the best performance using TPUs, the package accepts only TFRecords Dataset(s).
Either you have ready-made TFRecords Dataset or you want to obtain TFRecords Dataset for your own image data. This section is devoted to explaining about how to obtain your own Image Dataset TFRecords. <br>
_Note_ -> To utilize the TFRecords dataset created, ensure that the dataset is public while uploading on Kaggle. <br>
_Note_ -> No need to have **TPU ON** for making TFRecords files. Making TFRecords is CPU computation. <br>
_Note_ -> It is better to make TFRecords dataset on Google Colab ( > 75 GB) as Kaggle Kernels have limited Disk Space( < 5 GB). Download the datasets after you are done. Upload them on Kaggle as public datasets. Input the data in the Kaggle Notebooks.
Let's get started with **tfrecords_maker** module of **quick_ml** package. <br>
<br>
**Labeled Data**
For Labeled Data, make sure that the dataset follows the following structure ->
>/ Data
>>> -Class1 <br>
>>> -Class2 <br>
>>> -Class3 <br>
>>> -ClassN <br>
where Class1, Class2, .. , ClassN denote the folders of images as well the class of the images. These shall serve as the labels for classification.
<br>
This is usually used for training and validation data. <br>
However, it can also be used to create labeled Test Data.
<br> <br>
To make labeled data, there are 2 options. <br>
* Convert entire image folder to tfrecords file.
* Split the Image Dataset folder in a specified ratio & make tfrecords files.
<br>
A) Convert entire image folder to tfrecords file <br>
```
from quick_ml.tfrecords_maker import create_tfrecord_labeled
from quick_ml.tfrecords_maker import get_addrs_label
```
To create a tfrecords dataset file from this, the following would be the function call :- <br>
```
create_tfrecord_labeled(addrs, labels, output_filename, IMAGE_SIZE = (192,192))
```
<br>
However, you would need the address (**addrs**) and (**labels**) and shuffle them up. This has been implemented for you in the **get_addrs_label**. Follow the line of code below.<br>
```
addrs, labels = get_addrs_labels(DATA_DIR)
```
<br>
where DATA_DIR directs to the path of the Dataset with the structure mentioned in the beginning of Labeled Data TFRecords. <br>
Obtain the tfrecords file by giving a output_filename you would desire your output file to have using this line of code. <br>
```
create_tfrecord_labeled(addrs, labels, output_filename, IMAGE_SIZE = (192,192))
```
<br>
Ensure that you save the Labeled TFRecord Format somewhere as you would require it to read the data at a later stage. Preferred way of achieving this is through saving it in the Markdown cell below the above code cell. After uploading on Kaggle and making dataset public, adding the Labeled TFRecords Format in the Dataset Description. <br>
B) Split the Image Dataset Folder in a specified ratio & make tfrecords files. <br>
```
from quick_ml.tfrecords_maker import create_split_tfrecords_data
```
To create two tfrecords datasets from the Image Dataset Folder, use the following line of code :- <br>
```
create_split_tfrecords_data(DATA_DIR, outfile1name, outfile2name, split_size_ratio, IMAGE_SIZE = (192,192))
```
<br>
**_DESCRIPTION_** =>
<br>
**DATA_DIR** -> This refers to the Path to the Dataset Folder following the structure mentioned above.
<br> **outfile1name** + **outfile2name** -> Give names to the corresponding output files obtained through the split of the dataset as _outfile1name_ & _outfile2name_. <br>
**split_size_ratio** -> Mention the split size ratio you would to divide your dataset into. <br>
**IMAGE_SIZE** -> The Image Size you would like to set all the images of your dataset in the tfrecords file.
<br>
<br>
<br>
**_RETURNS_** => <br>
Doesn't return anything. Stores the TFRecords file(s) to your disk. Ensure sufficient disk space.
**Unlabeled Data**
For unlabeled data, make sure to follow the following structure. <br>
> / Data
>>> file1 <br>
>>> file2 <br>
>>> file3 <br>
>>> file4 <br>
>>> fileN <br>
where file1, file2, file3, fileN denote the unlabeled, uncategorized image files. The filenames serve as the Id which is paired with the Images as an identification. <br>
This is usually used for test data creation(unknown, unclassified).
<br> <br>
To make unlabeled TFRecords dataset, you would need **create_tfrecord_unlabeled** & **get_addrs_ids**.
```
from quick_ml.tfrecords_maker import create_tfrecord_unlabeled
from quick_ml.tfrecords_maker import get_addrs_ids
```
<br>
First, obtain the image addresses (**addrs**) and image ids (**ids**) using **get_addrs_ids** in the **tfrecords_maker** module. <br>
```
addrs, ids = get_addrs_ids(Unlabeled_Data_Dir)
```
<br>
where, <br>
Unlabeled_Data_dir refers to the Dataset Folder which follows the structure of unlabeled dataset. <br>
After getting the addrs & ids, pass the right parameters for the function to make the TFRecords Dataset for you. <br>
```
unlabeled_dataset = create_tfrecord_unlabeled(out_filename, addrs, ids, IMAGE_SIZE = (192,192))
```
**_DESCRIPTION_** => <br>
**out_filename** - name of the tfrecords outputfile name. <br>
**addrs** - the addrs of the images in the data folder. (can be obtained using get_addrs_ids()) <br>
**ids** - the ids of the imahes in the data folder. (can be obtained using get_addrs_ids()) <br>
**IMAGE_SIZE** - The Image Size of each image you want to have in the TFRecords dataset. Default, (192,192). <br>
**_RETURNS_** => <br>
A TFRecords dataset with examples with 'image' as the first field & 'idnum' as the second field.
<br>
## Visualize & Check the Data
---
After creating your TFRecords Dataset (labeled or unlabeled), you would like to check and glance through your dataset. For this import, **visualize_and_check_data** from **quick_ml**.
<br>
To get started, write the following line of code. :- <br>
```
from quick_ml.visualize_and_check_data import check_one_image_and_label, check_batch_and_labels, check_one_image_and_id, check_batch_and_ids
```
Available methods are :-
<ol>
<li type = 'I'> check_one_image_and_label
<li type = 'I'> check_batch_and_labels
<li type = 'I'> check_one_image_and_id
<li type = 'I'> check_batch_and_ids
</ol>
<br>
**check_one_image_and_label** <br>
Use this for checking labeled TFRecords Dataset. It displays only one image along with its label when the labeled TFRecords dataset is passed as an argument. <br>
```
check_one_image_and_label(tfrecord_filename)
```
**_Description_** => <br>
Displays one image along with its label. <br>
Pass the tfrecord_filename as the argument. It will display one image along with its label from the tfrecords dataset. <br>
**check_batch_and_labels** <br>
Use this for checking labeled TFRecords Dataset. It displays a grid of images along with their labels given the tfrecords dataset passed as an argument. <br>
```
check_batch_and_labels(tfrecord_filename, n_examples, grid_rows, grid_columns, grid_size = (8,8)
```
**_Description_** => <br>
Displays a grid of images along with their labels. <br>
Pass the tfrecord_filename, the number of examples to see (n_examples), divide the n_examples into product of rows (grid_rows) and columns (grid_columns) such that n_examples = grid_rows * grid_columns. Finally the grid_size as a tuple, Default (8,8) as an argument. It will display a grid of images along with their labels from the tfrecords dataset. <br>
**check_one_image_and_id** <br>
Use this for checking unlabeled TFRecords Dataset. It displays only one image along with its id when the unlabeled TFRecords dataset is passed as an argument. <br>
```
check_one_image_and_id(tfrecord_filename)
```
**_Description_** => <br>
Displays one image along with its id. <br>
Pass the tfrecord_filename as the argument. It will display one image along with its id from the tfrecords dataset. <br>
**check_batch_and_ids** <br>
Use this for checking unlabeled TFRecords Dataset. It displays a grid of images along with their ids given the tfrecords dataset passed as an argument. <br>
```
check_batch_and_ids(tfrecord_filename, n_examples, grid_rows, grid_columns, grid_size = (8,8)
```
**_Description_** => <br>
Displays a grid of images along with their ids. <br>
Pass the tfrecord_filename, the number of examples to see (n_examples), divide the n_examples into product of rows (grid_rows) and columns (grid_columns) such that n_examples = grid_rows * grid_columns. Finally the grid_size as a tuple, Default (8,8) as an argument. It will display a grid of images along with their ids from the tfrecords dataset. <br>
<br>
## Begin working w/ TPU
---
This helps you to get the TPU instance, TPU strategy, load the training dataset, validation dataset & test dataset from their TFRecords file & GCS_DS_PATH. <br>
To get all the required utilities, use the following line of code. <br>
```
from quick_ml.begin_tpu import define_tpu_strategy, get_training_dataset, get_validation_dataset, get_test_dataset
```
**_Available Methods & Functionalities_** => <br>
<ol>
<li> define_tpu_strategy
<li> get_training_dataset
<li> get_validation_dataset
<li> get_test_dataset
</ol>
**define_tpu_strategy** <br>
This returns the tpu instance and the tpu strategy. <br>
```
strategy, tpu = define_tpu_strategy()
```
**get_training_dataset** <br>
Helps you load the tfrecords file (TRAINING DATASET). <br>
```
train_dataset = get_training_dataset(GCS_DS_PATH, train_tfrec_path, BATCH_SIZE)
```
**_Description_** => <br>
**GCS_DS_PATH** - The GCS Bucket Path of the tfrecords dataset. <br>
**train_tfrec_path** - the train tfrecords filename path. eg. '/train.tfrecords' <br>
**BATCH_SIZE** - Select the batch size for the images to load in the training dataset instance. <br>
<br>
**_Returns_** => <br>
A tfrecords dataset instance which can be fed to model training as the training dataset.
<br>
**get_validation_dataset** <br>
Helps you load the tfrecords file (VALIDATION DATASET).
```
val_dataset = get_validation_dataset(GCS_DS_PATH, val_tfrec_path, BATCH_SIZE)
```
**_Description_** => <br>
**GCS_DS_PATH** - The GCS Bucket Path of the tfrecords dataset. <br>
**val_tfrec_path** - the validation tfrecords filename path. eg. '/val.tfrecords' <br>
**BATCH_SIZE** - Select the batch size for the images to load in the validation dataset instance. <br>
<br>
**_Returns_** => <br>
A tfrecords dataset instance which can be fed to model training as the validation dataset.
<br>
**get_test_dataset** <br>
Helps you load the tfrecords file (TEST DATASET).
```
test_dataset = get_test_dataset(GCS_DS_PATH, test_tfrec_path, BATCH_SIZE)
```
**_Description_** => <br>
**GCS_DS_PATH** - The GCS Bucket Path of the tfrecords dataset. <br>
**test_tfrec_path** - the test tfrecords filename path. eg. '/test.tfrecords' <br>
**BATCH_SIZE** - Select the batch size for the images to load in the test dataset instance. <br>
<br>
**_Returns_** => <br>
A tfrecords dataset instance which can be used for prediction as test dataset.
<br>
## Create Model Quickly
---
This helps you to create a model ready for training all in a single line of code. <br>
This includes loading the pretrained model along with the weights, addition of the the classification model on top of pretrained model and the compilation of the model. All in a single line of code. <br>
The function is situated in the **load_models_quick** module of **quick_ml** package. <br>
```
from quick_ml.load_models_quick import create_model
```
<br>
**create_model()** function parameters/arguments :- <br>
```
model = create_model(classes, model_name = 'VGG16', classification_model = 'default', freeze = False, input_shape = [512, 512,3], activation = 'softmax', weights= "imagenet", optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = 'sparse_categorical_accuracy')
```
**_Arguemnts Description_** => <br>
**classes** - Number of classes for classification. <br>
**model_name** - Name of the model. Default, VGG16. <br>
Available models -> <br>
MODELS -> 'VGG16', 'VGG19', <br>
'Xception', <br>
'DenseNet121', 'DenseNet169', 'DenseNet201', <br>
'ResNet50', 'ResNet101', 'ResNet152', 'ResNet50V2', 'ResNet101V2', 'ResNet152V2', <br>
'MobileNet', 'MobileNetV2', <br>
'InceptionV3', 'InceptionResNetV2', <br>
'EfficientNetB0', 'EfficientNetB1', 'EfficientNetB2', 'EfficientNetB3', 'EfficientNetB4', 'EfficientNetB5', 'EfficientNetB6', 'EfficientNetB7'
<br>
**classification_model** - The classification model which you want to attach as the top to the pretrained model. The 'default' classification model has a Global Average Pooling2D followed by Dense layer with output nodes same as the number of classes for classification. <br>
You can define your own classification_model (Sequential Model) and pass the model as an argument to the classification model. <br>
```
class_model = tf.keras.Sequential([
tf.keras.layers(),
tf.keras.layers()
])
get_models_training_report(models, tpu, n_class, traindata, steps_per_epoch, epochs, val_data, classification_model = class_model)
```
<br>
**freeze** - True or False. Whether or not to freeze the pretrained model weights while training the model. Default, False.<br>
**input_shape** - Input shape of the images of the TFRecords Dataset. Default, [512,512,3] <br>
**activation** - The activation function to be used for the final layer of the classification model put on top of the pretrained model. For Binary Classification, use 'sigmoid'. For multi-class classification, use 'softmax'. Default, 'softmax'. <br>
**weights** - What kind of weights to use for the pretrained model you have decided as your model backbone. Default, 'imagenet'. Options, 'imagenet' & None. In case you are using 'imagenet' weights, ensure you have loaded [TF Keras pretrained weights](https://www.kaggle.com/superficiallybot/tf-keras-pretrained-weights) in your Kaggle Notebook. <br>
**optimizer** - The optimizer to be used for converging the model while training. Default, 'adam'.<br>
**loss** - Loss function for the model while training. Default, 'sparse_categorical_crossentropy'. Options, 'binary_crossentropy' or 'sparse_categorical_crossentropy'. Use 'binary_crossentropy' for Binary Classifications. Use 'sparse_categorical_crossentropy' for multi-class classifications. Support for 'categorical_crossentropy' is not provided as it is computationally expensive. Both sparse & categorical cross entropy serve the same purpose.<br>
**metrics** - The metrics for gauging your model's training performance. Default, 'sparse_categorical_accuracy'. Options, 'sparse_categorical_accuracy' & 'accuracy'. For Binary Classifications, use 'accuracy'. For Multi-class classifications, use 'sparse_categorical_accuracy'. <br>
<br>
**_Returns_** => <br>
A tf.keras.Sequential **compiled** Model with base model as the pretrained model architecture name specified along with the classification model attached. This model is **_ready for training_** via model.fit(). <br>
## Models Training Report
---
This utility function is designed for getting to know which models are the best for the dataset at hand. Manually training models one by one is troublesome as well as cumbersome. A smart and quick way of achieving this is by using the **get_models_training_report()** from **quick_ml.training_predictions**. <br>
To get started, import the **training_predictions** module from **quick_ml** <br>
```
from quick_ml.training_predictions import get_models_training_report
```
<br>
After passing in the arguments for get_models_training_report, you will obtain a pandas dataframe. However, before getting into the details of the output and what are the parameters to be passed to the function, let's take a quick view of the table output format. <br>
### Output Table Overview
Table Preview of the Pandas DataFrame that is return upon calling the function to obtain training_report. <br>
| Model Name | Train_top1_Accuracy | Train_top3_Accuracy | Val_top1_Accuracy | Val_top3_Accuracy |
| ------ | ------ | ------ | ------ | ------ |
| Model_1 | 97.1 | 96| 94| 93.1|
| Model_2 | 96.2 | 92 | 93| 91|
| Model_3 | 98| 96| 97.1| 96|
| Model_4 | 90| 87| 85| 83|
| Model_5 | 70| 61| 55| 51|
| Model_6 | 91| 86| 90| 88|
<br>
Table Description :- <br>
1) Model Name -> Name of the model trained on the dataset <br>
2) Top 1 Accuracy -> The last accuracy score on training dataset <br>
3) Top 3 Accuracy -> The average of the last 3 accuracy scores on training dataset<br>
4) Val Top 1 Accuracy -> The last validation accuracy score on validation dataset <br>
5) Val Top 3 Accuracy -> The average of the last 3 validation accuracy scores on validation dataset <br>
<br>
#### Using Models Training Report
Once you have successfully imported **get_models_training_report**, pass the arguments as per your requirement. The function returns a pandas dataframe with a table similar to above. The arguemnts are - <br>
```
get_models_training_report(models, tpu, n_class, traindata, steps_per_epoch, epochs, val_data, classification_model = 'default', freeze = False, input_shape = [512,512,3], activation = 'softmax', weights = 'imagenet', optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = 'sparse_categorical_accuracy', plot = False)
```
_**Arguments Description**_ -> <br>
**models** - list of models to obtain the training report on. eg.
``` models = ['VGG16', 'EfficientNetB7', 'InceptionV3', 'ResNet50'] ``` <br>
**tpu** - The TPU instance <br>
**n_class** - number of classes in the Dataset <br>
**traindata** - The training dataset (In TFRecords Dataset) <br>
**steps_per_epoch** - number of steps to be taken per epoch. Ideally, it should be number of training images // BATCH_SIZE <br>
**epochs** - Number of epochs for which models are to be trained. <br>
**val_data** - The validation dataset (In TFRecords Dataset) <br>
**classification_model** - The classification model which you want to attach as the top to the pretrained model. The 'default' classification model has a Global Average Pooling2D followed by Dense layer with output nodes same as the number of classes for classification. <br>
You can define your own classification_model (Sequential Model) and pass the model as an argument to the classification model. <br>
```
class_model = tf.keras.Sequential([
tf.keras.layers(),
tf.keras.layers()
])
get_models_training_report(models, tpu, n_class, traindata, steps_per_epoch, epochs, val_data, classification_model = class_model)
```
**freeze** - Whether or not you want to freeze the pretrained model weights. Default, False. <br>
input_shape - Defines the input_shape of the images of the dataset. Default, [512,512,3] <br>
**activation** - The activation function for the final Dense layer of your Classification model. Default, 'softmax'. For binary classification, change to 'sigmoid' with n_class = 1.<br>
**weights** - The pretrained Model weights to be taken for consideration. Default, 'imagenet'. Support for 'noisy-student' coming soon. <br>
optimizer - The optimizer for the model to converge while training. Default, 'adam' <br>
**loss** - loss function to consider while training your deep learning model. Two options supported. 'Sparse Categorical CrossEntropy' & 'Binary Cross Entropy'. Default, 'Sparse Categorical CrossEntropy'. <br>
**metrics** - The metric to be taken under consideration while training your deep learning model. Two options available. 'accuracy' & 'sparse_categorical_accuracy'. Use 'accuracy' as a metric while doing Binary Classification else 'sparse_categorical_accuracy'. Default, 'sparse_categorical_accuracy'. <br>
**plot** - Plot the training curves of all the models for quick visualization. Feature Coming soon. <br>
<br>
**_Returns_** => <br>
A Pandas Dataframe with a table output as shown above. You can save the function output in a variable and save the dataframe to your disk using .to_csv() method. <br>
## Callbacks
---
In case, the classes of your dataset contain high similarity index, in such cases, it is imperative to have callbacks necessary for your model training and convergence. For obtaining such a model, callbacks are often used.
This utility aims at providing callbacks which are oftenly used while training deep learning models and returns a list of callbacks. Pass this as an argument while training deep learning models. <br>
```
from quick_ml.callbacks import callbacks
```
##### Learning Rate Scheduler
There are 3 different types of learning rate schedulers. <br>
<ol><li> RAMPUP Learning Rate Scheduler
<li> Simple Learning Rate Scheduler
<li> Step-wise Learning Rate Scheduler
</ol>
##### Early Stopping Callback
Use Early Stopping Callback as a measure to prevent the model from overfitting. The default callback setting is as follows <br>
monitor : 'val_loss', min_delta = 0, patience = 0, verbose = 0, <br>
mode = 'auto', baseline = None, restore_best_weights = False.
<br><br>
To use the default settings of Early Stopping Callback, pass
```
callbacks = callbacks(early_stopping = "default")
```
##### Readuce LR On Plateau
Prevent your model from getting stuck at local minima using ReduceLROnPlataeu callback. The default implementation has the following parameter settings => <br>
'monitor' : 'val_loss', 'factor' : 0.1, 'patience' : 10, 'verbose' : 0, mode = 'auto', min_delta = 0.0001, cooldown = 0, min_lr = 0
<br> <br>
_**Combine Multiple callbacks**_ <br>
```
callbacks = callbacks(lr_scheduler = 'rampup', early_stopping = 'default', reduce_lr_on_plateau = 'default' )
```
## Predictions
---
The package supports multlipe options to obtain predictions on your testDataset (only TFRecords Format Dataset). <br><br>
Supported Methods for obtaining predictions -> <br>
- get_predictions
- ensemble_predictions
- Model Averaging
- Model Weighted
- Train K-Fold (Coming Soon)
- Test Time Augmentatios (Coming Soon)
##### Get Predictions
Obtain predictions on the **TEST TFRECORDS Data Format** using get_predictions(). <br>
Two call types have been defined for get_predictions(). <br>
Import the predictions function. <br>
```
from quick_ml.predictions import get_predictions
```
First Definition -> <br>
Use this function definition when you have the GCS_DS_PATH, test_tfrec_path, BATCH_SIZE. <br>
This is usually helpful when you have a trained model weights from a different session and want to obtain predictions in a different session. Usually beneficial if there are multiple models from whom predictions are to be obtained. Training of multiple models using get_models_training_report() from quick_ml.training_predictions in one session. Saving the best model weights in the same session using create_model() from quick_ml.load_models_quickly. Testing/Predictions in a different session for multiple models using this function definition. This is the best way to deal with multiple models. <br>
```
predictions = get_predictions(GCS_DS_PATH, test_tfrec_path, BATCH_SIZE, model)
```
Second Definition <br>
Use this function when you have testTFDataset and model. <br>
This function definition is usually the best option when you have one model and want to obtain the predictions in the same session. For this, you must have loaded the datasets before. However, you are free to explore better possibilites with the above two functions.
```
prediction = get_predictions(testTFdataset, model)
```
<br>
## K-Fold Training & Predictions
---
<br>
K-Fold Cross Validation is usually performed to verify that the selected model's good performance isn't due to data bias. <br>
This would be highly beneficial after obtaining Training Report of the models and you have selected your model architecture you would be working with. <br>
To get started with K-Fold Cross Validation & Predictions, <br>
```
from quick_ml.k_fold_training import train_k_fold_predict
```
<br>
Function Definition :- <br>
```
train_k_fold_predict(k, tpu, train_tfrecs, val_tfrecs, test_tfrecs, GCS_DS_PATH, BATCH_SIZE)
```
<br>
**_Description_** => <br>
**k** -> The number of folds. Usually, 5 or 10. <br>
**tpu** -> the tpu instance. To be obtained from get_strategy()<br>
**train_tfrecs** -> The complete path location of the tfrecord files of training dataset. <br>
**val_tfrecs** -> The complete path location of the tfrecord files of the validation dataset. <br>
**test_tfrecs** -> The complete path location of the tfrecord files of the test dataset. <br>
**GCS_DS_PATH** -> The Google Cloud Bucket Service Location of the Dataset.<br>
**BATCH_SIZE** -> Select the batch size of the training dataset. Usually, the value should be a multiple of 128 for efficient utilization of TPUs. <br>
<br> <br>
**_Returns_** => <br>
Doesn't return anything. Saves an output file with the result of each fold training along with its validation result. <br>
<br>
## Examples
***
Following are few Kaggle Notebooks showing the working of **quick_ml** python package. <br>
<br>
TFRecords Dataset Making -> [Notebook 1](https://www.kaggle.com/superficiallybot/quick-ml-tfrecords-maker?scriptVersionId=41133131) <br>
Binary Classification -> [Notebook 2](https://www.kaggle.com/superficiallybot/) <br>
Multi-Class Classification -> [Notebook 3](https://www.kaggle.com/superficiallybot/quick-ml-multiclass-classification-tpu?scriptVersionId=41169786) <br>
## Feedback & Development
***
__Want to contribute? Great!__ <br>
Send your ideas to antoreepjana@gmail.com and ensure the format of the subject of the mail as
[quick_ml Contribute] -> [Your Subject]
__Want to suggest an improvement or a feature? Most Welcome!__ <br>
Send your ideas to antoreepjana@gmail.com and ensure the format of the subject of the mail as [quick_ml Suggestion] -> [Your Subject]
__Want to share errors or complaint something which you didn't like? That's how it improves ;)__ <br>
Send your grievances to antoreepjana@gmail.com and ensure the format of the subject of the mail as [quick_ml Grievances] -> [Your Subject]
__Want to send love, thanks and appreciation? I'd love to hear from you!__ <br>
Send your love to antoreepjana@gmail.com and ensure the format of the subject of the mail as [quick_ml Feedback] -> [Your Subject]
## Upcoming Features!
***
- Data Augmentations on TPU.
- Support for Hyper-Parameter Tuning
## License
***
MIT
**Free Software, Hell Yeah!**
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/antoreep_jana/quick_ml",
"name": "quick-ml",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "quick_ml, quick ml, TPU, Distributed Training, GPU, Multi GPU Training, Deep Learning TPU, tensorflow, deep learning",
"author": "Antoreep Jana",
"author_email": "antoreepjana@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d8/48/620da7a2e8fdd535da869b78570de30d3f1aabf25e92b8191fe5739b604b/quick_ml-1.3.20.tar.gz",
"platform": null,
"description": "# quick_ml : ML for everyone\n[![Build Status](https://travis-ci.org/joemccann/dillinger.svg?branch=master)](https://travis-ci.org/joemccann/dillinger) [![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://gitlab.com/antoreep_jana/quick_ml/-/blob/master/LICENSE)\n\n ![quick_ml_logo](https://gitlab.com/antoreep_jana/quick_ml/-/raw/master/quick_ml_logo.png?inline=false)\n\n<br><br>\n\n## Official Website\n\n<br>\n\n**www.quickml.info**\n\n<br><br>\n**quick_ml** is a python package (pypi) which provides quick plug and plag prototypes to train Deep Learning Model(s) through optimized utilization of TPU computation power.\n\n - Speed up your Deep Learning Experimentation Workflow by x20 times.\n - No unncessary worrying about the details. Those have been taken care of by the library.\n - Obtain results of Deep Learning Models on your dataset using minimal lines of code.\n - Train Multiple Deep Learning Models in a go just by naming the model names. You need not to worry much about the internal working.\n - Full control over the Deep Learning workflow & setting of parameters. All within a single or minimal function call(s).\n\n# New Features!\n\n - Rapidly train about 24 Deep Learning Pretrained Models in one session using TPUs.\n - Quick & Easy TFRecords Dataset Maker. TFRecords expidite the training process. - \n\n\n### Why quick_ml?\n - Usual time taken to code & train a deep learning workflow for a single model is atleast 1.5 hrs to 2 hrs (given you are proficient in the field and you know what to code). Using quick_ml, you would be able to do this in less than 20 mins even if you are a beginner.\n - Fast experimentation. That's it. Experimenting what works and what doesn't is tedious and tiresome. \n \n\n## Specifications\n Support for __Kaggle Notebooks__ w/ __TPU enabled ON__.\n For best performance, import the pretrained weights dataset in the Kaggle Notebook. (https://www.kaggle.com/superficiallybot/tf-keras-pretrained-weights) <br>\n **Tensorflow version==2.4.0** <br>\n **Python 3.6+** <br>\n \n __Note__ -> Don't import tensorflow in the beginning. With the upcoming updates in the tensorflow, it might take some time to reflect the corresponding changes to the package. The package is built and tested on the most stable version of tensorflow mentioned in the Specifications.\n \n\n### Few Words about the package\n\n> The idea behind designing the package was to \n> reduce the unncessary training time for Deep Learning Models.\n> The experimentation time if reduced can help the people concerned with\n> the package to focus on the finer details which are often neglected. \n> In addition to this, there are several utility functions provided at a single\n> place aimed at expediting the ML workflow. These utility functions have been designed\n> with ease of use as the foremost priority and attempt has been made to \n> optimize the TPU computation as well as bring in most of the functionalities. Attempt has been made to reduce about 500-700 lines of code or even more (depending on what you are about to use) to less than 10 lines of code. Hope you like it!\n\n***\n## Table of Contents\n***\n***\n\n* [Installation](#installation)\n* [Getting Started](#getting-started)\n * [Making Custom Datasets (TFRecords)](#making-custom-datasets-tfrecords)\n * Labeled Data\n * Unlabeled Data\n * [Visualize & Check the Data](#visualize-check-the-data)\n * [Begin Working w/ TPU](#begin-working-w-tpu)\n * [Create Models Quickly](#create-model-quickly)\n * [Models Training Report](#models-training-report)\n * [Callbacks](#callbacks)\n * [Predictions](#predictions)\n * [K-Fold Training & Predictions](#k-fold-training-predictions)\n* [Examples](#examples)\n* [Feedback & Development](#feedback-development)\n* [Upcoming Features!](#upcoming-features)\n* [License](#license)\n\n\n***\n### Installation\n***\n***\n\nYou can quickly get started with the package using pip command.\n\n```\n!pip install quick-ml\n```\n\nOnce you have installed quick_ml package, you can get started with your Deep Learning Workflow. <br> Quickly check whether the correct version of tenserflow has been installed and import tensorflow by the following statement.\n\n<br>\n\n\n```\nimport tensorflow as tf\nimport quick_ml\n```\n\nCheck the output to know about the status of your installation. Also add, <br>\n\n<br>\n\n***\n# Getting Started\n___\n***\nLet's begin exploring the package. \n\n## Making Custom Datasets (TFRecords)\n---\n\nTo obtain the best performance using TPUs, the package accepts only TFRecords Dataset(s). \nEither you have ready-made TFRecords Dataset or you want to obtain TFRecords Dataset for your own image data. This section is devoted to explaining about how to obtain your own Image Dataset TFRecords. <br>\n_Note_ -> To utilize the TFRecords dataset created, ensure that the dataset is public while uploading on Kaggle. <br>\n_Note_ -> No need to have **TPU ON** for making TFRecords files. Making TFRecords is CPU computation. <br>\n_Note_ -> It is better to make TFRecords dataset on Google Colab ( > 75 GB) as Kaggle Kernels have limited Disk Space( < 5 GB). Download the datasets after you are done. Upload them on Kaggle as public datasets. Input the data in the Kaggle Notebooks.\n\nLet's get started with **tfrecords_maker** module of **quick_ml** package. <br>\n\n<br>\n\n**Labeled Data**\n\nFor Labeled Data, make sure that the dataset follows the following structure -> \n\n>/ Data\n>>> -Class1 <br>\n>>> -Class2 <br>\n>>> -Class3 <br>\n>>> -ClassN <br>\n\nwhere Class1, Class2, .. , ClassN denote the folders of images as well the class of the images. These shall serve as the labels for classification.\n<br> \nThis is usually used for training and validation data. <br>\nHowever, it can also be used to create labeled Test Data.\n<br> <br>\n\nTo make labeled data, there are 2 options. <br>\n* Convert entire image folder to tfrecords file.\n * Split the Image Dataset folder in a specified ratio & make tfrecords files.\n\n<br>\nA) Convert entire image folder to tfrecords file <br>\n\n```\nfrom quick_ml.tfrecords_maker import create_tfrecord_labeled\nfrom quick_ml.tfrecords_maker import get_addrs_label\n```\n\nTo create a tfrecords dataset file from this, the following would be the function call :- <br>\n\n```\ncreate_tfrecord_labeled(addrs, labels, output_filename, IMAGE_SIZE = (192,192))\n```\n\n<br>\n\nHowever, you would need the address (**addrs**) and (**labels**) and shuffle them up. This has been implemented for you in the **get_addrs_label**. Follow the line of code below.<br>\n\n```\naddrs, labels = get_addrs_labels(DATA_DIR)\n```\n\n<br>\nwhere DATA_DIR directs to the path of the Dataset with the structure mentioned in the beginning of Labeled Data TFRecords. <br>\n\nObtain the tfrecords file by giving a output_filename you would desire your output file to have using this line of code. <br>\n\n```\ncreate_tfrecord_labeled(addrs, labels, output_filename, IMAGE_SIZE = (192,192))\n```\n\n<br>\nEnsure that you save the Labeled TFRecord Format somewhere as you would require it to read the data at a later stage. Preferred way of achieving this is through saving it in the Markdown cell below the above code cell. After uploading on Kaggle and making dataset public, adding the Labeled TFRecords Format in the Dataset Description. <br>\n\nB) Split the Image Dataset Folder in a specified ratio & make tfrecords files. <br>\n\n```\nfrom quick_ml.tfrecords_maker import create_split_tfrecords_data\n```\n\nTo create two tfrecords datasets from the Image Dataset Folder, use the following line of code :- <br>\n\n```\ncreate_split_tfrecords_data(DATA_DIR, outfile1name, outfile2name, split_size_ratio, IMAGE_SIZE = (192,192))\n```\n\n<br>\n\n**_DESCRIPTION_** => \n\n<br>\n\n **DATA_DIR** -> This refers to the Path to the Dataset Folder following the structure mentioned above. \n<br> **outfile1name** + **outfile2name** -> Give names to the corresponding output files obtained through the split of the dataset as _outfile1name_ & _outfile2name_. <br>\n **split_size_ratio** -> Mention the split size ratio you would to divide your dataset into. <br>\n **IMAGE_SIZE** -> The Image Size you would like to set all the images of your dataset in the tfrecords file.\n\n<br>\n<br>\n\n<br>\n\n**_RETURNS_** => <br>\nDoesn't return anything. Stores the TFRecords file(s) to your disk. Ensure sufficient disk space.\n\n**Unlabeled Data**\n\nFor unlabeled data, make sure to follow the following structure. <br>\n\n> / Data\n>>> file1 <br>\n>>> file2 <br>\n>>> file3 <br>\n>>> file4 <br>\n>>> fileN <br>\n\nwhere file1, file2, file3, fileN denote the unlabeled, uncategorized image files. The filenames serve as the Id which is paired with the Images as an identification. <br>\nThis is usually used for test data creation(unknown, unclassified).\n<br> <br>\nTo make unlabeled TFRecords dataset, you would need **create_tfrecord_unlabeled** & **get_addrs_ids**.\n\n```\nfrom quick_ml.tfrecords_maker import create_tfrecord_unlabeled\nfrom quick_ml.tfrecords_maker import get_addrs_ids\n```\n\n<br>\n\nFirst, obtain the image addresses (**addrs**) and image ids (**ids**) using **get_addrs_ids** in the **tfrecords_maker** module. <br> \n\n```\naddrs, ids = get_addrs_ids(Unlabeled_Data_Dir)\n```\n\n<br>\nwhere, <br>\nUnlabeled_Data_dir refers to the Dataset Folder which follows the structure of unlabeled dataset. <br>\n\nAfter getting the addrs & ids, pass the right parameters for the function to make the TFRecords Dataset for you. <br>\n\n```\nunlabeled_dataset = create_tfrecord_unlabeled(out_filename, addrs, ids, IMAGE_SIZE = (192,192))\n```\n\n**_DESCRIPTION_** => <br>\n **out_filename** - name of the tfrecords outputfile name. <br>\n **addrs** - the addrs of the images in the data folder. (can be obtained using get_addrs_ids()) <br>\n **ids** - the ids of the imahes in the data folder. (can be obtained using get_addrs_ids()) <br>\n **IMAGE_SIZE** - The Image Size of each image you want to have in the TFRecords dataset. Default, (192,192). <br>\n\n\n**_RETURNS_** => <br>\nA TFRecords dataset with examples with 'image' as the first field & 'idnum' as the second field. \n\n<br>\n\n## Visualize & Check the Data\n---\n\nAfter creating your TFRecords Dataset (labeled or unlabeled), you would like to check and glance through your dataset. For this import, **visualize_and_check_data** from **quick_ml**.\n<br>\nTo get started, write the following line of code. :- <br>\n\n```\nfrom quick_ml.visualize_and_check_data import check_one_image_and_label, check_batch_and_labels, check_one_image_and_id, check_batch_and_ids\n```\n\nAvailable methods are :- \n<ol>\n<li type = 'I'> check_one_image_and_label\n<li type = 'I'> check_batch_and_labels\n<li type = 'I'> check_one_image_and_id\n<li type = 'I'> check_batch_and_ids\n</ol>\n\n<br>\n\n**check_one_image_and_label** <br>\n\nUse this for checking labeled TFRecords Dataset. It displays only one image along with its label when the labeled TFRecords dataset is passed as an argument. <br>\n\n```\ncheck_one_image_and_label(tfrecord_filename)\n```\n\n**_Description_** => <br>\nDisplays one image along with its label. <br>\nPass the tfrecord_filename as the argument. It will display one image along with its label from the tfrecords dataset. <br>\n\n**check_batch_and_labels** <br>\nUse this for checking labeled TFRecords Dataset. It displays a grid of images along with their labels given the tfrecords dataset passed as an argument. <br>\n\n```\ncheck_batch_and_labels(tfrecord_filename, n_examples, grid_rows, grid_columns, grid_size = (8,8)\n```\n\n**_Description_** => <br>\nDisplays a grid of images along with their labels. <br>\nPass the tfrecord_filename, the number of examples to see (n_examples), divide the n_examples into product of rows (grid_rows) and columns (grid_columns) such that n_examples = grid_rows * grid_columns. Finally the grid_size as a tuple, Default (8,8) as an argument. It will display a grid of images along with their labels from the tfrecords dataset. <br>\n\n**check_one_image_and_id** <br>\n\nUse this for checking unlabeled TFRecords Dataset. It displays only one image along with its id when the unlabeled TFRecords dataset is passed as an argument. <br>\n\n```\ncheck_one_image_and_id(tfrecord_filename)\n```\n\n**_Description_** => <br>\nDisplays one image along with its id. <br>\nPass the tfrecord_filename as the argument. It will display one image along with its id from the tfrecords dataset. <br>\n\n**check_batch_and_ids** <br>\nUse this for checking unlabeled TFRecords Dataset. It displays a grid of images along with their ids given the tfrecords dataset passed as an argument. <br>\n\n```\ncheck_batch_and_ids(tfrecord_filename, n_examples, grid_rows, grid_columns, grid_size = (8,8)\n```\n\n**_Description_** => <br>\nDisplays a grid of images along with their ids. <br>\nPass the tfrecord_filename, the number of examples to see (n_examples), divide the n_examples into product of rows (grid_rows) and columns (grid_columns) such that n_examples = grid_rows * grid_columns. Finally the grid_size as a tuple, Default (8,8) as an argument. It will display a grid of images along with their ids from the tfrecords dataset. <br>\n\n<br>\n\n## Begin working w/ TPU\n---\n\nThis helps you to get the TPU instance, TPU strategy, load the training dataset, validation dataset & test dataset from their TFRecords file & GCS_DS_PATH. <br>\n\nTo get all the required utilities, use the following line of code. <br>\n\n```\nfrom quick_ml.begin_tpu import define_tpu_strategy, get_training_dataset, get_validation_dataset, get_test_dataset\n```\n\n**_Available Methods & Functionalities_** => <br>\n\n<ol>\n<li> define_tpu_strategy\n<li> get_training_dataset\n<li> get_validation_dataset\n<li> get_test_dataset\n</ol>\n\n**define_tpu_strategy** <br>\nThis returns the tpu instance and the tpu strategy. <br>\n\n```\nstrategy, tpu = define_tpu_strategy()\n```\n\n**get_training_dataset** <br>\nHelps you load the tfrecords file (TRAINING DATASET). <br>\n\n```\ntrain_dataset = get_training_dataset(GCS_DS_PATH, train_tfrec_path, BATCH_SIZE)\n```\n\n**_Description_** => <br>\n **GCS_DS_PATH** - The GCS Bucket Path of the tfrecords dataset. <br>\n **train_tfrec_path** - the train tfrecords filename path. eg. '/train.tfrecords' <br>\n **BATCH_SIZE** - Select the batch size for the images to load in the training dataset instance. <br>\n\n<br>\n\n**_Returns_** => <br>\nA tfrecords dataset instance which can be fed to model training as the training dataset.\n\n<br>\n\n**get_validation_dataset** <br>\nHelps you load the tfrecords file (VALIDATION DATASET).\n\n```\nval_dataset = get_validation_dataset(GCS_DS_PATH, val_tfrec_path, BATCH_SIZE)\n```\n\n**_Description_** => <br>\n **GCS_DS_PATH** - The GCS Bucket Path of the tfrecords dataset. <br>\n **val_tfrec_path** - the validation tfrecords filename path. eg. '/val.tfrecords' <br>\n **BATCH_SIZE** - Select the batch size for the images to load in the validation dataset instance. <br>\n\n<br>\n\n**_Returns_** => <br>\nA tfrecords dataset instance which can be fed to model training as the validation dataset.\n\n<br>\n\n**get_test_dataset** <br>\nHelps you load the tfrecords file (TEST DATASET).\n\n```\ntest_dataset = get_test_dataset(GCS_DS_PATH, test_tfrec_path, BATCH_SIZE)\n```\n\n**_Description_** => <br>\n **GCS_DS_PATH** - The GCS Bucket Path of the tfrecords dataset. <br>\n **test_tfrec_path** - the test tfrecords filename path. eg. '/test.tfrecords' <br>\n **BATCH_SIZE** - Select the batch size for the images to load in the test dataset instance. <br>\n\n<br>\n\n**_Returns_** => <br>\nA tfrecords dataset instance which can be used for prediction as test dataset.\n\n<br>\n\n\n\n## Create Model Quickly\n---\n\nThis helps you to create a model ready for training all in a single line of code. <br>\nThis includes loading the pretrained model along with the weights, addition of the the classification model on top of pretrained model and the compilation of the model. All in a single line of code. <br>\nThe function is situated in the **load_models_quick** module of **quick_ml** package. <br>\n```\nfrom quick_ml.load_models_quick import create_model\n```\n<br>\n\n**create_model()** function parameters/arguments :- <br>\n\n```\nmodel = create_model(classes, model_name = 'VGG16', classification_model = 'default', freeze = False, input_shape = [512, 512,3], activation = 'softmax', weights= \"imagenet\", optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = 'sparse_categorical_accuracy')\n```\n\n**_Arguemnts Description_** => <br>\n **classes** - Number of classes for classification. <br>\n **model_name** - Name of the model. Default, VGG16. <br>\nAvailable models -> <br>\nMODELS -> 'VGG16', 'VGG19', <br>\n 'Xception', <br>\n 'DenseNet121', 'DenseNet169', 'DenseNet201', <br>\n 'ResNet50', 'ResNet101', 'ResNet152', 'ResNet50V2', 'ResNet101V2', 'ResNet152V2', <br>\n 'MobileNet', 'MobileNetV2', <br>\n 'InceptionV3', 'InceptionResNetV2', <br> \n 'EfficientNetB0', 'EfficientNetB1', 'EfficientNetB2', 'EfficientNetB3', 'EfficientNetB4', 'EfficientNetB5', 'EfficientNetB6', 'EfficientNetB7'\n<br>\n\n **classification_model** - The classification model which you want to attach as the top to the pretrained model. The 'default' classification model has a Global Average Pooling2D followed by Dense layer with output nodes same as the number of classes for classification. <br>\nYou can define your own classification_model (Sequential Model) and pass the model as an argument to the classification model. <br>\n```\nclass_model = tf.keras.Sequential([\ntf.keras.layers(),\ntf.keras.layers()\n])\n\nget_models_training_report(models, tpu, n_class, traindata, steps_per_epoch, epochs, val_data, classification_model = class_model)\n```\n\n<br>\n\n **freeze** - True or False. Whether or not to freeze the pretrained model weights while training the model. Default, False.<br>\n **input_shape** - Input shape of the images of the TFRecords Dataset. Default, [512,512,3] <br>\n **activation** - The activation function to be used for the final layer of the classification model put on top of the pretrained model. For Binary Classification, use 'sigmoid'. For multi-class classification, use 'softmax'. Default, 'softmax'. <br> \n **weights** - What kind of weights to use for the pretrained model you have decided as your model backbone. Default, 'imagenet'. Options, 'imagenet' & None. In case you are using 'imagenet' weights, ensure you have loaded [TF Keras pretrained weights](https://www.kaggle.com/superficiallybot/tf-keras-pretrained-weights) in your Kaggle Notebook. <br>\n **optimizer** - The optimizer to be used for converging the model while training. Default, 'adam'.<br>\n **loss** - Loss function for the model while training. Default, 'sparse_categorical_crossentropy'. Options, 'binary_crossentropy' or 'sparse_categorical_crossentropy'. Use 'binary_crossentropy' for Binary Classifications. Use 'sparse_categorical_crossentropy' for multi-class classifications. Support for 'categorical_crossentropy' is not provided as it is computationally expensive. Both sparse & categorical cross entropy serve the same purpose.<br>\n **metrics** - The metrics for gauging your model's training performance. Default, 'sparse_categorical_accuracy'. Options, 'sparse_categorical_accuracy' & 'accuracy'. For Binary Classifications, use 'accuracy'. For Multi-class classifications, use 'sparse_categorical_accuracy'. <br>\n\n<br>\n\n**_Returns_** => <br>\nA tf.keras.Sequential **compiled** Model with base model as the pretrained model architecture name specified along with the classification model attached. This model is **_ready for training_** via model.fit(). <br>\n\n## Models Training Report\n---\n\nThis utility function is designed for getting to know which models are the best for the dataset at hand. Manually training models one by one is troublesome as well as cumbersome. A smart and quick way of achieving this is by using the **get_models_training_report()** from **quick_ml.training_predictions**. <br>\nTo get started, import the **training_predictions** module from **quick_ml** <br>\n```\nfrom quick_ml.training_predictions import get_models_training_report\n```\n<br>\n\nAfter passing in the arguments for get_models_training_report, you will obtain a pandas dataframe. However, before getting into the details of the output and what are the parameters to be passed to the function, let's take a quick view of the table output format. <br>\n\n### Output Table Overview \nTable Preview of the Pandas DataFrame that is return upon calling the function to obtain training_report. <br>\n\n| Model Name | Train_top1_Accuracy | Train_top3_Accuracy | Val_top1_Accuracy | Val_top3_Accuracy |\n| ------ | ------ | ------ | ------ | ------ |\n| Model_1 | 97.1 | 96| 94| 93.1|\n| Model_2 | 96.2 | 92 | 93| 91|\n| Model_3 | 98| 96| 97.1| 96|\n| Model_4 | 90| 87| 85| 83|\n| Model_5 | 70| 61| 55| 51|\n| Model_6 | 91| 86| 90| 88|\n\n<br>\nTable Description :- <br>\n1) Model Name -> Name of the model trained on the dataset <br>\n2) Top 1 Accuracy -> The last accuracy score on training dataset <br>\n3) Top 3 Accuracy -> The average of the last 3 accuracy scores on training dataset<br>\n4) Val Top 1 Accuracy -> The last validation accuracy score on validation dataset <br>\n5) Val Top 3 Accuracy -> The average of the last 3 validation accuracy scores on validation dataset <br>\n<br>\n\n#### Using Models Training Report\n\nOnce you have successfully imported **get_models_training_report**, pass the arguments as per your requirement. The function returns a pandas dataframe with a table similar to above. The arguemnts are - <br>\n\n```\nget_models_training_report(models, tpu, n_class, traindata, steps_per_epoch, epochs, val_data, classification_model = 'default', freeze = False, input_shape = [512,512,3], activation = 'softmax', weights = 'imagenet', optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = 'sparse_categorical_accuracy', plot = False)\n```\n\n_**Arguments Description**_ -> <br>\n **models** - list of models to obtain the training report on. eg.\n``` models = ['VGG16', 'EfficientNetB7', 'InceptionV3', 'ResNet50'] ``` <br>\n **tpu** - The TPU instance <br>\n **n_class** - number of classes in the Dataset <br>\n **traindata** - The training dataset (In TFRecords Dataset) <br>\n **steps_per_epoch** - number of steps to be taken per epoch. Ideally, it should be number of training images // BATCH_SIZE <br>\n **epochs** - Number of epochs for which models are to be trained. <br>\n **val_data** - The validation dataset (In TFRecords Dataset) <br>\n **classification_model** - The classification model which you want to attach as the top to the pretrained model. The 'default' classification model has a Global Average Pooling2D followed by Dense layer with output nodes same as the number of classes for classification. <br>\nYou can define your own classification_model (Sequential Model) and pass the model as an argument to the classification model. <br>\n```\nclass_model = tf.keras.Sequential([\ntf.keras.layers(),\ntf.keras.layers()\n])\n\nget_models_training_report(models, tpu, n_class, traindata, steps_per_epoch, epochs, val_data, classification_model = class_model)\n```\n\n **freeze** - Whether or not you want to freeze the pretrained model weights. Default, False. <br>\ninput_shape - Defines the input_shape of the images of the dataset. Default, [512,512,3] <br>\n **activation** - The activation function for the final Dense layer of your Classification model. Default, 'softmax'. For binary classification, change to 'sigmoid' with n_class = 1.<br>\n **weights** - The pretrained Model weights to be taken for consideration. Default, 'imagenet'. Support for 'noisy-student' coming soon. <br>\noptimizer - The optimizer for the model to converge while training. Default, 'adam' <br>\n **loss** - loss function to consider while training your deep learning model. Two options supported. 'Sparse Categorical CrossEntropy' & 'Binary Cross Entropy'. Default, 'Sparse Categorical CrossEntropy'. <br>\n **metrics** - The metric to be taken under consideration while training your deep learning model. Two options available. 'accuracy' & 'sparse_categorical_accuracy'. Use 'accuracy' as a metric while doing Binary Classification else 'sparse_categorical_accuracy'. Default, 'sparse_categorical_accuracy'. <br>\n **plot** - Plot the training curves of all the models for quick visualization. Feature Coming soon. <br>\n<br>\n**_Returns_** => <br>\nA Pandas Dataframe with a table output as shown above. You can save the function output in a variable and save the dataframe to your disk using .to_csv() method. <br>\n\n## Callbacks\n---\n\nIn case, the classes of your dataset contain high similarity index, in such cases, it is imperative to have callbacks necessary for your model training and convergence. For obtaining such a model, callbacks are often used. \nThis utility aims at providing callbacks which are oftenly used while training deep learning models and returns a list of callbacks. Pass this as an argument while training deep learning models. <br>\n\n```\nfrom quick_ml.callbacks import callbacks\n```\n\n##### Learning Rate Scheduler \n\nThere are 3 different types of learning rate schedulers. <br>\n <ol><li> RAMPUP Learning Rate Scheduler\n <li> Simple Learning Rate Scheduler\n <li> Step-wise Learning Rate Scheduler\n </ol>\n\n##### Early Stopping Callback\n\nUse Early Stopping Callback as a measure to prevent the model from overfitting. The default callback setting is as follows <br>\n monitor : 'val_loss', min_delta = 0, patience = 0, verbose = 0, <br>\n mode = 'auto', baseline = None, restore_best_weights = False.\n <br><br>\n \nTo use the default settings of Early Stopping Callback, pass\n```\ncallbacks = callbacks(early_stopping = \"default\")\n```\n\n\n \n\n##### Readuce LR On Plateau\n\nPrevent your model from getting stuck at local minima using ReduceLROnPlataeu callback. The default implementation has the following parameter settings => <br>\n 'monitor' : 'val_loss', 'factor' : 0.1, 'patience' : 10, 'verbose' : 0, mode = 'auto', min_delta = 0.0001, cooldown = 0, min_lr = 0\n <br> <br>\n \n _**Combine Multiple callbacks**_ <br>\n ```\n callbacks = callbacks(lr_scheduler = 'rampup', early_stopping = 'default', reduce_lr_on_plateau = 'default' )\n ```\n \n ## Predictions\n ---\n \n The package supports multlipe options to obtain predictions on your testDataset (only TFRecords Format Dataset). <br><br>\n \n Supported Methods for obtaining predictions -> <br>\n - get_predictions\n - ensemble_predictions\n - Model Averaging\n - Model Weighted\n - Train K-Fold (Coming Soon)\n - Test Time Augmentatios (Coming Soon)\n \n ##### Get Predictions\n\n Obtain predictions on the **TEST TFRECORDS Data Format** using get_predictions(). <br>\n Two call types have been defined for get_predictions(). <br>\n Import the predictions function. <br>\n ```\n from quick_ml.predictions import get_predictions\n ```\n \n First Definition -> <br>\n Use this function definition when you have the GCS_DS_PATH, test_tfrec_path, BATCH_SIZE. <br>\n This is usually helpful when you have a trained model weights from a different session and want to obtain predictions in a different session. Usually beneficial if there are multiple models from whom predictions are to be obtained. Training of multiple models using get_models_training_report() from quick_ml.training_predictions in one session. Saving the best model weights in the same session using create_model() from quick_ml.load_models_quickly. Testing/Predictions in a different session for multiple models using this function definition. This is the best way to deal with multiple models. <br>\n ```\n predictions = get_predictions(GCS_DS_PATH, test_tfrec_path, BATCH_SIZE, model)\n ```\n\nSecond Definition <br>\nUse this function when you have testTFDataset and model. <br>\nThis function definition is usually the best option when you have one model and want to obtain the predictions in the same session. For this, you must have loaded the datasets before. However, you are free to explore better possibilites with the above two functions.\n```\nprediction = get_predictions(testTFdataset, model)\n```\n\n<br>\n\n## K-Fold Training & Predictions\n---\n\n<br>\n\nK-Fold Cross Validation is usually performed to verify that the selected model's good performance isn't due to data bias. <br>\nThis would be highly beneficial after obtaining Training Report of the models and you have selected your model architecture you would be working with. <br>\n\nTo get started with K-Fold Cross Validation & Predictions, <br>\n\n```\nfrom quick_ml.k_fold_training import train_k_fold_predict\n```\n\n<br>\nFunction Definition :- <br>\n\n```\ntrain_k_fold_predict(k, tpu, train_tfrecs, val_tfrecs, test_tfrecs, GCS_DS_PATH, BATCH_SIZE)\n```\n\n<br>\n\n**_Description_** => <br>\n **k** -> The number of folds. Usually, 5 or 10. <br>\n **tpu** -> the tpu instance. To be obtained from get_strategy()<br>\n **train_tfrecs** -> The complete path location of the tfrecord files of training dataset. <br>\n **val_tfrecs** -> The complete path location of the tfrecord files of the validation dataset. <br>\n **test_tfrecs** -> The complete path location of the tfrecord files of the test dataset. <br>\n **GCS_DS_PATH** -> The Google Cloud Bucket Service Location of the Dataset.<br>\n **BATCH_SIZE** -> Select the batch size of the training dataset. Usually, the value should be a multiple of 128 for efficient utilization of TPUs. <br>\n\n\n\n<br> <br>\n**_Returns_** => <br>\nDoesn't return anything. Saves an output file with the result of each fold training along with its validation result. <br>\n\n\n<br>\n\n## Examples\n***\n\nFollowing are few Kaggle Notebooks showing the working of **quick_ml** python package. <br>\n<br>\nTFRecords Dataset Making -> [Notebook 1](https://www.kaggle.com/superficiallybot/quick-ml-tfrecords-maker?scriptVersionId=41133131) <br>\nBinary Classification -> [Notebook 2](https://www.kaggle.com/superficiallybot/) <br>\nMulti-Class Classification -> [Notebook 3](https://www.kaggle.com/superficiallybot/quick-ml-multiclass-classification-tpu?scriptVersionId=41169786) <br>\n\n## Feedback & Development\n***\n\n__Want to contribute? Great!__ <br>\nSend your ideas to antoreepjana@gmail.com and ensure the format of the subject of the mail as \n[quick_ml Contribute] -> [Your Subject]\n\n\n__Want to suggest an improvement or a feature? Most Welcome!__ <br>\nSend your ideas to antoreepjana@gmail.com and ensure the format of the subject of the mail as [quick_ml Suggestion] -> [Your Subject]\n\n\n__Want to share errors or complaint something which you didn't like? That's how it improves ;)__ <br>\nSend your grievances to antoreepjana@gmail.com and ensure the format of the subject of the mail as [quick_ml Grievances] -> [Your Subject]\n\n\n__Want to send love, thanks and appreciation? I'd love to hear from you!__ <br>\nSend your love to antoreepjana@gmail.com and ensure the format of the subject of the mail as [quick_ml Feedback] -> [Your Subject]\n\n\n\n## Upcoming Features!\n***\n\n - Data Augmentations on TPU.\n - Support for Hyper-Parameter Tuning\n\n## License\n***\n\nMIT\n\n\n**Free Software, Hell Yeah!**\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "quick_ml : Computer Vision For Everyone. Official Website -> www.antoreepjana.wixsite.com/quick-ml. Making Deep Learning through TPUs accessible to everyone. Lesser Code, faster computation, better modelling.",
"version": "1.3.20",
"project_urls": {
"Download": "https://gitlab.com/antoreep_jana/quick_ml/-/archive/v1.3.20/quick_ml-v1.3.20.tar.gz",
"Homepage": "https://gitlab.com/antoreep_jana/quick_ml"
},
"split_keywords": [
"quick_ml",
" quick ml",
" tpu",
" distributed training",
" gpu",
" multi gpu training",
" deep learning tpu",
" tensorflow",
" deep learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d848620da7a2e8fdd535da869b78570de30d3f1aabf25e92b8191fe5739b604b",
"md5": "730af9f61623fd09532902a3ea1d2317",
"sha256": "c22d600bc7ded86b553b102de5cef93205f25cdb059dea0f398358867206f7c7"
},
"downloads": -1,
"filename": "quick_ml-1.3.20.tar.gz",
"has_sig": false,
"md5_digest": "730af9f61623fd09532902a3ea1d2317",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 35111,
"upload_time": "2024-10-27T06:31:40",
"upload_time_iso_8601": "2024-10-27T06:31:40.528939Z",
"url": "https://files.pythonhosted.org/packages/d8/48/620da7a2e8fdd535da869b78570de30d3f1aabf25e92b8191fe5739b604b/quick_ml-1.3.20.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-27 06:31:40",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "antoreep_jana",
"gitlab_project": "quick_ml",
"lcname": "quick-ml"
}