# ๐Project Overview: Multimodal AUV Bayesian Neural Networks for Underwater Environmental Understanding๐
This project develops and deploys **multimodal, Bayesian Neural Networks (BNNs)**, to process and interpret habitat data collected by **Autonomous Underwater Vehicles (AUVs)**. This is to offer **scalable**, **accurate** mapping solutions
in complex underwater environments,whilst incorporating unceratinty quantification to allow **reliable** decision making. The repo
also presents a model as a retrainable foundation model for further tweaking to new datasets and scenarios.๐
## ๐ง Problem Addressed ๐ง
**Environmental mapping** within complex underwater environments presents significant challenges due to inherent data complexities and sensor limitations. Traditional methodologies often struggle to account for the variable conditions encountered in marine settings, such as attenuation of **light ๐ฆ, turbidity ๐, and the physical constraints of acoustic and optical sensors ๐ธ** . These factors contribute to **noisy, incomplete, and uncertain data acquisition**, hindering the generation of reliable environmental characterizations.๐
Furthermore, conventional machine learning models typically yield point predictions without quantifying associated uncertainties. In applications requiring high-stakes decision-making, such as **marine conservation๐ฟ, resource management ๐ , or autonomous navigation ๐งญ**, understanding the **confidence bounds** of predictions is critical for robust risk assessment and operational planning. The fusion of diverse data modalities collected by Autonomous Underwater Vehicles (AUVs), including high-resolution **multibeam sonar ๐ก, side-scan sonar ๐ฐ๏ธ, and optical imagery ๐ท**, further compounds the challenge, necessitating advanced computational approaches to effectively integrate and interpret these disparate information streams.
This project addresses these critical limitations by developing and deploying **multimodal Bayesian Neural Networks (BNNs)**. This approach explicitly models and quantifies the **epistemic and aleatoric uncertainties** inherent in complex underwater datasets, providing not only robust environmental classifications but also **quantifiable measures of prediction confidence**. By leveraging the **complementary strengths of multiple sensor modalities**, the framework aims to deliver enhanced accuracy, scalability, and decision-making capabilities for comprehensive underwater environmental understanding. โจ
# Project Structure ๐๏ธ
```
Multimodal_AUV/
โโโโ src/
โ โโโ Multimodal_AUV/
โ โโโ config/
โ โโโ paths.py
โ โโโ __init__.py
โ โโโ data/
โ โโโ datasets.py
โ โโโ loaders.py
โ โโโ __init__.py
โ โโโ data_preperation/
โ โโโ geospatial.py
โ โโโ image_processing.py
โ โโโ main_data_preparation.py
โ โโโ GAVIA_data_preparation.py
โ โโโ utilities.py
โ โโโ __init__.py
โ โโโ Examples/
โ โโโ Example_data_preparation.py
โ โโโ Example_Inference_model.py
โ โโโ Example_Retraining_model.py
โ โโโ Example_training_from_scratch.py
โ โโโ __init__.py
โ โโโ functions/
โ โโโ functions.py
โ โโโ __init__.py
โ โโโ inference/
โ โโโ inference_data.py
โ โโโ predictiors.py
โ โโโ __init__.py
โ โโโ models/
โ โโโ base_models.py
โ โโโ model_utils.py
โ โโโ __init__.py
โ โโโ train/
โ โโโ checkpointing.py
โ โโโ loop_utils.py
โ โโโ multimodal.py
โ โโโ unitmodal.py
โ โโโ __init__.py
โ โโโ utils/
โ โโโ device.py
โ โโโ __init__.py
โ โโโ cli.py
โ โโโ main.py
โ โโโ __init__.py
โโโ unittests/
โโโ test_data.py
โโโ test_model.py
โโโ test_train.py
โโโ test_utils.py
โโโ __init__.py
```
# Module features ๐
Here are some of the key capabilities of this module
* **End-to-End Pipeline**:The repo offers a complete pipeline, allowing you to turn raw georeferenced imagery๐ธ and sonar tiffs ๐กinto **valid predictions with quantified uncertainty** by training Bayesian Neural Networks.
* **Model to predict benthic habitat class (Northern Britain)**: Can download and run a model to evaluate bathymetric, sidescan and image "pairs"
and predict specific benthic habitat classes found in Northern Britain: **Sand ๐๏ธ, Mud ๐๏ธ, Rock ๐ชจ, Gravel โซ, Burrowed Mud (PMF) ๐ณ๏ธ, Kelp forest (PMF) ๐ณ, or Horse Mussel reef (PMF) ๐**.
* **Retrainable foundation model**: Code to download and retrain a **pretrained network** for combining bathymetric, sidescan sonar and image for a new datasets, adapting the model to your specific needs with reduced computational requirements. ๐
* **Training a model from scratch**: Code to take sonar and image and train a **completely new model** returning a CSV of metrics ๐, the model itself ๐ง , and confusion matrices ๐.
* **Options to optimise sonar patch sizes and to train unimodal models**: Code to find the **optimal sonar patch** to maximise predicitve accuracy (high compute requirements! โก) and to train unimodal and multimodal models to **compare the benefits of multimodality**. ๐ฌ
# Getting started
This section guides you through setting up the project, installing dependencies, and preparing your data for processing and model training/inference.
1. **Create and Activate Conda Environment**:
We recommend using Conda to manage the project's dependencies for a consistent and isolated environment.
Create the Conda environment:
```
Bash
conda create -n multimodal_auv python=3.9 # Must be python 3.9
```
Activate the environment:
```
Bash
conda activate multimodal_auv
```
You should see (multimodal_auv) at the beginning of your terminal prompt, indicating the environment is active.
3. **Install Dependencies**:
With your Conda environment active, install all necessary Python packages listed in the requirements.txt file.
```
Bash
pip install Multimodal_AUV
```
Important Note on GPU Support:
In order to train quickly this project utilises PyTorch with CUDA for GPU acceleration. However, the requirements.txt file does not includ PyTorch (torch, torchvision, torchaudio) and NVIDIA CUDA runtime dependencies as these need to be downloaded to fit with your local CUDA toolkit or GPU driver setup. Navigate to this webpage: https://pytorch.org/get-started/locally/ select your requirements and then copy the command and run that locally.
For example, for CUDA 11.8, Python on windows:
```
Bash
# Then, install PyTorch with CUDA via Conda
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
To import for a script simply call at the top of the script:
```
Bash
import Multimodal_AUV
```
4. **Prepare Data Folders**:
Your project requires specific data structures for input and output. If you run the examples below this will be structored correctly. Please organize your data as follows, and update the paths in your config.yaml file accordingly.
Recommended Folder Structure:
```
Multimodal_AUV/
โโโ data/
โ โโโ individual_data_point/
โ โ โโโ auv_image.jpg/ # Image from camera
โ โ โโโ local_side_scan_image.jpg/ # Cut out of sonar local to camera
โ โ โโโ local_bathy_image.jpg/ # Cut out of sonar local to camera
โ โ โโโ LABEL.txt/ # Where the Label is in the title replacing LABEL
โ โโโ individual_data_point/
โ..........
โ โโโ individual_data_point/
โ โโโ processed_output/ # Output folder for processed AUV data (e.g., aligned images, extracted features)
โ โโโ model_checkpoints/ # Directory to save trained model weights/checkpoints
โ โโโ inference_results/ # Directory to save inference output (e.g., prediction CSVs, classified maps)
โโโ config.yaml # Your main configuration file
โโโ Multimodal_AUV/
โ โโโ ... # Your Python source code
โโโ your_runner_script.py # (Optional) Script to run commands based on config.yaml
โโโ requirements.txt # List of Python dependencies
โโโ README.md
```
## Clarifying Data Folder Contents:
* ```data/```: Folder containing folders of paired data. Your training scripts' ```--root_dir``` would typically point here.
* ```data/individual_data_point/```: Example of folder within folder holding required data files
* ```data/individual_data_point/auv_image.jpg```: The individual image for prediction
* ```data/individual_data_point/local_side_scan_image.jpg```: The individual side scan image local to the camera image for prediction
* ```data/individual_data_point/local_bathy_image.jpg```: The individual bathymetric image local to the camera image for prediction
* ```data/individual_data_point/LABEL.txt```: The label to predict. **N.B.** Not required if youre not training/retraining a model.
## NOTE : Sidescna files must have SSS in name and bathymetric files must be called "patch_30m_combined_bathy"
## Example root directory

## Example interal data directory

## Understanding the arguments
* ```data/processed_output/```: Stores intermediate or final processed data, often generated by preliminary scripts.
* ```data/model_checkpoints/```: Dedicated location for saving trained model weights and checkpoints.
* ```data/inference_results/```: Stores outputs generated by your inference models (e.g., prediction CSVs, classified maps).
### Action Required:
* **Create these directories manually** within your cloned repository if they don't exist. **Note**: If you run the below code including the example of data preparation the correct structure will be created automatically.
* **Update** ```config.yaml```: Open your ```config.yaml``` file and set the ```data_root_dir```, ```output_base_dir```, and other relevant paths within ```training_from_scratch```, ```retraining_model```, ```inference_model```, and ```raw_data_processing``` sections to match the paths you've created.
# Usage examples
## 1. Run the End-to-End Data Preparation Pipeline โ๏ธ
To preprocess your AUV sonar and optical image data, execute the following command from your terminal:
```bash
multimodal-auv-data-prep --raw_optical_images_folder "/home/tommorgan/Documents/data/Newfolder/" --geotiff_folder "/home/tommorgan/Documents/data/Newfolder/sonar/" --output_folder "/home/tommorgan/Documents/data/test/" --window_size_meters 30 --image_enhancement_method "AverageSubtraction" --exiftool_path '/usr/bin/exiftool'
```
To do this in a script run:
```
Bash
# Example for run_auv_preprocessing
import os
from Multimodal_AUV import run_auv_preprocessing
run_auv_preprocessing(
raw_optical_images_folder = "D:/raw dataset/",
geotiff_folder = "D:/raw dataset/sonar/",
output_folder= "D:/output/",
exiftool_path = r'C:exiftool-13.32_64\exiftool-13.32_64\exiftool(-k).exe', # Must point to the actual .exe file or "/usr/bin/exiftool" #for linux
window_size_meters = 30.0,
image_enhancement_method = "AverageSubtraction"
)
```
### Understanding the Arguments:
* **```python Example_data_preparation.py```**: This invokes the main preprocessing script.
* **```--raw_optical_images_folder```**: ```"/path/to/your/raw/optical_images"```
**Purpose**: Specifies the absolute path to the directory containing a collection of folders with your original, unprocessed JPG optical image files from the AUV. This should be as its downloaded from your datasource. The structure should have folders inside (at least one) containing images with metadata accessible by Exiftool and organised in this structure:
```<comment>
<altitude>1.52</altitude>
<depth>25.78</depth>
<heading>123.45</heading>
<pitch>2.10</pitch>
<roll>-0.75</roll>
<surge>0.15</surge>
<sway>-0.05</sway>
<lat>56.12345</lat>
<lon>-3.98765</lon>
</comment>```
If not you will have to rewrite the metadata part of the function or organise your own data function.
**Action Required**: You MUST replace ```/path/to/your/raw/optical_images``` with the actual, full path to your raw optical images folder on your local machine.
* **```--geotiff_folder```**: ```"/path/to/your/auv_geotiffs"```
**Purpose**: Defines the absolute path to the directory containing all your GeoTIFF files, which typically include bathymetry and side-scan sonar data. The bathymetry tiffs must have "bathy" in the file name, the side-scan must have "SSS" in the file name.
**Action Required**: You MUST replace ```/path/to/your/auv_geotiffs``` with the actual, full path to your GeoTIFFs folder.
Example Structure:
```/path/to/your/auv_geotiffs/
โโโ bathymetry.tif
โโโ side_scan.tif
โโโ ...```
* **```--output_folder```**: ```"/path/to/your/processed_auv_data"```
**Purpose**: Designates the root directory where all the processed and organized output data will be saved. This is where the processed optical images, sonar patches, and the main coords.csv file will reside.
**Action Required**: You MUST replace ```/path/to/your/processed_auv_data``` with your desired output directory.
* **```--exiftool_path```** ```"C:/exiftool/"```
**Purpose**: Provides the absolute path to the directory where the exiftool.exe executable is located. This is essential for extracting GPS and timestamp information from your optical images.
**Action Required**: You MUST download and unpack exiftool and then replace
```"C:/exiftool/exiftool.exe "``` with the correct path to your ExifTool installation, it MUST point at the .exe itself. For Linux/macOS, this might be /usr/bin/ or /usr/local/bin/ if installed globally.
* **```--window_size_meters 30.0```**
**Purpose**: Sets the desired side length (in meters) for the square patches that will be extracted from your GeoTIFF files (e.g., a 30.0 value means a 30m x 30m sonar patch).
**Customization**: Adjust this value based on the scale of features you want to capture in your sonar data for machine learning and the typical coverage of your optical images. 30 meters has been found optimal in most scenarios
* **```--image_enhancement_method```** ```"AverageSubtraction"```
**Purpose**: Specifies the method to be used for enhancing the optical images. This can improve the visual quality and potentially the feature extraction for machine learning.
**Customization**: Choose between "AverageSubtraction" (a simpler method) or "CLAHE" (Contrast Limited Adaptive Histogram Equalization, often more effective for underwater images). The default is AverageSubtraction.
* **```--skip_bathy_combine (Optional flag)```**
**Purpose**: If this flag is present, the post-processing step that attempts to combine multiple bathymetry channels into a single representation will be skipped.
**Usage**: Include this flag in your command if you do not want this channel combination to occur. For example: python your_script_name.py ... --skip_bathy_combine (no value needed, just the flag).
### Output Data Structure
Upon successful execution, your ```--output_folder``` will contain a structured dataset. Here's an example of the typical output:
```
/path/to/your/processed_auv_data/
โโโ coords.csv
โโโ image_0001/
โ โโโ image_0001_processed.jpg # Enhanced optical image
โ โโโ bathymetry_patch.tif # Extracted bathymetry patch
โ โโโ side_scan_patch.tif # Extracted side-scan sonar patch
โ โโโ (other_geotiff_name)_patch.tif
โโโ image_0002/
โ โโโ image_0002_processed.jpg
โ โโโ bathymetry_patch.tif
โ โโโ ...
โโโ ...
```
* **coords.csv**: A primary metadata file containing entries for each processed optical image, including its filename, geographical coordinates (latitude, longitude), timestamp, and the relative path to its corresponding processed image and sonar patches within the output structure.
* **image_XXXX/ subfolders**: Each subfolder is named after the processed optical image and contains the processed optical image itself.
* **GeoTIFF patches** : Individual GeoTIFF files representing the extracted square patches from each of your input GeoTIFFs (e.g., bathymetry, side-scan sonar) for that specific location.
## 2.Predict Benthic Habitat Class using a Pre-trained Model ๐
Once you have your environment set up and data prepared, you can run inference using our pre-trained Multimodal AUV Bayesian Neural Network (Found here: https://huggingface.co/sams-tom/multimodal-auv-bathy-bnn-classifier/tree/main/multimodal-bnn) . This example demonstrates how to apply the model to new data and generate predictions with uncertainty quantification.
### Prerequisites:
* Ensure you have cloned this repository and installed all dependencies as described in the Installation Guide.
* Your input data (images and sonar files) should be organized as expected by the CustomImageDataset (refer to ```Multimodal_AUV/data/datasets.py``` or the above example (1.) for details). The ```--data_dir``` argument should point to the root of this organized dataset.
* The script will **automatically** download the required model weights from the Hugging Face Hub.
Inference Command Example:
```
Bash
multimodal-auv-inference --data_dir "/home/tommorgan/Documents/data/all_mulroy_images_and_sonar" --output_csv "/home/tommorgan/Documents/data/test/csv.csv" --batch_size 4 --num_mc_samples 10
```
To do this in a script run:
```
Bash
from Multimodal_AUV import run_auv_inference
run_auv_inference(
data_directory= "D:/dataset/",
batch_size= 4,
output_csv ="D:/csvs/inference_results.csv",
num_mc_samples = 5,
num_classes = 7)
print("Inference function called. Check results in:", inference_output_csv)
````
### Understanding the Arguments:
* **```python -m multimodal_auv.Examples.Example_Inference_model```**: This executes the ```Example_Inference_model.py``` script as a Python module, which is the recommended way to run scripts within a package structure.
* **```--data_dir``` ```"/path/to/your/input_data/dataset"```**:
**Purpose**: Specifies the absolute path to the directory containing your multimodal input data (e.g., GeoTIFFs, corresponding CSVs, etc.).
**Action Required** : You MUST replace ```"/path/to/your/input_data/all_mulroy_images_and_sonar"``` with the actual absolute path to your dataset on your local machine.
* **```--output_csv``` ```"/path/to/save/your/results/inference.csv"```**:
**Purpose**: Defines the absolute path and filename where the inference results (predicted classes, uncertainty metrics) will be saved in CSV format.
**Action Required**: You MUST replace ```"/path/to/save/your/results/inference.csv"``` with your desired output path and filename. The script will create the file and any necessary parent directories if they don't exist.
* **```--batch_size 4:```**
**Purpose**: Sets the number of samples processed at once by the model during inference.
**Customization**: Adjust this value based on your available GPU memory. Larger batch sizes can speed up inference but require more VRAM.
* **```--num_mc_samples 5```**:
**Purpose**: Specifies the number of Monte Carlo (MC) samples to draw from the Bayesian Neural Network's posterior distribution. A higher number of samples leads to a more robust estimation of predictive uncertainty.
**Customization**: For production, you might use 100 or more samples for better uncertainty estimation. For quick testing, 5-10 samples are sufficient.
### Expected Output:
Upon successful execution, a CSV file (e.g., inference.csv) will be created at the specified --output_csv path. This file will contain:
* **Image Name**: Identifier for the input sample.
* **Predicted Class**: The model's most likely class prediction.
* **Predictive Uncertainty**: A measure of the total uncertainty in the prediction (combining aleatoric and epistemic).
* **Aleatoric Uncertainty**: Uncertainty inherent in the data itself (e.g., sensor noise, ambiguous regions).
## 3. Retrain a Pre-trained Model on a New Dataset ๐
This example demonstrates how to fine-tune our pre-trained Multimodal AUV Bayesian Neural Network (Found here: https://huggingface.co/sams-tom/multimodal-auv-bathy-bnn-classifier/tree/main/multimodal-bnn ) on your own custom dataset. Retraining allows you to adapt the model to specific environmental conditions or new benthic classes present in your data, leveraging the knowledge already learned by the pre-trained model.
### Prerequisites:
* Ensure you have cloned this repository and installed all dependencies as described in the Installation Guide.
* Your input data (images and sonar files) should be organized as expected by the CustomImageDataset (refer to ```multimodal_auv/data/datasets.py``` or Example.data preparataion above (1) for details). The ```--data_dir``` argument should point to the root of this organized dataset.
* The script will automatically download the required pre-trained model weights from the Hugging Face Hub.
Retraining Command Example:
```
Bash
multimodal-auv-retrain --data_dir "home/tommorgan/Documents/data/representative_sediment_sample/" --batch_size_multimodal 4 --num_epochs_multimodal 5 --num_mc_samples 5 --learning_rate_multimodal 1e-5 --weight_decay_multimodal 1e-5 --bathy_patch_base 30 --sss_patch_base 30
```
To run this as a script:
```
Bash
from Multimodal_AUV import run_auv_retraining
import torch
#Parameters you want to control from outside the function:
training_devices = [torch.device("cuda:0")] if torch.cuda.is_available() else [torch.device("cpu")]
const_bnn_prior_parameters = {
"prior_mu": 0.0,
"prior_sigma": 1.0,
"posterior_mu_init": 0.0,
"posterior_rho_init": -3.0,
"type": "Reparameterization",
"moped_enable": True,
"moped_delta": 0.1,
}
#Now, call the function with all your desired parameters:
run_auv_retraining(
root_dir='D:/Your/dataset/',
devices=training_devices,
const_bnn_prior_parameters=const_bnn_prior_parameters,
num_classes=7, #Change this to the number of classes in your dataset
#Optimizer/Training Parameters (all optimised for pretrained dataset):
lr_multimodal=1e-5,
multimodal_weight_decay=1e-5,
epochs_multimodal=20,
num_mc=5,
bathy_patch_base=30,
sss_patch_base=30,
batch_size_multimodal=1,
#Scheduler Parameters:
scheduler_multimodal_step_size=7,
scheduler_multimodal_gamma=0.752,
)
```
### Understanding the Arguments:
* **```python -m multimodal_auv.Examples.Example_Retraining_model```**: This executes the ```Example_Retraining_model.py``` script as a Python module, which is the recommended way to run scripts within a package structure.
* **```--data_dir``` ```""/path/to/your/input_data/dataset""```**:
**Purpose**: Specifies the absolute path to the directory containing your multimodal input data for retraining (e.g., GeoTIFFs, corresponding CSVs, etc.).
**Action Required**: You MUST replace ```""/path/to/your/input_data/dataset""``` with the actual absolute path to your dataset on your local machine.
* **```--batch_size_multimodal 20```**:
**Purpose**: Sets the number of samples processed at once by the model during retraining.
**Customization**: Adjust this value based on your available GPU memory. Larger batch sizes can speed up training but require more VRAM.
* **```--num_epochs_multimodal 20```**:
**Purpose**: Defines the total number of training epochs (complete passes through the entire dataset).
**Customization**: Increase this value for more thorough training, especially with larger datasets or when the model is converging slowly.
* **```num_mc_samples 20```**:
**Purpose**: Specifies the number of Monte Carlo (MC) samples to draw from the Bayesian Neural Network's posterior distribution during training. A higher number of samples leads to a more robust estimation of predictive uncertainty.
**Customization**: For production, you might use 100 or more samples for better uncertainty estimation. For quicker testing or initial training, 5-10 samples are sufficient.
* **```--learning_rate_multimodal 0.001```**:
**Purpose**: Sets the initial learning rate for the optimizer. This controls the step size at which the model's weights are updated during training.
**Customization**: Experiment with different learning rates (e.g., 0.01, 0.0001) to find the optimal value for your dataset.
* **```--weight_decay_multimodal 1e-5```**:
**Purpose**: Applies L2 regularization (weight decay) to prevent overfitting by penalizing large weights.
**Customization**: Adjust this value to control the strength of the regularization. A higher value means stronger regularization.
* **```--bathy_patch_base 30```**:
**Purpose**: Defines the base patch size for bathymetry data processing.
**Customization**: This parameter affects how bathymetry data is chunked and processed. Adjust as needed based on your data characteristics.
* **```--sss_patch_base 30```**:
**Purpose**: Defines the base patch size for side-scan sonar (SSS) data processing.
**Customization**: Similar to bathy_patch_base, this affects how SSS data is chunked and processed.
## 4. Train a New Multimodal Model from Scratch ๐ง
This example outlines how to train a new Multimodal AUV Bayesian Neural Network entirely from scratch using your own dataset. This is suitable when you have a large, diverse dataset and want to build a model specifically tailored to your data's unique characteristics, without relying on pre-trained weights.
### Prerequisites:
* Ensure you have cloned this repository and installed all dependencies as described in the Installation Guide.
* Your input data (images and sonar files) should be organized as expected by the CustomImageDataset (refer to ```multimodal_auv/data/datasets.py``` or example.data_preparation above (1) for details). The ```--root_dir``` argument should point to the root of this organized dataset.
Training Command Example:
```
Bash
multimodal-auv-train-scratch --root_dir "home/tommorgan/Documents/data/representative_sediment_sample/" --batch_size_multimodal 4 --epochs_multimodal 5 --num_mc 5 --lr_multimodal 1e-5
```
To run this as a script:
```
Bash
import torch
from Multimodal_AUV import run_AUV_training_from_scratch
training_devices = [torch.device("cuda:0")] if torch.cuda.is_available() else [torch.device("cpu")]
const_bnn_prior_parameters = {
"prior_mu": 0.0,
"prior_sigma": 1.0,
"posterior_mu_init": 0.0,
"posterior_rho_init": -3.0,
"type": "Reparameterization",
"moped_enable": True,
"moped_delta": 0.1,
}
# Call the refactored training function, passing only the core and dynamic parameters
run_AUV_training_from_scratch(
const_bnn_prior_parameters=const_bnn_prior_parameters,
# Dynamic parameters from args (all optimised for pretrained dataset):)
lr_multimodal_model=1e-5,
num_epochs_multimodal=20,
num_mc=5,
bathy_patch_base_raw=30.0,
sss_patch_base_raw=30.0,
batch_size_multimodal=1,
# General pipeline parameters
root_dir='D:/Your/dataset/',
devices=training_devices,
num_classes=7
)
print("Training function called.")
```
### Understanding the Arguments:
* **```python -m multimodal_auv.Examples.Example_training_from_scratch```**: This executes the ```Example_training_from_scratch.py``` script as a Python module, which is the recommended way to run scripts within a package structure.
* **```--root_dir```** "/path/to/your/input_data/dataset":
**Purpose**: Specifies the absolute path to the root directory containing your multimodal input data for training (e.g., GeoTIFFs, corresponding CSVs, etc.).
**Action Required**: You MUST replace ```/home/tommorgan/Documents/data/representative_sediment_sample/``` with the actual absolute path to your dataset on your local machine.
* **```--epochs_multimodal```** 20:
**Purpose**: Defines the total number of training epochs (complete passes through the entire dataset).
**Customization**: Increase this value for more thorough training, especially with larger datasets. Training from scratch typically requires more epochs than retraining.
* **```--num_mc```** 20:
**Purpose**: Specifies the number of Monte Carlo (MC) samples to draw from the Bayesian Neural Network's posterior distribution during training. A higher number of samples leads to a more robust estimation of predictive uncertainty.
**Customization**: For production, you might use 100 or more samples for better uncertainty estimation. For quicker testing or initial training, 5-10 samples are sufficient.
* **```--batch_size_multimodal```** 20:
**Purpose**: Sets the number of samples processed at once by the model during training.
**Customization**: Adjust this value based on your available GPU memory. Larger batch sizes can speed up training but require more VRAM.
* **```--lr_multimodal```** 0.001:
**Purpose**: Sets the initial learning rate for the optimizer. This controls the step size at which the model's weights are updated during training.
**Customization**: Experiment with different learning rates (e.g., 0.01, 0.0001) to find the optimal value for your dataset. Training from scratch might require more careful tuning of the learning rate.
# Running tests โ
To ensure the integrity and correctness of the codebase, you can run the provided unit tests. Navigate to the root directory of the repository and execute:
```bash
cd ..
pytest unittests/
```
# Full working python script
```
from Multimodal_AUV import run_auv_retraining, run_auv_inference, run_auv_preprocessing, run_AUV_training_from_scratch
import torch
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
#Parameters you want to control from outside the function:
training_devices = [torch.device("cuda:0")] if torch.cuda.is_available() else [torch.device("cpu")]
const_bnn_prior_parameters = {
"prior_mu": 0.0,
"prior_sigma": 1.0,
"posterior_mu_init": 0.0,
"posterior_rho_init": -3.0,
"type": "Reparameterization",
"moped_enable": True,
"moped_delta": 0.1,
}
#Now, call the function with all your desired parameters:
run_auv_retraining(
root_dir='D:/Your/dataset/',
devices=training_devices,
const_bnn_prior_parameters=const_bnn_prior_parameters,
num_classes=7, #Change this to the number of classes in your dataset
#Optimizer/Training Parameters (all optimised for pretrained dataset):
lr_multimodal=1e-5,
multimodal_weight_decay=1e-5,
epochs_multimodal=20,
num_mc=5,
bathy_patch_base=30,
sss_patch_base=30,
batch_size_multimodal=1,
#Scheduler Parameters:
scheduler_multimodal_step_size=7,
scheduler_multimodal_gamma=0.752,
)
print("Retraining process initiated.")
run_auv_inference(
data_directory= "D:/dataset/",
batch_size= 4,
output_csv ="D:/csvs/inference_results.csv",
num_mc_samples = 5,
num_classes = 7)
run_auv_preprocessing(
raw_optical_images_folder = "D:/raw dataset/",
geotiff_folder = "D:/raw dataset/sonar/",
output_folder= "D:/output/",
exiftool_path = r'C:exiftool-13.32_64\exiftool-13.32_64\exiftool(-k).exe', # Must point to the actual .exe file or "/usr/bin/exiftool" #for linux
window_size_meters = 30.0,
image_enhancement_method = "AverageSubtraction"
)
const_bnn_prior_parameters = {
"prior_mu": 0.0,
"prior_sigma": 1.0,
"posterior_mu_init": 0.0,
"posterior_rho_init": -3.0,
"type": "Reparameterization",
"moped_enable": True,
"moped_delta": 0.1,
}
# Call the refactored training function, passing only the core and dynamic parameters
run_AUV_training_from_scratch(
const_bnn_prior_parameters=const_bnn_prior_parameters,
# Dynamic parameters from args (all optimised for pretrained dataset):)
lr_multimodal_model=1e-5,
num_epochs_multimodal=20,
num_mc=5,
bathy_patch_base_raw=30.0,
sss_patch_base_raw=30.0,
batch_size_multimodal=1,
# General pipeline parameters
root_dir='D:/Your/dataset/',
devices=training_devices,
num_classes=7
)
print("Training function called.")
```
# โ๏ธ Configuration โ๏ธ
All core parameters for data processing, model training, and inference are controlled via **YAML configuration files**. This approach ensures reproducibility ๐, simplifies experimentation ๐งช, and facilitates seamless collaboration ๐ค.
**Key Configuration Areas**:
The configuration is organized to cover various stages of the AUV data processing and model lifecycle:
### Data Management: ๐
Input/Output Paths: Define locations for raw data (e.g., optical images ๐ธ, GeoTIFFs ๐บ๏ธ), processed outputs, and inference results.
Data Preparation Parameters: Specify settings like patch sizes forbathymetry ๐ and SSS, image dimensions ๐ผ๏ธ,, and relevant GeoTIFF channels.
### Model Training & Retraining: ๐ง
Core Training Parameters: Control fundamental aspects like learning rate ๐, batch size ๐ฆ, number of epochs โณ, and optimization algorithms.
Model Architecture: Configure choices such as model type (e.g., multimodal_bnn, unimodal_bnn), number of output classes, and specific layer dimensions.
Bayesian Neural Network (BNN) Settings: Parameters for BNN priors, if applicable.
### Inference: ๐ฎ
Prediction Control: Define thresholds for classification and output formats for results.
### Configuration Examples and Usage:
Below are examples reflecting the arguments used by various scripts within the project. These can be integrated into a single, comprehensive config.yaml file, or broken down into separate files for specific tasks.
```
YAML
#Configuration File
#General Project Settings (can be shared across scripts)
global_settings:
data_root_dir: "/path/to/your/input_data/dataset"
output_base_dir: "/path/to/your/project_outputs"
num_mc_samples: 20 # Common for BNN inference/evaluation
multimodal_batch_size: 20 # Common batch size for multimodal models
#--- Individual Script Configurations ---
#Configuration for Example_training_from_scratch
training_from_scratch:
epochs_multimodal: 20
lr_multimodal: 0.001
# root_dir and batch_size_multimodal can inherit from global_settings or be overridden here
#Configuration for Example_Retraining_model
retraining_model:
num_epochs_multimodal: 20 # Renamed from 'epochs_multimodal' in original script
learning_rate_multimodal: 0.001 # Renamed from 'lr_multimodal'
weight_decay_multimodal: 1e-5
bathy_patch_base: 30
sss_patch_base: 30
# data_dir, batch_size_multimodal, num_mc_samples can inherit from global_settings or be overridden
#Configuration for Example_Inference_model
inference_model:
output_csv: "%(output_base_dir)s/inference_results/inference.csv" # Example using global var
batch_size: 4 # Specific batch size for inference
#Configuration for your_script_name.py (e.g., for raw data processing)
raw_data_processing:
raw_optical_images_folder: "%(data_root_dir)s/raw_auv_images"
geotiff_folder: "%(data_root_dir)s/auv_geotiffs"
output_folder: "%(output_base_dir)s/processed_auv_data"
exiftool_path: "C:/exiftool/" # Note: This might need to be OS-specific or relative
window_size_meters: 30.0
image_enhancement_method: "AverageSubtraction"
```
# ๐ง Model Architecture ๐๏ธ
This project leverages sophisticated ** Multimodal Bayesian Neural Network (BNN)** architectures designed for robust data fusion and uncertainty quantification in underwater environments. The core design principles are **modularity** and **adaptability** , allowing for both unimodal and multimodal processing. โจ
## **1. Multimodal Fusion Architecture:** ๐ค
The primary model (used in 2.Predict Benthic Habitat Class using a Pre-trained Model ๐ , 3. Retrain a Pre-trained Model on a New Dataset ๐, 4. Train a New Multimodal Model from Scratch ๐ง ) is designed to integrate information from different sensor modalities:
* **Image Encoder:** A Convolutional Neural Network (CNN) backbone (e.g., a pre-trained ResNet, specifically adapted to be Bayesian) processes the optical imagery from AUVs. ๐ธ
* **Bathymetric Sonar Encoder(s):** A Convolutional Neural Network (CNN) backbone (e.g., a pre-trained ResNet, specifically adapted to be Bayesian) processes the bathymetric sonar from AUVs. ๐
* * **Side scan sonar Sonar Encoder(s):** A Convolutional Neural Network (CNN) backbone (e.g., a pre-trained ResNet, specifically adapted to be Bayesian) processes the Side scan sonar from AUVs. ๐ก
* **Fusion Layer:** Features extracted from each modality's encoder are concatenated or combined using a dedicated fusion layer (e.g., a fully connected network, attention mechanism). This layer learns the optimal way to combine visual and acoustic information. ๐
* **Prediction Head:** A final set of layers (often fully connected) takes the fused features and outputs predictions for the target task (e.g., benthic habitat classification ๐ ), with the Bayesian nature providing a distribution over these predictions.
### Diagram of the Multimodal Network: ๐ผ๏ธ

**2. Bayesian Neural Network Implementation:** ๐ก
The "Bayesian" aspect is achieved by converting deterministic layers (e.g., Linear, Conv2D) into their probabilistic counterparts using `bayesian-torch`. This means:
* **Weight Distributions:** Instead of learning fixed weights, the model learns **distributions over its weights**, allowing it to output a distribution of predictions for a given input.๐
* **Uncertainty Quantification:** The variance in these output predictions provides a direct measure of the model's confidence and **epistemic uncertainty**, which is vital for decision-making in ambiguous underwater settings. ๐
**3. Foundation Model Concept:** ๐
In addition, this project aims to provide a **retrainable foundation model**:
* The architecture is general enough to be applicable across various underwater mapping tasks. ๐
* It is pre-trained on a diverse dataset (e.g., Northern Britain benthic habitat data), providing strong initial feature representations.๐ช
* Users can then **fine-tune** this pre-trained model (3. Retrain a Pre-trained Model on a New Dataset ๐) on their own smaller, specific datasets to adapt it to new areas or different classification schemes, significantly reducing training time and data requirements. โฑ๏ธ
**4. Unimodal Models:** ๐ฏ
The project also includes components (`unitmodal.py` in `train/` and potentially `base_models.py`) to train and evaluate models based on **single modalities** (e.g., image-only ๐ธ or sonar-only ๐ก). This allows for ablation studies and comparison with the performance benefits of multimodal fusion.
### Diagram of the Unimodal Networks: ๐ผ๏ธ

---
# Contact
Have questions about the project, found a bug, or want to contribute? Here are a few ways to reach out:
* **GitHub Issues:** For any code-related questions, bug reports, or feature requests, please open an [Issue on this repository](https://github.com/sams-tom/multimodal-auv-bnn-project/issues). This is the preferred method for transparency and tracking.
* **Email:** For more direct or confidential inquiries, you can reach me at [phd01tm@sams.ac.uk](mailto:phd01tm@sams.ac.uk).
* **LinkedIn:** Connect with the project lead/team on LinkedIn:
* [Tom Morgan](https://www.linkedin.com/in/tom-morgan-8a73b129b/)
# Citations
* **GitHub Repository (Code & Documentation):** [https://github.com/sams-tom/multimodal-auv-bnn-project](https://github.com/sams-tom/multimodal-auv-bnn-project)
* **Hugging Face Models:** [https://huggingface.co/sams-tom/multimodal-auv-bnn-models](https://huggingface.co/sams-tom/multimodal-auv-bnn-models)
* **Research Paper:** [In development]
Raw data
{
"_id": null,
"home_page": "https://github.com/sams-tom/Multimodal-AUV",
"name": "multimodal-auv",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.10,>=3.9",
"maintainer_email": null,
"keywords": "AUV, Bayesian Neural Networks, Underwater Mapping, Habitat Classification, Multimodal Data, Oceanography, geospatial-data, environmental-monitoring, uncertainty-quantification, computer-vision, remote-sensing",
"author": "Tom Morgan",
"author_email": "phd01tm@sams.ac.uk",
"download_url": "https://files.pythonhosted.org/packages/a7/7e/b37d030531dff14217705ce770ebf6320580b8ba24e17462628c4880f9c3/multimodal_auv-0.0.5.tar.gz",
"platform": null,
"description": "# \ud83c\udf0aProject Overview: Multimodal AUV Bayesian Neural Networks for Underwater Environmental Understanding\ud83d\udc20\nThis project develops and deploys **multimodal, Bayesian Neural Networks (BNNs)**, to process and interpret habitat data collected by **Autonomous Underwater Vehicles (AUVs)**. This is to offer **scalable**, **accurate** mapping solutions\nin complex underwater environments,whilst incorporating unceratinty quantification to allow **reliable** decision making. The repo \nalso presents a model as a retrainable foundation model for further tweaking to new datasets and scenarios.\ud83d\ude80\n\n\n ## \ud83d\udea7 Problem Addressed \ud83d\udea7\n**Environmental mapping** within complex underwater environments presents significant challenges due to inherent data complexities and sensor limitations. Traditional methodologies often struggle to account for the variable conditions encountered in marine settings, such as attenuation of **light \ud83d\udd26, turbidity \ud83c\udf0a, and the physical constraints of acoustic and optical sensors \ud83d\udcf8** . These factors contribute to **noisy, incomplete, and uncertain data acquisition**, hindering the generation of reliable environmental characterizations.\ud83d\udcc9\n\nFurthermore, conventional machine learning models typically yield point predictions without quantifying associated uncertainties. In applications requiring high-stakes decision-making, such as **marine conservation\ud83c\udf3f, resource management \ud83d\udc20, or autonomous navigation \ud83e\udded**, understanding the **confidence bounds** of predictions is critical for robust risk assessment and operational planning. The fusion of diverse data modalities collected by Autonomous Underwater Vehicles (AUVs), including high-resolution **multibeam sonar \ud83d\udce1, side-scan sonar \ud83d\udef0\ufe0f, and optical imagery \ud83d\udcf7**, further compounds the challenge, necessitating advanced computational approaches to effectively integrate and interpret these disparate information streams.\n\nThis project addresses these critical limitations by developing and deploying **multimodal Bayesian Neural Networks (BNNs)**. This approach explicitly models and quantifies the **epistemic and aleatoric uncertainties** inherent in complex underwater datasets, providing not only robust environmental classifications but also **quantifiable measures of prediction confidence**. By leveraging the **complementary strengths of multiple sensor modalities**, the framework aims to deliver enhanced accuracy, scalability, and decision-making capabilities for comprehensive underwater environmental understanding. \u2728\n\n\n \n# Project Structure \ud83c\udfd7\ufe0f\n```\nMultimodal_AUV/\n\u2502\u251c\u2500\u2500 src/\n\u2502 \u251c\u2500\u2500 Multimodal_AUV/\n\u2502 \u251c\u2500\u2500 config/\n\u2502 \u251c\u2500\u2500 paths.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 data/\n\u2502 \u251c\u2500\u2500 datasets.py\n\u2502 \u251c\u2500\u2500 loaders.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 data_preperation/\n\u2502 \u251c\u2500\u2500 geospatial.py\n\u2502 \u251c\u2500\u2500 image_processing.py\n\u2502 \u251c\u2500\u2500 main_data_preparation.py\n\u2502 \u251c\u2500\u2500 GAVIA_data_preparation.py\n\u2502 \u251c\u2500\u2500 utilities.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 Examples/\n\u2502 \u251c\u2500\u2500 Example_data_preparation.py\n\u2502 \u251c\u2500\u2500 Example_Inference_model.py\n\u2502 \u251c\u2500\u2500 Example_Retraining_model.py\n\u2502 \u251c\u2500\u2500 Example_training_from_scratch.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 functions/\n\u2502 \u251c\u2500\u2500 functions.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 inference/\n\u2502 \u251c\u2500\u2500 inference_data.py\n\u2502 \u251c\u2500\u2500 predictiors.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 models/\n\u2502 \u251c\u2500\u2500 base_models.py\n\u2502 \u251c\u2500\u2500 model_utils.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 train/\n\u2502 \u251c\u2500\u2500 checkpointing.py\n\u2502 \u251c\u2500\u2500 loop_utils.py\n\u2502 \u251c\u2500\u2500 multimodal.py\n\u2502 \u251c\u2500\u2500 unitmodal.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 utils/\n\u2502 \u251c\u2500\u2500 device.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 cli.py\n\u2502 \u251c\u2500\u2500 main.py\n\u2502 \u2514\u2500\u2500 __init__.py\n\u2514\u2500\u2500 unittests/\n \u251c\u2500\u2500 test_data.py\n \u251c\u2500\u2500 test_model.py\n \u251c\u2500\u2500 test_train.py\n \u251c\u2500\u2500 test_utils.py\n \u2514\u2500\u2500 __init__.py\n```\n# Module features \ud83d\ude80\nHere are some of the key capabilities of this module\n* **End-to-End Pipeline**:The repo offers a complete pipeline, allowing you to turn raw georeferenced imagery\ud83d\udcf8 and sonar tiffs \ud83d\udce1into **valid predictions with quantified uncertainty** by training Bayesian Neural Networks.\n\n* **Model to predict benthic habitat class (Northern Britain)**: Can download and run a model to evaluate bathymetric, sidescan and image \"pairs\"\nand predict specific benthic habitat classes found in Northern Britain: **Sand \ud83c\udfd6\ufe0f, Mud \ud83c\udfde\ufe0f, Rock \ud83e\udea8, Gravel \u26ab, Burrowed Mud (PMF) \ud83d\udd73\ufe0f, Kelp forest (PMF) \ud83c\udf33, or Horse Mussel reef (PMF) \ud83d\udc1a**.\n \n* **Retrainable foundation model**: Code to download and retrain a **pretrained network** for combining bathymetric, sidescan sonar and image for a new datasets, adapting the model to your specific needs with reduced computational requirements. \ud83d\udd04\n\n* **Training a model from scratch**: Code to take sonar and image and train a **completely new model** returning a CSV of metrics \ud83d\udcca, the model itself \ud83e\udde0, and confusion matrices \ud83d\udcc8.\n\n* **Options to optimise sonar patch sizes and to train unimodal models**: Code to find the **optimal sonar patch** to maximise predicitve accuracy (high compute requirements! \u26a1) and to train unimodal and multimodal models to **compare the benefits of multimodality**. \ud83d\udd2c\n \n# Getting started\nThis section guides you through setting up the project, installing dependencies, and preparing your data for processing and model training/inference.\n\n \n1. **Create and Activate Conda Environment**:\n We recommend using Conda to manage the project's dependencies for a consistent and isolated environment.\n \n Create the Conda environment:\n ```\n Bash\n \n conda create -n multimodal_auv python=3.9 # Must be python 3.9\n ```\n Activate the environment:\n ```\n Bash\n \n conda activate multimodal_auv\n ```\n You should see (multimodal_auv) at the beginning of your terminal prompt, indicating the environment is active.\n\n3. **Install Dependencies**:\n With your Conda environment active, install all necessary Python packages listed in the requirements.txt file.\n ```\n Bash\n \n pip install Multimodal_AUV\n ```\nImportant Note on GPU Support:\nIn order to train quickly this project utilises PyTorch with CUDA for GPU acceleration. However, the requirements.txt file does not includ PyTorch (torch, torchvision, torchaudio) and NVIDIA CUDA runtime dependencies as these need to be downloaded to fit with your local CUDA toolkit or GPU driver setup. Navigate to this webpage: https://pytorch.org/get-started/locally/ select your requirements and then copy the command and run that locally.\n\nFor example, for CUDA 11.8, Python on windows:\n```\nBash\n# Then, install PyTorch with CUDA via Conda\npip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n\n```\n\nTo import for a script simply call at the top of the script:\n```\nBash\nimport Multimodal_AUV\n```\n\n4. **Prepare Data Folders**:\n \n Your project requires specific data structures for input and output. If you run the examples below this will be structored correctly. Please organize your data as follows, and update the paths in your config.yaml file accordingly.\n\nRecommended Folder Structure:\n```\nMultimodal_AUV/\n\u251c\u2500\u2500 data/\n\u2502 \u251c\u2500\u2500 individual_data_point/\n\u2502 \u2502 \u251c\u2500\u2500 auv_image.jpg/ # Image from camera\n\u2502 \u2502 \u251c\u2500\u2500 local_side_scan_image.jpg/ # Cut out of sonar local to camera\n\u2502 \u2502 \u251c\u2500\u2500 local_bathy_image.jpg/ # Cut out of sonar local to camera\n\u2502 \u2502 \u2514\u2500\u2500 LABEL.txt/ # Where the Label is in the title replacing LABEL\n\u2502 \u251c\u2500\u2500 individual_data_point/\n\u2502..........\n\u2502 \u2514\u2500\u2500 individual_data_point/\n\u2502 \u251c\u2500\u2500 processed_output/ # Output folder for processed AUV data (e.g., aligned images, extracted features)\n\u2502 \u251c\u2500\u2500 model_checkpoints/ # Directory to save trained model weights/checkpoints\n\u2502 \u2514\u2500\u2500 inference_results/ # Directory to save inference output (e.g., prediction CSVs, classified maps)\n\u251c\u2500\u2500 config.yaml # Your main configuration file\n\u251c\u2500\u2500 Multimodal_AUV/\n\u2502 \u2514\u2500\u2500 ... # Your Python source code\n\u251c\u2500\u2500 your_runner_script.py # (Optional) Script to run commands based on config.yaml\n\u251c\u2500\u2500 requirements.txt # List of Python dependencies\n\u2514\u2500\u2500 README.md\n```\n## Clarifying Data Folder Contents:\n\n* ```data/```: Folder containing folders of paired data. Your training scripts' ```--root_dir``` would typically point here.\n* ```data/individual_data_point/```: Example of folder within folder holding required data files\n* ```data/individual_data_point/auv_image.jpg```: The individual image for prediction\n* ```data/individual_data_point/local_side_scan_image.jpg```: The individual side scan image local to the camera image for prediction\n* ```data/individual_data_point/local_bathy_image.jpg```: The individual bathymetric image local to the camera image for prediction\n* ```data/individual_data_point/LABEL.txt```: The label to predict. **N.B.** Not required if youre not training/retraining a model.\n## NOTE : Sidescna files must have SSS in name and bathymetric files must be called \"patch_30m_combined_bathy\"\n## Example root directory\n\n\n## Example interal data directory\n\n\n## Understanding the arguments\n* ```data/processed_output/```: Stores intermediate or final processed data, often generated by preliminary scripts.\n\n* ```data/model_checkpoints/```: Dedicated location for saving trained model weights and checkpoints.\n\n* ```data/inference_results/```: Stores outputs generated by your inference models (e.g., prediction CSVs, classified maps).\n\n### Action Required:\n\n* **Create these directories manually** within your cloned repository if they don't exist. **Note**: If you run the below code including the example of data preparation the correct structure will be created automatically.\n\n* **Update** ```config.yaml```: Open your ```config.yaml``` file and set the ```data_root_dir```, ```output_base_dir```, and other relevant paths within ```training_from_scratch```, ```retraining_model```, ```inference_model```, and ```raw_data_processing``` sections to match the paths you've created.\n \n# Usage examples\n## 1. Run the End-to-End Data Preparation Pipeline \u2699\ufe0f\nTo preprocess your AUV sonar and optical image data, execute the following command from your terminal:\n\n```bash\nmultimodal-auv-data-prep --raw_optical_images_folder \"/home/tommorgan/Documents/data/Newfolder/\" --geotiff_folder \"/home/tommorgan/Documents/data/Newfolder/sonar/\" --output_folder \"/home/tommorgan/Documents/data/test/\" --window_size_meters 30 --image_enhancement_method \"AverageSubtraction\" --exiftool_path '/usr/bin/exiftool'\n\n```\nTo do this in a script run:\n```\nBash\n# Example for run_auv_preprocessing\nimport os\nfrom Multimodal_AUV import run_auv_preprocessing\n\nrun_auv_preprocessing(\n raw_optical_images_folder = \"D:/raw dataset/\",\n geotiff_folder = \"D:/raw dataset/sonar/\",\n output_folder= \"D:/output/\",\n exiftool_path = r'C:exiftool-13.32_64\\exiftool-13.32_64\\exiftool(-k).exe', # Must point to the actual .exe file or \"/usr/bin/exiftool\" #for linux\n window_size_meters = 30.0,\n image_enhancement_method = \"AverageSubtraction\"\n)\n\n```\n\n### Understanding the Arguments:\n\n* **```python Example_data_preparation.py```**: This invokes the main preprocessing script.\n\n* **```--raw_optical_images_folder```**: ```\"/path/to/your/raw/optical_images\"```\n \n **Purpose**: Specifies the absolute path to the directory containing a collection of folders with your original, unprocessed JPG optical image files from the AUV. This should be as its downloaded from your datasource. The structure should have folders inside (at least one) containing images with metadata accessible by Exiftool and organised in this structure: \n ```<comment>\n <altitude>1.52</altitude>\n <depth>25.78</depth>\n <heading>123.45</heading>\n <pitch>2.10</pitch>\n <roll>-0.75</roll>\n <surge>0.15</surge>\n <sway>-0.05</sway>\n <lat>56.12345</lat>\n <lon>-3.98765</lon>\n </comment>``` \n If not you will have to rewrite the metadata part of the function or organise your own data function.\n \n **Action Required**: You MUST replace ```/path/to/your/raw/optical_images``` with the actual, full path to your raw optical images folder on your local machine.\n\n* **```--geotiff_folder```**: ```\"/path/to/your/auv_geotiffs\"```\n\n **Purpose**: Defines the absolute path to the directory containing all your GeoTIFF files, which typically include bathymetry and side-scan sonar data. The bathymetry tiffs must have \"bathy\" in the file name, the side-scan must have \"SSS\" in the file name. \n \n **Action Required**: You MUST replace ```/path/to/your/auv_geotiffs``` with the actual, full path to your GeoTIFFs folder.\n\n Example Structure:\n \n ```/path/to/your/auv_geotiffs/\n \u251c\u2500\u2500 bathymetry.tif\n \u251c\u2500\u2500 side_scan.tif\n \u2514\u2500\u2500 ...```\n\n* **```--output_folder```**: ```\"/path/to/your/processed_auv_data\"```\n \n **Purpose**: Designates the root directory where all the processed and organized output data will be saved. This is where the processed optical images, sonar patches, and the main coords.csv file will reside.\n \n **Action Required**: You MUST replace ```/path/to/your/processed_auv_data``` with your desired output directory.\n\n* **```--exiftool_path```** ```\"C:/exiftool/\"```\n\n **Purpose**: Provides the absolute path to the directory where the exiftool.exe executable is located. This is essential for extracting GPS and timestamp information from your optical images.\n \n **Action Required**: You MUST download and unpack exiftool and then replace\n```\"C:/exiftool/exiftool.exe \"``` with the correct path to your ExifTool installation, it MUST point at the .exe itself. For Linux/macOS, this might be /usr/bin/ or /usr/local/bin/ if installed globally.\n\n* **```--window_size_meters 30.0```**\n\n **Purpose**: Sets the desired side length (in meters) for the square patches that will be extracted from your GeoTIFF files (e.g., a 30.0 value means a 30m x 30m sonar patch).\n \n **Customization**: Adjust this value based on the scale of features you want to capture in your sonar data for machine learning and the typical coverage of your optical images. 30 meters has been found optimal in most scenarios\n\n* **```--image_enhancement_method```** ```\"AverageSubtraction\"```\n\n **Purpose**: Specifies the method to be used for enhancing the optical images. This can improve the visual quality and potentially the feature extraction for machine learning.\n \n **Customization**: Choose between \"AverageSubtraction\" (a simpler method) or \"CLAHE\" (Contrast Limited Adaptive Histogram Equalization, often more effective for underwater images). The default is AverageSubtraction.\n\n* **```--skip_bathy_combine (Optional flag)```**\n\n **Purpose**: If this flag is present, the post-processing step that attempts to combine multiple bathymetry channels into a single representation will be skipped.\n \n **Usage**: Include this flag in your command if you do not want this channel combination to occur. For example: python your_script_name.py ... --skip_bathy_combine (no value needed, just the flag).\n\n### Output Data Structure\n\nUpon successful execution, your ```--output_folder``` will contain a structured dataset. Here's an example of the typical output:\n ```\n /path/to/your/processed_auv_data/\n \u251c\u2500\u2500 coords.csv\n \u251c\u2500\u2500 image_0001/\n \u2502 \u251c\u2500\u2500 image_0001_processed.jpg # Enhanced optical image\n \u2502 \u251c\u2500\u2500 bathymetry_patch.tif # Extracted bathymetry patch\n \u2502 \u251c\u2500\u2500 side_scan_patch.tif # Extracted side-scan sonar patch\n \u2502 \u2514\u2500\u2500 (other_geotiff_name)_patch.tif\n \u251c\u2500\u2500 image_0002/\n \u2502 \u251c\u2500\u2500 image_0002_processed.jpg\n \u2502 \u251c\u2500\u2500 bathymetry_patch.tif\n \u2502 \u2514\u2500\u2500 ...\n \u2514\u2500\u2500 ...\n ```\n\n* **coords.csv**: A primary metadata file containing entries for each processed optical image, including its filename, geographical coordinates (latitude, longitude), timestamp, and the relative path to its corresponding processed image and sonar patches within the output structure.\n\n* **image_XXXX/ subfolders**: Each subfolder is named after the processed optical image and contains the processed optical image itself.\n\n* **GeoTIFF patches** : Individual GeoTIFF files representing the extracted square patches from each of your input GeoTIFFs (e.g., bathymetry, side-scan sonar) for that specific location.\n\n\n## 2.Predict Benthic Habitat Class using a Pre-trained Model \ud83d\udc20\n\nOnce you have your environment set up and data prepared, you can run inference using our pre-trained Multimodal AUV Bayesian Neural Network (Found here: https://huggingface.co/sams-tom/multimodal-auv-bathy-bnn-classifier/tree/main/multimodal-bnn) . This example demonstrates how to apply the model to new data and generate predictions with uncertainty quantification.\n\n### Prerequisites:\n\n* Ensure you have cloned this repository and installed all dependencies as described in the Installation Guide.\n\n* Your input data (images and sonar files) should be organized as expected by the CustomImageDataset (refer to ```Multimodal_AUV/data/datasets.py``` or the above example (1.) for details). The ```--data_dir``` argument should point to the root of this organized dataset.\n\n* The script will **automatically** download the required model weights from the Hugging Face Hub.\n\nInference Command Example:\n\n```\nBash\n\nmultimodal-auv-inference --data_dir \"/home/tommorgan/Documents/data/all_mulroy_images_and_sonar\" --output_csv \"/home/tommorgan/Documents/data/test/csv.csv\" --batch_size 4 --num_mc_samples 10\n\n\n```\nTo do this in a script run:\n```\nBash\nfrom Multimodal_AUV import run_auv_inference\n\n\n\nrun_auv_inference(\n data_directory= \"D:/dataset/\",\n batch_size= 4, \n output_csv =\"D:/csvs/inference_results.csv\",\n num_mc_samples = 5,\n num_classes = 7)\nprint(\"Inference function called. Check results in:\", inference_output_csv)\n````\n### Understanding the Arguments:\n\n* **```python -m multimodal_auv.Examples.Example_Inference_model```**: This executes the ```Example_Inference_model.py``` script as a Python module, which is the recommended way to run scripts within a package structure.\n\n* **```--data_dir``` ```\"/path/to/your/input_data/dataset\"```**:\n\n **Purpose**: Specifies the absolute path to the directory containing your multimodal input data (e.g., GeoTIFFs, corresponding CSVs, etc.).\n \n **Action Required** : You MUST replace ```\"/path/to/your/input_data/all_mulroy_images_and_sonar\"``` with the actual absolute path to your dataset on your local machine.\n\n* **```--output_csv``` ```\"/path/to/save/your/results/inference.csv\"```**:\n\n **Purpose**: Defines the absolute path and filename where the inference results (predicted classes, uncertainty metrics) will be saved in CSV format.\n \n **Action Required**: You MUST replace ```\"/path/to/save/your/results/inference.csv\"``` with your desired output path and filename. The script will create the file and any necessary parent directories if they don't exist.\n\n* **```--batch_size 4:```**\n\n **Purpose**: Sets the number of samples processed at once by the model during inference.\n \n **Customization**: Adjust this value based on your available GPU memory. Larger batch sizes can speed up inference but require more VRAM.\n\n* **```--num_mc_samples 5```**:\n\n **Purpose**: Specifies the number of Monte Carlo (MC) samples to draw from the Bayesian Neural Network's posterior distribution. A higher number of samples leads to a more robust estimation of predictive uncertainty.\n \n **Customization**: For production, you might use 100 or more samples for better uncertainty estimation. For quick testing, 5-10 samples are sufficient.\n\n### Expected Output:\n\nUpon successful execution, a CSV file (e.g., inference.csv) will be created at the specified --output_csv path. This file will contain:\n\n* **Image Name**: Identifier for the input sample.\n\n* **Predicted Class**: The model's most likely class prediction.\n\n* **Predictive Uncertainty**: A measure of the total uncertainty in the prediction (combining aleatoric and epistemic).\n\n* **Aleatoric Uncertainty**: Uncertainty inherent in the data itself (e.g., sensor noise, ambiguous regions).\n\n\n## 3. Retrain a Pre-trained Model on a New Dataset \ud83d\udd04\n\nThis example demonstrates how to fine-tune our pre-trained Multimodal AUV Bayesian Neural Network (Found here: https://huggingface.co/sams-tom/multimodal-auv-bathy-bnn-classifier/tree/main/multimodal-bnn ) on your own custom dataset. Retraining allows you to adapt the model to specific environmental conditions or new benthic classes present in your data, leveraging the knowledge already learned by the pre-trained model.\n\n### Prerequisites:\n\n* Ensure you have cloned this repository and installed all dependencies as described in the Installation Guide.\n\n* Your input data (images and sonar files) should be organized as expected by the CustomImageDataset (refer to ```multimodal_auv/data/datasets.py``` or Example.data preparataion above (1) for details). The ```--data_dir``` argument should point to the root of this organized dataset.\n\n* The script will automatically download the required pre-trained model weights from the Hugging Face Hub.\n\nRetraining Command Example:\n\n```\nBash\n\nmultimodal-auv-retrain --data_dir \"home/tommorgan/Documents/data/representative_sediment_sample/\" --batch_size_multimodal 4 --num_epochs_multimodal 5 --num_mc_samples 5 --learning_rate_multimodal 1e-5 --weight_decay_multimodal 1e-5 --bathy_patch_base 30 --sss_patch_base 30\n\n\n```\nTo run this as a script:\n```\nBash\n\nfrom Multimodal_AUV import run_auv_retraining\nimport torch\n\n\n#Parameters you want to control from outside the function:\n\ntraining_devices = [torch.device(\"cuda:0\")] if torch.cuda.is_available() else [torch.device(\"cpu\")]\n\nconst_bnn_prior_parameters = {\n \"prior_mu\": 0.0,\n \"prior_sigma\": 1.0,\n \"posterior_mu_init\": 0.0,\n \"posterior_rho_init\": -3.0,\n \"type\": \"Reparameterization\",\n \"moped_enable\": True,\n \"moped_delta\": 0.1,\n}\n\n#Now, call the function with all your desired parameters:\nrun_auv_retraining(\n root_dir='D:/Your/dataset/',\n devices=training_devices,\n const_bnn_prior_parameters=const_bnn_prior_parameters,\n num_classes=7, #Change this to the number of classes in your dataset\n\n #Optimizer/Training Parameters (all optimised for pretrained dataset):\n lr_multimodal=1e-5,\n multimodal_weight_decay=1e-5,\n epochs_multimodal=20,\n num_mc=5,\n bathy_patch_base=30,\n sss_patch_base=30,\n batch_size_multimodal=1,\n \n #Scheduler Parameters:\n scheduler_multimodal_step_size=7,\n scheduler_multimodal_gamma=0.752,\n)\n```\n### Understanding the Arguments:\n\n* **```python -m multimodal_auv.Examples.Example_Retraining_model```**: This executes the ```Example_Retraining_model.py``` script as a Python module, which is the recommended way to run scripts within a package structure.\n\n* **```--data_dir``` ```\"\"/path/to/your/input_data/dataset\"\"```**:\n\n **Purpose**: Specifies the absolute path to the directory containing your multimodal input data for retraining (e.g., GeoTIFFs, corresponding CSVs, etc.).\n \n **Action Required**: You MUST replace ```\"\"/path/to/your/input_data/dataset\"\"``` with the actual absolute path to your dataset on your local machine.\n\n* **```--batch_size_multimodal 20```**:\n\n **Purpose**: Sets the number of samples processed at once by the model during retraining.\n \n **Customization**: Adjust this value based on your available GPU memory. Larger batch sizes can speed up training but require more VRAM.\n\n* **```--num_epochs_multimodal 20```**:\n\n **Purpose**: Defines the total number of training epochs (complete passes through the entire dataset).\n \n **Customization**: Increase this value for more thorough training, especially with larger datasets or when the model is converging slowly.\n\n* **```num_mc_samples 20```**: \n\n **Purpose**: Specifies the number of Monte Carlo (MC) samples to draw from the Bayesian Neural Network's posterior distribution during training. A higher number of samples leads to a more robust estimation of predictive uncertainty.\n \n **Customization**: For production, you might use 100 or more samples for better uncertainty estimation. For quicker testing or initial training, 5-10 samples are sufficient.\n\n* **```--learning_rate_multimodal 0.001```**:\n\n **Purpose**: Sets the initial learning rate for the optimizer. This controls the step size at which the model's weights are updated during training.\n \n **Customization**: Experiment with different learning rates (e.g., 0.01, 0.0001) to find the optimal value for your dataset.\n\n* **```--weight_decay_multimodal 1e-5```**:\n\n **Purpose**: Applies L2 regularization (weight decay) to prevent overfitting by penalizing large weights.\n \n **Customization**: Adjust this value to control the strength of the regularization. A higher value means stronger regularization.\n\n* **```--bathy_patch_base 30```**:\n\n **Purpose**: Defines the base patch size for bathymetry data processing.\n \n **Customization**: This parameter affects how bathymetry data is chunked and processed. Adjust as needed based on your data characteristics.\n\n* **```--sss_patch_base 30```**:\n\n **Purpose**: Defines the base patch size for side-scan sonar (SSS) data processing.\n \n **Customization**: Similar to bathy_patch_base, this affects how SSS data is chunked and processed.\n\n## 4. Train a New Multimodal Model from Scratch \ud83e\udde0\n\nThis example outlines how to train a new Multimodal AUV Bayesian Neural Network entirely from scratch using your own dataset. This is suitable when you have a large, diverse dataset and want to build a model specifically tailored to your data's unique characteristics, without relying on pre-trained weights.\n\n### Prerequisites:\n\n* Ensure you have cloned this repository and installed all dependencies as described in the Installation Guide.\n\n* Your input data (images and sonar files) should be organized as expected by the CustomImageDataset (refer to ```multimodal_auv/data/datasets.py``` or example.data_preparation above (1) for details). The ```--root_dir``` argument should point to the root of this organized dataset.\n\nTraining Command Example:\n\n```\nBash\n\nmultimodal-auv-train-scratch --root_dir \"home/tommorgan/Documents/data/representative_sediment_sample/\" --batch_size_multimodal 4 --epochs_multimodal 5 --num_mc 5 --lr_multimodal 1e-5 \n\n```\nTo run this as a script:\n```\nBash\nimport torch\nfrom Multimodal_AUV import run_AUV_training_from_scratch\n\ntraining_devices = [torch.device(\"cuda:0\")] if torch.cuda.is_available() else [torch.device(\"cpu\")]\n\n\nconst_bnn_prior_parameters = {\n \"prior_mu\": 0.0,\n \"prior_sigma\": 1.0,\n \"posterior_mu_init\": 0.0,\n \"posterior_rho_init\": -3.0,\n \"type\": \"Reparameterization\",\n \"moped_enable\": True,\n \"moped_delta\": 0.1,\n }\n\n# Call the refactored training function, passing only the core and dynamic parameters\nrun_AUV_training_from_scratch(\n const_bnn_prior_parameters=const_bnn_prior_parameters,\n # Dynamic parameters from args (all optimised for pretrained dataset):)\n lr_multimodal_model=1e-5,\n num_epochs_multimodal=20,\n num_mc=5,\n bathy_patch_base_raw=30.0,\n sss_patch_base_raw=30.0,\n batch_size_multimodal=1,\n # General pipeline parameters\n root_dir='D:/Your/dataset/',\n devices=training_devices,\n num_classes=7\n)\nprint(\"Training function called.\")\n```\n### Understanding the Arguments:\n\n* **```python -m multimodal_auv.Examples.Example_training_from_scratch```**: This executes the ```Example_training_from_scratch.py``` script as a Python module, which is the recommended way to run scripts within a package structure.\n\n* **```--root_dir```** \"/path/to/your/input_data/dataset\":\n\n **Purpose**: Specifies the absolute path to the root directory containing your multimodal input data for training (e.g., GeoTIFFs, corresponding CSVs, etc.).\n \n **Action Required**: You MUST replace ```/home/tommorgan/Documents/data/representative_sediment_sample/``` with the actual absolute path to your dataset on your local machine.\n\n* **```--epochs_multimodal```** 20:\n\n **Purpose**: Defines the total number of training epochs (complete passes through the entire dataset).\n \n **Customization**: Increase this value for more thorough training, especially with larger datasets. Training from scratch typically requires more epochs than retraining.\n\n* **```--num_mc```** 20:\n\n **Purpose**: Specifies the number of Monte Carlo (MC) samples to draw from the Bayesian Neural Network's posterior distribution during training. A higher number of samples leads to a more robust estimation of predictive uncertainty.\n \n **Customization**: For production, you might use 100 or more samples for better uncertainty estimation. For quicker testing or initial training, 5-10 samples are sufficient.\n\n* **```--batch_size_multimodal```** 20:\n\n **Purpose**: Sets the number of samples processed at once by the model during training.\n \n **Customization**: Adjust this value based on your available GPU memory. Larger batch sizes can speed up training but require more VRAM.\n\n* **```--lr_multimodal```** 0.001:\n\n **Purpose**: Sets the initial learning rate for the optimizer. This controls the step size at which the model's weights are updated during training.\n \n **Customization**: Experiment with different learning rates (e.g., 0.01, 0.0001) to find the optimal value for your dataset. Training from scratch might require more careful tuning of the learning rate.\n\n# Running tests \u2705 \n\nTo ensure the integrity and correctness of the codebase, you can run the provided unit tests. Navigate to the root directory of the repository and execute:\n\n```bash\ncd ..\npytest unittests/\n```\n\n# Full working python script\n\n```\n\nfrom Multimodal_AUV import run_auv_retraining, run_auv_inference, run_auv_preprocessing, run_AUV_training_from_scratch\nimport torch\nimport logging\n\nlogging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')\n\n#Parameters you want to control from outside the function:\n\ntraining_devices = [torch.device(\"cuda:0\")] if torch.cuda.is_available() else [torch.device(\"cpu\")]\n\nconst_bnn_prior_parameters = {\n \"prior_mu\": 0.0,\n \"prior_sigma\": 1.0,\n \"posterior_mu_init\": 0.0,\n \"posterior_rho_init\": -3.0,\n \"type\": \"Reparameterization\",\n \"moped_enable\": True,\n \"moped_delta\": 0.1,\n}\n\n#Now, call the function with all your desired parameters:\nrun_auv_retraining(\n root_dir='D:/Your/dataset/',\n devices=training_devices,\n const_bnn_prior_parameters=const_bnn_prior_parameters,\n num_classes=7, #Change this to the number of classes in your dataset\n\n #Optimizer/Training Parameters (all optimised for pretrained dataset):\n lr_multimodal=1e-5,\n multimodal_weight_decay=1e-5,\n epochs_multimodal=20,\n num_mc=5,\n bathy_patch_base=30,\n sss_patch_base=30,\n batch_size_multimodal=1,\n \n #Scheduler Parameters:\n scheduler_multimodal_step_size=7,\n scheduler_multimodal_gamma=0.752,\n)\n\nprint(\"Retraining process initiated.\")\n\nrun_auv_inference(\n data_directory= \"D:/dataset/\",\n batch_size= 4, \n output_csv =\"D:/csvs/inference_results.csv\",\n num_mc_samples = 5,\n num_classes = 7)\n\n\nrun_auv_preprocessing(\n raw_optical_images_folder = \"D:/raw dataset/\",\n geotiff_folder = \"D:/raw dataset/sonar/\",\n output_folder= \"D:/output/\",\n exiftool_path = r'C:exiftool-13.32_64\\exiftool-13.32_64\\exiftool(-k).exe', # Must point to the actual .exe file or \"/usr/bin/exiftool\" #for linux\n window_size_meters = 30.0,\n image_enhancement_method = \"AverageSubtraction\"\n)\n\nconst_bnn_prior_parameters = {\n \"prior_mu\": 0.0,\n \"prior_sigma\": 1.0,\n \"posterior_mu_init\": 0.0,\n \"posterior_rho_init\": -3.0,\n \"type\": \"Reparameterization\",\n \"moped_enable\": True,\n \"moped_delta\": 0.1,\n }\n\n# Call the refactored training function, passing only the core and dynamic parameters\nrun_AUV_training_from_scratch(\n const_bnn_prior_parameters=const_bnn_prior_parameters,\n # Dynamic parameters from args (all optimised for pretrained dataset):)\n lr_multimodal_model=1e-5,\n num_epochs_multimodal=20,\n num_mc=5,\n bathy_patch_base_raw=30.0,\n sss_patch_base_raw=30.0,\n batch_size_multimodal=1,\n # General pipeline parameters\n root_dir='D:/Your/dataset/',\n devices=training_devices,\n num_classes=7\n)\nprint(\"Training function called.\")\n```\n# \u2699\ufe0f Configuration \u2699\ufe0f\n\nAll core parameters for data processing, model training, and inference are controlled via **YAML configuration files**. This approach ensures reproducibility \ud83d\udd01, simplifies experimentation \ud83e\uddea, and facilitates seamless collaboration \ud83e\udd1d.\n\n**Key Configuration Areas**:\nThe configuration is organized to cover various stages of the AUV data processing and model lifecycle:\n\n### Data Management: \ud83d\udcca\n\nInput/Output Paths: Define locations for raw data (e.g., optical images \ud83d\udcf8, GeoTIFFs \ud83d\uddfa\ufe0f), processed outputs, and inference results.\n\nData Preparation Parameters: Specify settings like patch sizes forbathymetry \ud83d\udccf and SSS, image dimensions \ud83d\uddbc\ufe0f,, and relevant GeoTIFF channels.\n\n### Model Training & Retraining: \ud83e\udde0\n\nCore Training Parameters: Control fundamental aspects like learning rate \ud83d\udcc9, batch size \ud83d\udce6, number of epochs \u23f3, and optimization algorithms.\n\nModel Architecture: Configure choices such as model type (e.g., multimodal_bnn, unimodal_bnn), number of output classes, and specific layer dimensions.\n\nBayesian Neural Network (BNN) Settings: Parameters for BNN priors, if applicable.\n\n### Inference: \ud83d\udd2e\n\nPrediction Control: Define thresholds for classification and output formats for results.\n\n### Configuration Examples and Usage:\nBelow are examples reflecting the arguments used by various scripts within the project. These can be integrated into a single, comprehensive config.yaml file, or broken down into separate files for specific tasks.\n\n```\nYAML\n\n#Configuration File \n\n#General Project Settings (can be shared across scripts)\nglobal_settings:\n data_root_dir: \"/path/to/your/input_data/dataset\"\n output_base_dir: \"/path/to/your/project_outputs\"\n num_mc_samples: 20 # Common for BNN inference/evaluation\n multimodal_batch_size: 20 # Common batch size for multimodal models\n\n#--- Individual Script Configurations ---\n\n#Configuration for Example_training_from_scratch\ntraining_from_scratch:\n epochs_multimodal: 20\n lr_multimodal: 0.001\n # root_dir and batch_size_multimodal can inherit from global_settings or be overridden here\n\n#Configuration for Example_Retraining_model\nretraining_model:\n num_epochs_multimodal: 20 # Renamed from 'epochs_multimodal' in original script\n learning_rate_multimodal: 0.001 # Renamed from 'lr_multimodal'\n weight_decay_multimodal: 1e-5\n bathy_patch_base: 30\n sss_patch_base: 30\n # data_dir, batch_size_multimodal, num_mc_samples can inherit from global_settings or be overridden\n\n#Configuration for Example_Inference_model\ninference_model:\n output_csv: \"%(output_base_dir)s/inference_results/inference.csv\" # Example using global var\n batch_size: 4 # Specific batch size for inference\n\n#Configuration for your_script_name.py (e.g., for raw data processing)\nraw_data_processing:\n raw_optical_images_folder: \"%(data_root_dir)s/raw_auv_images\"\n geotiff_folder: \"%(data_root_dir)s/auv_geotiffs\"\n output_folder: \"%(output_base_dir)s/processed_auv_data\"\n exiftool_path: \"C:/exiftool/\" # Note: This might need to be OS-specific or relative\n window_size_meters: 30.0\n image_enhancement_method: \"AverageSubtraction\"\n```\n\n# \ud83e\udde0 Model Architecture \ud83c\udfd7\ufe0f\n\nThis project leverages sophisticated ** Multimodal Bayesian Neural Network (BNN)** architectures designed for robust data fusion and uncertainty quantification in underwater environments. The core design principles are **modularity** and **adaptability** , allowing for both unimodal and multimodal processing. \u2728\n\n## **1. Multimodal Fusion Architecture:** \ud83e\udd1d\nThe primary model (used in 2.Predict Benthic Habitat Class using a Pre-trained Model \ud83d\udc20, 3. Retrain a Pre-trained Model on a New Dataset \ud83d\udd04, 4. Train a New Multimodal Model from Scratch \ud83e\udde0) is designed to integrate information from different sensor modalities:\n* **Image Encoder:** A Convolutional Neural Network (CNN) backbone (e.g., a pre-trained ResNet, specifically adapted to be Bayesian) processes the optical imagery from AUVs. \ud83d\udcf8\n* **Bathymetric Sonar Encoder(s):** A Convolutional Neural Network (CNN) backbone (e.g., a pre-trained ResNet, specifically adapted to be Bayesian) processes the bathymetric sonar from AUVs. \ud83d\udccf\n* * **Side scan sonar Sonar Encoder(s):** A Convolutional Neural Network (CNN) backbone (e.g., a pre-trained ResNet, specifically adapted to be Bayesian) processes the Side scan sonar from AUVs. \ud83d\udce1\n* **Fusion Layer:** Features extracted from each modality's encoder are concatenated or combined using a dedicated fusion layer (e.g., a fully connected network, attention mechanism). This layer learns the optimal way to combine visual and acoustic information. \ud83d\udd17\n* **Prediction Head:** A final set of layers (often fully connected) takes the fused features and outputs predictions for the target task (e.g., benthic habitat classification \ud83d\udc20), with the Bayesian nature providing a distribution over these predictions.\n\n### Diagram of the Multimodal Network: \ud83d\uddbc\ufe0f\n\n\n**2. Bayesian Neural Network Implementation:** \ud83d\udca1\nThe \"Bayesian\" aspect is achieved by converting deterministic layers (e.g., Linear, Conv2D) into their probabilistic counterparts using `bayesian-torch`. This means:\n\n* **Weight Distributions:** Instead of learning fixed weights, the model learns **distributions over its weights**, allowing it to output a distribution of predictions for a given input.\ud83d\udcca\n* **Uncertainty Quantification:** The variance in these output predictions provides a direct measure of the model's confidence and **epistemic uncertainty**, which is vital for decision-making in ambiguous underwater settings. \ud83c\udf0a\n\n**3. Foundation Model Concept:** \ud83d\ude80\nIn addition, this project aims to provide a **retrainable foundation model**:\n* The architecture is general enough to be applicable across various underwater mapping tasks. \ud83c\udf10\n* It is pre-trained on a diverse dataset (e.g., Northern Britain benthic habitat data), providing strong initial feature representations.\ud83d\udcaa\n* Users can then **fine-tune** this pre-trained model (3. Retrain a Pre-trained Model on a New Dataset \ud83d\udd04) on their own smaller, specific datasets to adapt it to new areas or different classification schemes, significantly reducing training time and data requirements. \u23f1\ufe0f\n\n**4. Unimodal Models:** \ud83c\udfaf\nThe project also includes components (`unitmodal.py` in `train/` and potentially `base_models.py`) to train and evaluate models based on **single modalities** (e.g., image-only \ud83d\udcf8 or sonar-only \ud83d\udce1). This allows for ablation studies and comparison with the performance benefits of multimodal fusion.\n\n### Diagram of the Unimodal Networks: \ud83d\uddbc\ufe0f\n\n\n\n---\n# Contact\nHave questions about the project, found a bug, or want to contribute? Here are a few ways to reach out:\n\n* **GitHub Issues:** For any code-related questions, bug reports, or feature requests, please open an [Issue on this repository](https://github.com/sams-tom/multimodal-auv-bnn-project/issues). This is the preferred method for transparency and tracking.\n\n* **Email:** For more direct or confidential inquiries, you can reach me at [phd01tm@sams.ac.uk](mailto:phd01tm@sams.ac.uk).\n\n* **LinkedIn:** Connect with the project lead/team on LinkedIn:\n * [Tom Morgan](https://www.linkedin.com/in/tom-morgan-8a73b129b/)\n \n# Citations\n\n* **GitHub Repository (Code & Documentation):** [https://github.com/sams-tom/multimodal-auv-bnn-project](https://github.com/sams-tom/multimodal-auv-bnn-project)\n* **Hugging Face Models:** [https://huggingface.co/sams-tom/multimodal-auv-bnn-models](https://huggingface.co/sams-tom/multimodal-auv-bnn-models)\n* **Research Paper:** [In development]\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Multimodal AUV Bayesian Neural Networks for Underwater Environmental Understanding",
"version": "0.0.5",
"project_urls": {
"Documentation": "https://github.com/sams-tom/Multimodal-AUV/blob/master/README.md",
"Homepage": "https://github.com/sams-tom/Multimodal-AUV",
"Repository": "https://github.com/sams-tom/Multimodal-AUV"
},
"split_keywords": [
"auv",
" bayesian neural networks",
" underwater mapping",
" habitat classification",
" multimodal data",
" oceanography",
" geospatial-data",
" environmental-monitoring",
" uncertainty-quantification",
" computer-vision",
" remote-sensing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c9c19e0c91890e06e127fc4ce0400e29edf69c2952348cbe4630fd46b642bc87",
"md5": "54ee3b6ba7fc837b610b7bced7f2f808",
"sha256": "700c1a0d5a0c2bccb6cdbc3b70db0904ef7838e1438da880685d55898aca5092"
},
"downloads": -1,
"filename": "multimodal_auv-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "54ee3b6ba7fc837b610b7bced7f2f808",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.10,>=3.9",
"size": 93492,
"upload_time": "2025-08-06T10:31:08",
"upload_time_iso_8601": "2025-08-06T10:31:08.705931Z",
"url": "https://files.pythonhosted.org/packages/c9/c1/9e0c91890e06e127fc4ce0400e29edf69c2952348cbe4630fd46b642bc87/multimodal_auv-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a77eb37d030531dff14217705ce770ebf6320580b8ba24e17462628c4880f9c3",
"md5": "c1d9523d9bfea994ee520f0377a19363",
"sha256": "e56bafb6973653906cedf15a1335f2c8a8410d504ddd852743a9748e48818f30"
},
"downloads": -1,
"filename": "multimodal_auv-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "c1d9523d9bfea994ee520f0377a19363",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.10,>=3.9",
"size": 83553,
"upload_time": "2025-08-06T10:31:10",
"upload_time_iso_8601": "2025-08-06T10:31:10.448396Z",
"url": "https://files.pythonhosted.org/packages/a7/7e/b37d030531dff14217705ce770ebf6320580b8ba24e17462628c4880f9c3/multimodal_auv-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 10:31:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sams-tom",
"github_project": "Multimodal-AUV",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "multimodal-auv"
}