# sclassifier
This python module allows to perform radio source classification analysis using different ML methods in a supervised/self-supervised or unsupervised way:
* convolutional neural networks (CNNs)
* convolutional autoencoders (CAEs)
* decision trees & LightGBM
* HDBSCAN clustering algorithm
* UMAP dimensionality reduction
* SimCLR & BYOL self-supervised frameworks
## **Status**
This software is under development. It requires python3 + tensorflow 2.x.
## **Credit**
This software is distributed with GPLv3 license. If you use it for your research, please add repository link or acknowledge authors in your papers.
## **Installation**
To build and install the package:
* Clone this repository in a local directory (e.g. $SRC_DIR):
```git clone https://github.com/SKA-INAF/sclassifier.git```
* Create a virtual environment with your preferred python version (e.g. python3.6) in a local install directory (e.g. INSTALL_DIR):
``` python3.6 -m venv $INSTALL_DIR```
* Activate your virtual environment:
```source $INSTALL_DIR/bin/activate```
* Install module dependencies listed in ```requirements.txt```:
``` pip install -r requirements.txt```
* Build and install package:
``` python setup build```
``` python setup install```
* If required (e.g. outside virtual env), add installation path to your ```PYTHONPATH``` environment variable:
``` export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python3.6/site-packages ```
## **Usage**
Several python scripts are provided in the ```scripts``` directory to run desired tasks, described below.
### **Image supervised classification with CNNs**
The script `run_classifier_nn.py` allows to perform binary and multi-class (single or multi-label) radio image (single- or multi-channel, FITS format) classification using customized or predefined CNN architectures (resnet18/resnet34/resnet50/resnet101). Customized networks can be built by user through input options, piling up stacks of Conv2D/MaxPool/BatchNorm/Dropout layers, enabled or disabled when desired. Several user options are provided to customize network architecture, data pre-processing and augmentation. A list if available with: ```python run_classifier_nn.py --help```.
Input data (train/validation) must be given in json format with the following structure:
```
{
"data": [
{
"filepaths": [
"G340.743+00.313_ch1.fits",
"G340.743+00.313_ch2.fits",
"G340.743+00.313_ch3.fits"
],
"sname": "G340.743+00.313",
"id": 6,
"label": "HII"
},
...
...
]
}
```
For multilabel classification the ```id``` and ```label``` keys must be lists.
Two run modes are supported: training, inference. To perform inference you need to specify the ```--predict``` option. To perform binary or multi-class classification you must specify the options ```--binary_class``` and ```--multilabel```, respectively.
To customize the desired class id/label names and relative targets, eventually remapping them with respect to values given in the input data list, you must specify the following options:
```
--nclasses=$NCLASSES
--classid_remap=$CLASSID_REMAP
--target_label_map=$TARGET_LABEL_MAP
--classid_label_map=$CLASSID_LABEL_MAP
--target_names=$TARGET_NAMES
```
For example:
```
NCLASSES=4
CLASS_PROBS='{"BACKGROUND":1.0,"COMPACT":0.1,"EXTENDED":1.0,"DIFFUSE":1.0}'
CLASSID_REMAP='{0:-1,1:0,2:1,3:2,4:3}'
TARGET_LABEL_MAP='{-1:"UNKNOWN",0:"BACKGROUND",1:"COMPACT",2:"EXTENDED",3:"DIFFUSE"}'
CLASSID_LABEL_MAP='{0:"UNKNOWN",1:"BACKGROUND",2:"COMPACT",3:"EXTENDED",4:"DIFFUSE"}'
TARGET_NAMES="BACKGROUND,COMPACT,EXTENDED,DIFFUSE"
```
Below we report some run examples:
* To train a custom model (2 conv layers + 1 dense layer) from scratch:
```
python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV --nepochs=10 \
--nfilters_cnn=16,32 --kernsizes_cnn=3,3 --strides_cnn=1,1 --add_maxpooling_layer \
--add_dense_layer --dense_layer_sizes=16 \
--add_dropout --dropout_rate=0.4 --add_conv_dropout --conv_dropout_rate=0.2 \
--batch_size=64 --optimizer=adam --learning_rate=1e-4 \
--augment --augmenter=cnn --augment_scale_factor=5 \
--resize_size=64 --scale_to_abs_max
```
* To train a predefined model (resnet18) using pre-trained backbone .h5 weights (e.g. $WEIGHTFILE):
```
python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV [OPTIONS] \
--use_predefined_arch --predefined_arch=resnet18 --weightfile_backbone=$WEIGHTFILE
```
* To perform inference with a saved .h5 model (e.g. $WEIGHTFILE) and weights (e.g. $WEIGHTFILE):
```
python run_classifier_nn.py --datalist=$DATALIST_TEST [OPTIONS] \
--modelfile=$MODELFILE --weightfile=$WEIGHTFILE [OPTIONS] \
--predict
```
### **Image feature extraction with CAE**
WRITE ME
### **Image feature extraction with SimCLR**
WRITE ME
### **Feature reduction with UMAP**
WRITE ME
### **Clustering feature data with HDBSCAN**
WRITE ME
Raw data
{
"_id": null,
"home_page": "https://github.com/SKA-INAF/sclassifier",
"name": "sclassifier",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "radio,source,classification",
"author": "Simone Riggi",
"author_email": "simone.riggi@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/86/89/11f8969b6abea144461304f0fdf7df9502e66a9fe43dd6cf2d9a42484ef8/sclassifier-1.0.6.tar.gz",
"platform": null,
"description": "# sclassifier\nThis python module allows to perform radio source classification analysis using different ML methods in a supervised/self-supervised or unsupervised way: \n* convolutional neural networks (CNNs) \n* convolutional autoencoders (CAEs) \n* decision trees & LightGBM \n* HDBSCAN clustering algorithm \n* UMAP dimensionality reduction \n* SimCLR & BYOL self-supervised frameworks \n\n## **Status**\nThis software is under development. It requires python3 + tensorflow 2.x. \n\n## **Credit**\nThis software is distributed with GPLv3 license. If you use it for your research, please add repository link or acknowledge authors in your papers. \n\n## **Installation** \n\nTo build and install the package: \n\n* Clone this repository in a local directory (e.g. $SRC_DIR): \n ```git clone https://github.com/SKA-INAF/sclassifier.git```\n* Create a virtual environment with your preferred python version (e.g. python3.6) in a local install directory (e.g. INSTALL_DIR): \n ``` python3.6 -m venv $INSTALL_DIR``` \n* Activate your virtual environment: \n ```source $INSTALL_DIR/bin/activate```\n* Install module dependencies listed in ```requirements.txt```: \n ``` pip install -r requirements.txt``` \n* Build and install package: \n ``` python setup build``` \n ``` python setup install``` \n* If required (e.g. outside virtual env), add installation path to your ```PYTHONPATH``` environment variable: \n ``` export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python3.6/site-packages ```\n\n## **Usage**\nSeveral python scripts are provided in the ```scripts``` directory to run desired tasks, described below. \n\n### **Image supervised classification with CNNs**\nThe script `run_classifier_nn.py` allows to perform binary and multi-class (single or multi-label) radio image (single- or multi-channel, FITS format) classification using customized or predefined CNN architectures (resnet18/resnet34/resnet50/resnet101). Customized networks can be built by user through input options, piling up stacks of Conv2D/MaxPool/BatchNorm/Dropout layers, enabled or disabled when desired. Several user options are provided to customize network architecture, data pre-processing and augmentation. A list if available with: ```python run_classifier_nn.py --help```. \n\nInput data (train/validation) must be given in json format with the following structure: \n\n```\n{ \n \"data\": [ \n { \n \"filepaths\": [ \n \"G340.743+00.313_ch1.fits\", \n \"G340.743+00.313_ch2.fits\", \n \"G340.743+00.313_ch3.fits\" \n ], \n \"sname\": \"G340.743+00.313\", \n \"id\": 6, \n \"label\": \"HII\" \n }, \n ...\n ...\n ] \n} \n``` \n\nFor multilabel classification the ```id``` and ```label``` keys must be lists. \n\nTwo run modes are supported: training, inference. To perform inference you need to specify the ```--predict``` option. To perform binary or multi-class classification you must specify the options ```--binary_class``` and ```--multilabel```, respectively. \n\nTo customize the desired class id/label names and relative targets, eventually remapping them with respect to values given in the input data list, you must specify the following options: \n\n```\n--nclasses=$NCLASSES \n--classid_remap=$CLASSID_REMAP \n--target_label_map=$TARGET_LABEL_MAP \n--classid_label_map=$CLASSID_LABEL_MAP \n--target_names=$TARGET_NAMES \n```\n\nFor example: \n \n```\nNCLASSES=4 \nCLASS_PROBS='{\"BACKGROUND\":1.0,\"COMPACT\":0.1,\"EXTENDED\":1.0,\"DIFFUSE\":1.0}' \nCLASSID_REMAP='{0:-1,1:0,2:1,3:2,4:3}' \nTARGET_LABEL_MAP='{-1:\"UNKNOWN\",0:\"BACKGROUND\",1:\"COMPACT\",2:\"EXTENDED\",3:\"DIFFUSE\"}' \nCLASSID_LABEL_MAP='{0:\"UNKNOWN\",1:\"BACKGROUND\",2:\"COMPACT\",3:\"EXTENDED\",4:\"DIFFUSE\"}' \nTARGET_NAMES=\"BACKGROUND,COMPACT,EXTENDED,DIFFUSE\"\n``` \n\nBelow we report some run examples:\n\n* To train a custom model (2 conv layers + 1 dense layer) from scratch: \n ```\n python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV --nepochs=10 \\ \n --nfilters_cnn=16,32 --kernsizes_cnn=3,3 --strides_cnn=1,1 --add_maxpooling_layer \\ \n --add_dense_layer --dense_layer_sizes=16 \\ \n --add_dropout --dropout_rate=0.4 --add_conv_dropout --conv_dropout_rate=0.2 \\ \n --batch_size=64 --optimizer=adam --learning_rate=1e-4 \\ \n --augment --augmenter=cnn --augment_scale_factor=5 \\ \n --resize_size=64 --scale_to_abs_max\n ``` \n \n* To train a predefined model (resnet18) using pre-trained backbone .h5 weights (e.g. $WEIGHTFILE): \n ```\n python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV [OPTIONS] \\ \n --use_predefined_arch --predefined_arch=resnet18 --weightfile_backbone=$WEIGHTFILE \n ``` \n\n* To perform inference with a saved .h5 model (e.g. $WEIGHTFILE) and weights (e.g. $WEIGHTFILE): \n ```\n python run_classifier_nn.py --datalist=$DATALIST_TEST [OPTIONS] \\ \n --modelfile=$MODELFILE --weightfile=$WEIGHTFILE [OPTIONS] \\ \n --predict\n ``` \n\n### **Image feature extraction with CAE**\nWRITE ME\n\n### **Image feature extraction with SimCLR**\nWRITE ME\n\n### **Feature reduction with UMAP**\nWRITE ME\n\n### **Clustering feature data with HDBSCAN**\nWRITE ME",
"bugtrack_url": null,
"license": "GPL3",
"summary": "Source classification using supervised and self-supervised learning",
"version": "1.0.6",
"project_urls": {
"Download": "https://github.com/SKA-INAF/sclassifier/archive/refs/tags/v1.0.6.tar.gz",
"Homepage": "https://github.com/SKA-INAF/sclassifier"
},
"split_keywords": [
"radio",
"source",
"classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "868911f8969b6abea144461304f0fdf7df9502e66a9fe43dd6cf2d9a42484ef8",
"md5": "54bf85546d38f6e888ac36dbf2bdd77a",
"sha256": "f486051d1c4305f0ab2f5c92d086e4d6e15863da83a958a5bd9163ee4e0b95cb"
},
"downloads": -1,
"filename": "sclassifier-1.0.6.tar.gz",
"has_sig": false,
"md5_digest": "54bf85546d38f6e888ac36dbf2bdd77a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 254915,
"upload_time": "2024-03-18T16:18:43",
"upload_time_iso_8601": "2024-03-18T16:18:43.619484Z",
"url": "https://files.pythonhosted.org/packages/86/89/11f8969b6abea144461304f0fdf7df9502e66a9fe43dd6cf2d9a42484ef8/sclassifier-1.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-18 16:18:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SKA-INAF",
"github_project": "sclassifier",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "sclassifier"
}