sclassifier

Name	sclassifier JSON
Version	1.0.6 JSON
	download
home_page	https://github.com/SKA-INAF/sclassifier
Summary	Source classification using supervised and self-supervised learning
upload_time	2024-03-18 16:18:43
maintainer
docs_url	None
author	Simone Riggi
requires_python
license	GPL3
keywords	radio source classification
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # sclassifier
This python module allows to perform radio source classification analysis using different ML methods in a supervised/self-supervised or unsupervised way: 
* convolutional neural networks (CNNs)    
* convolutional autoencoders (CAEs)   
* decision trees & LightGBM  
* HDBSCAN clustering algorithm   
* UMAP dimensionality reduction   
* SimCLR & BYOL self-supervised frameworks   

## **Status**
This software is under development. It requires python3 + tensorflow 2.x. 

## **Credit**
This software is distributed with GPLv3 license. If you use it for your research, please add repository link or acknowledge authors in your papers.   

## **Installation**  

To build and install the package:    

* Clone this repository in a local directory (e.g. $SRC_DIR):   
  ```git clone https://github.com/SKA-INAF/sclassifier.git```
* Create a virtual environment with your preferred python version (e.g. python3.6) in a local install directory (e.g. INSTALL_DIR):   
  ``` python3.6 -m venv $INSTALL_DIR```   
* Activate your virtual environment:   
  ```source $INSTALL_DIR/bin/activate```
* Install module dependencies listed in ```requirements.txt```:    
  ``` pip install -r requirements.txt```  
* Build and install package:   
  ``` python setup build```   
  ``` python setup install```   
* If required (e.g. outside virtual env), add installation path to your ```PYTHONPATH``` environment variable:   
  ``` export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python3.6/site-packages ```

## **Usage**
Several python scripts are provided in the ```scripts``` directory to run desired tasks, described below.  

### **Image supervised classification with CNNs**
The script `run_classifier_nn.py` allows to perform binary and multi-class (single or multi-label) radio image (single- or multi-channel, FITS format) classification using customized or predefined CNN architectures (resnet18/resnet34/resnet50/resnet101). Customized networks can be built by user through input options, piling up stacks of Conv2D/MaxPool/BatchNorm/Dropout layers, enabled or disabled when desired. Several user options are provided to customize network architecture, data pre-processing and augmentation. A list if available with: ```python run_classifier_nn.py --help```.    

Input data (train/validation) must be given in json format with the following structure:    

```
{  
  "data": [    
    {    
      "filepaths": [     
        "G340.743+00.313_ch1.fits",    
        "G340.743+00.313_ch2.fits",    
        "G340.743+00.313_ch3.fits"   
      ],    
      "sname": "G340.743+00.313",   
      "id": 6,   
      "label": "HII"    
    },    
    ...
    ...
  ]   
}   
```    

For multilabel classification the ```id``` and ```label``` keys must be lists.    

Two run modes are supported: training, inference. To perform inference you need to specify the ```--predict``` option. To perform binary or multi-class classification you must specify the options ```--binary_class``` and ```--multilabel```, respectively. 

To customize the desired class id/label names and relative targets, eventually remapping them with respect to values given in the input data list, you must specify the following options:   

```
--nclasses=$NCLASSES     
--classid_remap=$CLASSID_REMAP    
--target_label_map=$TARGET_LABEL_MAP      
--classid_label_map=$CLASSID_LABEL_MAP     
--target_names=$TARGET_NAMES     
```

For example:   
     
```
NCLASSES=4     
CLASS_PROBS='{"BACKGROUND":1.0,"COMPACT":0.1,"EXTENDED":1.0,"DIFFUSE":1.0}'    
CLASSID_REMAP='{0:-1,1:0,2:1,3:2,4:3}'    
TARGET_LABEL_MAP='{-1:"UNKNOWN",0:"BACKGROUND",1:"COMPACT",2:"EXTENDED",3:"DIFFUSE"}'    
CLASSID_LABEL_MAP='{0:"UNKNOWN",1:"BACKGROUND",2:"COMPACT",3:"EXTENDED",4:"DIFFUSE"}'    
TARGET_NAMES="BACKGROUND,COMPACT,EXTENDED,DIFFUSE"
```    

Below we report some run examples:

* To train a custom model (2 conv layers + 1 dense layer) from scratch:   
  ```
  python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV --nepochs=10 \    
    --nfilters_cnn=16,32 --kernsizes_cnn=3,3 --strides_cnn=1,1 --add_maxpooling_layer \    
    --add_dense_layer --dense_layer_sizes=16 \    
    --add_dropout --dropout_rate=0.4 --add_conv_dropout --conv_dropout_rate=0.2 \  
    --batch_size=64 --optimizer=adam --learning_rate=1e-4 \    
    --augment --augmenter=cnn --augment_scale_factor=5 \    
    --resize_size=64 --scale_to_abs_max
  ```   
  
* To train a predefined model (resnet18) using pre-trained backbone .h5 weights (e.g. $WEIGHTFILE):    
  ```
  python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV [OPTIONS] \    
    --use_predefined_arch --predefined_arch=resnet18 --weightfile_backbone=$WEIGHTFILE 
  ```    

* To perform inference with a saved .h5 model (e.g. $WEIGHTFILE) and weights (e.g. $WEIGHTFILE):     
  ```
  python run_classifier_nn.py --datalist=$DATALIST_TEST [OPTIONS] \    
    --modelfile=$MODELFILE --weightfile=$WEIGHTFILE [OPTIONS] \    
    --predict
  ```    

### **Image feature extraction with CAE**
WRITE ME

### **Image feature extraction with SimCLR**
WRITE ME

### **Feature reduction with UMAP**
WRITE ME

### **Clustering feature data with HDBSCAN**
WRITE ME

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SKA-INAF/sclassifier",
    "name": "sclassifier",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "radio,source,classification",
    "author": "Simone Riggi",
    "author_email": "simone.riggi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/86/89/11f8969b6abea144461304f0fdf7df9502e66a9fe43dd6cf2d9a42484ef8/sclassifier-1.0.6.tar.gz",
    "platform": null,
    "description": "# sclassifier\nThis python module allows to perform radio source classification analysis using different ML methods in a supervised/self-supervised or unsupervised way: \n* convolutional neural networks (CNNs)    \n* convolutional autoencoders (CAEs)   \n* decision trees & LightGBM  \n* HDBSCAN clustering algorithm   \n* UMAP dimensionality reduction   \n* SimCLR & BYOL self-supervised frameworks   \n\n## **Status**\nThis software is under development. It requires python3 + tensorflow 2.x. \n\n## **Credit**\nThis software is distributed with GPLv3 license. If you use it for your research, please add repository link or acknowledge authors in your papers.   \n\n## **Installation**  \n\nTo build and install the package:    \n\n* Clone this repository in a local directory (e.g. $SRC_DIR):   \n  ```git clone https://github.com/SKA-INAF/sclassifier.git```\n* Create a virtual environment with your preferred python version (e.g. python3.6) in a local install directory (e.g. INSTALL_DIR):   \n  ``` python3.6 -m venv $INSTALL_DIR```   \n* Activate your virtual environment:   \n  ```source $INSTALL_DIR/bin/activate```\n* Install module dependencies listed in ```requirements.txt```:    \n  ``` pip install -r requirements.txt```  \n* Build and install package:   \n  ``` python setup build```   \n  ``` python setup install```   \n* If required (e.g. outside virtual env), add installation path to your ```PYTHONPATH``` environment variable:   \n  ``` export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python3.6/site-packages ```\n\n## **Usage**\nSeveral python scripts are provided in the ```scripts``` directory to run desired tasks, described below.  \n\n### **Image supervised classification with CNNs**\nThe script `run_classifier_nn.py` allows to perform binary and multi-class (single or multi-label) radio image (single- or multi-channel, FITS format) classification using customized or predefined CNN architectures (resnet18/resnet34/resnet50/resnet101). Customized networks can be built by user through input options, piling up stacks of Conv2D/MaxPool/BatchNorm/Dropout layers, enabled or disabled when desired. Several user options are provided to customize network architecture, data pre-processing and augmentation. A list if available with: ```python run_classifier_nn.py --help```.    \n\nInput data (train/validation) must be given in json format with the following structure:    \n\n```\n{  \n  \"data\": [    \n    {    \n      \"filepaths\": [     \n        \"G340.743+00.313_ch1.fits\",    \n        \"G340.743+00.313_ch2.fits\",    \n        \"G340.743+00.313_ch3.fits\"   \n      ],    \n      \"sname\": \"G340.743+00.313\",   \n      \"id\": 6,   \n      \"label\": \"HII\"    \n    },    \n    ...\n    ...\n  ]   \n}   \n```    \n\nFor multilabel classification the ```id``` and ```label``` keys must be lists.    \n\nTwo run modes are supported: training, inference. To perform inference you need to specify the ```--predict``` option. To perform binary or multi-class classification you must specify the options ```--binary_class``` and ```--multilabel```, respectively. \n\nTo customize the desired class id/label names and relative targets, eventually remapping them with respect to values given in the input data list, you must specify the following options:   \n\n```\n--nclasses=$NCLASSES     \n--classid_remap=$CLASSID_REMAP    \n--target_label_map=$TARGET_LABEL_MAP      \n--classid_label_map=$CLASSID_LABEL_MAP     \n--target_names=$TARGET_NAMES     \n```\n\nFor example:   \n     \n```\nNCLASSES=4     \nCLASS_PROBS='{\"BACKGROUND\":1.0,\"COMPACT\":0.1,\"EXTENDED\":1.0,\"DIFFUSE\":1.0}'    \nCLASSID_REMAP='{0:-1,1:0,2:1,3:2,4:3}'    \nTARGET_LABEL_MAP='{-1:\"UNKNOWN\",0:\"BACKGROUND\",1:\"COMPACT\",2:\"EXTENDED\",3:\"DIFFUSE\"}'    \nCLASSID_LABEL_MAP='{0:\"UNKNOWN\",1:\"BACKGROUND\",2:\"COMPACT\",3:\"EXTENDED\",4:\"DIFFUSE\"}'    \nTARGET_NAMES=\"BACKGROUND,COMPACT,EXTENDED,DIFFUSE\"\n```    \n\nBelow we report some run examples:\n\n* To train a custom model (2 conv layers + 1 dense layer) from scratch:   \n  ```\n  python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV --nepochs=10 \\    \n    --nfilters_cnn=16,32 --kernsizes_cnn=3,3 --strides_cnn=1,1 --add_maxpooling_layer \\    \n    --add_dense_layer --dense_layer_sizes=16 \\    \n    --add_dropout --dropout_rate=0.4 --add_conv_dropout --conv_dropout_rate=0.2 \\  \n    --batch_size=64 --optimizer=adam --learning_rate=1e-4 \\    \n    --augment --augmenter=cnn --augment_scale_factor=5 \\    \n    --resize_size=64 --scale_to_abs_max\n  ```   \n  \n* To train a predefined model (resnet18) using pre-trained backbone .h5 weights (e.g. $WEIGHTFILE):    \n  ```\n  python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV [OPTIONS] \\    \n    --use_predefined_arch --predefined_arch=resnet18 --weightfile_backbone=$WEIGHTFILE \n  ```    \n\n* To perform inference with a saved .h5 model (e.g. $WEIGHTFILE) and weights (e.g. $WEIGHTFILE):     \n  ```\n  python run_classifier_nn.py --datalist=$DATALIST_TEST [OPTIONS] \\    \n    --modelfile=$MODELFILE --weightfile=$WEIGHTFILE [OPTIONS] \\    \n    --predict\n  ```    \n\n### **Image feature extraction with CAE**\nWRITE ME\n\n### **Image feature extraction with SimCLR**\nWRITE ME\n\n### **Feature reduction with UMAP**\nWRITE ME\n\n### **Clustering feature data with HDBSCAN**\nWRITE ME",
    "bugtrack_url": null,
    "license": "GPL3",
    "summary": "Source classification using supervised and self-supervised learning",
    "version": "1.0.6",
    "project_urls": {
        "Download": "https://github.com/SKA-INAF/sclassifier/archive/refs/tags/v1.0.6.tar.gz",
        "Homepage": "https://github.com/SKA-INAF/sclassifier"
    },
    "split_keywords": [
        "radio",
        "source",
        "classification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "868911f8969b6abea144461304f0fdf7df9502e66a9fe43dd6cf2d9a42484ef8",
                "md5": "54bf85546d38f6e888ac36dbf2bdd77a",
                "sha256": "f486051d1c4305f0ab2f5c92d086e4d6e15863da83a958a5bd9163ee4e0b95cb"
            },
            "downloads": -1,
            "filename": "sclassifier-1.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "54bf85546d38f6e888ac36dbf2bdd77a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 254915,
            "upload_time": "2024-03-18T16:18:43",
            "upload_time_iso_8601": "2024-03-18T16:18:43.619484Z",
            "url": "https://files.pythonhosted.org/packages/86/89/11f8969b6abea144461304f0fdf7df9502e66a9fe43dd6cf2d9a42484ef8/sclassifier-1.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-18 16:18:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SKA-INAF",
    "github_project": "sclassifier",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "sclassifier"
}

Simone Riggi