Hive-ML


NameHive-ML JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/MAIA-KTH/Hive_ML.git
SummaryPython package to run Machine Learning Experiments, within the Hive Framework.
upload_time2023-07-25 14:59:38
maintainer
docs_urlNone
authorBendazzoli Simone
requires_python>=3.8
licenseGPLv3
keywords machine learning image classification pcr medical image analysis dce mri radiomics feature selection radiodynamics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Hive-ML
[![Documentation Status](https://readthedocs.org/projects/hive-ml/badge/?version=latest)](https://hive-ml.readthedocs.io/en/latest/?badge=latest)

**Hive-ML** is a Python Package collecting the tools and scripts to run Machine Learning experiments on Radiological
Medical Imaging.

## Install

To install Hive-ML:

```shell
pip install hive-ml
```

or from GitHub:

```shell
git clone 
pip install -e Hive_ML
```

## Description

The **Hive-ML** workflow consists of several sequential steps, including *Radiomics extraction*,
*Sequential Forward Feature Selection*, and *Model Fitting*, reporting the classifier performances ( *ROC-AUC*,
*Sensitivity*,
*Specificity*, *Accuracy*) in a tabular format and tracking all the steps on an **MLFlow** server.

In addition, **Hive-ML** provides a *Docker Image*, *Kubernetes Deployment* and *Slurm Job*,
with the corresponding set of instructions to easily reproduce the experiments.

Finally, **Hive-ML** also support model serving through **MLFlow**, to provide easy access to the trained classifier
for future usage in model prediction.

#In the tutorial explained below, Hive-ML is used to predict the Pathological Complete Response after a Neo-Adjuvant
#chemotherapy, from DCE-MRI.

## Usage

![Hive-ML Pipeline](images/Radiodynamics_pipeline.png "Hive-ML Pipeline")
The Hive-ML workflow is controlled from a JSON configuration file, which the user can customize for each experiment run.

Example:

```json
    {
      "image_suffix": "_image.nii.gz",  # File suffix (or list of File suffixes) of the files containing the image volume.
      "mask_suffix": "_mask.nii.gz",    # File suffix (including file extension) of the files containing the segmentation mask of the ROI.
      "label_dict": {                   # Dictionary describing the classes. The key-value pair contains the label value as key (starting from 0) and the class description as value.
        "0": "non-pCR",
        "1": "pCR"
      },
      "models": {                       # Dictionary for all the classifiers to evaluate. Each element includes the classifier class name and an additional dictionary with the kwargs to pass to the classifier object.
        "rf": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "adab": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "knn": {},
        "lda": {},
        "qda": {},
        "logistic_regression": {},
        "svm": {
          "kernel": "rbf"
        },
        "naive": {}
      },
      "perfusion_maps": {                # Dictionary describing the perfusion maps to extract. Each element includes the perfusion map name and the file suffix used to save the perfusion map.
        "distance_map": "_distance_map.nii.gz",
        "distance_map_depth": {
          "suffix": "_distance_map_depth.nii.gz",
          "kwargs": [
            2
          ]
        },
        "ttp": "_ttp_map.nii.gz",
        "cbv": "_cbv_map.nii.gz",
        "cbf": "_cbf_map.nii.gz",
        "mtt": "_mtt_map.nii.gz"
     },
      "feature_selection": "SFFS",       # Type of Feature Selection to perform. Supported values are SFFS and PCA .
      "n_features": 30,                  # Number of features to preserve when performing Feature Selection.
      "n_folds": 5,                      # Number of folds to run cross-validation.
      "random_seed": 12345,              # Random seed number used when randomizing events and actions.
      "feature_aggregator": "SD"         # Aggregation strategy used when extracting features in the 4D. 
                                         # Supported values are: ``Flat`` (no aggregation, all features are preserved),
                                         #                       ``Mean`` (Average over the 4-th dimension),
                                         #                        ``SD`` (Standard Deviation over the 4-th dimension),
                                         #                        ``Mean_Norm`` (Independent channel-normalization, followed by average over the 4-th dimension),
                                         #                        ``SD_Norm`` (Independent channel-normalization, followed by SD over the 4-th dimension)
      "k_ensemble": [1,5],               # List of k values to select top-k best models in ensembling.
      "metric_best_model": "roc_auc",    # Classification Metric to consider when determining the best models from CV results.
      "reduction_best_model": "mean"     # Reduction to perform on CV scores to determine the best models.
    }
```

### Perfusion Maps Generation

Given a 4D Volume, to extract the perfusion maps (``TTP``, ``CBV``, ``CBF``, ``MTT``) run:

```shell
 Hive_ML_generate_perfusion_maps -i </path/to/data_folder> --config-file <config_file.json>
```

Fore more details, follow the Jupyter Notebook
Tutorial : [Generate Perfusion Maps](tutorials/0-Generate_Perfusion_Maps.ipynb)

![Perfusion Curve](images/Perfusion_curve.png "Perfusion Curve")
![Perfusion Maps](images/PMaps.png "Perfusion Maps")

### Feature Extraction

To extract Radiomics/Radiodynamics from the 4D Volume, run:

```shell
 Hive_ML_extract_radiomics --data-folder </path/to/data_folder> --config-file <config_file.json> --feature-param-file </path/to/radiomics_config.yaml --output-file </path/to/feature_file> 
```

![Feature Extraction](images/Feature_Extraction.png "Feature Extraction")

Fore more details, follow the Jupyter Notebook Tutorial : [Extract Features](tutorials/1-Extract_Features.ipynb)

### Feature Selection

To run Feature Selection:

```shell
 Hive_ML_feature_selection --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>
```

The Feature Selection report (in JSON format, including the selected features and validation scores for each classifier)
will be available at the following path:

```
$ROOT_FOLDER/<EXPERIMENT_ID>/SFFS
```

![Feature Selection](images/FS_MF.png "Feature Selection")

Fore more details, follow the Jupyter Notebook Tutorial : [Feature Selection](tutorials/2-Feature_Selection.ipynb)

### Model Fitting

To perform Model Fitting on the Selected features:

```shell
 Hive_ML_model_fitting --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>
```

The experiment validation reports, plots, and summaries will be available at the following path:

```
$ROOT_FOLDER/<EXPERIMENT_ID>
```

![Validation Plot Example](images/Validation_Plot.png "Validation Plot Example")

![CV](images/CV.png "CV")

Fore more details, follow the Jupyter Notebook Tutorial : [Model Fitting](tutorials/3-Model_Fitting.ipynb)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MAIA-KTH/Hive_ML.git",
    "name": "Hive-ML",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "machine learning,image classification,PCR,medical image analysis,DCE MRI,radiomics,feature selection,radiodynamics",
    "author": "Bendazzoli Simone",
    "author_email": "simben@kth.se",
    "download_url": "https://files.pythonhosted.org/packages/c1/27/5f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573/Hive_ML-1.0.1.tar.gz",
    "platform": "OS Independent",
    "description": "# Hive-ML\n[![Documentation Status](https://readthedocs.org/projects/hive-ml/badge/?version=latest)](https://hive-ml.readthedocs.io/en/latest/?badge=latest)\n\n**Hive-ML** is a Python Package collecting the tools and scripts to run Machine Learning experiments on Radiological\nMedical Imaging.\n\n## Install\n\nTo install Hive-ML:\n\n```shell\npip install hive-ml\n```\n\nor from GitHub:\n\n```shell\ngit clone \npip install -e Hive_ML\n```\n\n## Description\n\nThe **Hive-ML** workflow consists of several sequential steps, including *Radiomics extraction*,\n*Sequential Forward Feature Selection*, and *Model Fitting*, reporting the classifier performances ( *ROC-AUC*,\n*Sensitivity*,\n*Specificity*, *Accuracy*) in a tabular format and tracking all the steps on an **MLFlow** server.\n\nIn addition, **Hive-ML** provides a *Docker Image*, *Kubernetes Deployment* and *Slurm Job*,\nwith the corresponding set of instructions to easily reproduce the experiments.\n\nFinally, **Hive-ML** also support model serving through **MLFlow**, to provide easy access to the trained classifier\nfor future usage in model prediction.\n\n#In the tutorial explained below, Hive-ML is used to predict the Pathological Complete Response after a Neo-Adjuvant\n#chemotherapy, from DCE-MRI.\n\n## Usage\n\n![Hive-ML Pipeline](images/Radiodynamics_pipeline.png \"Hive-ML Pipeline\")\nThe Hive-ML workflow is controlled from a JSON configuration file, which the user can customize for each experiment run.\n\nExample:\n\n```json\n    {\n      \"image_suffix\": \"_image.nii.gz\",  # File suffix (or list of File suffixes) of the files containing the image volume.\n      \"mask_suffix\": \"_mask.nii.gz\",    # File suffix (including file extension) of the files containing the segmentation mask of the ROI.\n      \"label_dict\": {                   # Dictionary describing the classes. The key-value pair contains the label value as key (starting from 0) and the class description as value.\n        \"0\": \"non-pCR\",\n        \"1\": \"pCR\"\n      },\n      \"models\": {                       # Dictionary for all the classifiers to evaluate. Each element includes the classifier class name and an additional dictionary with the kwargs to pass to the classifier object.\n        \"rf\": {\n          \"criterion\": \"gini\",\n          \"n_estimators\": 100,\n          \"max_depth\": 10\n        },\n        \"adab\": {\n          \"criterion\": \"gini\",\n          \"n_estimators\": 100,\n          \"max_depth\": 10\n        },\n        \"knn\": {},\n        \"lda\": {},\n        \"qda\": {},\n        \"logistic_regression\": {},\n        \"svm\": {\n          \"kernel\": \"rbf\"\n        },\n        \"naive\": {}\n      },\n      \"perfusion_maps\": {                # Dictionary describing the perfusion maps to extract. Each element includes the perfusion map name and the file suffix used to save the perfusion map.\n        \"distance_map\": \"_distance_map.nii.gz\",\n        \"distance_map_depth\": {\n          \"suffix\": \"_distance_map_depth.nii.gz\",\n          \"kwargs\": [\n            2\n          ]\n        },\n        \"ttp\": \"_ttp_map.nii.gz\",\n        \"cbv\": \"_cbv_map.nii.gz\",\n        \"cbf\": \"_cbf_map.nii.gz\",\n        \"mtt\": \"_mtt_map.nii.gz\"\n     },\n      \"feature_selection\": \"SFFS\",       # Type of Feature Selection to perform. Supported values are SFFS and PCA .\n      \"n_features\": 30,                  # Number of features to preserve when performing Feature Selection.\n      \"n_folds\": 5,                      # Number of folds to run cross-validation.\n      \"random_seed\": 12345,              # Random seed number used when randomizing events and actions.\n      \"feature_aggregator\": \"SD\"         # Aggregation strategy used when extracting features in the 4D. \n                                         # Supported values are: ``Flat`` (no aggregation, all features are preserved),\n                                         #                       ``Mean`` (Average over the 4-th dimension),\n                                         #                        ``SD`` (Standard Deviation over the 4-th dimension),\n                                         #                        ``Mean_Norm`` (Independent channel-normalization, followed by average over the 4-th dimension),\n                                         #                        ``SD_Norm`` (Independent channel-normalization, followed by SD over the 4-th dimension)\n      \"k_ensemble\": [1,5],               # List of k values to select top-k best models in ensembling.\n      \"metric_best_model\": \"roc_auc\",    # Classification Metric to consider when determining the best models from CV results.\n      \"reduction_best_model\": \"mean\"     # Reduction to perform on CV scores to determine the best models.\n    }\n```\n\n### Perfusion Maps Generation\n\nGiven a 4D Volume, to extract the perfusion maps (``TTP``, ``CBV``, ``CBF``, ``MTT``) run:\n\n```shell\n Hive_ML_generate_perfusion_maps -i </path/to/data_folder> --config-file <config_file.json>\n```\n\nFore more details, follow the Jupyter Notebook\nTutorial : [Generate Perfusion Maps](tutorials/0-Generate_Perfusion_Maps.ipynb)\n\n![Perfusion Curve](images/Perfusion_curve.png \"Perfusion Curve\")\n![Perfusion Maps](images/PMaps.png \"Perfusion Maps\")\n\n### Feature Extraction\n\nTo extract Radiomics/Radiodynamics from the 4D Volume, run:\n\n```shell\n Hive_ML_extract_radiomics --data-folder </path/to/data_folder> --config-file <config_file.json> --feature-param-file </path/to/radiomics_config.yaml --output-file </path/to/feature_file> \n```\n\n![Feature Extraction](images/Feature_Extraction.png \"Feature Extraction\")\n\nFore more details, follow the Jupyter Notebook Tutorial : [Extract Features](tutorials/1-Extract_Features.ipynb)\n\n### Feature Selection\n\nTo run Feature Selection:\n\n```shell\n Hive_ML_feature_selection --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>\n```\n\nThe Feature Selection report (in JSON format, including the selected features and validation scores for each classifier)\nwill be available at the following path:\n\n```\n$ROOT_FOLDER/<EXPERIMENT_ID>/SFFS\n```\n\n![Feature Selection](images/FS_MF.png \"Feature Selection\")\n\nFore more details, follow the Jupyter Notebook Tutorial : [Feature Selection](tutorials/2-Feature_Selection.ipynb)\n\n### Model Fitting\n\nTo perform Model Fitting on the Selected features:\n\n```shell\n Hive_ML_model_fitting --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>\n```\n\nThe experiment validation reports, plots, and summaries will be available at the following path:\n\n```\n$ROOT_FOLDER/<EXPERIMENT_ID>\n```\n\n![Validation Plot Example](images/Validation_Plot.png \"Validation Plot Example\")\n\n![CV](images/CV.png \"CV\")\n\nFore more details, follow the Jupyter Notebook Tutorial : [Model Fitting](tutorials/3-Model_Fitting.ipynb)\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "Python package to run Machine Learning Experiments, within the Hive Framework.",
    "version": "1.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/MAIA-KTH/Hive_ML/issues",
        "Documentation": "https://hive-ml.readthedocs.io",
        "Homepage": "https://github.com/MAIA-KTH/Hive_ML.git",
        "Source Code": "https://github.com/MAIA-KTH/Hive_ML"
    },
    "split_keywords": [
        "machine learning",
        "image classification",
        "pcr",
        "medical image analysis",
        "dce mri",
        "radiomics",
        "feature selection",
        "radiodynamics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7ea36e0b8fca61d58bc02cde31a1f806fc9a1c084cc600d880c80fb002e56590",
                "md5": "eb5fe6a34cb1047e0682bc63e240dad3",
                "sha256": "315deea6876a435f4e4ebe6238eb60dd33f087295eddee226c783f65147fd417"
            },
            "downloads": -1,
            "filename": "Hive_ML-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eb5fe6a34cb1047e0682bc63e240dad3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 34443,
            "upload_time": "2023-07-25T14:59:36",
            "upload_time_iso_8601": "2023-07-25T14:59:36.748264Z",
            "url": "https://files.pythonhosted.org/packages/7e/a3/6e0b8fca61d58bc02cde31a1f806fc9a1c084cc600d880c80fb002e56590/Hive_ML-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c1275f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573",
                "md5": "67510ec5c6d9661fdd69a62d7cd07f83",
                "sha256": "91756a8945dcaf5f8d3c60dc67bbb6300d7e1624270dabcccdab3d3b95fdd4f4"
            },
            "downloads": -1,
            "filename": "Hive_ML-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "67510ec5c6d9661fdd69a62d7cd07f83",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 48064,
            "upload_time": "2023-07-25T14:59:38",
            "upload_time_iso_8601": "2023-07-25T14:59:38.235142Z",
            "url": "https://files.pythonhosted.org/packages/c1/27/5f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573/Hive_ML-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-25 14:59:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MAIA-KTH",
    "github_project": "Hive_ML",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "hive-ml"
}
        
Elapsed time: 0.75203s