# Hive-ML
[![Documentation Status](https://readthedocs.org/projects/hive-ml/badge/?version=latest)](https://hive-ml.readthedocs.io/en/latest/?badge=latest)
**Hive-ML** is a Python Package collecting the tools and scripts to run Machine Learning experiments on Radiological
Medical Imaging.
## Install
To install Hive-ML:
```shell
pip install hive-ml
```
or from GitHub:
```shell
git clone
pip install -e Hive_ML
```
## Description
The **Hive-ML** workflow consists of several sequential steps, including *Radiomics extraction*,
*Sequential Forward Feature Selection*, and *Model Fitting*, reporting the classifier performances ( *ROC-AUC*,
*Sensitivity*,
*Specificity*, *Accuracy*) in a tabular format and tracking all the steps on an **MLFlow** server.
In addition, **Hive-ML** provides a *Docker Image*, *Kubernetes Deployment* and *Slurm Job*,
with the corresponding set of instructions to easily reproduce the experiments.
Finally, **Hive-ML** also support model serving through **MLFlow**, to provide easy access to the trained classifier
for future usage in model prediction.
#In the tutorial explained below, Hive-ML is used to predict the Pathological Complete Response after a Neo-Adjuvant
#chemotherapy, from DCE-MRI.
## Usage
![Hive-ML Pipeline](images/Radiodynamics_pipeline.png "Hive-ML Pipeline")
The Hive-ML workflow is controlled from a JSON configuration file, which the user can customize for each experiment run.
Example:
```json
{
"image_suffix": "_image.nii.gz", # File suffix (or list of File suffixes) of the files containing the image volume.
"mask_suffix": "_mask.nii.gz", # File suffix (including file extension) of the files containing the segmentation mask of the ROI.
"label_dict": { # Dictionary describing the classes. The key-value pair contains the label value as key (starting from 0) and the class description as value.
"0": "non-pCR",
"1": "pCR"
},
"models": { # Dictionary for all the classifiers to evaluate. Each element includes the classifier class name and an additional dictionary with the kwargs to pass to the classifier object.
"rf": {
"criterion": "gini",
"n_estimators": 100,
"max_depth": 10
},
"adab": {
"criterion": "gini",
"n_estimators": 100,
"max_depth": 10
},
"knn": {},
"lda": {},
"qda": {},
"logistic_regression": {},
"svm": {
"kernel": "rbf"
},
"naive": {}
},
"perfusion_maps": { # Dictionary describing the perfusion maps to extract. Each element includes the perfusion map name and the file suffix used to save the perfusion map.
"distance_map": "_distance_map.nii.gz",
"distance_map_depth": {
"suffix": "_distance_map_depth.nii.gz",
"kwargs": [
2
]
},
"ttp": "_ttp_map.nii.gz",
"cbv": "_cbv_map.nii.gz",
"cbf": "_cbf_map.nii.gz",
"mtt": "_mtt_map.nii.gz"
},
"feature_selection": "SFFS", # Type of Feature Selection to perform. Supported values are SFFS and PCA .
"n_features": 30, # Number of features to preserve when performing Feature Selection.
"n_folds": 5, # Number of folds to run cross-validation.
"random_seed": 12345, # Random seed number used when randomizing events and actions.
"feature_aggregator": "SD" # Aggregation strategy used when extracting features in the 4D.
# Supported values are: ``Flat`` (no aggregation, all features are preserved),
# ``Mean`` (Average over the 4-th dimension),
# ``SD`` (Standard Deviation over the 4-th dimension),
# ``Mean_Norm`` (Independent channel-normalization, followed by average over the 4-th dimension),
# ``SD_Norm`` (Independent channel-normalization, followed by SD over the 4-th dimension)
"k_ensemble": [1,5], # List of k values to select top-k best models in ensembling.
"metric_best_model": "roc_auc", # Classification Metric to consider when determining the best models from CV results.
"reduction_best_model": "mean" # Reduction to perform on CV scores to determine the best models.
}
```
### Perfusion Maps Generation
Given a 4D Volume, to extract the perfusion maps (``TTP``, ``CBV``, ``CBF``, ``MTT``) run:
```shell
Hive_ML_generate_perfusion_maps -i </path/to/data_folder> --config-file <config_file.json>
```
Fore more details, follow the Jupyter Notebook
Tutorial : [Generate Perfusion Maps](tutorials/0-Generate_Perfusion_Maps.ipynb)
![Perfusion Curve](images/Perfusion_curve.png "Perfusion Curve")
![Perfusion Maps](images/PMaps.png "Perfusion Maps")
### Feature Extraction
To extract Radiomics/Radiodynamics from the 4D Volume, run:
```shell
Hive_ML_extract_radiomics --data-folder </path/to/data_folder> --config-file <config_file.json> --feature-param-file </path/to/radiomics_config.yaml --output-file </path/to/feature_file>
```
![Feature Extraction](images/Feature_Extraction.png "Feature Extraction")
Fore more details, follow the Jupyter Notebook Tutorial : [Extract Features](tutorials/1-Extract_Features.ipynb)
### Feature Selection
To run Feature Selection:
```shell
Hive_ML_feature_selection --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>
```
The Feature Selection report (in JSON format, including the selected features and validation scores for each classifier)
will be available at the following path:
```
$ROOT_FOLDER/<EXPERIMENT_ID>/SFFS
```
![Feature Selection](images/FS_MF.png "Feature Selection")
Fore more details, follow the Jupyter Notebook Tutorial : [Feature Selection](tutorials/2-Feature_Selection.ipynb)
### Model Fitting
To perform Model Fitting on the Selected features:
```shell
Hive_ML_model_fitting --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>
```
The experiment validation reports, plots, and summaries will be available at the following path:
```
$ROOT_FOLDER/<EXPERIMENT_ID>
```
![Validation Plot Example](images/Validation_Plot.png "Validation Plot Example")
![CV](images/CV.png "CV")
Fore more details, follow the Jupyter Notebook Tutorial : [Model Fitting](tutorials/3-Model_Fitting.ipynb)
Raw data
{
"_id": null,
"home_page": "https://github.com/MAIA-KTH/Hive_ML.git",
"name": "Hive-ML",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "machine learning,image classification,PCR,medical image analysis,DCE MRI,radiomics,feature selection,radiodynamics",
"author": "Bendazzoli Simone",
"author_email": "simben@kth.se",
"download_url": "https://files.pythonhosted.org/packages/c1/27/5f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573/Hive_ML-1.0.1.tar.gz",
"platform": "OS Independent",
"description": "# Hive-ML\n[![Documentation Status](https://readthedocs.org/projects/hive-ml/badge/?version=latest)](https://hive-ml.readthedocs.io/en/latest/?badge=latest)\n\n**Hive-ML** is a Python Package collecting the tools and scripts to run Machine Learning experiments on Radiological\nMedical Imaging.\n\n## Install\n\nTo install Hive-ML:\n\n```shell\npip install hive-ml\n```\n\nor from GitHub:\n\n```shell\ngit clone \npip install -e Hive_ML\n```\n\n## Description\n\nThe **Hive-ML** workflow consists of several sequential steps, including *Radiomics extraction*,\n*Sequential Forward Feature Selection*, and *Model Fitting*, reporting the classifier performances ( *ROC-AUC*,\n*Sensitivity*,\n*Specificity*, *Accuracy*) in a tabular format and tracking all the steps on an **MLFlow** server.\n\nIn addition, **Hive-ML** provides a *Docker Image*, *Kubernetes Deployment* and *Slurm Job*,\nwith the corresponding set of instructions to easily reproduce the experiments.\n\nFinally, **Hive-ML** also support model serving through **MLFlow**, to provide easy access to the trained classifier\nfor future usage in model prediction.\n\n#In the tutorial explained below, Hive-ML is used to predict the Pathological Complete Response after a Neo-Adjuvant\n#chemotherapy, from DCE-MRI.\n\n## Usage\n\n![Hive-ML Pipeline](images/Radiodynamics_pipeline.png \"Hive-ML Pipeline\")\nThe Hive-ML workflow is controlled from a JSON configuration file, which the user can customize for each experiment run.\n\nExample:\n\n```json\n {\n \"image_suffix\": \"_image.nii.gz\", # File suffix (or list of File suffixes) of the files containing the image volume.\n \"mask_suffix\": \"_mask.nii.gz\", # File suffix (including file extension) of the files containing the segmentation mask of the ROI.\n \"label_dict\": { # Dictionary describing the classes. The key-value pair contains the label value as key (starting from 0) and the class description as value.\n \"0\": \"non-pCR\",\n \"1\": \"pCR\"\n },\n \"models\": { # Dictionary for all the classifiers to evaluate. Each element includes the classifier class name and an additional dictionary with the kwargs to pass to the classifier object.\n \"rf\": {\n \"criterion\": \"gini\",\n \"n_estimators\": 100,\n \"max_depth\": 10\n },\n \"adab\": {\n \"criterion\": \"gini\",\n \"n_estimators\": 100,\n \"max_depth\": 10\n },\n \"knn\": {},\n \"lda\": {},\n \"qda\": {},\n \"logistic_regression\": {},\n \"svm\": {\n \"kernel\": \"rbf\"\n },\n \"naive\": {}\n },\n \"perfusion_maps\": { # Dictionary describing the perfusion maps to extract. Each element includes the perfusion map name and the file suffix used to save the perfusion map.\n \"distance_map\": \"_distance_map.nii.gz\",\n \"distance_map_depth\": {\n \"suffix\": \"_distance_map_depth.nii.gz\",\n \"kwargs\": [\n 2\n ]\n },\n \"ttp\": \"_ttp_map.nii.gz\",\n \"cbv\": \"_cbv_map.nii.gz\",\n \"cbf\": \"_cbf_map.nii.gz\",\n \"mtt\": \"_mtt_map.nii.gz\"\n },\n \"feature_selection\": \"SFFS\", # Type of Feature Selection to perform. Supported values are SFFS and PCA .\n \"n_features\": 30, # Number of features to preserve when performing Feature Selection.\n \"n_folds\": 5, # Number of folds to run cross-validation.\n \"random_seed\": 12345, # Random seed number used when randomizing events and actions.\n \"feature_aggregator\": \"SD\" # Aggregation strategy used when extracting features in the 4D. \n # Supported values are: ``Flat`` (no aggregation, all features are preserved),\n # ``Mean`` (Average over the 4-th dimension),\n # ``SD`` (Standard Deviation over the 4-th dimension),\n # ``Mean_Norm`` (Independent channel-normalization, followed by average over the 4-th dimension),\n # ``SD_Norm`` (Independent channel-normalization, followed by SD over the 4-th dimension)\n \"k_ensemble\": [1,5], # List of k values to select top-k best models in ensembling.\n \"metric_best_model\": \"roc_auc\", # Classification Metric to consider when determining the best models from CV results.\n \"reduction_best_model\": \"mean\" # Reduction to perform on CV scores to determine the best models.\n }\n```\n\n### Perfusion Maps Generation\n\nGiven a 4D Volume, to extract the perfusion maps (``TTP``, ``CBV``, ``CBF``, ``MTT``) run:\n\n```shell\n Hive_ML_generate_perfusion_maps -i </path/to/data_folder> --config-file <config_file.json>\n```\n\nFore more details, follow the Jupyter Notebook\nTutorial : [Generate Perfusion Maps](tutorials/0-Generate_Perfusion_Maps.ipynb)\n\n![Perfusion Curve](images/Perfusion_curve.png \"Perfusion Curve\")\n![Perfusion Maps](images/PMaps.png \"Perfusion Maps\")\n\n### Feature Extraction\n\nTo extract Radiomics/Radiodynamics from the 4D Volume, run:\n\n```shell\n Hive_ML_extract_radiomics --data-folder </path/to/data_folder> --config-file <config_file.json> --feature-param-file </path/to/radiomics_config.yaml --output-file </path/to/feature_file> \n```\n\n![Feature Extraction](images/Feature_Extraction.png \"Feature Extraction\")\n\nFore more details, follow the Jupyter Notebook Tutorial : [Extract Features](tutorials/1-Extract_Features.ipynb)\n\n### Feature Selection\n\nTo run Feature Selection:\n\n```shell\n Hive_ML_feature_selection --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>\n```\n\nThe Feature Selection report (in JSON format, including the selected features and validation scores for each classifier)\nwill be available at the following path:\n\n```\n$ROOT_FOLDER/<EXPERIMENT_ID>/SFFS\n```\n\n![Feature Selection](images/FS_MF.png \"Feature Selection\")\n\nFore more details, follow the Jupyter Notebook Tutorial : [Feature Selection](tutorials/2-Feature_Selection.ipynb)\n\n### Model Fitting\n\nTo perform Model Fitting on the Selected features:\n\n```shell\n Hive_ML_model_fitting --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>\n```\n\nThe experiment validation reports, plots, and summaries will be available at the following path:\n\n```\n$ROOT_FOLDER/<EXPERIMENT_ID>\n```\n\n![Validation Plot Example](images/Validation_Plot.png \"Validation Plot Example\")\n\n![CV](images/CV.png \"CV\")\n\nFore more details, follow the Jupyter Notebook Tutorial : [Model Fitting](tutorials/3-Model_Fitting.ipynb)\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "Python package to run Machine Learning Experiments, within the Hive Framework.",
"version": "1.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/MAIA-KTH/Hive_ML/issues",
"Documentation": "https://hive-ml.readthedocs.io",
"Homepage": "https://github.com/MAIA-KTH/Hive_ML.git",
"Source Code": "https://github.com/MAIA-KTH/Hive_ML"
},
"split_keywords": [
"machine learning",
"image classification",
"pcr",
"medical image analysis",
"dce mri",
"radiomics",
"feature selection",
"radiodynamics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7ea36e0b8fca61d58bc02cde31a1f806fc9a1c084cc600d880c80fb002e56590",
"md5": "eb5fe6a34cb1047e0682bc63e240dad3",
"sha256": "315deea6876a435f4e4ebe6238eb60dd33f087295eddee226c783f65147fd417"
},
"downloads": -1,
"filename": "Hive_ML-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eb5fe6a34cb1047e0682bc63e240dad3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 34443,
"upload_time": "2023-07-25T14:59:36",
"upload_time_iso_8601": "2023-07-25T14:59:36.748264Z",
"url": "https://files.pythonhosted.org/packages/7e/a3/6e0b8fca61d58bc02cde31a1f806fc9a1c084cc600d880c80fb002e56590/Hive_ML-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c1275f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573",
"md5": "67510ec5c6d9661fdd69a62d7cd07f83",
"sha256": "91756a8945dcaf5f8d3c60dc67bbb6300d7e1624270dabcccdab3d3b95fdd4f4"
},
"downloads": -1,
"filename": "Hive_ML-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "67510ec5c6d9661fdd69a62d7cd07f83",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 48064,
"upload_time": "2023-07-25T14:59:38",
"upload_time_iso_8601": "2023-07-25T14:59:38.235142Z",
"url": "https://files.pythonhosted.org/packages/c1/27/5f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573/Hive_ML-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-25 14:59:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MAIA-KTH",
"github_project": "Hive_ML",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "hive-ml"
}