delia


Namedelia JSON
Version 1.2.7 PyPI version JSON
download
home_pagehttps://github.com/MaxenceLarose/delia
SummaryDICOM Extraction for Large-scale Image Analysis (DELIA).
upload_time2023-12-27 18:58:53
maintainer
docs_urlNone
authorMaxence Larose
requires_python>=3.7
licenseApache License 2.0
keywords dicom hdf5 medical image segmentation pre-processing python3 radiomics deep-learning dicom-seg rt-struct
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center" width="100%">
    <img width="100%" src="https://github.com/MaxenceLarose/delia/raw/main/images/logo/delia_banner.svg">
</p>
<p align="center"><i>DELIA facilitates data extraction from DICOM files to support large-scale image analysis workflows.</i></p>

# Notable features

- Bulk extraction of images and segmentations from multiple patients' DICOM files. Segmentations must be in [DICOM-SEG](https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.html) or [RTStruct](https://dicom.nema.org/dicom/2013/output/chtml/part03/sect_A.19.html) format.
- Creation of an [HDF5](https://github.com/h5py/h5py) database containing multiple patients' medical images as well as binary label maps (segmentations). The database is then easier to use than DICOMs to perform tasks on medical data, such as training deep neural networks. 
- Bulk extraction of radiomics features from multiple patients' DICOM files using [pyradiomics](https://github.com/AIM-Harvard/pyradiomics). Note that `pyradiomics` is not a dependency of `delia` and must be installed separately.

# Installation

## Latest stable version :

```
pip install delia
```

## Latest (possibly unstable) version :

```
pip install git+https://github.com/MaxenceLarose/delia
```

# Quick usage preview

## Extract images as `numpy` arrays

```python
from delia.extractors import PatientsDataExtractor

patients_data_extractor = PatientsDataExtractor(path_to_patients_folder="patients")

for patient in patients_data_extractor:
    for image_data in patient.data:
        array = image_data.image.numpy_array
        
        """Perform any tasks on images on-the-fly."""
        print(array)
```

## Create patients database (HDF5)

```python
from delia.databases import PatientsDatabase
from delia.extractors import PatientsDataExtractor

patients_data_extractor = PatientsDataExtractor(path_to_patients_folder="patients")

database = PatientsDatabase(path_to_database="patients_database.h5")

database.create(patients_data_extractor=patients_data_extractor)

```

## Create radiomics dataset (CSV)

```python
from delia.extractors import PatientsDataExtractor
from delia.radiomics import RadiomicsDataset, RadiomicsFeatureExtractor

patients_data_extractor = PatientsDataExtractor(path_to_patients_folder="patients")

radiomics_dataset = RadiomicsDataset(path_to_dataset="radiomics.csv")
radiomics_dataset.extractor = RadiomicsFeatureExtractor(path_to_params="features_extractor_params.yaml")

radiomics_dataset.create(patients_data_extractor=patients_data_extractor, organ="Heart", image_modality="CT")

```
*Note that `pyradiomics` is not a dependency of `delia` and must be installed separately.*

# Motivation

**Digital Imaging and Communications in Medicine** ([**DICOM**](https://www.dicomstandard.org/)) is *the* international standard for medical images and related information. The working group [DICOM WG-23](https://www.dicomstandard.org/activity/wgs/wg-23/) on Artificial Intelligence / Application Hosting is currently working to identify or develop the DICOM mechanisms to support AI workflows, concentrating on the clinical context. Moreover, their future *roadmap and objectives* includes working on the concern that current DICOM mechanisms might not be adequate to cover some use cases, particularly bulk analysis of large repository data, e.g. for training deep learning neural networks. **However, no tool has been developed to achieve this goal at present.**

The main **purpose** of this module is therefore to provide the necessary tools to facilitate the use of medical images in an AI workflow.  This goal is accomplished by using the [HDF file format](https://www.hdfgroup.org/) to create a database containing patients' medical images as well as binary label maps obtained from the segmentation of these images.

# How it works

## Main concepts

There are 4 main concepts :

1. `PatientDataModel` : It is the primary `delia` data structure. It is a named tuple gathering the image and segmentation data available in a patient record. 
2. `PatientsDataExtractor` : A Python [Generator](https://docs.python.org/3/library/collections.abc.html#collections.abc.Generator) that allows to iterate over several patients and create a `PatientDataModel` object for each of them. A sequence of `delia` and/or `monai` transformations to apply to specific images or segmentations can be specified (see [MONAI](https://docs.monai.io/en/stable/transforms.html)).
3. `PatientsDatabase` : An object that is used to create/interact with an HDF5 file (a database!) containing all patients information (images + label maps). The `PatientsDataExtractor` is used to populate this database. 
4. `RadiomicsDataset` : An object that is used to create/interact with a csv file (a dataset!) containing radiomics features extracted from images. The `PatientsDataExtractor` is used to populate this dataset. 

## Organize your data

Since this module requires the use of data, it is important to properly configure the data-related elements before using it.

### File format

Images files must be in standard [**DICOM**](https://www.dicomstandard.org/) format and segmentation files must be in [DICOM-SEG](https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.html) or [RTStruct](https://dicom.nema.org/dicom/2013/output/chtml/part03/sect_A.19.html) format.

If your segmentation files are in a research file format (`.nrrd`, `.nii`, etc.), you need to convert them into the standardized [DICOM-SEG](https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.html) or [RTStruct](https://dicom.nema.org/dicom/2013/output/chtml/part03/sect_A.19.html) format. You can use the [pydicom-seg](https://pypi.org/project/pydicom-seg/) library to create the DICOM-SEG files OR use the [itkimage2dicomSEG](https://github.com/MaxenceLarose/itkimage2dicomSEG) python module, which provide a complete pipeline to perform this conversion. Also, you can use the [RT-Utils](https://github.com/qurit/rt-utils) library to create the RTStruct files.

### Tag values (Optional)

*This dictionary is **not** mandatory for the code to work and therefore its default value is `None`. Note that if no `tag_values` dictionary is given, i.e. `tag_values = None`, then all images will be added to the database.*

The tag values are specified as a **dictionary** that contains the values for the tag specified when creating the PatientsDataExtractor object of the images that need to be extracted from the patients' files. Keys are arbitrary names given to the images we want to add and values are lists of values for the desired tag. The images associated with these tag values do not need to have a corresponding segmentation volume. If none of the descriptions match the series in a patient's files, a warning is raised and the patient is added to the list of patients for whom the pipeline has failed.

Note that the tag values can be specified as a python dictionary or as a path to a **json file** that contains the desired values. Both methods are presented below.

<details>
  <summary>Using a json file</summary>

Create a json file containing only the dictionary of the names given to the images we want to add (keys) and lists of tag values (values). Place this file in your data folder.

Here is an example of a json file configured as expected :

```json
{
    "PT": [
        "PT WB CORR (AC)",
        "PT WB XL CORR (AC)"
    ],
    "CT": [
        "CT 2.5 WB",
        "AC CT 2.5 WB"
    ]
}
```
</details>

<details>
  <summary>Using a Python dictionary</summary>

Create the organs dictionary in your main.py python file.

Here is an example of a python dictionary instanced as expected :

```python
tag_values = {
    "PT": [
        "PT WB CORR (AC)",
        "PT WB XL CORR (AC)"
    ],
    "CT": [
        "CT 2.5 WB",
        "AC CT 2.5 WB"
    ]
}
```
</details>

## Structure your patients directory

It is important to configure the directory structure correctly to ensure that the module interacts correctly with the data files. The `patients` folder, must be structured as follows. *Note that all DICOM files in the patients' folder will be read.*

```
|_πŸ“‚ Project directory/
  |_πŸ“„ main.py
  |_πŸ“‚ data/
    |_πŸ“„ tag_values.json
    |_πŸ“‚ patients/
      |_πŸ“‚ patient1/
       	|_πŸ“„ ...
       	|_πŸ“‚ ...
      |_πŸ“‚ patient2/
       	|_πŸ“„ ...
       	|_πŸ“‚ ...
      |_πŸ“‚ ...
```

# Import the package

The easiest way to import the package is to use :

```python
import delia
```

You can explicitly use the objects sub-modules :

```python
from delia.databases import PatientsDatabase
from delia.extractors import PatientsDataExtractor
from delia.radiomics import RadiomicsDataset, RadiomicsFeatureExtractor
```

# Use the package

## Example using the `PatientsDatabase` class

This file can then be executed to obtain an hdf5 database.

```python
from delia.databases import PatientsDatabase
from delia.extractors import PatientsDataExtractor
from delia.transforms import (
    PETtoSUVD,
    ResampleD
)
from monai.transforms import (
    CenterSpatialCropD,
    Compose,
    ScaleIntensityD,
    ThresholdIntensityD
)

patients_data_extractor = PatientsDataExtractor(
    path_to_patients_folder="data/patients",
    tag_values="data/tag_values.json",
    transforms=Compose(
        [
            ResampleD(keys=["CT_THORAX", "PT", "Heart"], out_spacing=(1.5, 1.5, 1.5)),
            CenterSpatialCropD(keys=["CT_THORAX", "PT", "Heart"], roi_size=(1000, 160, 160)),
            ThresholdIntensityD(keys=["CT_THORAX"], threshold=-250, above=True, cval=-250),
            ThresholdIntensityD(keys=["CT_THORAX"], threshold=500, above=False, cval=500),
            ScaleIntensityD(keys=["CT_THORAX"], minv=0, maxv=1),
            PETtoSUVD(keys=["PT"])
        ]
    )
)

database = PatientsDatabase(path_to_database="data/patients_database.h5")

database.create(
    patients_data_extractor=patients_data_extractor,
    tags_to_use_as_attributes=[(0x0008, 0x103E), (0x0020, 0x000E), (0x0008, 0x0060)],
    overwrite_database=True
)

```

The created HDF5 database will then look something like :

![patient_dataset](https://github.com/MaxenceLarose/delia/raw/main/images/patient_dataset.png)

## Example using the `PatientsDataExtractor`class

This file can then be executed to perform on-the-fly tasks on images.

```python
from delia.extractors import PatientsDataExtractor
from delia.transforms import Compose, CopySegmentationsD, PETtoSUVD, ResampleD
import SimpleITK as sitk

patients_data_extractor = PatientsDataExtractor(
    path_to_patients_folder="data/patients",
    tag_values="data/tag_values.json",
    transforms=Compose(
        [
            ResampleD(keys=["CT_THORAX", "Heart"], out_spacing=(1.5, 1.5, 1.5)),
            PETtoSUVD(keys=["PT"]),
            CopySegmentationsD(segmented_image_key="CT_THORAX", unsegmented_image_key="PT")
        ]
    )
)

for patient_dataset in patients_data_extractor:
    print(f"Patient ID: {patient_dataset.patient_id}")

    for patient_image_data in patient_dataset.data:
        dicom_header = patient_image_data.image.dicom_header
        simple_itk_image = patient_image_data.image.simple_itk_image
        numpy_array_image = sitk.GetArrayFromImage(simple_itk_image)

        """Perform any tasks on images on-the-fly."""
        print(numpy_array_image.shape)
```

## Need more examples?

You can find more in the [examples folder](https://github.com/MaxenceLarose/delia/tree/main/examples).

# A deeper look into the `PatientsDataExtractor` object

The `PatientsDataExtractor` has 4 important variables:  a `path_to_patients_folder` (which dictates the path to the folder that contains all patient records), a `tag` (which specifies which tag to use when choosing which images need to be extracted, if no value is given, this defaults to `SeriesDescription`), a `tag_values` (which dictates the images that needs to be extracted from the patient records) and `transforms` that defines a sequence of transformations to apply on images or segmentations. For each patient/folder available in the `path_to_patients_folder`, all DICOM files in their folder are read. If the specified tag's value of a certain volume match one of the descriptions present in the given `tag_values` dictionary, this volume and its segmentation (if available) are automatically added to the `PatientDataModel`. Note that if no `tag_values` dictionary is given (`tag_values = None`), then all images (and associated segmentations) will be added to the database. 

The `PatientsDataExtractor` can therefore be used to iteratively perform tasks on each of the patients, such as displaying certain images, transforming images into numpy arrays, or creating an HDF5 database using the `PatientsDatabase`. It is this last task that is highlighted in this package, but it must be understood that the data extraction is performed in a very general manner by the `PatientsDataExtractor` and is therefore not limited to this single application. For example, someone could easily develop a `Numpydatabase` whose creation would be ensured by the `PatientsDataExtractor`, similar to the current `PatientsDatabase` based on the HDF5 format.

# TODO

- [x] Generalize the use of arbitrary tags to choose images to extract. At the moment, the only tag available is `series_descriptions`.
- [ ] Find which DICOM tags act in unusual ways such as having data associated with .repval and .value which differ by more than just their type. For now, both are used to obtain the same value in delia/readers/patient_data/factories/patient_data_factories.py (152-155) and delia/readers/image/dicom_reader.py (113-116).
- [ ] Allow for more comprehensive use of tags such as using AND, OR and NOT between tags and their values. 
- [ ] Loosen monai version requirements (monai==1.0.1) to allow for more recent versions. This needs a redefinition of the way transforms are applied.


# License

This code is provided under the [Apache License 2.0](https://github.com/MaxenceLarose/delia/blob/main/LICENSE).

# Citation

```
@misc{delia,
  title={DELIA: DICOM Extraction for Large-scale Image Analysis},
  author={Maxence Larose},
  year={2022},
  publisher={UniversitΓ© Laval},
  url={https://github.com/MedPhysUL/delia},
}
```

# Contact

Maxence Larose, B. Ing., [maxence.larose.1@ulaval.ca](mailto:maxence.larose.1@ulaval.ca)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MaxenceLarose/delia",
    "name": "delia",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "dicom hdf5 medical image segmentation pre-processing python3 radiomics deep-learning dicom-seg rt-struct",
    "author": "Maxence Larose",
    "author_email": "maxence.larose.1@ulaval.ca",
    "download_url": "https://files.pythonhosted.org/packages/05/ca/8d443299d1101f7d49d9685f476803207db76db404ef4167639fc334fd90/delia-1.2.7.tar.gz",
    "platform": null,
    "description": "<p align=\"center\" width=\"100%\">\r\n    <img width=\"100%\" src=\"https://github.com/MaxenceLarose/delia/raw/main/images/logo/delia_banner.svg\">\r\n</p>\r\n<p align=\"center\"><i>DELIA facilitates data extraction from DICOM files to support large-scale image analysis workflows.</i></p>\r\n\r\n# Notable features\r\n\r\n- Bulk extraction of images and segmentations from multiple patients' DICOM files. Segmentations must be in [DICOM-SEG](https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.html) or [RTStruct](https://dicom.nema.org/dicom/2013/output/chtml/part03/sect_A.19.html) format.\r\n- Creation of an [HDF5](https://github.com/h5py/h5py) database containing multiple patients' medical images as well as binary label maps (segmentations). The database is then easier to use than DICOMs to perform tasks on medical data, such as training deep neural networks. \r\n- Bulk extraction of radiomics features from multiple patients' DICOM files using [pyradiomics](https://github.com/AIM-Harvard/pyradiomics). Note that `pyradiomics` is not a dependency of `delia` and must be installed separately.\r\n\r\n# Installation\r\n\r\n## Latest stable version :\r\n\r\n```\r\npip install delia\r\n```\r\n\r\n## Latest (possibly unstable) version :\r\n\r\n```\r\npip install git+https://github.com/MaxenceLarose/delia\r\n```\r\n\r\n# Quick usage preview\r\n\r\n## Extract images as `numpy` arrays\r\n\r\n```python\r\nfrom delia.extractors import PatientsDataExtractor\r\n\r\npatients_data_extractor = PatientsDataExtractor(path_to_patients_folder=\"patients\")\r\n\r\nfor patient in patients_data_extractor:\r\n    for image_data in patient.data:\r\n        array = image_data.image.numpy_array\r\n        \r\n        \"\"\"Perform any tasks on images on-the-fly.\"\"\"\r\n        print(array)\r\n```\r\n\r\n## Create patients database (HDF5)\r\n\r\n```python\r\nfrom delia.databases import PatientsDatabase\r\nfrom delia.extractors import PatientsDataExtractor\r\n\r\npatients_data_extractor = PatientsDataExtractor(path_to_patients_folder=\"patients\")\r\n\r\ndatabase = PatientsDatabase(path_to_database=\"patients_database.h5\")\r\n\r\ndatabase.create(patients_data_extractor=patients_data_extractor)\r\n\r\n```\r\n\r\n## Create radiomics dataset (CSV)\r\n\r\n```python\r\nfrom delia.extractors import PatientsDataExtractor\r\nfrom delia.radiomics import RadiomicsDataset, RadiomicsFeatureExtractor\r\n\r\npatients_data_extractor = PatientsDataExtractor(path_to_patients_folder=\"patients\")\r\n\r\nradiomics_dataset = RadiomicsDataset(path_to_dataset=\"radiomics.csv\")\r\nradiomics_dataset.extractor = RadiomicsFeatureExtractor(path_to_params=\"features_extractor_params.yaml\")\r\n\r\nradiomics_dataset.create(patients_data_extractor=patients_data_extractor, organ=\"Heart\", image_modality=\"CT\")\r\n\r\n```\r\n*Note that `pyradiomics` is not a dependency of `delia` and must be installed separately.*\r\n\r\n# Motivation\r\n\r\n**Digital Imaging and Communications in Medicine** ([**DICOM**](https://www.dicomstandard.org/)) is *the* international standard for medical images and related information. The working group [DICOM WG-23](https://www.dicomstandard.org/activity/wgs/wg-23/) on Artificial Intelligence / Application Hosting is currently working to identify or develop the DICOM mechanisms to support AI workflows, concentrating on the clinical context. Moreover, their future *roadmap and objectives* includes working on the concern that current DICOM mechanisms might not be adequate to cover some use cases, particularly bulk analysis of large repository data, e.g. for training deep learning neural networks. **However, no tool has been developed to achieve this goal at present.**\r\n\r\nThe main **purpose** of this module is therefore to provide the necessary tools to facilitate the use of medical images in an AI workflow.  This goal is accomplished by using the [HDF file format](https://www.hdfgroup.org/) to create a database containing patients' medical images as well as binary label maps obtained from the segmentation of these images.\r\n\r\n# How it works\r\n\r\n## Main concepts\r\n\r\nThere are 4 main concepts :\r\n\r\n1. `PatientDataModel` : It is the primary `delia` data structure. It is a named tuple gathering the image and segmentation data available in a patient record. \r\n2. `PatientsDataExtractor` : A Python [Generator](https://docs.python.org/3/library/collections.abc.html#collections.abc.Generator) that allows to iterate over several patients and create a `PatientDataModel` object for each of them. A sequence of `delia` and/or `monai` transformations to apply to specific images or segmentations can be specified (see [MONAI](https://docs.monai.io/en/stable/transforms.html)).\r\n3. `PatientsDatabase` : An object that is used to create/interact with an HDF5 file (a database!) containing all patients information (images + label maps). The `PatientsDataExtractor` is used to populate this database. \r\n4. `RadiomicsDataset` : An object that is used to create/interact with a csv file (a dataset!) containing radiomics features extracted from images. The `PatientsDataExtractor` is used to populate this dataset. \r\n\r\n## Organize your data\r\n\r\nSince this module requires the use of data, it is important to properly configure the data-related elements before using it.\r\n\r\n### File format\r\n\r\nImages files must be in standard [**DICOM**](https://www.dicomstandard.org/) format and segmentation files must be in [DICOM-SEG](https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.html) or [RTStruct](https://dicom.nema.org/dicom/2013/output/chtml/part03/sect_A.19.html) format.\r\n\r\nIf your segmentation files are in a research file format (`.nrrd`, `.nii`, etc.), you need to convert them into the standardized [DICOM-SEG](https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.html) or [RTStruct](https://dicom.nema.org/dicom/2013/output/chtml/part03/sect_A.19.html) format. You can use the [pydicom-seg](https://pypi.org/project/pydicom-seg/) library to create the DICOM-SEG files OR use the [itkimage2dicomSEG](https://github.com/MaxenceLarose/itkimage2dicomSEG) python module, which provide a complete pipeline to perform this conversion. Also, you can use the [RT-Utils](https://github.com/qurit/rt-utils) library to create the RTStruct files.\r\n\r\n### Tag values (Optional)\r\n\r\n*This dictionary is **not** mandatory for the code to work and therefore its default value is `None`. Note that if no `tag_values` dictionary is given, i.e. `tag_values = None`, then all images will be added to the database.*\r\n\r\nThe tag values are specified as a **dictionary** that contains the values for the tag specified when creating the PatientsDataExtractor object of the images that need to be extracted from the patients' files. Keys are arbitrary names given to the images we want to add and values are lists of values for the desired tag. The images associated with these tag values do not need to have a corresponding segmentation volume. If none of the descriptions match the series in a patient's files, a warning is raised and the patient is added to the list of patients for whom the pipeline has failed.\r\n\r\nNote that the tag values can be specified as a python dictionary or as a path to a **json file** that contains the desired values. Both methods are presented below.\r\n\r\n<details>\r\n  <summary>Using a json file</summary>\r\n\r\nCreate a json file containing only the dictionary of the names given to the images we want to add (keys) and lists of tag values (values). Place this file in your data folder.\r\n\r\nHere is an example of a json file configured as expected :\r\n\r\n```json\r\n{\r\n    \"PT\": [\r\n        \"PT WB CORR (AC)\",\r\n        \"PT WB XL CORR (AC)\"\r\n    ],\r\n    \"CT\": [\r\n        \"CT 2.5 WB\",\r\n        \"AC CT 2.5 WB\"\r\n    ]\r\n}\r\n```\r\n</details>\r\n\r\n<details>\r\n  <summary>Using a Python dictionary</summary>\r\n\r\nCreate the organs dictionary in your main.py python file.\r\n\r\nHere is an example of a python dictionary instanced as expected :\r\n\r\n```python\r\ntag_values = {\r\n    \"PT\": [\r\n        \"PT WB CORR (AC)\",\r\n        \"PT WB XL CORR (AC)\"\r\n    ],\r\n    \"CT\": [\r\n        \"CT 2.5 WB\",\r\n        \"AC CT 2.5 WB\"\r\n    ]\r\n}\r\n```\r\n</details>\r\n\r\n## Structure your patients directory\r\n\r\nIt is important to configure the directory structure correctly to ensure that the module interacts correctly with the data files. The `patients` folder, must be structured as follows. *Note that all DICOM files in the patients' folder will be read.*\r\n\r\n```\r\n|_\ud83d\udcc2 Project directory/\r\n  |_\ud83d\udcc4 main.py\r\n  |_\ud83d\udcc2 data/\r\n    |_\ud83d\udcc4 tag_values.json\r\n    |_\ud83d\udcc2 patients/\r\n      |_\ud83d\udcc2 patient1/\r\n       \t|_\ud83d\udcc4 ...\r\n       \t|_\ud83d\udcc2 ...\r\n      |_\ud83d\udcc2 patient2/\r\n       \t|_\ud83d\udcc4 ...\r\n       \t|_\ud83d\udcc2 ...\r\n      |_\ud83d\udcc2 ...\r\n```\r\n\r\n# Import the package\r\n\r\nThe easiest way to import the package is to use :\r\n\r\n```python\r\nimport delia\r\n```\r\n\r\nYou can explicitly use the objects sub-modules :\r\n\r\n```python\r\nfrom delia.databases import PatientsDatabase\r\nfrom delia.extractors import PatientsDataExtractor\r\nfrom delia.radiomics import RadiomicsDataset, RadiomicsFeatureExtractor\r\n```\r\n\r\n# Use the package\r\n\r\n## Example using the `PatientsDatabase` class\r\n\r\nThis file can then be executed to obtain an hdf5 database.\r\n\r\n```python\r\nfrom delia.databases import PatientsDatabase\r\nfrom delia.extractors import PatientsDataExtractor\r\nfrom delia.transforms import (\r\n    PETtoSUVD,\r\n    ResampleD\r\n)\r\nfrom monai.transforms import (\r\n    CenterSpatialCropD,\r\n    Compose,\r\n    ScaleIntensityD,\r\n    ThresholdIntensityD\r\n)\r\n\r\npatients_data_extractor = PatientsDataExtractor(\r\n    path_to_patients_folder=\"data/patients\",\r\n    tag_values=\"data/tag_values.json\",\r\n    transforms=Compose(\r\n        [\r\n            ResampleD(keys=[\"CT_THORAX\", \"PT\", \"Heart\"], out_spacing=(1.5, 1.5, 1.5)),\r\n            CenterSpatialCropD(keys=[\"CT_THORAX\", \"PT\", \"Heart\"], roi_size=(1000, 160, 160)),\r\n            ThresholdIntensityD(keys=[\"CT_THORAX\"], threshold=-250, above=True, cval=-250),\r\n            ThresholdIntensityD(keys=[\"CT_THORAX\"], threshold=500, above=False, cval=500),\r\n            ScaleIntensityD(keys=[\"CT_THORAX\"], minv=0, maxv=1),\r\n            PETtoSUVD(keys=[\"PT\"])\r\n        ]\r\n    )\r\n)\r\n\r\ndatabase = PatientsDatabase(path_to_database=\"data/patients_database.h5\")\r\n\r\ndatabase.create(\r\n    patients_data_extractor=patients_data_extractor,\r\n    tags_to_use_as_attributes=[(0x0008, 0x103E), (0x0020, 0x000E), (0x0008, 0x0060)],\r\n    overwrite_database=True\r\n)\r\n\r\n```\r\n\r\nThe created HDF5 database will then look something like :\r\n\r\n![patient_dataset](https://github.com/MaxenceLarose/delia/raw/main/images/patient_dataset.png)\r\n\r\n## Example using the `PatientsDataExtractor`class\r\n\r\nThis file can then be executed to perform on-the-fly tasks on images.\r\n\r\n```python\r\nfrom delia.extractors import PatientsDataExtractor\r\nfrom delia.transforms import Compose, CopySegmentationsD, PETtoSUVD, ResampleD\r\nimport SimpleITK as sitk\r\n\r\npatients_data_extractor = PatientsDataExtractor(\r\n    path_to_patients_folder=\"data/patients\",\r\n    tag_values=\"data/tag_values.json\",\r\n    transforms=Compose(\r\n        [\r\n            ResampleD(keys=[\"CT_THORAX\", \"Heart\"], out_spacing=(1.5, 1.5, 1.5)),\r\n            PETtoSUVD(keys=[\"PT\"]),\r\n            CopySegmentationsD(segmented_image_key=\"CT_THORAX\", unsegmented_image_key=\"PT\")\r\n        ]\r\n    )\r\n)\r\n\r\nfor patient_dataset in patients_data_extractor:\r\n    print(f\"Patient ID: {patient_dataset.patient_id}\")\r\n\r\n    for patient_image_data in patient_dataset.data:\r\n        dicom_header = patient_image_data.image.dicom_header\r\n        simple_itk_image = patient_image_data.image.simple_itk_image\r\n        numpy_array_image = sitk.GetArrayFromImage(simple_itk_image)\r\n\r\n        \"\"\"Perform any tasks on images on-the-fly.\"\"\"\r\n        print(numpy_array_image.shape)\r\n```\r\n\r\n## Need more examples?\r\n\r\nYou can find more in the [examples folder](https://github.com/MaxenceLarose/delia/tree/main/examples).\r\n\r\n# A deeper look into the `PatientsDataExtractor` object\r\n\r\nThe `PatientsDataExtractor` has 4 important variables:  a `path_to_patients_folder` (which dictates the path to the folder that contains all patient records), a `tag` (which specifies which tag to use when choosing which images need to be extracted, if no value is given, this defaults to `SeriesDescription`), a `tag_values` (which dictates the images that needs to be extracted from the patient records) and `transforms` that defines a sequence of transformations to apply on images or segmentations. For each patient/folder available in the `path_to_patients_folder`, all DICOM files in their folder are read. If the specified tag's value of a certain volume match one of the descriptions present in the given `tag_values` dictionary, this volume and its segmentation (if available) are automatically added to the `PatientDataModel`. Note that if no `tag_values` dictionary is given (`tag_values = None`), then all images (and associated segmentations) will be added to the database. \r\n\r\nThe `PatientsDataExtractor` can therefore be used to iteratively perform tasks on each of the patients, such as displaying certain images, transforming images into numpy arrays, or creating an HDF5 database using the `PatientsDatabase`. It is this last task that is highlighted in this package, but it must be understood that the data extraction is performed in a very general manner by the `PatientsDataExtractor` and is therefore not limited to this single application. For example, someone could easily develop a `Numpydatabase` whose creation would be ensured by the `PatientsDataExtractor`, similar to the current `PatientsDatabase` based on the HDF5 format.\r\n\r\n# TODO\r\n\r\n- [x] Generalize the use of arbitrary tags to choose images to extract. At the moment, the only tag available is `series_descriptions`.\r\n- [ ] Find which DICOM tags act in unusual ways such as having data associated with .repval and .value which differ by more than just their type. For now, both are used to obtain the same value in delia/readers/patient_data/factories/patient_data_factories.py (152-155) and delia/readers/image/dicom_reader.py (113-116).\r\n- [ ] Allow for more comprehensive use of tags such as using AND, OR and NOT between tags and their values. \r\n- [ ] Loosen monai version requirements (monai==1.0.1) to allow for more recent versions. This needs a redefinition of the way transforms are applied.\r\n\r\n\r\n# License\r\n\r\nThis code is provided under the [Apache License 2.0](https://github.com/MaxenceLarose/delia/blob/main/LICENSE).\r\n\r\n# Citation\r\n\r\n```\r\n@misc{delia,\r\n  title={DELIA: DICOM Extraction for Large-scale Image Analysis},\r\n  author={Maxence Larose},\r\n  year={2022},\r\n  publisher={Universit\u00e9 Laval},\r\n  url={https://github.com/MedPhysUL/delia},\r\n}\r\n```\r\n\r\n# Contact\r\n\r\nMaxence Larose, B. Ing., [maxence.larose.1@ulaval.ca](mailto:maxence.larose.1@ulaval.ca)\r\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "DICOM Extraction for Large-scale Image Analysis (DELIA).",
    "version": "1.2.7",
    "project_urls": {
        "Homepage": "https://github.com/MaxenceLarose/delia"
    },
    "split_keywords": [
        "dicom",
        "hdf5",
        "medical",
        "image",
        "segmentation",
        "pre-processing",
        "python3",
        "radiomics",
        "deep-learning",
        "dicom-seg",
        "rt-struct"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b1a038d31549e384e7d89ec388374953413ebf264a90e6961643308bdd43d4b9",
                "md5": "ee084c227a522c2791741129076ae24f",
                "sha256": "0e5e2461bfb32f6d0f89d5ee7dc38c854d5c24e9a604d217794f43bef7c3fa9b"
            },
            "downloads": -1,
            "filename": "delia-1.2.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ee084c227a522c2791741129076ae24f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 68033,
            "upload_time": "2023-12-27T18:58:51",
            "upload_time_iso_8601": "2023-12-27T18:58:51.261950Z",
            "url": "https://files.pythonhosted.org/packages/b1/a0/38d31549e384e7d89ec388374953413ebf264a90e6961643308bdd43d4b9/delia-1.2.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "05ca8d443299d1101f7d49d9685f476803207db76db404ef4167639fc334fd90",
                "md5": "b9a684dcb47cc3358828e207b072098b",
                "sha256": "cfcf1f832c5c6a57474f768e77def9c017f5032a6f813b0cbc1a8fcd08d4adeb"
            },
            "downloads": -1,
            "filename": "delia-1.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "b9a684dcb47cc3358828e207b072098b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 48842,
            "upload_time": "2023-12-27T18:58:53",
            "upload_time_iso_8601": "2023-12-27T18:58:53.373046Z",
            "url": "https://files.pythonhosted.org/packages/05/ca/8d443299d1101f7d49d9685f476803207db76db404ef4167639fc334fd90/delia-1.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-27 18:58:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MaxenceLarose",
    "github_project": "delia",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "delia"
}
        
Elapsed time: 0.16970s