torchxrayvision

Name	torchxrayvision JSON
Version	1.3.2 JSON
	download
home_page	https://github.com/mlmed/torchxrayvision
Summary	TorchXRayVision: A library of chest X-ray datasets and models
upload_time	2025-01-03 02:44:28
maintainer	None
docs_url	None
author	Joseph Paul Cohen
requires_python	>=3.6
license	None
keywords
VCS
bugtrack_url
requirements	torch torchvision scikit-image tqdm numpy pandas requests pillow imageio
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            🚨 Paper now online! [https://arxiv.org/abs/2111.00595](https://arxiv.org/abs/2111.00595)

🚨 Documentation now online! [https://mlmed.org/torchxrayvision/](https://mlmed.org/torchxrayvision/)

# TorchXRayVision 

| <img src="https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/torchxrayvision-logo.png" width="300px"/>  |  ([🎬 promo video](https://www.youtube.com/watch?v=Rl7xz0uULGQ)) <br>[<img src="http://img.youtube.com/vi/Rl7xz0uULGQ/0.jpg" width="400px"/>)](http://www.youtube.com/watch?v=Rl7xz0uULGQ "Video Title") |
|---|---|

# What is it?

A library for chest X-ray datasets and models. Including pre-trained models.


TorchXRayVision is an open source software library for working with chest X-ray datasets and deep learning models. It provides a common interface and common pre-processing chain for a wide set of publicly available chest X-ray datasets. In addition, a number of classification and representation learning models with different architectures, trained on different data combinations, are available through the library to serve as baselines or feature extractors.

- In the case of researchers addressing clinical questions it is a waste of time for them to train models from scratch. To address this, TorchXRayVision provides pre-trained models which are trained on large cohorts of data and enables 1) rapid analysis of large datasets 2) feature reuse for few-shot learning.
- In the case of researchers developing algorithms it is important to robustly evaluate models using multiple external datasets. Metadata associated with each dataset can vary greatly which makes it difficult to apply methods to multiple datasets. TorchXRayVision provides access to many datasets in a uniform way so that they can be swapped out with a single line of code. These datasets can also be merged and filtered to construct specific distributional shifts for studying generalization.

Twitter: [@torchxrayvision](https://twitter.com/torchxrayvision)

## Getting started

```
$ pip install torchxrayvision
```

```python3
import torchxrayvision as xrv
import skimage, torch, torchvision

# Prepare the image:
img = skimage.io.imread("16747_3_1.jpg")
img = xrv.datasets.normalize(img, 255) # convert 8-bit image to [-1024, 1024] range
img = img.mean(2)[None, ...] # Make single color channel

transform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop(),xrv.datasets.XRayResizer(224)])

img = transform(img)
img = torch.from_numpy(img)

# Load model and process image
model = xrv.models.DenseNet(weights="densenet121-res224-all")
outputs = model(img[None,...]) # or model.features(img[None,...]) 

# Print results
dict(zip(model.pathologies,outputs[0].detach().numpy()))

{'Atelectasis': 0.32797316,
 'Consolidation': 0.42933336,
 'Infiltration': 0.5316924,
 'Pneumothorax': 0.28849724,
 'Edema': 0.024142697,
 'Emphysema': 0.5011832,
 'Fibrosis': 0.51887786,
 'Effusion': 0.27805611,
 'Pneumonia': 0.18569896,
 'Pleural_Thickening': 0.24489835,
 'Cardiomegaly': 0.3645515,
 'Nodule': 0.68982,
 'Mass': 0.6392845,
 'Hernia': 0.00993878,
 'Lung Lesion': 0.011150705,
 'Fracture': 0.51916164,
 'Lung Opacity': 0.59073937,
 'Enlarged Cardiomediastinum': 0.27218717}

```

A sample script to process images usings pretrained models is [process_image.py](https://github.com/mlmed/torchxrayvision/blob/master/scripts/process_image.py)

```
$ python3 process_image.py ../tests/00000001_000.png
{'preds': {'Atelectasis': 0.50500506,
           'Cardiomegaly': 0.6600903,
           'Consolidation': 0.30575264,
           'Edema': 0.274184,
           'Effusion': 0.4026162,
           'Emphysema': 0.5036339,
           'Enlarged Cardiomediastinum': 0.40989172,
           'Fibrosis': 0.53293407,
           'Fracture': 0.32376793,
           'Hernia': 0.011924741,
           'Infiltration': 0.5154413,
           'Lung Lesion': 0.22231922,
           'Lung Opacity': 0.2772148,
           'Mass': 0.32237658,
           'Nodule': 0.5091847,
           'Pleural_Thickening': 0.5102617,
           'Pneumonia': 0.30947986,
           'Pneumothorax': 0.24847917}}

```

## Models ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_models.ipynb))

Specify weights for pretrained models (currently all DenseNet121)
Note: Each pretrained model has 18 outputs. The `all` model has every output trained. However, for the other weights some targets are not trained and will predict randomly becuase they do not exist in the training dataset. The only valid outputs are listed in the field `{dataset}.pathologies` on the dataset that corresponds to the weights. 

```python3

## 224x224 models
model = xrv.models.DenseNet(weights="densenet121-res224-all")
model = xrv.models.DenseNet(weights="densenet121-res224-rsna") # RSNA Pneumonia Challenge
model = xrv.models.DenseNet(weights="densenet121-res224-nih") # NIH chest X-ray8
model = xrv.models.DenseNet(weights="densenet121-res224-pc") # PadChest (University of Alicante)
model = xrv.models.DenseNet(weights="densenet121-res224-chex") # CheXpert (Stanford)
model = xrv.models.DenseNet(weights="densenet121-res224-mimic_nb") # MIMIC-CXR (MIT)
model = xrv.models.DenseNet(weights="densenet121-res224-mimic_ch") # MIMIC-CXR (MIT)

# 512x512 models
model = xrv.models.ResNet(weights="resnet50-res512-all")

# DenseNet121 from JF Healthcare for the CheXpert competition
model = xrv.baseline_models.jfhealthcare.DenseNet() 

# Official Stanford CheXpert model
model = xrv.baseline_models.chexpert.DenseNet(weights_zip="chexpert_weights.zip")

# Emory HITI lab race prediction model
model = xrv.baseline_models.emory_hiti.RaceModel()
model.targets -> ["Asian", "Black", "White"]

# Riken age prediction model
model = xrv.baseline_models.riken.AgeModel()

```

Benchmarks of the modes are here: [BENCHMARKS.md](BENCHMARKS.md) and the performance of some of the models can be seen in this paper [arxiv.org/abs/2002.02497](https://arxiv.org/abs/2002.02497). 


## Autoencoders 
You can also load a pre-trained autoencoder that is trained on the PadChest, NIH, CheXpert, and MIMIC datasets.
```python3
ae = xrv.autoencoders.ResNetAE(weights="101-elastic")
z = ae.encode(image)
image2 = ae.decode(z)
```

## Segmentation

You can load pretrained anatomical segmentation models. [Demo Notebook](scripts/segmentation.ipynb)

```python3
seg_model = xrv.baseline_models.chestx_det.PSPNet()
output = seg_model(image)
output.shape # [1, 14, 512, 512]
seg_model.targets # ['Left Clavicle', 'Right Clavicle', 'Left Scapula', 'Right Scapula',
                  #  'Left Lung', 'Right Lung', 'Left Hilus Pulmonis', 'Right Hilus Pulmonis',
                  #  'Heart', 'Aorta', 'Facies Diaphragmatica', 'Mediastinum',  'Weasand', 'Spine']
```

![](docs/segmentation-pspnet.png)

## Datasets 
[View docstrings for more detail on each dataset](https://github.com/mlmed/torchxrayvision/blob/master/torchxrayvision/datasets.py) and [Demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_datasets.ipynb) and [Example loading script](https://github.com/mlmed/torchxrayvision/blob/master/scripts/dataset_utils.py)

```python3
transform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop(),
                                            xrv.datasets.XRayResizer(224)])

# RSNA Pneumonia Detection Challenge. https://pubs.rsna.org/doi/full/10.1148/ryai.2019180041
d_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="path to stage_2_train_images_jpg",
                                       transform=transform)
                
# CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. https://arxiv.org/abs/1901.07031             
d_chex = xrv.datasets.CheX_Dataset(imgpath="path to CheXpert-v1.0-small",
                                   csvpath="path to CheXpert-v1.0-small/train.csv",
                                   transform=transform)

# National Institutes of Health ChestX-ray8 dataset. https://arxiv.org/abs/1705.02315
d_nih = xrv.datasets.NIH_Dataset(imgpath="path to NIH images")

# A relabelling of a subset of NIH images from: https://pubs.rsna.org/doi/10.1148/radiol.2019191293
d_nih2 = xrv.datasets.NIH_Google_Dataset(imgpath="path to NIH images")

# PadChest: A large chest x-ray image dataset with multi-label annotated reports. https://arxiv.org/abs/1901.07441
d_pc = xrv.datasets.PC_Dataset(imgpath="path to image folder")

# COVID-19 Image Data Collection. https://arxiv.org/abs/2006.11988
d_covid19 = xrv.datasets.COVID19_Dataset() # specify imgpath and csvpath for the dataset

# SIIM Pneumothorax Dataset. https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation
d_siim = xrv.datasets.SIIM_Pneumothorax_Dataset(imgpath="dicom-images-train/",
                                                csvpath="train-rle.csv")

# VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations. https://arxiv.org/abs/2012.15029
d_vin = xrv.datasets.VinBrain_Dataset(imgpath=".../train",
                                      csvpath=".../train.csv")

# National Library of Medicine Tuberculosis Datasets. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256233/
d_nlmtb = xrv.datasets.NLMTB_Dataset(imgpath="path to MontgomerySet or ChinaSet_AllFiles")
```

## Dataset fields

Each dataset contains a number of fields. These fields are maintained when xrv.datasets.Subset_Dataset and xrv.datasets.Merge_Dataset are used.

 - `.pathologies` This field is a list of the pathologies contained in this dataset that will be contained in the `.labels` field ].

 - `.labels` This field contains a 1,0, or NaN for each label defined in `.pathologies`. 

 - `.csv` This field is a pandas DataFrame of the metadata csv file that comes with the data. Each row aligns with the elements of the dataset so indexing using `.iloc` will work. 

If possible, each dataset's `.csv` will have some common fields of the csv. These will be aligned when The list is as follows:

- `csv.patientid` A unique id that will uniqely identify samples in this dataset

- `csv.offset_day_int` An integer time offset for the image in the unit of days. This is expected to be for relative times and has no absolute meaning although for some datasets it is the epoch time.

- `csv.age_years` The age of the patient in years.

- `csv.sex_male` If the patient is male

- `csv.sex_female` If the patient is female


## Dataset tools

relabel_dataset will align labels to have the same order as the pathologies argument.
```python3
xrv.datasets.relabel_dataset(xrv.datasets.default_pathologies , d_nih) # has side effects
```

specify a subset of views ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_datasets_views.ipynb))
```python3
d_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="...",
                                               views=["PA","AP","AP Supine"])
```

specify only 1 image per patient
```python3
d_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="...",
                                               unique_patients=True)
```

obtain summary statistics per dataset
```python3
d_chex = xrv.datasets.CheX_Dataset(imgpath="CheXpert-v1.0-small",
                                   csvpath="CheXpert-v1.0-small/train.csv",
                                 views=["PA","AP"], unique_patients=False)

CheX_Dataset num_samples=191010 views=['PA', 'AP']
{'Atelectasis': {0.0: 17621, 1.0: 29718},
 'Cardiomegaly': {0.0: 22645, 1.0: 23384},
 'Consolidation': {0.0: 30463, 1.0: 12982},
 'Edema': {0.0: 29449, 1.0: 49674},
 'Effusion': {0.0: 34376, 1.0: 76894},
 'Enlarged Cardiomediastinum': {0.0: 26527, 1.0: 9186},
 'Fracture': {0.0: 18111, 1.0: 7434},
 'Lung Lesion': {0.0: 17523, 1.0: 7040},
 'Lung Opacity': {0.0: 20165, 1.0: 94207},
 'Pleural Other': {0.0: 17166, 1.0: 2503},
 'Pneumonia': {0.0: 18105, 1.0: 4674},
 'Pneumothorax': {0.0: 54165, 1.0: 17693},
 'Support Devices': {0.0: 21757, 1.0: 99747}}
```

## Pathology masks ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_masks.ipynb))

Masks are available in the following datasets:
```python3
xrv.datasets.RSNA_Pneumonia_Dataset() # for Lung Opacity
xrv.datasets.SIIM_Pneumothorax_Dataset() # for Pneumothorax
xrv.datasets.NIH_Dataset() # for Cardiomegaly, Mass, Effusion, ...
```

Example usage:

```python3
d_rsna = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="stage_2_train_images_jpg", 
                                            views=["PA","AP"],
                                            pathology_masks=True)
                                            
# The has_masks column will let you know if any masks exist for that sample
d_rsna.csv.has_masks.value_counts()
False    20672
True      6012       

# Each sample will have a pathology_masks dictionary where the index 
# of each pathology will correspond to a mask of that pathology (if it exists).
# There may be more than one mask per sample. But only one per pathology.
sample["pathology_masks"][d_rsna.pathologies.index("Lung Opacity")]
```
![](https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/pathology-mask-rsna2.png)
![](https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/pathology-mask-rsna3.png)

it also works with data_augmentation if you pass in `data_aug=data_transforms` to the dataloader. The random seed is matched to align calls for the image and the mask.

![](https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/pathology-mask-rsna614-da.png)

## Distribution shift tools ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_datasets-CovariateShift.ipynb))

The class `xrv.datasets.CovariateDataset` takes two datasets and two 
arrays representing the labels. The samples will be returned with the 
desired ratio of images from each site. The goal here is to simulate 
a covariate shift to make a model focus on an incorrect feature. Then 
the shift can be reversed in the validation data causing a catastrophic
failure in generalization performance.

ratio=0.0 means images from d1 will have a positive label
ratio=0.5 means images from d1 will have half of the positive labels
ratio=1.0 means images from d1 will have no positive label

With any ratio the number of samples returned will be the same.

```python3
d = xrv.datasets.CovariateDataset(d1 = # dataset1 with a specific condition
                                  d1_target = #target label to predict,
                                  d2 = # dataset2 with a specific condition
                                  d2_target = #target label to predict,
                                  mode="train", # train, valid, and test
                                  ratio=0.9)

```

## Citation

Primary TorchXRayVision paper: [https://arxiv.org/abs/2111.00595](https://arxiv.org/abs/2111.00595)

```
Joseph Paul Cohen, Joseph D. Viviano, Paul Bertin, Paul Morrison, Parsa Torabian, Matteo Guarrera, Matthew P Lungren, Akshay Chaudhari, Rupert Brooks, Mohammad Hashir, Hadrien Bertrand
TorchXRayVision: A library of chest X-ray datasets and models. 
Medical Imaging with Deep Learning
https://github.com/mlmed/torchxrayvision, 2020


@inproceedings{Cohen2022xrv,
title = {{TorchXRayVision: A library of chest X-ray datasets and models}},
author = {Cohen, Joseph Paul and Viviano, Joseph D. and Bertin, Paul and Morrison, Paul and Torabian, Parsa and Guarrera, Matteo and Lungren, Matthew P and Chaudhari, Akshay and Brooks, Rupert and Hashir, Mohammad and Bertrand, Hadrien},
booktitle = {Medical Imaging with Deep Learning},
url = {https://github.com/mlmed/torchxrayvision},
arxivId = {2111.00595},
year = {2022}
}

```
and this paper which initiated development of the library: [https://arxiv.org/abs/2002.02497](https://arxiv.org/abs/2002.02497)
```
Joseph Paul Cohen and Mohammad Hashir and Rupert Brooks and Hadrien Bertrand
On the limits of cross-domain generalization in automated X-ray prediction. 
Medical Imaging with Deep Learning 2020 (Online: https://arxiv.org/abs/2002.02497)

@inproceedings{cohen2020limits,
  title={On the limits of cross-domain generalization in automated X-ray prediction},
  author={Cohen, Joseph Paul and Hashir, Mohammad and Brooks, Rupert and Bertrand, Hadrien},
  booktitle={Medical Imaging with Deep Learning},
  year={2020},
  url={https://arxiv.org/abs/2002.02497}
}
```

## Supporters/Sponsors


| <a href="https://cifar.ca/"><img width="300px" src="https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/cifar-logo.png" /></a><br> CIFAR (Canadian Institute for Advanced Research)  |  <a href="https://mila.quebec/"><img width="300px" src="https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/mila-logo.png" /></a><br> Mila, Quebec AI Institute, University of Montreal |
|:---:|:---:|
| <a href="http://aimi.stanford.edu/"><img width="300px" src="https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/AIMI-stanford.jpg" /></a> <br><b>Stanford University's Center for <br>Artificial Intelligence in Medicine & Imaging</b>  | <a href="http://www.carestream.com/"><img width="300px" src="https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/carestream-logo.png" /></a> <br><b>Carestream Health</b>  |

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mlmed/torchxrayvision",
    "name": "torchxrayvision",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Joseph Paul Cohen",
    "author_email": "joseph@josephpcohen.com",
    "download_url": "https://files.pythonhosted.org/packages/5e/70/7e6e97d4b899aa5a3b0576ced9da3870f460aad0819cdcba9cf1612b1bb3/torchxrayvision-1.3.2.tar.gz",
    "platform": null,
    "description": "\ud83d\udea8 Paper now online! [https://arxiv.org/abs/2111.00595](https://arxiv.org/abs/2111.00595)\n\n\ud83d\udea8 Documentation now online! [https://mlmed.org/torchxrayvision/](https://mlmed.org/torchxrayvision/)\n\n# TorchXRayVision \n\n| <img src=\"https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/torchxrayvision-logo.png\" width=\"300px\"/>  |  ([\ud83c\udfac promo video](https://www.youtube.com/watch?v=Rl7xz0uULGQ)) <br>[<img src=\"http://img.youtube.com/vi/Rl7xz0uULGQ/0.jpg\" width=\"400px\"/>)](http://www.youtube.com/watch?v=Rl7xz0uULGQ \"Video Title\") |\n|---|---|\n\n# What is it?\n\nA library for chest X-ray datasets and models. Including pre-trained models.\n\n\nTorchXRayVision is an open source software library for working with chest X-ray datasets and deep learning models. It provides a common interface and common pre-processing chain for a wide set of publicly available chest X-ray datasets. In addition, a number of classification and representation learning models with different architectures, trained on different data combinations, are available through the library to serve as baselines or feature extractors.\n\n- In the case of researchers addressing clinical questions it is a waste of time for them to train models from scratch. To address this, TorchXRayVision provides pre-trained models which are trained on large cohorts of data and enables 1) rapid analysis of large datasets 2) feature reuse for few-shot learning.\n- In the case of researchers developing algorithms it is important to robustly evaluate models using multiple external datasets. Metadata associated with each dataset can vary greatly which makes it difficult to apply methods to multiple datasets. TorchXRayVision provides access to many datasets in a uniform way so that they can be swapped out with a single line of code. These datasets can also be merged and filtered to construct specific distributional shifts for studying generalization.\n\nTwitter: [@torchxrayvision](https://twitter.com/torchxrayvision)\n\n## Getting started\n\n```\n$ pip install torchxrayvision\n```\n\n```python3\nimport torchxrayvision as xrv\nimport skimage, torch, torchvision\n\n# Prepare the image:\nimg = skimage.io.imread(\"16747_3_1.jpg\")\nimg = xrv.datasets.normalize(img, 255) # convert 8-bit image to [-1024, 1024] range\nimg = img.mean(2)[None, ...] # Make single color channel\n\ntransform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop(),xrv.datasets.XRayResizer(224)])\n\nimg = transform(img)\nimg = torch.from_numpy(img)\n\n# Load model and process image\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-all\")\noutputs = model(img[None,...]) # or model.features(img[None,...]) \n\n# Print results\ndict(zip(model.pathologies,outputs[0].detach().numpy()))\n\n{'Atelectasis': 0.32797316,\n 'Consolidation': 0.42933336,\n 'Infiltration': 0.5316924,\n 'Pneumothorax': 0.28849724,\n 'Edema': 0.024142697,\n 'Emphysema': 0.5011832,\n 'Fibrosis': 0.51887786,\n 'Effusion': 0.27805611,\n 'Pneumonia': 0.18569896,\n 'Pleural_Thickening': 0.24489835,\n 'Cardiomegaly': 0.3645515,\n 'Nodule': 0.68982,\n 'Mass': 0.6392845,\n 'Hernia': 0.00993878,\n 'Lung Lesion': 0.011150705,\n 'Fracture': 0.51916164,\n 'Lung Opacity': 0.59073937,\n 'Enlarged Cardiomediastinum': 0.27218717}\n\n```\n\nA sample script to process images usings pretrained models is [process_image.py](https://github.com/mlmed/torchxrayvision/blob/master/scripts/process_image.py)\n\n```\n$ python3 process_image.py ../tests/00000001_000.png\n{'preds': {'Atelectasis': 0.50500506,\n           'Cardiomegaly': 0.6600903,\n           'Consolidation': 0.30575264,\n           'Edema': 0.274184,\n           'Effusion': 0.4026162,\n           'Emphysema': 0.5036339,\n           'Enlarged Cardiomediastinum': 0.40989172,\n           'Fibrosis': 0.53293407,\n           'Fracture': 0.32376793,\n           'Hernia': 0.011924741,\n           'Infiltration': 0.5154413,\n           'Lung Lesion': 0.22231922,\n           'Lung Opacity': 0.2772148,\n           'Mass': 0.32237658,\n           'Nodule': 0.5091847,\n           'Pleural_Thickening': 0.5102617,\n           'Pneumonia': 0.30947986,\n           'Pneumothorax': 0.24847917}}\n\n```\n\n## Models ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_models.ipynb))\n\nSpecify weights for pretrained models (currently all DenseNet121)\nNote: Each pretrained model has 18 outputs. The `all` model has every output trained. However, for the other weights some targets are not trained and will predict randomly becuase they do not exist in the training dataset. The only valid outputs are listed in the field `{dataset}.pathologies` on the dataset that corresponds to the weights. \n\n```python3\n\n## 224x224 models\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-all\")\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-rsna\") # RSNA Pneumonia Challenge\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-nih\") # NIH chest X-ray8\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-pc\") # PadChest (University of Alicante)\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-chex\") # CheXpert (Stanford)\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-mimic_nb\") # MIMIC-CXR (MIT)\nmodel = xrv.models.DenseNet(weights=\"densenet121-res224-mimic_ch\") # MIMIC-CXR (MIT)\n\n# 512x512 models\nmodel = xrv.models.ResNet(weights=\"resnet50-res512-all\")\n\n# DenseNet121 from JF Healthcare for the CheXpert competition\nmodel = xrv.baseline_models.jfhealthcare.DenseNet() \n\n# Official Stanford CheXpert model\nmodel = xrv.baseline_models.chexpert.DenseNet(weights_zip=\"chexpert_weights.zip\")\n\n# Emory HITI lab race prediction model\nmodel = xrv.baseline_models.emory_hiti.RaceModel()\nmodel.targets -> [\"Asian\", \"Black\", \"White\"]\n\n# Riken age prediction model\nmodel = xrv.baseline_models.riken.AgeModel()\n\n```\n\nBenchmarks of the modes are here: [BENCHMARKS.md](BENCHMARKS.md) and the performance of some of the models can be seen in this paper [arxiv.org/abs/2002.02497](https://arxiv.org/abs/2002.02497). \n\n\n## Autoencoders \nYou can also load a pre-trained autoencoder that is trained on the PadChest, NIH, CheXpert, and MIMIC datasets.\n```python3\nae = xrv.autoencoders.ResNetAE(weights=\"101-elastic\")\nz = ae.encode(image)\nimage2 = ae.decode(z)\n```\n\n## Segmentation\n\nYou can load pretrained anatomical segmentation models. [Demo Notebook](scripts/segmentation.ipynb)\n\n```python3\nseg_model = xrv.baseline_models.chestx_det.PSPNet()\noutput = seg_model(image)\noutput.shape # [1, 14, 512, 512]\nseg_model.targets # ['Left Clavicle', 'Right Clavicle', 'Left Scapula', 'Right Scapula',\n                  #  'Left Lung', 'Right Lung', 'Left Hilus Pulmonis', 'Right Hilus Pulmonis',\n                  #  'Heart', 'Aorta', 'Facies Diaphragmatica', 'Mediastinum',  'Weasand', 'Spine']\n```\n\n![](docs/segmentation-pspnet.png)\n\n## Datasets \n[View docstrings for more detail on each dataset](https://github.com/mlmed/torchxrayvision/blob/master/torchxrayvision/datasets.py) and [Demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_datasets.ipynb) and [Example loading script](https://github.com/mlmed/torchxrayvision/blob/master/scripts/dataset_utils.py)\n\n```python3\ntransform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop(),\n                                            xrv.datasets.XRayResizer(224)])\n\n# RSNA Pneumonia Detection Challenge. https://pubs.rsna.org/doi/full/10.1148/ryai.2019180041\nd_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath=\"path to stage_2_train_images_jpg\",\n                                       transform=transform)\n                \n# CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. https://arxiv.org/abs/1901.07031             \nd_chex = xrv.datasets.CheX_Dataset(imgpath=\"path to CheXpert-v1.0-small\",\n                                   csvpath=\"path to CheXpert-v1.0-small/train.csv\",\n                                   transform=transform)\n\n# National Institutes of Health ChestX-ray8 dataset. https://arxiv.org/abs/1705.02315\nd_nih = xrv.datasets.NIH_Dataset(imgpath=\"path to NIH images\")\n\n# A relabelling of a subset of NIH images from: https://pubs.rsna.org/doi/10.1148/radiol.2019191293\nd_nih2 = xrv.datasets.NIH_Google_Dataset(imgpath=\"path to NIH images\")\n\n# PadChest: A large chest x-ray image dataset with multi-label annotated reports. https://arxiv.org/abs/1901.07441\nd_pc = xrv.datasets.PC_Dataset(imgpath=\"path to image folder\")\n\n# COVID-19 Image Data Collection. https://arxiv.org/abs/2006.11988\nd_covid19 = xrv.datasets.COVID19_Dataset() # specify imgpath and csvpath for the dataset\n\n# SIIM Pneumothorax Dataset. https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation\nd_siim = xrv.datasets.SIIM_Pneumothorax_Dataset(imgpath=\"dicom-images-train/\",\n                                                csvpath=\"train-rle.csv\")\n\n# VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations. https://arxiv.org/abs/2012.15029\nd_vin = xrv.datasets.VinBrain_Dataset(imgpath=\".../train\",\n                                      csvpath=\".../train.csv\")\n\n# National Library of Medicine Tuberculosis Datasets. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256233/\nd_nlmtb = xrv.datasets.NLMTB_Dataset(imgpath=\"path to MontgomerySet or ChinaSet_AllFiles\")\n```\n\n## Dataset fields\n\nEach dataset contains a number of fields. These fields are maintained when xrv.datasets.Subset_Dataset and xrv.datasets.Merge_Dataset are used.\n\n - `.pathologies` This field is a list of the pathologies contained in this dataset that will be contained in the `.labels` field ].\n\n - `.labels` This field contains a 1,0, or NaN for each label defined in `.pathologies`. \n\n - `.csv` This field is a pandas DataFrame of the metadata csv file that comes with the data. Each row aligns with the elements of the dataset so indexing using `.iloc` will work. \n\nIf possible, each dataset's `.csv` will have some common fields of the csv. These will be aligned when The list is as follows:\n\n- `csv.patientid` A unique id that will uniqely identify samples in this dataset\n\n- `csv.offset_day_int` An integer time offset for the image in the unit of days. This is expected to be for relative times and has no absolute meaning although for some datasets it is the epoch time.\n\n- `csv.age_years` The age of the patient in years.\n\n- `csv.sex_male` If the patient is male\n\n- `csv.sex_female` If the patient is female\n\n\n## Dataset tools\n\nrelabel_dataset will align labels to have the same order as the pathologies argument.\n```python3\nxrv.datasets.relabel_dataset(xrv.datasets.default_pathologies , d_nih) # has side effects\n```\n\nspecify a subset of views ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_datasets_views.ipynb))\n```python3\nd_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath=\"...\",\n                                               views=[\"PA\",\"AP\",\"AP Supine\"])\n```\n\nspecify only 1 image per patient\n```python3\nd_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath=\"...\",\n                                               unique_patients=True)\n```\n\nobtain summary statistics per dataset\n```python3\nd_chex = xrv.datasets.CheX_Dataset(imgpath=\"CheXpert-v1.0-small\",\n                                   csvpath=\"CheXpert-v1.0-small/train.csv\",\n                                 views=[\"PA\",\"AP\"], unique_patients=False)\n\nCheX_Dataset num_samples=191010 views=['PA', 'AP']\n{'Atelectasis': {0.0: 17621, 1.0: 29718},\n 'Cardiomegaly': {0.0: 22645, 1.0: 23384},\n 'Consolidation': {0.0: 30463, 1.0: 12982},\n 'Edema': {0.0: 29449, 1.0: 49674},\n 'Effusion': {0.0: 34376, 1.0: 76894},\n 'Enlarged Cardiomediastinum': {0.0: 26527, 1.0: 9186},\n 'Fracture': {0.0: 18111, 1.0: 7434},\n 'Lung Lesion': {0.0: 17523, 1.0: 7040},\n 'Lung Opacity': {0.0: 20165, 1.0: 94207},\n 'Pleural Other': {0.0: 17166, 1.0: 2503},\n 'Pneumonia': {0.0: 18105, 1.0: 4674},\n 'Pneumothorax': {0.0: 54165, 1.0: 17693},\n 'Support Devices': {0.0: 21757, 1.0: 99747}}\n```\n\n## Pathology masks ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_masks.ipynb))\n\nMasks are available in the following datasets:\n```python3\nxrv.datasets.RSNA_Pneumonia_Dataset() # for Lung Opacity\nxrv.datasets.SIIM_Pneumothorax_Dataset() # for Pneumothorax\nxrv.datasets.NIH_Dataset() # for Cardiomegaly, Mass, Effusion, ...\n```\n\nExample usage:\n\n```python3\nd_rsna = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath=\"stage_2_train_images_jpg\", \n                                            views=[\"PA\",\"AP\"],\n                                            pathology_masks=True)\n                                            \n# The has_masks column will let you know if any masks exist for that sample\nd_rsna.csv.has_masks.value_counts()\nFalse    20672\nTrue      6012       \n\n# Each sample will have a pathology_masks dictionary where the index \n# of each pathology will correspond to a mask of that pathology (if it exists).\n# There may be more than one mask per sample. But only one per pathology.\nsample[\"pathology_masks\"][d_rsna.pathologies.index(\"Lung Opacity\")]\n```\n![](https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/pathology-mask-rsna2.png)\n![](https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/pathology-mask-rsna3.png)\n\nit also works with data_augmentation if you pass in `data_aug=data_transforms` to the dataloader. The random seed is matched to align calls for the image and the mask.\n\n![](https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/pathology-mask-rsna614-da.png)\n\n## Distribution shift tools ([demo notebook](https://github.com/mlmed/torchxrayvision/blob/master/scripts/xray_datasets-CovariateShift.ipynb))\n\nThe class `xrv.datasets.CovariateDataset` takes two datasets and two \narrays representing the labels. The samples will be returned with the \ndesired ratio of images from each site. The goal here is to simulate \na covariate shift to make a model focus on an incorrect feature. Then \nthe shift can be reversed in the validation data causing a catastrophic\nfailure in generalization performance.\n\nratio=0.0 means images from d1 will have a positive label\nratio=0.5 means images from d1 will have half of the positive labels\nratio=1.0 means images from d1 will have no positive label\n\nWith any ratio the number of samples returned will be the same.\n\n```python3\nd = xrv.datasets.CovariateDataset(d1 = # dataset1 with a specific condition\n                                  d1_target = #target label to predict,\n                                  d2 = # dataset2 with a specific condition\n                                  d2_target = #target label to predict,\n                                  mode=\"train\", # train, valid, and test\n                                  ratio=0.9)\n\n```\n\n## Citation\n\nPrimary TorchXRayVision paper: [https://arxiv.org/abs/2111.00595](https://arxiv.org/abs/2111.00595)\n\n```\nJoseph Paul Cohen, Joseph D. Viviano, Paul Bertin, Paul Morrison, Parsa Torabian, Matteo Guarrera, Matthew P Lungren, Akshay Chaudhari, Rupert Brooks, Mohammad Hashir, Hadrien Bertrand\nTorchXRayVision: A library of chest X-ray datasets and models. \nMedical Imaging with Deep Learning\nhttps://github.com/mlmed/torchxrayvision, 2020\n\n\n@inproceedings{Cohen2022xrv,\ntitle = {{TorchXRayVision: A library of chest X-ray datasets and models}},\nauthor = {Cohen, Joseph Paul and Viviano, Joseph D. and Bertin, Paul and Morrison, Paul and Torabian, Parsa and Guarrera, Matteo and Lungren, Matthew P and Chaudhari, Akshay and Brooks, Rupert and Hashir, Mohammad and Bertrand, Hadrien},\nbooktitle = {Medical Imaging with Deep Learning},\nurl = {https://github.com/mlmed/torchxrayvision},\narxivId = {2111.00595},\nyear = {2022}\n}\n\n```\nand this paper which initiated development of the library: [https://arxiv.org/abs/2002.02497](https://arxiv.org/abs/2002.02497)\n```\nJoseph Paul Cohen and Mohammad Hashir and Rupert Brooks and Hadrien Bertrand\nOn the limits of cross-domain generalization in automated X-ray prediction. \nMedical Imaging with Deep Learning 2020 (Online: https://arxiv.org/abs/2002.02497)\n\n@inproceedings{cohen2020limits,\n  title={On the limits of cross-domain generalization in automated X-ray prediction},\n  author={Cohen, Joseph Paul and Hashir, Mohammad and Brooks, Rupert and Bertrand, Hadrien},\n  booktitle={Medical Imaging with Deep Learning},\n  year={2020},\n  url={https://arxiv.org/abs/2002.02497}\n}\n```\n\n## Supporters/Sponsors\n\n\n| <a href=\"https://cifar.ca/\"><img width=\"300px\" src=\"https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/cifar-logo.png\" /></a><br> CIFAR (Canadian Institute for Advanced Research)  |  <a href=\"https://mila.quebec/\"><img width=\"300px\" src=\"https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/mila-logo.png\" /></a><br> Mila, Quebec AI Institute, University of Montreal |\n|:---:|:---:|\n| <a href=\"http://aimi.stanford.edu/\"><img width=\"300px\" src=\"https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/AIMI-stanford.jpg\" /></a> <br><b>Stanford University's Center for <br>Artificial Intelligence in Medicine & Imaging</b>  | <a href=\"http://www.carestream.com/\"><img width=\"300px\" src=\"https://raw.githubusercontent.com/mlmed/torchxrayvision/master/docs/carestream-logo.png\" /></a> <br><b>Carestream Health</b>  |\n\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "TorchXRayVision: A library of chest X-ray datasets and models",
    "version": "1.3.2",
    "project_urls": {
        "Homepage": "https://github.com/mlmed/torchxrayvision"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a98086cc53d16b96d48b28299099a93becc05f57aa70ecf9d441a82d9243c5c8",
                "md5": "262f61196b2c50a44686d844071defe8",
                "sha256": "eb98b2654d38583d8ea02088917ac721f0cda7a945bfdc3498ab5e7e1d5b4cca"
            },
            "downloads": -1,
            "filename": "torchxrayvision-1.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "262f61196b2c50a44686d844071defe8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 28983447,
            "upload_time": "2025-01-03T02:44:21",
            "upload_time_iso_8601": "2025-01-03T02:44:21.520096Z",
            "url": "https://files.pythonhosted.org/packages/a9/80/86cc53d16b96d48b28299099a93becc05f57aa70ecf9d441a82d9243c5c8/torchxrayvision-1.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5e707e6e97d4b899aa5a3b0576ced9da3870f460aad0819cdcba9cf1612b1bb3",
                "md5": "542277d452892e8246f241bf7f2d72a2",
                "sha256": "e5e6139e9d1aa7cb3598a8ffdc3ed15b0fbe700d98707d9b28b3faa167e75ec5"
            },
            "downloads": -1,
            "filename": "torchxrayvision-1.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "542277d452892e8246f241bf7f2d72a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 28984107,
            "upload_time": "2025-01-03T02:44:28",
            "upload_time_iso_8601": "2025-01-03T02:44:28.697803Z",
            "url": "https://files.pythonhosted.org/packages/5e/70/7e6e97d4b899aa5a3b0576ced9da3870f460aad0819cdcba9cf1612b1bb3/torchxrayvision-1.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-03 02:44:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mlmed",
    "github_project": "torchxrayvision",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1"
                ]
            ]
        },
        {
            "name": "torchvision",
            "specs": [
                [
                    ">=",
                    "0.5"
                ]
            ]
        },
        {
            "name": "scikit-image",
            "specs": [
                [
                    ">=",
                    "0.16"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "1"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    ">=",
                    "5.3.0"
                ]
            ]
        },
        {
            "name": "imageio",
            "specs": []
        }
    ],
    "lcname": "torchxrayvision"
}

Joseph Paul Cohen