# fw-file
Unified interface for reading medical file types, exposing parsed fields as dict
keys as well as attributes and for saving any modifications to disk or a buffer.
DICOM support - built on top of `pydicom` - is the primary goal of the library.
`fw-file` also provides helpers for parsing DICOMs containing non-standard tags
and utilities for organizing datasets and extracting metadata.
Additional file types supported:
- NIfTI1 and NIfTI2 (.nii.gz)
- Bruker ParaVision (subject/acqp/method)
- GE MR RAW / PFile (P_NNNNN_.7)
- Philips MR PAR/REC header (.par)
- Philips MR PAR/REC zipfile (.parrec.zip) (read-only)
- Siemens MR RAW (.dat)
- Siemens MR Spectroscopy (.rda)
- Siemens PET RAW (.ptd)
- PNG (.png)
- JPEG/JPG (.jpeg/.jpg)
- BrainVision EEG (.vhdr/.vmrk/.eeg)
- EEGLAB EEG (.set/.fdt)
- European Data Format EEG (.edf)
- BioSemi Data Format EEG (.bdf)
## Installation
To install the package with all the optional dependencies:
```bash
pip install "fw-file[all]"
```
Alternatively, add as a `poetry` dependency to your project:
```bash
poetry add fw-file --extras all
```
## Usage
### Opening
```python
from fw_file.dicom import DICOM
dcm = DICOM("dataset.dcm") # also works with any readable file-like object
```
### Fields
**Attribute access** on DICOMs works similarly to that in `pydicom`:
```python
dcm.PatientAge == "060Y"
dcm.patientage == "060Y" # attrs are case-insensitive
dcm.patient_age == "060Y" # and snake_case compatible
```
**Key access** also returns values instead of `pydicom.DataElement`:
```python
dcm["PatientAge"] == "060Y"
dcm["patientage"] == "060Y" # keys are case-insensitive too
dcm["patient_age"] == "060Y" # and snake_case compatible
dcm["00101010"] == "060Y"
dcm["0010", "1010"] == "060Y"
dcm[0x00101010] == "060Y"
dcm[0x0010, 0x1010] == "060Y"
```
**Private tags** can be accessed as keys when including the creator:
```python
dcm["AGFA", "Zoom factor"] == 2
dcm["AGFA", "0019xx82"] == 2
```
**Assignment and deletion** works with attributes and keys alike:
```python
dcm.PatientAge = "065Y"
del dcm["PatientAge"]
```
### Metadata
Flywheel metadata can be extracted using the `get_meta()` method:
```python
from fw_file.dicom import DICOM
dcm = DICOM("dataset.dcm")
dcm.get_meta() == {
"subject.label": "PatientID",
"session.label": "StudyDescription",
"session.uid": "1.2.3", # StudyInstanceUID
"acquisition.label": "SeriesDescription",
"acquisition.uid": "4.5.6", # SeriesInstanceUID
# and much, much more...
}
```
### Saving
```python
dcm.save() # save to the original location
dcm.save("edited.dcm") # save to a given filepath
dcm.save(io.BytesIO()) # save to any writable object
```
### Collections and series
Handling multiple DICOM files together is a common use case, where the tags of
more than one file need to be inspected in tandem for QA/validation or even
modified for de-identification. `DICOMCollection` facilitates that and exposes
convenience methods to be loaded from a list of files, a directory or a zip
archive.
```python
from fw_file.dicom import DICOMCollection
coll_dcm = DICOMCollection("001.dcm", "002.dcm") # from a list of files
coll_dir = DICOMCollection.from_dir(".") # from a directory
coll_zip = DICOMCollection.from_zip("dicom.zip") # from a zip archive
coll = DICOMCollection() # or start from scratch
coll.append("001.dcm") # and add files later
```
To interact with the underlying DICOMs:
```python
# access individual instances through list indexes
coll[0].SOPInstanceUID == "1.2.3"
# get tag value of all instances as a list, allowing different values
coll.bulk_get("SOPInstanceUID") == ["1.2.3", "1.2.4"]
# get a unique tag value, raising when encountering multiple values
coll.get("SeriesInstanceUID") == "1.2"
coll.get("SOPInstanceUID") # raises ValueError
# set a tag value uniformly on all instances
coll.set("PatientAge", "060Y")
# delete a tag across all instances
coll.delete("PatientID")
```
Finally, a `DICOMCollection` can be saved in place, exported to a directory or
packed as a zip archive:
```python
coll.save()
coll.to_dir("/tmp/dicom")
coll.to_zip("/tmp/dicom.zip")
```
`DICOMSeries` is a subclass of `DICOMCollection`, intended to be used on files
that belong to the same DICOM series. The instances normally have the same
`SeriesInstanceUID` attribute and are uploaded together (zipped) into a Flywheel
acquisition. In addition to the collection methods, `DICOMSeries` can be used to
pack the instances into an appropriately named ZIP archive and extract Flywheel
metadata from multiple files while also validating the values, checking for any
discrepancies among the instances along the way.
```python
from fw_file.dicom import DICOMSeries
series = DICOMSeries("001.dcm", "002.dcm")
filepath, metadata = series.to_upload()
```
### DICOM Standard Editions
As the DICOM Standard is typically revised multiple times throughout the year,
`fw-file` provides the option to choose which edition is being utilized via
environment variables. The default is `"2023c"`, which utilizes the locally-saved
2023c edition. Additional options are `"current"` and any valid 5-character edition
(i.e. `"2022d"`). Specifying `"current"` will fetch the most recent edition at runtime.
```bash
FW_DCM_STANDARD_REV=current
FW_DCM_STANDARD_REV=2022d
```
### Private dictionary
In addition to the private tags included in
[`pydicom`](https://github.com/pydicom/pydicom/blob/v2.1.2/pydicom/_private_dict.py),
`fw-file` ships with an [extended dictionary](fw_file/dicom/dcmdict.py) to
make accessing even more private tags that much simpler.
The private dictionary can be further extended by creating a DCMTK-style
[data dict](https://github.com/DCMTK/dcmtk/blob/master/dcmdata/data/private.dic)
file and setting the
[`DCMDICTPATH`](https://support.dcmtk.org/docs/file_envvars.html)
environment variable to it's path.
### `DataElement` decoding
DICOMs are often saved with non-standard and/or corrupt data elements. To enable
loading these datasets, `fw-file` provides fixes for some common problems:
- Fix `VM=1` strings that contain `\` by replacing with `_` (default: enabled)
- Fix `VR` for known data elements encoded as explicit `UN` (default: enabled)
- Extend/improve handling of data elements with a `VR` mismatch (default: disabled)
These fixes can also be enabled/disabled via environment variables:
```bash
FW_DCM_REPLACE_UN_WITH_KNOWN_VR=false
FW_DCM_FIX_VM1_STRINGS=false
FW_DCM_FIX_VR_MISMATCH=true
```
To extract as much information from a DICOM as possible, `fw-file` can be run in
read-only mode. When enabled, invalid values are retained and the VR is set to OB.
As it is not safe to write the DICOM back in this state, saving is disabled. This
mode can be enabled via an environment variable. (default: disabled)
```bash
FW_DCM_READ_ONLY=true
```
Additionally, validation mode can be set via environment variables. Default is
1 (WARN), additional options are 2 (RAISE) and 0 (IGNORE).
```bash
FW_DCM_READING_VALIDATION_MODE=1
FW_DCM_WRITING_VALIDATION_MODE=1
```
## EEG
Multiple EEG filetypes are supported including BrainVision, EEGLAB, EDF, and BDF files.
These files are parsed using the MNE-Python library.
BrainVision data must contain both the header file (.vhdr) and the marker file (.vmrk)
in the same directory.
If EEGLAB data is made up of two files (.set and .fdt), these files must be
in the same directory.
A zip archive can also be used to instantiate a `fw-file` BrainVision or EEGLAB object.
```python
from fw_file.eeg import BrainVision, EEGLAB
bv = BrainVision.from_zip("brainvision.zip")
e = EEGLAB.from_zip("eeglab.zip")
```
## Development
Install the project using `poetry` and enable `pre-commit`:
```bash
poetry install --extras "all"
pre-commit install
```
## License
[![MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/flywheel-io/tools/lib/fw-file",
"name": "fw-file",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "Flywheel, parse, medical, file, metadata, extract, DICOM, RAW, MR, CT, PET, NIfTI, JPG, JPEG, PNG, Bruker, ParaVision, GE, PFile, Philips, PARREC, Siemens, PTD, EEG",
"author": "Flywheel",
"author_email": "support@flywheel.io",
"download_url": null,
"platform": null,
"description": "# fw-file\n\nUnified interface for reading medical file types, exposing parsed fields as dict\nkeys as well as attributes and for saving any modifications to disk or a buffer.\n\nDICOM support - built on top of `pydicom` - is the primary goal of the library.\n`fw-file` also provides helpers for parsing DICOMs containing non-standard tags\nand utilities for organizing datasets and extracting metadata.\n\nAdditional file types supported:\n\n- NIfTI1 and NIfTI2 (.nii.gz)\n- Bruker ParaVision (subject/acqp/method)\n- GE MR RAW / PFile (P_NNNNN_.7)\n- Philips MR PAR/REC header (.par)\n- Philips MR PAR/REC zipfile (.parrec.zip) (read-only)\n- Siemens MR RAW (.dat)\n- Siemens MR Spectroscopy (.rda)\n- Siemens PET RAW (.ptd)\n- PNG (.png)\n- JPEG/JPG (.jpeg/.jpg)\n- BrainVision EEG (.vhdr/.vmrk/.eeg)\n- EEGLAB EEG (.set/.fdt)\n- European Data Format EEG (.edf)\n- BioSemi Data Format EEG (.bdf)\n\n## Installation\n\nTo install the package with all the optional dependencies:\n\n```bash\npip install \"fw-file[all]\"\n```\n\nAlternatively, add as a `poetry` dependency to your project:\n\n```bash\npoetry add fw-file --extras all\n```\n\n## Usage\n\n### Opening\n\n```python\nfrom fw_file.dicom import DICOM\ndcm = DICOM(\"dataset.dcm\") # also works with any readable file-like object\n```\n\n### Fields\n\n**Attribute access** on DICOMs works similarly to that in `pydicom`:\n\n```python\ndcm.PatientAge == \"060Y\"\ndcm.patientage == \"060Y\" # attrs are case-insensitive\ndcm.patient_age == \"060Y\" # and snake_case compatible\n```\n\n**Key access** also returns values instead of `pydicom.DataElement`:\n\n```python\ndcm[\"PatientAge\"] == \"060Y\"\ndcm[\"patientage\"] == \"060Y\" # keys are case-insensitive too\ndcm[\"patient_age\"] == \"060Y\" # and snake_case compatible\ndcm[\"00101010\"] == \"060Y\"\ndcm[\"0010\", \"1010\"] == \"060Y\"\ndcm[0x00101010] == \"060Y\"\ndcm[0x0010, 0x1010] == \"060Y\"\n```\n\n**Private tags** can be accessed as keys when including the creator:\n\n```python\ndcm[\"AGFA\", \"Zoom factor\"] == 2\ndcm[\"AGFA\", \"0019xx82\"] == 2\n```\n\n**Assignment and deletion** works with attributes and keys alike:\n\n```python\ndcm.PatientAge = \"065Y\"\ndel dcm[\"PatientAge\"]\n```\n\n### Metadata\n\nFlywheel metadata can be extracted using the `get_meta()` method:\n\n```python\nfrom fw_file.dicom import DICOM\ndcm = DICOM(\"dataset.dcm\")\ndcm.get_meta() == {\n \"subject.label\": \"PatientID\",\n \"session.label\": \"StudyDescription\",\n \"session.uid\": \"1.2.3\", # StudyInstanceUID\n \"acquisition.label\": \"SeriesDescription\",\n \"acquisition.uid\": \"4.5.6\", # SeriesInstanceUID\n # and much, much more...\n}\n```\n\n### Saving\n\n```python\ndcm.save() # save to the original location\ndcm.save(\"edited.dcm\") # save to a given filepath\ndcm.save(io.BytesIO()) # save to any writable object\n```\n\n### Collections and series\n\nHandling multiple DICOM files together is a common use case, where the tags of\nmore than one file need to be inspected in tandem for QA/validation or even\nmodified for de-identification. `DICOMCollection` facilitates that and exposes\nconvenience methods to be loaded from a list of files, a directory or a zip\narchive.\n\n```python\nfrom fw_file.dicom import DICOMCollection\ncoll_dcm = DICOMCollection(\"001.dcm\", \"002.dcm\") # from a list of files\ncoll_dir = DICOMCollection.from_dir(\".\") # from a directory\ncoll_zip = DICOMCollection.from_zip(\"dicom.zip\") # from a zip archive\ncoll = DICOMCollection() # or start from scratch\ncoll.append(\"001.dcm\") # and add files later\n```\n\nTo interact with the underlying DICOMs:\n\n```python\n# access individual instances through list indexes\ncoll[0].SOPInstanceUID == \"1.2.3\"\n# get tag value of all instances as a list, allowing different values\ncoll.bulk_get(\"SOPInstanceUID\") == [\"1.2.3\", \"1.2.4\"]\n# get a unique tag value, raising when encountering multiple values\ncoll.get(\"SeriesInstanceUID\") == \"1.2\"\ncoll.get(\"SOPInstanceUID\") # raises ValueError\n# set a tag value uniformly on all instances\ncoll.set(\"PatientAge\", \"060Y\")\n# delete a tag across all instances\ncoll.delete(\"PatientID\")\n```\n\nFinally, a `DICOMCollection` can be saved in place, exported to a directory or\npacked as a zip archive:\n\n```python\ncoll.save()\ncoll.to_dir(\"/tmp/dicom\")\ncoll.to_zip(\"/tmp/dicom.zip\")\n```\n\n`DICOMSeries` is a subclass of `DICOMCollection`, intended to be used on files\nthat belong to the same DICOM series. The instances normally have the same\n`SeriesInstanceUID` attribute and are uploaded together (zipped) into a Flywheel\nacquisition. In addition to the collection methods, `DICOMSeries` can be used to\npack the instances into an appropriately named ZIP archive and extract Flywheel\nmetadata from multiple files while also validating the values, checking for any\ndiscrepancies among the instances along the way.\n\n```python\nfrom fw_file.dicom import DICOMSeries\nseries = DICOMSeries(\"001.dcm\", \"002.dcm\")\nfilepath, metadata = series.to_upload()\n```\n\n### DICOM Standard Editions\n\nAs the DICOM Standard is typically revised multiple times throughout the year,\n`fw-file` provides the option to choose which edition is being utilized via\nenvironment variables. The default is `\"2023c\"`, which utilizes the locally-saved\n2023c edition. Additional options are `\"current\"` and any valid 5-character edition\n(i.e. `\"2022d\"`). Specifying `\"current\"` will fetch the most recent edition at runtime.\n\n```bash\nFW_DCM_STANDARD_REV=current\nFW_DCM_STANDARD_REV=2022d\n```\n\n### Private dictionary\n\nIn addition to the private tags included in\n[`pydicom`](https://github.com/pydicom/pydicom/blob/v2.1.2/pydicom/_private_dict.py),\n`fw-file` ships with an [extended dictionary](fw_file/dicom/dcmdict.py) to\nmake accessing even more private tags that much simpler.\n\nThe private dictionary can be further extended by creating a DCMTK-style\n[data dict](https://github.com/DCMTK/dcmtk/blob/master/dcmdata/data/private.dic)\nfile and setting the\n[`DCMDICTPATH`](https://support.dcmtk.org/docs/file_envvars.html)\nenvironment variable to it's path.\n\n### `DataElement` decoding\n\nDICOMs are often saved with non-standard and/or corrupt data elements. To enable\nloading these datasets, `fw-file` provides fixes for some common problems:\n\n- Fix `VM=1` strings that contain `\\` by replacing with `_` (default: enabled)\n- Fix `VR` for known data elements encoded as explicit `UN` (default: enabled)\n- Extend/improve handling of data elements with a `VR` mismatch (default: disabled)\n\nThese fixes can also be enabled/disabled via environment variables:\n\n```bash\nFW_DCM_REPLACE_UN_WITH_KNOWN_VR=false\nFW_DCM_FIX_VM1_STRINGS=false\nFW_DCM_FIX_VR_MISMATCH=true\n```\n\nTo extract as much information from a DICOM as possible, `fw-file` can be run in\nread-only mode. When enabled, invalid values are retained and the VR is set to OB.\nAs it is not safe to write the DICOM back in this state, saving is disabled. This\nmode can be enabled via an environment variable. (default: disabled)\n\n```bash\nFW_DCM_READ_ONLY=true\n```\n\nAdditionally, validation mode can be set via environment variables. Default is\n1 (WARN), additional options are 2 (RAISE) and 0 (IGNORE).\n\n```bash\nFW_DCM_READING_VALIDATION_MODE=1\nFW_DCM_WRITING_VALIDATION_MODE=1\n```\n\n## EEG\n\nMultiple EEG filetypes are supported including BrainVision, EEGLAB, EDF, and BDF files.\nThese files are parsed using the MNE-Python library.\n\nBrainVision data must contain both the header file (.vhdr) and the marker file (.vmrk)\nin the same directory.\n\nIf EEGLAB data is made up of two files (.set and .fdt), these files must be\nin the same directory.\n\nA zip archive can also be used to instantiate a `fw-file` BrainVision or EEGLAB object.\n\n```python\nfrom fw_file.eeg import BrainVision, EEGLAB\nbv = BrainVision.from_zip(\"brainvision.zip\")\ne = EEGLAB.from_zip(\"eeglab.zip\")\n```\n\n## Development\n\nInstall the project using `poetry` and enable `pre-commit`:\n\n```bash\npoetry install --extras \"all\"\npre-commit install\n```\n\n## License\n\n[![MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Unified data-file interface",
"version": "3.6.2",
"project_urls": {
"Documentation": "https://gitlab.com/flywheel-io/tools/lib/fw-file",
"Homepage": "https://gitlab.com/flywheel-io/tools/lib/fw-file",
"Repository": "https://gitlab.com/flywheel-io/tools/lib/fw-file"
},
"split_keywords": [
"flywheel",
" parse",
" medical",
" file",
" metadata",
" extract",
" dicom",
" raw",
" mr",
" ct",
" pet",
" nifti",
" jpg",
" jpeg",
" png",
" bruker",
" paravision",
" ge",
" pfile",
" philips",
" parrec",
" siemens",
" ptd",
" eeg"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "113ab6c5e7734be6de04fc081a30723c274ce9c54d2c6686c1cfbb8308e26d11",
"md5": "c1f39c922763bb516b63fb0a3befbbff",
"sha256": "6ce7a3466d3433625f5c405bf1d592fa9c1eafbbce8e3ec4c4923de6a15d8b02"
},
"downloads": -1,
"filename": "fw_file-3.6.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c1f39c922763bb516b63fb0a3befbbff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 363227,
"upload_time": "2024-12-23T20:04:24",
"upload_time_iso_8601": "2024-12-23T20:04:24.176583Z",
"url": "https://files.pythonhosted.org/packages/11/3a/b6c5e7734be6de04fc081a30723c274ce9c54d2c6686c1cfbb8308e26d11/fw_file-3.6.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-23 20:04:24",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "flywheel-io",
"gitlab_project": "tools",
"lcname": "fw-file"
}