# Anonymized/De-identified In Place
## Overview
### Summary
Profile-based anonymization of a file in flywheel.
Files will be anonymized according to a de-id YAML profile and will overwrite
or create a new version of the source file.
Currently supported files are:
* Dicom
* JPG
* PNG
* TIFF
* XML
* JSON
* Text file defining key/value pair (e.g. MHD)
* CSV
* TSV
Currently supported field transformations are:
* ``remove``: Removes the field from the metadata.
* ``replace-with``: Replaces the contents of the field with the value provided.
* ``increment-date``: Offsets the date by the number of days.
* ``increment-datetime``: Offsets the datetime by the number of days.
* ``hash``: Replace the contents of the field with a one-way cryptographic hash.
* ``hashuid``: Replaces a UID field with a hashed version of that field.
* ``jitter``: Shifts value by a random number.
* ``encrypt`` (non-DICOM): Encrypts the field in place with AES-EAX encryption
* ``encrypt`` (DICOM): Removes the field from the DICOM and stores the original value
in EncryptedAttributesSequence with CMS encryption
* ``decrypt`` (non-DICOM): Decrypts the field in place with AES-EAX decryption
* ``decrypt`` (DICOM): Replace the contents of the field with the value stored in
EncryptedAttributesSequence with CMS decryption
* ``regex-sub``: Replace the contents of the field with a value built from other fields
and/or group extracted from the field value.
* ``keep``: Do nothing.
Additionally, for DICOM, pixel data masking is supported based on pre-defined
pixel coordinates ([doc](https://flywheel-io.gitlab.io/public/migration-toolkit/pages/pixels.html)).
The YAML profile extends the
[flywheel-migration-toolkit](https://gitlab.com/flywheel-io/public/migration-toolkit)
de-id profile to flywheel metadata container. Documentation on how to write YAML
configuration for the different supported files can be found in the flywheel-migration
[doc](https://flywheel-io.gitlab.io/public/migration-toolkit/).
_NOTE:_ Metadat extraction must be rerun on the output file, as
the gear itself does not propagate/modify DICOM metadata.
### License
MIT
### Classification
utility
* Gear Level:*
* [x] Project
* [x] Subject
* [x] Session
* [x] Acquisition
* [ ] Analysis
----
[[_TOC_]]
----
### Inputs
* _deid-profile_
* __Name__: deid-profile
* __Type__: file
* __Optional__: false
* __Description__: A Flywheel de-identification profile specifying the
de-identification actions to perform
* _subject-csv_
* __Name__: subject-csv
* __Type__: file
* __Optional__: true
* __Description__: A CSV file that contains mapping values to apply for subjects
during de-identification.
* _input-file_
* __Name__: input-file
* __Type__: file
* __Optional__: false
* __Description__: An input file to be de-identified
#### deid_profile (required)
This is a YAML file that describes the protocol for de-identifying
input-file. This file covers all the same functionality of Flywheel
CLI de-identification. A simple example deid_profile.yaml looks like this:
``` yaml
# Configuration for DICOM de-identification
dicom:
# What date offset to use, in number of days
date-increment: -17
# Set patient age from date of birth
patient-age-from-birthdate: true
# Set patient age units as Years
patient-age-units: Y
# Remove private tags
remove-private-tags: true
# filenames block to manipulate output filename based on input filename
filenames:
# input regular expression that match source filename
- input-regex: '.*'
# formatter of the output filename
output: '{SOPInstanceUID}.dcm'
fields:
# Remove a dicom field value (e.g. remove “StationName”)
- name: StationName
remove: true
# Increment a date field by -17 days
- name: StudyDate
increment-date: true
# Increment a datetime field by -17 days
- name: AcquisitionDateTime
increment-datetime: true
# One-Way hash a dicom field to a unique string
- name: AccessionNumber
hash: true
# One-Way hash the ConcatenationUID,
# keeping the prefix (4 nodes) and suffix (2 nodes)
- name: ConcatenationUID
hashuid: true
# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly
to that same profile.
zip:
fields:
- name: comment
replace-with: FLYWHEEL
filenames:
- input-regex: (?P<used>.*).dicom.zip$
output: '{used}.dcm.zip'
hash-subdirectories: true
validate-zip-members: true
# The flywheel configuration to handle flywheel metadata de-id.
flywheel:
# subject container
subject:
# If set to true, export all source container metdata to destination container.
all: true
# session container
session:
# If set to false, only export to destination container the metadata defined
# in the fields key.
all: false
date-increment: -17
fields:
- name: operator
replace-with: REDACTED
- name: info.sessiondate
increment-date: true
- name: tags
replace-with:
- deid-exported
acquisition:
all: true
file:
all: true
# If set to true, export the file info header to the destination container.
# If set to false or missing, the file info header will be removed from the
# destination container.
include-info-header: true
```
#### subject-csv (optional)
The subject_csv facilitates subject-specific configuration of
de-identification profiles. This is a csv file that contains the column
`subject.label` with unique values corresponding to the `subject.label`
values in the project to be exported. If a subject in the project to be
exported is not listed in `subject.label` in the provided subject_csv
this subject will not be exported.
##### Subject-level customization with subject-csv and deid-profile
Requirements:
* To update subject fields, the fields must both be represented in the
subject_csv as column header and in the deid_profile as jinja variable
(i.e `"{{ var_name }}"`).
* If a field is represented in both the deid_profile and the
subject_csv, the value in the deid_profile will be replaced with the
value listed in the corresponding column of the subject_csv for each
subject that has a label listed in the `subject.label` column.
* Fields represented in the deid_profile but not in the subject_csv will
be the same for all subjects.
Let's walk through an example pairing of subject_csv and deid_profile
to illustrate.
The following table represents subject_csv (../tests/data/example-csv-mapping.csv):
| subject.label | DATE_INCREMENT | SUBJECT_ID | PATIENT_BD_BOOL |
|---------------|----------------|-------------|-----------------|
| 001 | -15 | Patient_IDA | false |
| 002 | -20 | Patient_IDB | true |
| 003 | -30 | Patient_IDC | true |
The deid_profile:
``` yaml
dicom:
# date-increment can be any integer value since dicom.date-increment is defined in
# example-csv-mapping.csv
date-increment: "{{ DATE_INCREMENT }}"
# since example-csv-mapping.csv doesn't define dicom.remove-private-tags,
# all subjects will have private tags removed
remove-private-tags: true
fields:
- name: PatientBirthDate
# remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined
# in example-csv-mapping.csv
remove: "{{ PATIENT_BD_BOOL }}"
- name: PatientID
# replace-with can be any string value since dicom.fields.PatientID.replace-with
# is defined in example-csv-mapping.csv
replace-with: "{{ SUBJECT_ID }}"
```
The resulting profile for subject 003 given the above would be:
``` yaml
dicom:
# date-increment can be any integer value since dicom.date-increment is defined in
# example-csv-mapping.csv
date-increment: -30
remove_private_tags: true
fields:
- name: PatientBirthDate
remove: true
- name: PatientID
replace-with: Patient_IDC
```
### Config
* _debug_
* __Name__: debug
* __Type__: boolean
* __Default__: false
* __Description__: If true, the gear will print debug information to the log.
* _tag_
* __Name__: tag
* __Type__: string
* __Default__: "deid-inplace"
* __Description__: The tag prefix to append to the file after the gear runs.
The tag will be <prefix>-PASS or <prefix>-FAIL, depending on the gear run status.
* _delete-original_
* __Name__: delete-original
* __Type__: boolean
* __Default__: true
* __Description__: If True, the original file is deleted and replaced with the
de-identified file, rendering the original file unrecoverable. If False, the
de-identified file overwrites the original, resulting in a file version
increment that can be reversed.
## Usage
1. User uploads or identifies a file in Flywheel to deidentify
2. User runs deid-inplace (this utility gear) at the project, subject, or session
level and provides the following:
* Files:
* A de-identification profile specifying how to
de-identify/anonymize each file type
* an optional csv that contains a column that maps to a
Flywheel session or subject metadata field and columns that
specify values with which to replace DICOM header tags
* The desired input file to de-identify
* Configuration options:
* delete-original: True/False
3. The gear will deidentify the file
4. The gear will erase or overwrite the original file depending on the config option.
### Environment
This gear uses `poetry` as a virtual environment and dependency manager you can interact
with the gear using the following:
1. [Install poetry](https://python-poetry.org/docs/#installation)
2. Install dependencies (from within gear directory): `poetry install`
3. Enter virtual environment: `poetry shell`
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace",
"name": "fw-gear-deid-inplace",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.12",
"maintainer_email": null,
"keywords": "Flywheel, Gears",
"author": "Flywheel",
"author_email": "support@flywheel.io",
"download_url": null,
"platform": null,
"description": "# Anonymized/De-identified In Place\n\n## Overview\n\n### Summary\n\nProfile-based anonymization of a file in flywheel.\nFiles will be anonymized according to a de-id YAML profile and will overwrite\nor create a new version of the source file.\n\nCurrently supported files are:\n\n* Dicom\n* JPG\n* PNG\n* TIFF\n* XML\n* JSON\n* Text file defining key/value pair (e.g. MHD)\n* CSV\n* TSV\n\nCurrently supported field transformations are:\n\n* ``remove``: Removes the field from the metadata.\n* ``replace-with``: Replaces the contents of the field with the value provided.\n* ``increment-date``: Offsets the date by the number of days.\n* ``increment-datetime``: Offsets the datetime by the number of days.\n* ``hash``: Replace the contents of the field with a one-way cryptographic hash.\n* ``hashuid``: Replaces a UID field with a hashed version of that field.\n* ``jitter``: Shifts value by a random number.\n* ``encrypt`` (non-DICOM): Encrypts the field in place with AES-EAX encryption\n* ``encrypt`` (DICOM): Removes the field from the DICOM and stores the original value\nin EncryptedAttributesSequence with CMS encryption\n* ``decrypt`` (non-DICOM): Decrypts the field in place with AES-EAX decryption\n* ``decrypt`` (DICOM): Replace the contents of the field with the value stored in\nEncryptedAttributesSequence with CMS decryption\n* ``regex-sub``: Replace the contents of the field with a value built from other fields\n and/or group extracted from the field value.\n* ``keep``: Do nothing.\n\nAdditionally, for DICOM, pixel data masking is supported based on pre-defined\npixel coordinates ([doc](https://flywheel-io.gitlab.io/public/migration-toolkit/pages/pixels.html)).\n\nThe YAML profile extends the\n[flywheel-migration-toolkit](https://gitlab.com/flywheel-io/public/migration-toolkit)\nde-id profile to flywheel metadata container. Documentation on how to write YAML\nconfiguration for the different supported files can be found in the flywheel-migration\n[doc](https://flywheel-io.gitlab.io/public/migration-toolkit/).\n\n_NOTE:_ Metadat extraction must be rerun on the output file, as\nthe gear itself does not propagate/modify DICOM metadata.\n\n### License\n\nMIT\n\n### Classification\n\nutility\n\n* Gear Level:*\n\n* [x] Project\n* [x] Subject\n* [x] Session\n* [x] Acquisition\n* [ ] Analysis\n\n----\n\n[[_TOC_]]\n\n----\n\n### Inputs\n\n* _deid-profile_\n * __Name__: deid-profile\n * __Type__: file\n * __Optional__: false\n * __Description__: A Flywheel de-identification profile specifying the\n de-identification actions to perform\n\n* _subject-csv_\n * __Name__: subject-csv\n * __Type__: file\n * __Optional__: true\n * __Description__: A CSV file that contains mapping values to apply for subjects\n during de-identification.\n \n* _input-file_\n * __Name__: input-file\n * __Type__: file\n * __Optional__: false\n * __Description__: An input file to be de-identified\n\n#### deid_profile (required)\n\nThis is a YAML file that describes the protocol for de-identifying\ninput-file. This file covers all the same functionality of Flywheel\nCLI de-identification. A simple example deid_profile.yaml looks like this:\n\n``` yaml\n# Configuration for DICOM de-identification\ndicom:\n # What date offset to use, in number of days\n date-increment: -17\n\n # Set patient age from date of birth\n patient-age-from-birthdate: true\n # Set patient age units as Years\n patient-age-units: Y\n # Remove private tags\n remove-private-tags: true\n\n # filenames block to manipulate output filename based on input filename\n filenames:\n # input regular expression that match source filename\n - input-regex: '.*'\n # formatter of the output filename\n output: '{SOPInstanceUID}.dcm'\n\n fields:\n # Remove a dicom field value (e.g. remove \u201cStationName\u201d)\n - name: StationName\n remove: true\n\n # Increment a date field by -17 days\n - name: StudyDate\n increment-date: true\n\n # Increment a datetime field by -17 days\n - name: AcquisitionDateTime\n increment-datetime: true\n\n # One-Way hash a dicom field to a unique string\n - name: AccessionNumber\n hash: true\n\n # One-Way hash the ConcatenationUID,\n # keeping the prefix (4 nodes) and suffix (2 nodes)\n - name: ConcatenationUID\n hashuid: true\n\n# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly\nto that same profile. \nzip:\n fields:\n - name: comment\n replace-with: FLYWHEEL\n filenames:\n - input-regex: (?P<used>.*).dicom.zip$\n output: '{used}.dcm.zip'\n hash-subdirectories: true\n validate-zip-members: true\n\n# The flywheel configuration to handle flywheel metadata de-id.\nflywheel:\n # subject container\n subject:\n # If set to true, export all source container metdata to destination container.\n all: true\n\n # session container\n session:\n # If set to false, only export to destination container the metadata defined\n # in the fields key.\n all: false\n date-increment: -17\n fields:\n - name: operator\n replace-with: REDACTED\n - name: info.sessiondate\n increment-date: true\n - name: tags\n replace-with: \n - deid-exported\n\n acquisition:\n all: true\n\n file:\n all: true\n # If set to true, export the file info header to the destination container.\n # If set to false or missing, the file info header will be removed from the \n # destination container.\n include-info-header: true\n```\n\n#### subject-csv (optional)\n\nThe subject_csv facilitates subject-specific configuration of\nde-identification profiles. This is a csv file that contains the column\n`subject.label` with unique values corresponding to the `subject.label`\nvalues in the project to be exported. If a subject in the project to be\nexported is not listed in `subject.label` in the provided subject_csv\nthis subject will not be exported.\n\n##### Subject-level customization with subject-csv and deid-profile\n\nRequirements:\n\n* To update subject fields, the fields must both be represented in the\n subject_csv as column header and in the deid_profile as jinja variable\n (i.e `\"{{ var_name }}\"`).\n* If a field is represented in both the deid_profile and the\n subject_csv, the value in the deid_profile will be replaced with the\n value listed in the corresponding column of the subject_csv for each\n subject that has a label listed in the `subject.label` column.\n* Fields represented in the deid_profile but not in the subject_csv will\n be the same for all subjects.\n\nLet's walk through an example pairing of subject_csv and deid_profile\nto illustrate.\n\nThe following table represents subject_csv (../tests/data/example-csv-mapping.csv):\n\n| subject.label | DATE_INCREMENT | SUBJECT_ID | PATIENT_BD_BOOL |\n|---------------|----------------|-------------|-----------------|\n| 001 | -15 | Patient_IDA | false |\n| 002 | -20 | Patient_IDB | true |\n| 003 | -30 | Patient_IDC | true |\n\nThe deid_profile:\n\n``` yaml\ndicom:\n # date-increment can be any integer value since dicom.date-increment is defined in\n # example-csv-mapping.csv\n date-increment: \"{{ DATE_INCREMENT }}\"\n # since example-csv-mapping.csv doesn't define dicom.remove-private-tags,\n # all subjects will have private tags removed\n remove-private-tags: true\n fields:\n - name: PatientBirthDate\n # remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined\n # in example-csv-mapping.csv\n remove: \"{{ PATIENT_BD_BOOL }}\"\n - name: PatientID\n # replace-with can be any string value since dicom.fields.PatientID.replace-with\n # is defined in example-csv-mapping.csv\n replace-with: \"{{ SUBJECT_ID }}\"\n```\n\nThe resulting profile for subject 003 given the above would be:\n\n``` yaml\ndicom:\n # date-increment can be any integer value since dicom.date-increment is defined in\n # example-csv-mapping.csv\n date-increment: -30\n remove_private_tags: true\n fields:\n - name: PatientBirthDate\n remove: true\n - name: PatientID\n replace-with: Patient_IDC \n```\n\n### Config\n\n* _debug_\n * __Name__: debug\n * __Type__: boolean\n * __Default__: false\n * __Description__: If true, the gear will print debug information to the log.\n\n* _tag_\n * __Name__: tag\n * __Type__: string\n * __Default__: \"deid-inplace\"\n * __Description__: The tag prefix to append to the file after the gear runs.\n The tag will be <prefix>-PASS or <prefix>-FAIL, depending on the gear run status.\n\n* _delete-original_\n * __Name__: delete-original\n * __Type__: boolean\n * __Default__: true\n * __Description__: If True, the original file is deleted and replaced with the \n de-identified file, rendering the original file unrecoverable. If False, the\n de-identified file overwrites the original, resulting in a file version \n increment that can be reversed.\n\n\n## Usage\n\n1. User uploads or identifies a file in Flywheel to deidentify\n2. User runs deid-inplace (this utility gear) at the project, subject, or session\n level and provides the following:\n * Files:\n * A de-identification profile specifying how to\n de-identify/anonymize each file type\n * an optional csv that contains a column that maps to a\n Flywheel session or subject metadata field and columns that\n specify values with which to replace DICOM header tags\n * The desired input file to de-identify\n * Configuration options:\n * delete-original: True/False\n\n3. The gear will deidentify the file\n4. The gear will erase or overwrite the original file depending on the config option.\n\n\n### Environment\n\nThis gear uses `poetry` as a virtual environment and dependency manager you can interact\nwith the gear using the following:\n\n1. [Install poetry](https://python-poetry.org/docs/#installation)\n2. Install dependencies (from within gear directory): `poetry install`\n3. Enter virtual environment: `poetry shell`",
"bugtrack_url": null,
"license": "MIT",
"summary": "{{description}}",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace",
"Repository": "https://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace"
},
"split_keywords": [
"flywheel",
" gears"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ac6db0d12e5c311ced901c146ba701f8071dce7c83bf50508735b53c15e2c1f2",
"md5": "77ad03411b9ebc1c29795ef00e44dfcd",
"sha256": "b22ffd5e799d5d3a2ffbf5109c7a4382d0b18ea13980206db66aecf09a545d0b"
},
"downloads": -1,
"filename": "fw_gear_deid_inplace-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "77ad03411b9ebc1c29795ef00e44dfcd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.12",
"size": 13874,
"upload_time": "2025-01-06T21:59:11",
"upload_time_iso_8601": "2025-01-06T21:59:11.460960Z",
"url": "https://files.pythonhosted.org/packages/ac/6d/b0d12e5c311ced901c146ba701f8071dce7c83bf50508735b53c15e2c1f2/fw_gear_deid_inplace-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-06 21:59:11",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "flywheel-io",
"gitlab_project": "scientific-solutions",
"lcname": "fw-gear-deid-inplace"
}