fw-gear-deid-inplace


Namefw-gear-deid-inplace JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace
Summary{{description}}
upload_time2025-01-06 21:59:11
maintainerNone
docs_urlNone
authorFlywheel
requires_python<4.0,>=3.12
licenseMIT
keywords flywheel gears
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Anonymized/De-identified In Place

## Overview

### Summary

Profile-based anonymization of a file in flywheel.
Files will be anonymized according to a de-id YAML profile and will overwrite
or create a new version of the source file.

Currently supported files are:

* Dicom
* JPG
* PNG
* TIFF
* XML
* JSON
* Text file defining key/value pair (e.g. MHD)
* CSV
* TSV

Currently supported field transformations are:

* ``remove``: Removes the field from the metadata.
* ``replace-with``: Replaces the contents of the field with the value provided.
* ``increment-date``: Offsets the date by the number of days.
* ``increment-datetime``: Offsets the datetime by the number of days.
* ``hash``: Replace the contents of the field with a one-way cryptographic hash.
* ``hashuid``: Replaces a UID field with a hashed version of that field.
* ``jitter``: Shifts value by a random number.
* ``encrypt`` (non-DICOM): Encrypts the field in place with AES-EAX encryption
* ``encrypt`` (DICOM): Removes the field from the DICOM and stores the original value
in EncryptedAttributesSequence with CMS encryption
* ``decrypt`` (non-DICOM): Decrypts the field in place with AES-EAX decryption
* ``decrypt`` (DICOM): Replace the contents of the field with the value stored in
EncryptedAttributesSequence with CMS decryption
* ``regex-sub``: Replace the contents of the field with a value built from other fields
  and/or group extracted from the field value.
* ``keep``: Do nothing.

Additionally, for DICOM, pixel data masking is supported based on pre-defined
pixel coordinates ([doc](https://flywheel-io.gitlab.io/public/migration-toolkit/pages/pixels.html)).

The YAML profile extends the
[flywheel-migration-toolkit](https://gitlab.com/flywheel-io/public/migration-toolkit)
de-id profile to flywheel metadata container. Documentation on how to write YAML
configuration for the different supported files can be found in the flywheel-migration
[doc](https://flywheel-io.gitlab.io/public/migration-toolkit/).

_NOTE:_ Metadat extraction must be rerun on the output file, as
the gear itself does not propagate/modify DICOM metadata.

### License

MIT

### Classification

utility

* Gear Level:*

* [x] Project
* [x] Subject
* [x] Session
* [x] Acquisition
* [ ] Analysis

----

[[_TOC_]]

----

### Inputs

* _deid-profile_
  * __Name__: deid-profile
  * __Type__: file
  * __Optional__: false
  * __Description__: A Flywheel de-identification profile specifying the
      de-identification actions to perform

* _subject-csv_
  * __Name__: subject-csv
  * __Type__: file
  * __Optional__: true
  * __Description__: A CSV file that contains mapping values to apply for subjects
      during de-identification.
  
* _input-file_
  * __Name__: input-file
  * __Type__: file
  * __Optional__: false
  * __Description__: An input file to be de-identified

#### deid_profile (required)

This is a YAML file that describes the protocol for de-identifying
input-file. This file covers all the same functionality of Flywheel
CLI de-identification. A simple example deid_profile.yaml looks like this:

``` yaml
# Configuration for DICOM de-identification
dicom:
  # What date offset to use, in number of days
  date-increment: -17

  # Set patient age from date of birth
  patient-age-from-birthdate: true
  # Set patient age units as Years
  patient-age-units: Y
  # Remove private tags
  remove-private-tags: true

  # filenames block to manipulate output filename based on input filename
  filenames:
      # input regular expression that match source filename
    - input-regex: '.*'
      # formatter of the output filename
      output: '{SOPInstanceUID}.dcm'

  fields:
    # Remove a dicom field value (e.g. remove “StationName”)
    - name: StationName
      remove: true

    # Increment a date field by -17 days
    - name: StudyDate
      increment-date: true

    # Increment a datetime field by -17 days
    - name: AcquisitionDateTime
      increment-datetime: true

    # One-Way hash a dicom field to a unique string
    - name: AccessionNumber
      hash: true

    # One-Way hash the ConcatenationUID,
    # keeping the prefix (4 nodes) and suffix (2 nodes)
    - name: ConcatenationUID
      hashuid: true

# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly
to that same profile. 
zip:
  fields:
  - name: comment
    replace-with: FLYWHEEL
  filenames:
  - input-regex: (?P<used>.*).dicom.zip$
    output: '{used}.dcm.zip'
  hash-subdirectories: true
  validate-zip-members: true

# The flywheel configuration to handle flywheel metadata de-id.
flywheel:
  # subject container
  subject:
    # If set to true, export all source container metdata to destination container.
    all: true

  # session container
  session:
    # If set to false, only export to destination container the metadata defined
    # in the fields key.
    all: false
    date-increment: -17
    fields:
      - name: operator
        replace-with: REDACTED
      - name: info.sessiondate
        increment-date: true
      - name: tags
        replace-with: 
          - deid-exported

  acquisition:
    all: true

  file:
    all: true
    # If set to true, export the file info header to the destination container.
    # If set to false or missing, the file info header will be removed from the 
    # destination container.
    include-info-header: true
```

#### subject-csv (optional)

The subject_csv facilitates subject-specific configuration of
de-identification profiles. This is a csv file that contains the column
`subject.label` with unique values corresponding to the `subject.label`
values in the project to be exported. If a subject in the project to be
exported is not listed in `subject.label` in the provided subject_csv
this subject will not be exported.

##### Subject-level customization with subject-csv and deid-profile

Requirements:

* To update subject fields, the fields must both be represented in the
  subject_csv as column header and in the deid_profile as jinja variable
  (i.e `"{{ var_name }}"`).
* If a field is represented in both the deid_profile and the
  subject_csv, the value in the deid_profile will be replaced with the
  value listed in the corresponding column of the subject_csv for each
  subject that has a label listed in the `subject.label` column.
* Fields represented in the deid_profile but not in the subject_csv will
  be the same for all subjects.

Let's walk through an example pairing of subject_csv and deid_profile
to illustrate.

The following table represents subject_csv (../tests/data/example-csv-mapping.csv):

| subject.label | DATE_INCREMENT | SUBJECT_ID  | PATIENT_BD_BOOL |
|---------------|----------------|-------------|-----------------|
| 001           | -15            | Patient_IDA | false           |
| 002           | -20            | Patient_IDB | true            |
| 003           | -30            | Patient_IDC | true            |

The deid_profile:

``` yaml
dicom:
  # date-increment can be any integer value since dicom.date-increment is defined in
  # example-csv-mapping.csv
  date-increment: "{{ DATE_INCREMENT }}"
  # since example-csv-mapping.csv doesn't define dicom.remove-private-tags,
  # all subjects will have private tags removed
  remove-private-tags: true
  fields:
    - name: PatientBirthDate
      # remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined
      # in example-csv-mapping.csv
      remove: "{{ PATIENT_BD_BOOL }}"
    - name: PatientID
      # replace-with can be any string value since dicom.fields.PatientID.replace-with
      # is defined in example-csv-mapping.csv
      replace-with: "{{ SUBJECT_ID }}"
```

The resulting profile for subject 003 given the above would be:

``` yaml
dicom:
  # date-increment can be any integer value since dicom.date-increment is defined in
  # example-csv-mapping.csv
  date-increment: -30
  remove_private_tags: true
  fields:
    - name: PatientBirthDate
      remove: true
    - name: PatientID
      replace-with: Patient_IDC 
```

### Config

* _debug_
  * __Name__: debug
  * __Type__: boolean
  * __Default__: false
  * __Description__: If true, the gear will print debug information to the log.

* _tag_
  * __Name__: tag
  * __Type__: string
  * __Default__: "deid-inplace"
  * __Description__: The tag prefix to append to the file after the gear runs.
    The tag will be <prefix>-PASS or <prefix>-FAIL, depending on the gear run status.

* _delete-original_
  * __Name__: delete-original
  * __Type__: boolean
  * __Default__: true
  * __Description__: If True, the original file is deleted and replaced with the 
  de-identified file, rendering the original file unrecoverable. If False, the
  de-identified file overwrites the original, resulting in a file version 
  increment that can be reversed.


## Usage

1. User uploads or identifies a file in Flywheel to deidentify
2. User runs deid-inplace (this utility gear) at the project, subject, or session
   level and provides the following:
    * Files:
        * A de-identification profile specifying how to
          de-identify/anonymize each file type
        * an optional csv that contains a column that maps to a
          Flywheel session or subject metadata field and columns that
          specify values with which to replace DICOM header tags
        * The desired input file to de-identify
    * Configuration options:
        * delete-original: True/False

3. The gear will deidentify the file
4. The gear will erase or overwrite the original file depending on the config option.


### Environment

This gear uses `poetry` as a virtual environment and dependency manager you can interact
with the gear using the following:

1. [Install poetry](https://python-poetry.org/docs/#installation)
2. Install dependencies (from within gear directory): `poetry install`
3. Enter virtual environment: `poetry shell`
            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace",
    "name": "fw-gear-deid-inplace",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": "Flywheel, Gears",
    "author": "Flywheel",
    "author_email": "support@flywheel.io",
    "download_url": null,
    "platform": null,
    "description": "# Anonymized/De-identified In Place\n\n## Overview\n\n### Summary\n\nProfile-based anonymization of a file in flywheel.\nFiles will be anonymized according to a de-id YAML profile and will overwrite\nor create a new version of the source file.\n\nCurrently supported files are:\n\n* Dicom\n* JPG\n* PNG\n* TIFF\n* XML\n* JSON\n* Text file defining key/value pair (e.g. MHD)\n* CSV\n* TSV\n\nCurrently supported field transformations are:\n\n* ``remove``: Removes the field from the metadata.\n* ``replace-with``: Replaces the contents of the field with the value provided.\n* ``increment-date``: Offsets the date by the number of days.\n* ``increment-datetime``: Offsets the datetime by the number of days.\n* ``hash``: Replace the contents of the field with a one-way cryptographic hash.\n* ``hashuid``: Replaces a UID field with a hashed version of that field.\n* ``jitter``: Shifts value by a random number.\n* ``encrypt`` (non-DICOM): Encrypts the field in place with AES-EAX encryption\n* ``encrypt`` (DICOM): Removes the field from the DICOM and stores the original value\nin EncryptedAttributesSequence with CMS encryption\n* ``decrypt`` (non-DICOM): Decrypts the field in place with AES-EAX decryption\n* ``decrypt`` (DICOM): Replace the contents of the field with the value stored in\nEncryptedAttributesSequence with CMS decryption\n* ``regex-sub``: Replace the contents of the field with a value built from other fields\n  and/or group extracted from the field value.\n* ``keep``: Do nothing.\n\nAdditionally, for DICOM, pixel data masking is supported based on pre-defined\npixel coordinates ([doc](https://flywheel-io.gitlab.io/public/migration-toolkit/pages/pixels.html)).\n\nThe YAML profile extends the\n[flywheel-migration-toolkit](https://gitlab.com/flywheel-io/public/migration-toolkit)\nde-id profile to flywheel metadata container. Documentation on how to write YAML\nconfiguration for the different supported files can be found in the flywheel-migration\n[doc](https://flywheel-io.gitlab.io/public/migration-toolkit/).\n\n_NOTE:_ Metadat extraction must be rerun on the output file, as\nthe gear itself does not propagate/modify DICOM metadata.\n\n### License\n\nMIT\n\n### Classification\n\nutility\n\n* Gear Level:*\n\n* [x] Project\n* [x] Subject\n* [x] Session\n* [x] Acquisition\n* [ ] Analysis\n\n----\n\n[[_TOC_]]\n\n----\n\n### Inputs\n\n* _deid-profile_\n  * __Name__: deid-profile\n  * __Type__: file\n  * __Optional__: false\n  * __Description__: A Flywheel de-identification profile specifying the\n      de-identification actions to perform\n\n* _subject-csv_\n  * __Name__: subject-csv\n  * __Type__: file\n  * __Optional__: true\n  * __Description__: A CSV file that contains mapping values to apply for subjects\n      during de-identification.\n  \n* _input-file_\n  * __Name__: input-file\n  * __Type__: file\n  * __Optional__: false\n  * __Description__: An input file to be de-identified\n\n#### deid_profile (required)\n\nThis is a YAML file that describes the protocol for de-identifying\ninput-file. This file covers all the same functionality of Flywheel\nCLI de-identification. A simple example deid_profile.yaml looks like this:\n\n``` yaml\n# Configuration for DICOM de-identification\ndicom:\n  # What date offset to use, in number of days\n  date-increment: -17\n\n  # Set patient age from date of birth\n  patient-age-from-birthdate: true\n  # Set patient age units as Years\n  patient-age-units: Y\n  # Remove private tags\n  remove-private-tags: true\n\n  # filenames block to manipulate output filename based on input filename\n  filenames:\n      # input regular expression that match source filename\n    - input-regex: '.*'\n      # formatter of the output filename\n      output: '{SOPInstanceUID}.dcm'\n\n  fields:\n    # Remove a dicom field value (e.g. remove \u201cStationName\u201d)\n    - name: StationName\n      remove: true\n\n    # Increment a date field by -17 days\n    - name: StudyDate\n      increment-date: true\n\n    # Increment a datetime field by -17 days\n    - name: AcquisitionDateTime\n      increment-datetime: true\n\n    # One-Way hash a dicom field to a unique string\n    - name: AccessionNumber\n      hash: true\n\n    # One-Way hash the ConcatenationUID,\n    # keeping the prefix (4 nodes) and suffix (2 nodes)\n    - name: ConcatenationUID\n      hashuid: true\n\n# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly\nto that same profile. \nzip:\n  fields:\n  - name: comment\n    replace-with: FLYWHEEL\n  filenames:\n  - input-regex: (?P<used>.*).dicom.zip$\n    output: '{used}.dcm.zip'\n  hash-subdirectories: true\n  validate-zip-members: true\n\n# The flywheel configuration to handle flywheel metadata de-id.\nflywheel:\n  # subject container\n  subject:\n    # If set to true, export all source container metdata to destination container.\n    all: true\n\n  # session container\n  session:\n    # If set to false, only export to destination container the metadata defined\n    # in the fields key.\n    all: false\n    date-increment: -17\n    fields:\n      - name: operator\n        replace-with: REDACTED\n      - name: info.sessiondate\n        increment-date: true\n      - name: tags\n        replace-with: \n          - deid-exported\n\n  acquisition:\n    all: true\n\n  file:\n    all: true\n    # If set to true, export the file info header to the destination container.\n    # If set to false or missing, the file info header will be removed from the \n    # destination container.\n    include-info-header: true\n```\n\n#### subject-csv (optional)\n\nThe subject_csv facilitates subject-specific configuration of\nde-identification profiles. This is a csv file that contains the column\n`subject.label` with unique values corresponding to the `subject.label`\nvalues in the project to be exported. If a subject in the project to be\nexported is not listed in `subject.label` in the provided subject_csv\nthis subject will not be exported.\n\n##### Subject-level customization with subject-csv and deid-profile\n\nRequirements:\n\n* To update subject fields, the fields must both be represented in the\n  subject_csv as column header and in the deid_profile as jinja variable\n  (i.e `\"{{ var_name }}\"`).\n* If a field is represented in both the deid_profile and the\n  subject_csv, the value in the deid_profile will be replaced with the\n  value listed in the corresponding column of the subject_csv for each\n  subject that has a label listed in the `subject.label` column.\n* Fields represented in the deid_profile but not in the subject_csv will\n  be the same for all subjects.\n\nLet's walk through an example pairing of subject_csv and deid_profile\nto illustrate.\n\nThe following table represents subject_csv (../tests/data/example-csv-mapping.csv):\n\n| subject.label | DATE_INCREMENT | SUBJECT_ID  | PATIENT_BD_BOOL |\n|---------------|----------------|-------------|-----------------|\n| 001           | -15            | Patient_IDA | false           |\n| 002           | -20            | Patient_IDB | true            |\n| 003           | -30            | Patient_IDC | true            |\n\nThe deid_profile:\n\n``` yaml\ndicom:\n  # date-increment can be any integer value since dicom.date-increment is defined in\n  # example-csv-mapping.csv\n  date-increment: \"{{ DATE_INCREMENT }}\"\n  # since example-csv-mapping.csv doesn't define dicom.remove-private-tags,\n  # all subjects will have private tags removed\n  remove-private-tags: true\n  fields:\n    - name: PatientBirthDate\n      # remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined\n      # in example-csv-mapping.csv\n      remove: \"{{ PATIENT_BD_BOOL }}\"\n    - name: PatientID\n      # replace-with can be any string value since dicom.fields.PatientID.replace-with\n      # is defined in example-csv-mapping.csv\n      replace-with: \"{{ SUBJECT_ID }}\"\n```\n\nThe resulting profile for subject 003 given the above would be:\n\n``` yaml\ndicom:\n  # date-increment can be any integer value since dicom.date-increment is defined in\n  # example-csv-mapping.csv\n  date-increment: -30\n  remove_private_tags: true\n  fields:\n    - name: PatientBirthDate\n      remove: true\n    - name: PatientID\n      replace-with: Patient_IDC \n```\n\n### Config\n\n* _debug_\n  * __Name__: debug\n  * __Type__: boolean\n  * __Default__: false\n  * __Description__: If true, the gear will print debug information to the log.\n\n* _tag_\n  * __Name__: tag\n  * __Type__: string\n  * __Default__: \"deid-inplace\"\n  * __Description__: The tag prefix to append to the file after the gear runs.\n    The tag will be <prefix>-PASS or <prefix>-FAIL, depending on the gear run status.\n\n* _delete-original_\n  * __Name__: delete-original\n  * __Type__: boolean\n  * __Default__: true\n  * __Description__: If True, the original file is deleted and replaced with the \n  de-identified file, rendering the original file unrecoverable. If False, the\n  de-identified file overwrites the original, resulting in a file version \n  increment that can be reversed.\n\n\n## Usage\n\n1. User uploads or identifies a file in Flywheel to deidentify\n2. User runs deid-inplace (this utility gear) at the project, subject, or session\n   level and provides the following:\n    * Files:\n        * A de-identification profile specifying how to\n          de-identify/anonymize each file type\n        * an optional csv that contains a column that maps to a\n          Flywheel session or subject metadata field and columns that\n          specify values with which to replace DICOM header tags\n        * The desired input file to de-identify\n    * Configuration options:\n        * delete-original: True/False\n\n3. The gear will deidentify the file\n4. The gear will erase or overwrite the original file depending on the config option.\n\n\n### Environment\n\nThis gear uses `poetry` as a virtual environment and dependency manager you can interact\nwith the gear using the following:\n\n1. [Install poetry](https://python-poetry.org/docs/#installation)\n2. Install dependencies (from within gear directory): `poetry install`\n3. Enter virtual environment: `poetry shell`",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "{{description}}",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace",
        "Repository": "https://gitlab.com/flywheel-io/scientific-solutions/gears/deid-inplace"
    },
    "split_keywords": [
        "flywheel",
        " gears"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ac6db0d12e5c311ced901c146ba701f8071dce7c83bf50508735b53c15e2c1f2",
                "md5": "77ad03411b9ebc1c29795ef00e44dfcd",
                "sha256": "b22ffd5e799d5d3a2ffbf5109c7a4382d0b18ea13980206db66aecf09a545d0b"
            },
            "downloads": -1,
            "filename": "fw_gear_deid_inplace-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "77ad03411b9ebc1c29795ef00e44dfcd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 13874,
            "upload_time": "2025-01-06T21:59:11",
            "upload_time_iso_8601": "2025-01-06T21:59:11.460960Z",
            "url": "https://files.pythonhosted.org/packages/ac/6d/b0d12e5c311ced901c146ba701f8071dce7c83bf50508735b53c15e2c1f2/fw_gear_deid_inplace-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-06 21:59:11",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "flywheel-io",
    "gitlab_project": "scientific-solutions",
    "lcname": "fw-gear-deid-inplace"
}
        
Elapsed time: 0.65801s