fw-gear-qc-reporter


Namefw-gear-qc-reporter JSON
Version 0.3.3 PyPI version JSON
download
home_pagehttps://gitlab.com/flywheel-io/scientific-solutions/gears/metadata-error-reporter
SummaryGear to report on metadata qc results
upload_time2024-12-05 15:43:09
maintainerNone
docs_urlNone
authorFlywheel
requires_python<4.0,>=3.11
licenseMIT
keywords flywheel gears errors qc report reporter
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # metadata-error-reporter (Metadata Error Reporter)

## Overview

### Summary

Gear to create report on qc values from metadata on various sections of the
hierarchy.

### Cite

Developed by Flywheel.

### License 

[MIT](./LICENSE)

### Classification

Analysis gear.

*Gear Level:*

- [x] Project
- [x] Subject
- [x] Session
- [ ] Acquisition
- [ ] Analysis

----

[[_TOC_]]

----

### Inputs

- *metadata-rules*
  - __Type__: *YAML file*
  - __Optional__: *True*
  - __Description__: *A YAML file describing how to handle non-default QC values.*

### Config

- *debug*
  - __Type__: *boolean*
  - __Description__: *Log debug messages*
  - __Default__: *False*

- *report-on-success*
  - __Type__: *boolean*
  - __Description__: *If true, report on QC results that passed as well. Otherwise, report on only failed QC results.*
  - __Default__: *False*
  
- *output-format*
  - __Type__: *string* (Allowed values: {`csv`, `json`})
  - __Description__: *Format of output report*
  - __Default__: *csv*
  
- *intermediate-containers*
  - __Type__: *boolean*
  - __Description__: *If true, report on files attached to all containers under
  the run level. Otherwise only report on files attached to acquisition.*
  - __Default__: *False*

- *skip-analyses*
  - __Type__: *boolean*
  - __Description__: *If true, skip analysis QC results. Otherwise include analyses*
  - __Default__: *True*

### Outputs

#### Files

- *report*
  - __Name__: *{proj|sub|ses}-{id}-report.{csv|json}*
  - __Type__: *CSV or JSON report*
  - __Description__: *Consolidated report of configured QC results in hierarchy below run container*

#### Metadata

N/A

### Pre-requisites


#### Prerequisite Gear Runs

All gears which create QC results and are desired to be reported on should be run before `metadata-error-reporter` is run.

#### Prerequisite Files

N/A

#### Prerequisite Metadata

N/A

## Usage

### Description

`metadata-error-reporter` heavily relies upon Dataviews in Flywheel. The gear essentially does two things:

1. Submits and waits for the completion of Dataviews which correspond to config options.
  * The gear by default submits a dataview reporting on the `qc` namespace on all _acquisitions_ under the run level
  * With `intermediate-containers == True` the gear will also submit dataviews for each intermediate run level, i.e. if the run level is project, the gear will submit dataviews at the subject and session level as well as at acquisition
  * With `skip-analyses == False` the gear will also submit dataviews for analyses attached to each container level it runs at.  For example running at the subject level and `intermediate-containers == True`, the gear will submit 4 dataviews: session level, session-analysis level, acquistion level, and acquistion-analysis level.
2. Post-processes the dataviews performing default and any custom operations to report on QC results see [Metadata Rules](#metadata-rules)
  * By default, the gear reports on every key in the `qc` namespace.  Within each key, it reports on QC result (assumed to be a key), except `job_info`. Within each QC result, it reports the `state` and `data` keys, see [Output](#output) section.

For example, with the following qc namespace structure:

```python
file.info.qc: {
  "gear-name1": {
    "job_info": {},
    "qc_result1": {
      "state": "FAIL",
      "data": "invalid value"
    }
  },
  "gear-name2": {
    "job_info": {},
    "qc_result1": {
      "state": "FAIL",
      "data": "invalid value"
    }
  }
}
```

The gear would create 2 CSV lines for this particular file.

```csvM

```
## Metadata Rules

The `metadata-rules` input is an _optional_ YAML file that provides additional config globally or for individual qc-results

### Global options

* `top_level_namespace`: For if you want to report on qc results under a
different key
* `excluded_qc_results`: List of qc-result names to exclude added to the default
list (`job_info` and `gear_info`).
* `excluded_qc_results_override`: List of qc-results names to exclude overriding
the default list.
* `fail_names`: List of string values which will be interepreted to mean a failed qc-result (case-insensitive). Defaults to `["fail", "failure", "failed"]`.
* `state_name`: Name of the key in each qc-result which provides the state (pass vs. fail) information.  Defaults to `state`
* `true_means_fail`: For boolean valued qc-results, defines the mapping between the boolean and pass/fail.  If `True`, then a `True` valued boolean is defined as a failure, if `False`, then a `True` valued boolean is defined as a success.

### Field options

Use the `fields` key to define how to treat specific qc-results. The `fields` key should contain a mapping under it, where each key is a specific qc-result.  Each key should follow the format `<gear_name>.<qc_name>`. For example, to configure options for the qc-result `slice_consistency` generated by `dicom-qc`, you would use the key `dicom-qc.slice_consistency`.

Under each key (qc-result) in the fields section, the global `state_name`, `true_means_fail`, and `fail_names` can be overriden. Additionally, supporting data within the qc-result can be configured by using the `data` key.

Under the data key, each entry should be a key-value mapping where the `key` is the key of the data field you want extracted, and the value is one of either `unfold` or `default`:

* **unfold**: Unfold lists or dictionaries. For a list value, create a row for
each element in the list with item value represented as a string in
description column. For a dictionary value, create a row for each item in the
dictionary with item key represented in key column and item value represented
in value column.
* **default**: Represent everything as json object in description. For a list
value, create one row for the whole list will be a json representation of the
list, i.e. `[<item1>, <item2>]`. For a dictionary value, create one row for the
whole dictionary will be a json representation of the list, i.e. 
`[{"<key1>": "<value1>"}, {"<key2>": "<value2>"}]`

**NOTE**: The `data` key under a field definition does _not_ support nested fields at this time.

### Examples

#### Boolean valued qc-results

The qc-reporter gear is meant to report on qc-results created by the `GearToolkitContext`s [`add_qc_result` method](https://flywheel-io.gitlab.io/public/gear-toolkit/flywheel_gear_toolkit/context/#adding-a-qc-result-or-gear-info).

If you have a qc-result that was produced a different way (and therefore looks different), you will need to add a field within the `metadata-rules` input file to define how to process that result.

For example, if you have a gear called `boolean-reporter` that produces a single qc-result called `value`, such as this:

```json
{
  ...
  "qc": {
    "boolean-reporter": {
      "job_info": {...}
      "value": {
        "result": True
      }
    }
  }
}
```

You could write a `metadata-rules` like this:

```yaml
---
fields:
  boolean-reporter.value:
    state_name: "result"
    true_means_fail: True
```

This would tell the qc-reporter gear to look at the key `result` within the qc-result to determine state, and that a `True` value should be interpreted as a failure.

### Expanding supporting data

Dicom-qc reports on jsonschema validation, this can produce nested data such as this:

```json
{
  ...
  "qc": {
    "dicom-qc": {
      "job_info": {...}
      "jsonschema-validation": {
        "data": [{
          "error_context": "",
          "error_message": "'dicom' is a required property",
          "error_type": "required",
          "error_value": ["dicom", "dicom_array"],
          "item": "file.info.header"
        },
        {
          "error_context": "",
          "error_message": "'dicom_array' is a required property",
          "error_type": "required",
          "error_value": ["dicom", "dicom_array"],
          "item": "file.info.header"
        }],
        "state": "FAIL"
      },
    }
  }
}
```

By default if you ran qc-reporter on this, you would get a single row for this failed QC value, but `data` is a list of length 2, so if you wanted to get 2 rows (one for each failure), you could make a `metadata-rules` file that looked like this:

```yaml
---
fields:
  dicom-qc.jsonschema-validation:
    data:
      data: "unfold"
```

This will produce two rows for the one failed QC result with _all_ the supporting data as the `data` field in the output CSV.

## Output

If JSON output is selected the output will look like below.  Otherwise if CSV is selected, the output format will be the same, but with each object in the list being a row in the CSV.

```json
{
  [
      # Schema
	  {
      # Machine readable quick access
      "file_id": <file id | None>,
      "version": <file version | None>,
      # Human readable quick access
      "filename": <filename>,
      "subject.label": <label>,
      "session.label": <label | None>,
      "acquisition.label": <label | None>,
      "analysis.label": <label | None>,
      # for easier nav, non-existent for subject
      "session-url": <session-url>,
      # QC result  
      "state": <pass | fail>,
      "qc-namespace": <top level key under file.info.qc>,
      "qc": <key of the qc result>,
      "data": <supporting-data>,
      "key": <only used for “unfold” operation in custom optional input>,
      "value": <only used for “unfold” operation in custom optional input>,
	  },
      ## Examples
      # For specifically dicom-qc “jsonschema” and dicom-fixer “fixed” (both list types), unfold that list
    {
      …
      "qc-namespace": “dicom-qc”,
      "qc": “jsonschema”,
      "state": PASS | FAIL,
      "data": <error_messsage[0]>
    },
     …
    {
      …
      "qc-namespace": “dicom-qc”,
      "qc": “jsonschema”,
      "state": PASS | FAIL,
      "data": <error_messsage[n]>
    },
    {
      …
      "qc-namespace": “dicom-fixer”,
      "qc": “fixed”,
      "state": PASS | FAIL,
      "data": <fix[1]>
    },
    {
      …
      "qc-namespace": “dicom-fixer”,
      "qc": “fixed”,
      "state": PASS | FAIL,
      "data": <fix[n]>
    },
  ]
}
```

### Workflow

A general workflow:

1. Upload data to project with gear rules enabled
2. Gear rules run
3. Run any custom QC gears across project
4. Run metadata-error-reporter on project or subsection of project
5. Use output report to correct QC errors.

### Logging

An overview/orientation of the logging and how to interpret it.

## FAQ

[FAQ.md](FAQ.md)

## Contributing

[For more information about how to get started contributing to that gear,
checkout [CONTRIBUTING.md](CONTRIBUTING.md).]
<!-- markdownlint-disable-file -->

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/flywheel-io/scientific-solutions/gears/metadata-error-reporter",
    "name": "fw-gear-qc-reporter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "Flywheel, Gears, Errors, QC, Report, Reporter",
    "author": "Flywheel",
    "author_email": "support@flywheel.io",
    "download_url": null,
    "platform": null,
    "description": "# metadata-error-reporter (Metadata Error Reporter)\n\n## Overview\n\n### Summary\n\nGear to create report on qc values from metadata on various sections of the\nhierarchy.\n\n### Cite\n\nDeveloped by Flywheel.\n\n### License \n\n[MIT](./LICENSE)\n\n### Classification\n\nAnalysis gear.\n\n*Gear Level:*\n\n- [x] Project\n- [x] Subject\n- [x] Session\n- [ ] Acquisition\n- [ ] Analysis\n\n----\n\n[[_TOC_]]\n\n----\n\n### Inputs\n\n- *metadata-rules*\n  - __Type__: *YAML file*\n  - __Optional__: *True*\n  - __Description__: *A YAML file describing how to handle non-default QC values.*\n\n### Config\n\n- *debug*\n  - __Type__: *boolean*\n  - __Description__: *Log debug messages*\n  - __Default__: *False*\n\n- *report-on-success*\n  - __Type__: *boolean*\n  - __Description__: *If true, report on QC results that passed as well. Otherwise, report on only failed QC results.*\n  - __Default__: *False*\n  \n- *output-format*\n  - __Type__: *string* (Allowed values: {`csv`, `json`})\n  - __Description__: *Format of output report*\n  - __Default__: *csv*\n  \n- *intermediate-containers*\n  - __Type__: *boolean*\n  - __Description__: *If true, report on files attached to all containers under\n  the run level. Otherwise only report on files attached to acquisition.*\n  - __Default__: *False*\n\n- *skip-analyses*\n  - __Type__: *boolean*\n  - __Description__: *If true, skip analysis QC results. Otherwise include analyses*\n  - __Default__: *True*\n\n### Outputs\n\n#### Files\n\n- *report*\n  - __Name__: *{proj|sub|ses}-{id}-report.{csv|json}*\n  - __Type__: *CSV or JSON report*\n  - __Description__: *Consolidated report of configured QC results in hierarchy below run container*\n\n#### Metadata\n\nN/A\n\n### Pre-requisites\n\n\n#### Prerequisite Gear Runs\n\nAll gears which create QC results and are desired to be reported on should be run before `metadata-error-reporter` is run.\n\n#### Prerequisite Files\n\nN/A\n\n#### Prerequisite Metadata\n\nN/A\n\n## Usage\n\n### Description\n\n`metadata-error-reporter` heavily relies upon Dataviews in Flywheel. The gear essentially does two things:\n\n1. Submits and waits for the completion of Dataviews which correspond to config options.\n  * The gear by default submits a dataview reporting on the `qc` namespace on all _acquisitions_ under the run level\n  * With `intermediate-containers == True` the gear will also submit dataviews for each intermediate run level, i.e. if the run level is project, the gear will submit dataviews at the subject and session level as well as at acquisition\n  * With `skip-analyses == False` the gear will also submit dataviews for analyses attached to each container level it runs at.  For example running at the subject level and `intermediate-containers == True`, the gear will submit 4 dataviews: session level, session-analysis level, acquistion level, and acquistion-analysis level.\n2. Post-processes the dataviews performing default and any custom operations to report on QC results see [Metadata Rules](#metadata-rules)\n  * By default, the gear reports on every key in the `qc` namespace.  Within each key, it reports on QC result (assumed to be a key), except `job_info`. Within each QC result, it reports the `state` and `data` keys, see [Output](#output) section.\n\nFor example, with the following qc namespace structure:\n\n```python\nfile.info.qc: {\n  \"gear-name1\": {\n    \"job_info\": {},\n    \"qc_result1\": {\n      \"state\": \"FAIL\",\n      \"data\": \"invalid value\"\n    }\n  },\n  \"gear-name2\": {\n    \"job_info\": {},\n    \"qc_result1\": {\n      \"state\": \"FAIL\",\n      \"data\": \"invalid value\"\n    }\n  }\n}\n```\n\nThe gear would create 2 CSV lines for this particular file.\n\n```csvM\n\n```\n## Metadata Rules\n\nThe `metadata-rules` input is an _optional_ YAML file that provides additional config globally or for individual qc-results\n\n### Global options\n\n* `top_level_namespace`: For if you want to report on qc results under a\ndifferent key\n* `excluded_qc_results`: List of qc-result names to exclude added to the default\nlist (`job_info` and `gear_info`).\n* `excluded_qc_results_override`: List of qc-results names to exclude overriding\nthe default list.\n* `fail_names`: List of string values which will be interepreted to mean a failed qc-result (case-insensitive). Defaults to `[\"fail\", \"failure\", \"failed\"]`.\n* `state_name`: Name of the key in each qc-result which provides the state (pass vs. fail) information.  Defaults to `state`\n* `true_means_fail`: For boolean valued qc-results, defines the mapping between the boolean and pass/fail.  If `True`, then a `True` valued boolean is defined as a failure, if `False`, then a `True` valued boolean is defined as a success.\n\n### Field options\n\nUse the `fields` key to define how to treat specific qc-results. The `fields` key should contain a mapping under it, where each key is a specific qc-result.  Each key should follow the format `<gear_name>.<qc_name>`. For example, to configure options for the qc-result `slice_consistency` generated by `dicom-qc`, you would use the key `dicom-qc.slice_consistency`.\n\nUnder each key (qc-result) in the fields section, the global `state_name`, `true_means_fail`, and `fail_names` can be overriden. Additionally, supporting data within the qc-result can be configured by using the `data` key.\n\nUnder the data key, each entry should be a key-value mapping where the `key` is the key of the data field you want extracted, and the value is one of either `unfold` or `default`:\n\n* **unfold**: Unfold lists or dictionaries. For a list value, create a row for\neach element in the list with item value represented as a string in\ndescription column. For a dictionary value, create a row for each item in the\ndictionary with item key represented in key column and item value represented\nin value column.\n* **default**: Represent everything as json object in description. For a list\nvalue, create one row for the whole list will be a json representation of the\nlist, i.e. `[<item1>, <item2>]`. For a dictionary value, create one row for the\nwhole dictionary will be a json representation of the list, i.e. \n`[{\"<key1>\": \"<value1>\"}, {\"<key2>\": \"<value2>\"}]`\n\n**NOTE**: The `data` key under a field definition does _not_ support nested fields at this time.\n\n### Examples\n\n#### Boolean valued qc-results\n\nThe qc-reporter gear is meant to report on qc-results created by the `GearToolkitContext`s [`add_qc_result` method](https://flywheel-io.gitlab.io/public/gear-toolkit/flywheel_gear_toolkit/context/#adding-a-qc-result-or-gear-info).\n\nIf you have a qc-result that was produced a different way (and therefore looks different), you will need to add a field within the `metadata-rules` input file to define how to process that result.\n\nFor example, if you have a gear called `boolean-reporter` that produces a single qc-result called `value`, such as this:\n\n```json\n{\n  ...\n  \"qc\": {\n    \"boolean-reporter\": {\n      \"job_info\": {...}\n      \"value\": {\n        \"result\": True\n      }\n    }\n  }\n}\n```\n\nYou could write a `metadata-rules` like this:\n\n```yaml\n---\nfields:\n  boolean-reporter.value:\n    state_name: \"result\"\n    true_means_fail: True\n```\n\nThis would tell the qc-reporter gear to look at the key `result` within the qc-result to determine state, and that a `True` value should be interpreted as a failure.\n\n### Expanding supporting data\n\nDicom-qc reports on jsonschema validation, this can produce nested data such as this:\n\n```json\n{\n  ...\n  \"qc\": {\n    \"dicom-qc\": {\n      \"job_info\": {...}\n      \"jsonschema-validation\": {\n        \"data\": [{\n          \"error_context\": \"\",\n          \"error_message\": \"'dicom' is a required property\",\n          \"error_type\": \"required\",\n          \"error_value\": [\"dicom\", \"dicom_array\"],\n          \"item\": \"file.info.header\"\n        },\n        {\n          \"error_context\": \"\",\n          \"error_message\": \"'dicom_array' is a required property\",\n          \"error_type\": \"required\",\n          \"error_value\": [\"dicom\", \"dicom_array\"],\n          \"item\": \"file.info.header\"\n        }],\n        \"state\": \"FAIL\"\n      },\n    }\n  }\n}\n```\n\nBy default if you ran qc-reporter on this, you would get a single row for this failed QC value, but `data` is a list of length 2, so if you wanted to get 2 rows (one for each failure), you could make a `metadata-rules` file that looked like this:\n\n```yaml\n---\nfields:\n  dicom-qc.jsonschema-validation:\n    data:\n      data: \"unfold\"\n```\n\nThis will produce two rows for the one failed QC result with _all_ the supporting data as the `data` field in the output CSV.\n\n## Output\n\nIf JSON output is selected the output will look like below.  Otherwise if CSV is selected, the output format will be the same, but with each object in the list being a row in the CSV.\n\n```json\n{\n  [\n      # Schema\n\t  {\n      # Machine readable quick access\n      \"file_id\": <file id | None>,\n      \"version\": <file version | None>,\n      # Human readable quick access\n      \"filename\": <filename>,\n      \"subject.label\": <label>,\n      \"session.label\": <label | None>,\n      \"acquisition.label\": <label | None>,\n      \"analysis.label\": <label | None>,\n      # for easier nav, non-existent for subject\n      \"session-url\": <session-url>,\n      # QC result  \n      \"state\": <pass | fail>,\n      \"qc-namespace\": <top level key under file.info.qc>,\n      \"qc\": <key of the qc result>,\n      \"data\": <supporting-data>,\n      \"key\": <only used for \u201cunfold\u201d operation in custom optional input>,\n      \"value\": <only used for \u201cunfold\u201d operation in custom optional input>,\n\t  },\n      ## Examples\n      # For specifically dicom-qc \u201cjsonschema\u201d and dicom-fixer \u201cfixed\u201d (both list types), unfold that list\n    {\n      \u2026\n      \"qc-namespace\": \u201cdicom-qc\u201d,\n      \"qc\": \u201cjsonschema\u201d,\n      \"state\": PASS | FAIL,\n      \"data\": <error_messsage[0]>\n    },\n     \u2026\n    {\n      \u2026\n      \"qc-namespace\": \u201cdicom-qc\u201d,\n      \"qc\": \u201cjsonschema\u201d,\n      \"state\": PASS | FAIL,\n      \"data\": <error_messsage[n]>\n    },\n    {\n      \u2026\n      \"qc-namespace\": \u201cdicom-fixer\u201d,\n      \"qc\": \u201cfixed\u201d,\n      \"state\": PASS | FAIL,\n      \"data\": <fix[1]>\n    },\n    {\n      \u2026\n      \"qc-namespace\": \u201cdicom-fixer\u201d,\n      \"qc\": \u201cfixed\u201d,\n      \"state\": PASS | FAIL,\n      \"data\": <fix[n]>\n    },\n  ]\n}\n```\n\n### Workflow\n\nA general workflow:\n\n1. Upload data to project with gear rules enabled\n2. Gear rules run\n3. Run any custom QC gears across project\n4. Run metadata-error-reporter on project or subsection of project\n5. Use output report to correct QC errors.\n\n### Logging\n\nAn overview/orientation of the logging and how to interpret it.\n\n## FAQ\n\n[FAQ.md](FAQ.md)\n\n## Contributing\n\n[For more information about how to get started contributing to that gear,\ncheckout [CONTRIBUTING.md](CONTRIBUTING.md).]\n<!-- markdownlint-disable-file -->\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Gear to report on metadata qc results",
    "version": "0.3.3",
    "project_urls": {
        "Homepage": "https://gitlab.com/flywheel-io/scientific-solutions/gears/metadata-error-reporter",
        "Repository": "https://gitlab.com/flywheel-io/scientific-solutions/gears/metadata-error-reporter"
    },
    "split_keywords": [
        "flywheel",
        " gears",
        " errors",
        " qc",
        " report",
        " reporter"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "24a36f932c68fdef50048afb6da366ed6d28b44c9541f099a530b891cda580eb",
                "md5": "22c2eaf9c9bcfafc0fe57bf5564a52c3",
                "sha256": "bbc767e7348002a7e4e72712fc0f263752eb7a92b7d1b578d193367f2365972d"
            },
            "downloads": -1,
            "filename": "fw_gear_qc_reporter-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "22c2eaf9c9bcfafc0fe57bf5564a52c3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 13499,
            "upload_time": "2024-12-05T15:43:09",
            "upload_time_iso_8601": "2024-12-05T15:43:09.560544Z",
            "url": "https://files.pythonhosted.org/packages/24/a3/6f932c68fdef50048afb6da366ed6d28b44c9541f099a530b891cda580eb/fw_gear_qc_reporter-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-05 15:43:09",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "flywheel-io",
    "gitlab_project": "scientific-solutions",
    "lcname": "fw-gear-qc-reporter"
}
        
Elapsed time: 0.64930s