fw-gear-file-classifier


Namefw-gear-file-classifier JSON
Version 0.6.5 PyPI version JSON
download
home_pagehttps://gitlab.com/flywheel-io/flywheel-apps/file-classifier.git
SummaryGeneric File Classifier
upload_time2024-03-25 22:39:09
maintainerNone
docs_urlNone
authorFlywheel
requires_python<4.0,>=3.10
licenseMIT
keywords flywheel gears classification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- markdownlint-disable code-block-style -->
# File Classifier

[[_TOC_]]

## Overview

The file classifier gear provides a gear interface to the
[fw-classification](https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification)
toolkit and is essentially just a wrapper around fw-classification.

For documentation on classification in general, please consult the
[fw-classification documentation](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/)

The file classifier gear uses the file type of the provided input to determine
the proper
[adapter](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/fw-classification/adapters/)
to use.

## Supported file types

Currently the gear supports classification of the following file types:

* `dicom`: via the `file.info.header.dicom` namespace which is populated using
the
[file-metadata-importer](https://gitlab.com/flywheel-io/flywheel-apps/file-metadata-importer)
gear.
* `nifti`: via a `json` sidecar which is found in the same container as the
input.

## Usage

### Prerequisites

#### Metadata

In general, since fw-classification acts on input metadata, the input file needs to have
it's metadata populated before running file-classifier. The metadata can live in a few
places depending on how the file will be classified.  The most common would be in the
`file.info.header.<file-type>` which will be populated by `file-metadata-importer`.  But
the metadata can also be in a separate file such as the `sidecar.json` for NIfTIs, or in
the hierarchy such as acquisition label, file name, or custom information on any parent
container.

#### Profile

file-classifier ships with [default
profiles](https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification-profiles/-/tree/main/profiles)
but the gear also accepts an input profile.  If you have custom needs beyond what is in
the default profile, you will need to override the default profiles. See [Custom
Classifications](#custom-classifications)

### Inputs

* __file-input__: The file to classify
* __profile__: Optional profile to use for classification, if passed in, this
will override the default classification profile and use what was passed in.
See documentation for creating a profile at the
[classification-toolkit
docs](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/fw-classification/profile/)
* __classifications__: An optional list of context classifications set at the
project level, see
[Setting custom classifications](#custom-classifications).  These
classifications are added as the final block to the profile that is being
used to classify, therefore they get highest priority.

### Configuration

* __debug__ (boolean, default False): Include debug statements in output.
* __tag__ (str, default 'file-classifier'): String to tag the file after
classification. Useful for gear-rule pipelines triggered by tags.

### Which profile will be used?

The priority for determining which profile will be used is as so:

1. Profile passed in via the optional _input_ `profile`
2. Default profile `main.yml` described in the
[classification-profiles](https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification-profiles)
repo.

The profile being used will be printed out at the beginning of the gear.

!!! note
    _After_ the profile has been determined, context classifications will be
    added as a block to that profile, i.e. context-classifications _always_ have the
    highest priority.

## Custom Classifications

Often the default profile will not have specific enough classification for a specific
project.  If you need to add custom classifications, there are two main ways to pass
them in:

1. Create a profile and attach it to your project
2. Add custom classifications to the project custom information.

### Create a profile

This is a better option if you will use these same custom classifications on multiple
projects.

__WARNING__:

> Creating a profile and passing it in as input will _completely_ bypass the already
> pre-defined classifications, so if you want to keep those, you will need to either copy
> them, or include as a git profile:

For example, to add a custom classification of `Deleted` when Protocol Name has been
deleted:

```yaml
---
name: Custom classifier
includes:
  # Include default MR
  - https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification-profiles$profiles/MR.yaml

profile:
  - name: set_custom_deleted
    description: |
      Set custom deleted classification if ProtocolName was deleted
    rules:
      - match_type: 'all'
        match:
          - key: file.type
            is: dicom
          - key: file.info.header.dicom.ProtocolName
            is: 'Deleted'
        action:
          - key: file.classification.Custom
            add: 'Deleted'
```

### Add custom classifications to project information

Custom classification can be added to project information.  These can be added either
via the SDK or UI, and they follow the same structure as a `fw-classification` profile
[block](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/fw-classification/profile/#block).

!!! note

    Project information classifications are added _fter_ the profile has been
    determined, context classifications will be added as a block to that profile, i.e.
    context-classifications always have the highest priority.

For example, adding the same `ProtocolName` block via the SDK:

```python
import flywheel
fw = flywheel.Client()
proj = fw.get_project(<proj_id>) # or use lookup()
existing_info = proj.info
# Initialize context classifications if they don't exist
existing_info.setdefault('classifications', [])
existing_info['classifications'].append(
    {
        'match': [
            {
                'key': 'file.type',
                'is': 'dicom',
            },
            {
                'key': 'file.info.header.dicom.ProtocolName',
                'is': 'deleted',
            }
        ],
        'action': [
            {'key': 'file.classification.Custom', 'add': 'Deleted'},
        ]
    }
)
proj.replace_info(existing_info)
```

The gear will then record that it found these custom classifications in the job logs:

```bash
...
[552ms   INFO     ]  Log level is INFO
[552ms   INFO     ]  Using default profile 'main.yml'
[1152ms   INFO     ]  Looking for custom classifications in project Q1_Q2_2022
[1152ms   INFO     ]  Found custom classification in project context, parsed as:

If all of () executed, then execute the first match of the following:

        -------------------- Rule 0 --------------------
        Match if Any are True:
                - file.type is dicom
                - file.info.header.dicom.ProtocolName is deleted

        Do the following:
                - add Deleted to file.classification.Custom

[1152ms   INFO     ]  Starting classification.
[1380ms   INFO     ]  Running at acquisition level
...
```

You can also add these values via the UI:

![Custom Classifications](./docs/images/custom-classifications-ui.png)

## Contributing

For more information about how to get started contributing to that gear,
checkout [CONTRIBUTING.md](CONTRIBUTING.md).

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/flywheel-io/flywheel-apps/file-classifier.git",
    "name": "fw-gear-file-classifier",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "Flywheel, Gears, Classification",
    "author": "Flywheel",
    "author_email": "support@flywheel.io",
    "download_url": null,
    "platform": null,
    "description": "<!-- markdownlint-disable code-block-style -->\n# File Classifier\n\n[[_TOC_]]\n\n## Overview\n\nThe file classifier gear provides a gear interface to the\n[fw-classification](https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification)\ntoolkit and is essentially just a wrapper around fw-classification.\n\nFor documentation on classification in general, please consult the\n[fw-classification documentation](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/)\n\nThe file classifier gear uses the file type of the provided input to determine\nthe proper\n[adapter](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/fw-classification/adapters/)\nto use.\n\n## Supported file types\n\nCurrently the gear supports classification of the following file types:\n\n* `dicom`: via the `file.info.header.dicom` namespace which is populated using\nthe\n[file-metadata-importer](https://gitlab.com/flywheel-io/flywheel-apps/file-metadata-importer)\ngear.\n* `nifti`: via a `json` sidecar which is found in the same container as the\ninput.\n\n## Usage\n\n### Prerequisites\n\n#### Metadata\n\nIn general, since fw-classification acts on input metadata, the input file needs to have\nit's metadata populated before running file-classifier. The metadata can live in a few\nplaces depending on how the file will be classified.  The most common would be in the\n`file.info.header.<file-type>` which will be populated by `file-metadata-importer`.  But\nthe metadata can also be in a separate file such as the `sidecar.json` for NIfTIs, or in\nthe hierarchy such as acquisition label, file name, or custom information on any parent\ncontainer.\n\n#### Profile\n\nfile-classifier ships with [default\nprofiles](https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification-profiles/-/tree/main/profiles)\nbut the gear also accepts an input profile.  If you have custom needs beyond what is in\nthe default profile, you will need to override the default profiles. See [Custom\nClassifications](#custom-classifications)\n\n### Inputs\n\n* __file-input__: The file to classify\n* __profile__: Optional profile to use for classification, if passed in, this\nwill override the default classification profile and use what was passed in.\nSee documentation for creating a profile at the\n[classification-toolkit\ndocs](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/fw-classification/profile/)\n* __classifications__: An optional list of context classifications set at the\nproject level, see\n[Setting custom classifications](#custom-classifications).  These\nclassifications are added as the final block to the profile that is being\nused to classify, therefore they get highest priority.\n\n### Configuration\n\n* __debug__ (boolean, default False): Include debug statements in output.\n* __tag__ (str, default 'file-classifier'): String to tag the file after\nclassification. Useful for gear-rule pipelines triggered by tags.\n\n### Which profile will be used?\n\nThe priority for determining which profile will be used is as so:\n\n1. Profile passed in via the optional _input_ `profile`\n2. Default profile `main.yml` described in the\n[classification-profiles](https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification-profiles)\nrepo.\n\nThe profile being used will be printed out at the beginning of the gear.\n\n!!! note\n    _After_ the profile has been determined, context classifications will be\n    added as a block to that profile, i.e. context-classifications _always_ have the\n    highest priority.\n\n## Custom Classifications\n\nOften the default profile will not have specific enough classification for a specific\nproject.  If you need to add custom classifications, there are two main ways to pass\nthem in:\n\n1. Create a profile and attach it to your project\n2. Add custom classifications to the project custom information.\n\n### Create a profile\n\nThis is a better option if you will use these same custom classifications on multiple\nprojects.\n\n__WARNING__:\n\n> Creating a profile and passing it in as input will _completely_ bypass the already\n> pre-defined classifications, so if you want to keep those, you will need to either copy\n> them, or include as a git profile:\n\nFor example, to add a custom classification of `Deleted` when Protocol Name has been\ndeleted:\n\n```yaml\n---\nname: Custom classifier\nincludes:\n  # Include default MR\n  - https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification-profiles$profiles/MR.yaml\n\nprofile:\n  - name: set_custom_deleted\n    description: |\n      Set custom deleted classification if ProtocolName was deleted\n    rules:\n      - match_type: 'all'\n        match:\n          - key: file.type\n            is: dicom\n          - key: file.info.header.dicom.ProtocolName\n            is: 'Deleted'\n        action:\n          - key: file.classification.Custom\n            add: 'Deleted'\n```\n\n### Add custom classifications to project information\n\nCustom classification can be added to project information.  These can be added either\nvia the SDK or UI, and they follow the same structure as a `fw-classification` profile\n[block](https://flywheel-io.gitlab.io/scientific-solutions/lib/fw-classification/fw-classification/profile/#block).\n\n!!! note\n\n    Project information classifications are added _fter_ the profile has been\n    determined, context classifications will be added as a block to that profile, i.e.\n    context-classifications always have the highest priority.\n\nFor example, adding the same `ProtocolName` block via the SDK:\n\n```python\nimport flywheel\nfw = flywheel.Client()\nproj = fw.get_project(<proj_id>) # or use lookup()\nexisting_info = proj.info\n# Initialize context classifications if they don't exist\nexisting_info.setdefault('classifications', [])\nexisting_info['classifications'].append(\n    {\n        'match': [\n            {\n                'key': 'file.type',\n                'is': 'dicom',\n            },\n            {\n                'key': 'file.info.header.dicom.ProtocolName',\n                'is': 'deleted',\n            }\n        ],\n        'action': [\n            {'key': 'file.classification.Custom', 'add': 'Deleted'},\n        ]\n    }\n)\nproj.replace_info(existing_info)\n```\n\nThe gear will then record that it found these custom classifications in the job logs:\n\n```bash\n...\n[552ms   INFO     ]  Log level is INFO\n[552ms   INFO     ]  Using default profile 'main.yml'\n[1152ms   INFO     ]  Looking for custom classifications in project Q1_Q2_2022\n[1152ms   INFO     ]  Found custom classification in project context, parsed as:\n\nIf all of () executed, then execute the first match of the following:\n\n        -------------------- Rule 0 --------------------\n        Match if Any are True:\n                - file.type is dicom\n                - file.info.header.dicom.ProtocolName is deleted\n\n        Do the following:\n                - add Deleted to file.classification.Custom\n\n[1152ms   INFO     ]  Starting classification.\n[1380ms   INFO     ]  Running at acquisition level\n...\n```\n\nYou can also add these values via the UI:\n\n![Custom Classifications](./docs/images/custom-classifications-ui.png)\n\n## Contributing\n\nFor more information about how to get started contributing to that gear,\ncheckout [CONTRIBUTING.md](CONTRIBUTING.md).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generic File Classifier",
    "version": "0.6.5",
    "project_urls": {
        "Homepage": "https://gitlab.com/flywheel-io/flywheel-apps/file-classifier.git",
        "Repository": "https://gitlab.com/flywheel-io/flywheel-apps/file-classifier.git"
    },
    "split_keywords": [
        "flywheel",
        " gears",
        " classification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b267e55e7aaf2f314a7a5dea6cdd9e0fe0d07e82ae32c5473cf938b3626a5984",
                "md5": "eacc02229498185b031217be3b4349ce",
                "sha256": "69a93c7ba0764935831c3961905ca2c30ee3e0438d563d0a2215413291b778fc"
            },
            "downloads": -1,
            "filename": "fw_gear_file_classifier-0.6.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eacc02229498185b031217be3b4349ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 8793,
            "upload_time": "2024-03-25T22:39:09",
            "upload_time_iso_8601": "2024-03-25T22:39:09.533769Z",
            "url": "https://files.pythonhosted.org/packages/b2/67/e55e7aaf2f314a7a5dea6cdd9e0fe0d07e82ae32c5473cf938b3626a5984/fw_gear_file_classifier-0.6.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-25 22:39:09",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "flywheel-io",
    "gitlab_project": "flywheel-apps",
    "lcname": "fw-gear-file-classifier"
}
        
Elapsed time: 0.20650s