dtlpymetrics

Name	dtlpymetrics JSON
Version	1.2.16 JSON
	download
home_page	None
Summary	Scoring and metrics app
upload_time	2025-01-12 11:14:17
maintainer	None
docs_url	None
author	Dataloop Team
requires_python	None
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Scoring and metrics app


## Description

This app improves the efficiency of annotating data by improving annotation quality while reducing the time required to
produce them.

The components of this app are:

1. Functions to calculate scores for quality tasks and model predictions.
2. Custom nodes that can be added to pipelines to calculate scores when a task's quality items are completed.

Also see [this notebook](docs/metrics.ipynb) for more information

## Quality Flows
To understand more about each of these tasks, refer to the main Dataloop documentation linked:

1. [Qualification tasks](https://docs.dataloop.ai/docs/en/qualification-honeypot)
2. [Honeypot tasks](https://docs.dataloop.ai/docs/en/qualification-honeypot#honeypot)
3. [Consensus tasks](https://docs.dataloop.ai/docs/consensus)


In general, an annotator will receive an assignment to complete their annotation task. For a given item in a consensus task, 
each assignment will be cross-compared with every other assignment. In the case of qualification and honeypot tasks, 
each item will only have one assignment associated with it. 

### What's Supported?

Supported file types:
- image
- video

Scoring is currently supported for quality tasks with the following annotation types (with geometry score method in parentheses, where applicable):
- classification
- bounding box (IOU)
- polygon (IOU)
- segmentation (IOU)
- point (distance)


## Score Types

During scoring, the following scores will be created for each annotation:

- `raw_annotation_scores` -  for each annotation comparison we have `geometry`, `label` and `attribute` matching scores
- `annotation_overall` - the mean of each annotation’s raw scores
- `user_confusion_score` - the mean of every annotation overall score, relative to ref or another assignee
- `item_confusion_score` - the count of the number of label pairs associated with the assignee’s label, relative to the reference’s label
- `item_overall_score` - the mean value of *each* annotation overall score associated with an item

#### 1) Raw annotation scores: 
There are three types of scores for annotations: `annotation_iou`, `annotation_label` and `annotation_attribute`.  
These scores can be determined by the user, and the default is to include all three scores, and the default value is 1 (which can be modified).


#### 2) Annotation overall

For `annotation_overall` score we calculate the mean value for all raw annotation scores per annotation. 

#### 3) User confusion score

The `user_confusion` score represents the mean annotation score a given assignee has, relative to raw scores when comparing it to another set of annotations (either the reference or another assignee). 

#### 4) Label confusion score

The `label_confusion` score represents the count for a label annotated by a given assignee, relative to label each label class in the other set of annotations (either reference or another assignee).

#### 5) Item overall score

The `item_overall` score is the mean value of all annotations associated with an item, averaging the mean overall annotation score.

\
Any calculated and uploaded scores will replace any previous scores for all items of a given task.

_Note about videos_: Video scores will differ slightly from image scores. Video scores are calculated frame by frame, and then specific annotation scores will be the average of these scores across all relevant frames for that specific annotation. Confusion scores are not calculated due to the multi-frame nature of videos. Item overall scores remain an average of all annotations of the video item.

## Confusion Example

There are generally two kinds of scores: regular scores, and “confusion” scores. 

Regular scores show the level of agreement or overlap between two sets of annotations. They use the ID of the entities being compared for the `entityID` and `relative` fields. This can be for comparing annotations or items. `value` will typically be a number between 0 and 1. 


There are two types of confusion scores: item label confusion, and user confusion. **Item label confusion** shows the number of instances in which an assignee’s label corresponds with the ground truth labels. 

_Ground truth annotations_:

![Cat v dog](assets/cat_dog_annotations_1.png)

`item = dl.items.dl(item_id='64c0fc0730b03f27ca3a58db')`

_Assignee annotations_:

![Cat v dog](assets/cat_dog_annotations_2.png)

`item = dl.items.dl(item_id='64c0f2e1ec9103d52eaedbe2')`


In this example item, the ground truth has 3 for each cat and dog class. The assignee however, labels 1 as cat and 5 as dog. This would result in the following item label confusion scores:

```python
{
        "type": "label_confusion",
        "value": 1,
        "entityId": "cat",
        "context": {
            "relative": "cat",
            "taskId": "<TASK_ID>",
            "itemId": "<ITEM_ID>",
            "datasetId": "<DATASET_ID>"
        }
},
{
        "type": "label_confusion",
        "value": 3,
        "entityId": "dog",
        "context": {
            "relative": "dog",
            "taskId": "<TASK_ID>",
            "itemId": "<ITEM_ID>",
            "datasetId": "<DATASET_ID>"
        }
},
{
        "type": "label_confusion",
        "value": 2,
        "entityId": "dog",
        "context": {
            "relative": "cat",
            "taskId": "<TASK_ID>",
            "itemId": "<ITEM_ID>",
            "datasetId": "<DATASET_ID>"
        }
}
```

## Python installation

```shell
pip install dtlpymetrics
```

## Functions

See [this page](docs/dtlpymetrics_fxns.md) for details on additional functions.

## Contributions, Bugs and Issues - How to Contribute

We welcome anyone to help us improve this app.  
[Here](CONTRIBUTING.md) are detailed instructions to help you open a bug or ask for a feature request

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dtlpymetrics",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Dataloop Team",
    "author_email": "yaya.t@dataloop.ai",
    "download_url": null,
    "platform": null,
    "description": "# Scoring and metrics app\r\n\r\n\r\n## Description\r\n\r\nThis app improves the efficiency of annotating data by improving annotation quality while reducing the time required to\r\nproduce them.\r\n\r\nThe components of this app are:\r\n\r\n1. Functions to calculate scores for quality tasks and model predictions.\r\n2. Custom nodes that can be added to pipelines to calculate scores when a task's quality items are completed.\r\n\r\nAlso see [this notebook](docs/metrics.ipynb) for more information\r\n\r\n## Quality Flows\r\nTo understand more about each of these tasks, refer to the main Dataloop documentation linked:\r\n\r\n1. [Qualification tasks](https://docs.dataloop.ai/docs/en/qualification-honeypot)\r\n2. [Honeypot tasks](https://docs.dataloop.ai/docs/en/qualification-honeypot#honeypot)\r\n3. [Consensus tasks](https://docs.dataloop.ai/docs/consensus)\r\n\r\n\r\nIn general, an annotator will receive an assignment to complete their annotation task. For a given item in a consensus task, \r\neach assignment will be cross-compared with every other assignment. In the case of qualification and honeypot tasks, \r\neach item will only have one assignment associated with it. \r\n\r\n### What's Supported?\r\n\r\nSupported file types:\r\n- image\r\n- video\r\n\r\nScoring is currently supported for quality tasks with the following annotation types (with geometry score method in parentheses, where applicable):\r\n- classification\r\n- bounding box (IOU)\r\n- polygon (IOU)\r\n- segmentation (IOU)\r\n- point (distance)\r\n\r\n\r\n## Score Types\r\n\r\nDuring scoring, the following scores will be created for each annotation:\r\n\r\n- `raw_annotation_scores` -  for each annotation comparison we have `geometry`, `label` and `attribute` matching scores\r\n- `annotation_overall` - the mean of each annotation\u2019s raw scores\r\n- `user_confusion_score` - the mean of every annotation overall score, relative to ref or another assignee\r\n- `item_confusion_score` - the count of the number of label pairs associated with the assignee\u2019s label, relative to the reference\u2019s label\r\n- `item_overall_score` - the mean value of *each* annotation overall score associated with an item\r\n\r\n#### 1) Raw annotation scores: \r\nThere are three types of scores for annotations: `annotation_iou`, `annotation_label` and `annotation_attribute`.  \r\nThese scores can be determined by the user, and the default is to include all three scores, and the default value is 1 (which can be modified).\r\n\r\n\r\n#### 2) Annotation overall\r\n\r\nFor `annotation_overall` score we calculate the mean value for all raw annotation scores per annotation. \r\n\r\n#### 3) User confusion score\r\n\r\nThe `user_confusion` score represents the mean annotation score a given assignee has, relative to raw scores when comparing it to another set of annotations (either the reference or another assignee). \r\n\r\n#### 4) Label confusion score\r\n\r\nThe `label_confusion` score represents the count for a label annotated by a given assignee, relative to label each label class in the other set of annotations (either reference or another assignee).\r\n\r\n#### 5) Item overall score\r\n\r\nThe `item_overall` score is the mean value of all annotations associated with an item, averaging the mean overall annotation score.\r\n\r\n\\\r\nAny calculated and uploaded scores will replace any previous scores for all items of a given task.\r\n\r\n_Note about videos_: Video scores will differ slightly from image scores. Video scores are calculated frame by frame, and then specific annotation scores will be the average of these scores across all relevant frames for that specific annotation. Confusion scores are not calculated due to the multi-frame nature of videos. Item overall scores remain an average of all annotations of the video item.\r\n\r\n## Confusion Example\r\n\r\nThere are generally two kinds of scores: regular scores, and \u201cconfusion\u201d scores. \r\n\r\nRegular scores show the level of agreement or overlap between two sets of annotations. They use the ID of the entities being compared for the `entityID` and `relative` fields. This can be for comparing annotations or items. `value` will typically be a number between 0 and 1. \r\n\r\n\r\nThere are two types of confusion scores: item label confusion, and user confusion. **Item label confusion** shows the number of instances in which an assignee\u2019s label corresponds with the ground truth labels. \r\n\r\n_Ground truth annotations_:\r\n\r\n![Cat v dog](assets/cat_dog_annotations_1.png)\r\n\r\n`item = dl.items.dl(item_id='64c0fc0730b03f27ca3a58db')`\r\n\r\n_Assignee annotations_:\r\n\r\n![Cat v dog](assets/cat_dog_annotations_2.png)\r\n\r\n`item = dl.items.dl(item_id='64c0f2e1ec9103d52eaedbe2')`\r\n\r\n\r\nIn this example item, the ground truth has 3 for each cat and dog class. The assignee however, labels 1 as cat and 5 as dog. This would result in the following item label confusion scores:\r\n\r\n```python\r\n{\r\n        \"type\": \"label_confusion\",\r\n        \"value\": 1,\r\n        \"entityId\": \"cat\",\r\n        \"context\": {\r\n            \"relative\": \"cat\",\r\n            \"taskId\": \"<TASK_ID>\",\r\n            \"itemId\": \"<ITEM_ID>\",\r\n            \"datasetId\": \"<DATASET_ID>\"\r\n        }\r\n},\r\n{\r\n        \"type\": \"label_confusion\",\r\n        \"value\": 3,\r\n        \"entityId\": \"dog\",\r\n        \"context\": {\r\n            \"relative\": \"dog\",\r\n            \"taskId\": \"<TASK_ID>\",\r\n            \"itemId\": \"<ITEM_ID>\",\r\n            \"datasetId\": \"<DATASET_ID>\"\r\n        }\r\n},\r\n{\r\n        \"type\": \"label_confusion\",\r\n        \"value\": 2,\r\n        \"entityId\": \"dog\",\r\n        \"context\": {\r\n            \"relative\": \"cat\",\r\n            \"taskId\": \"<TASK_ID>\",\r\n            \"itemId\": \"<ITEM_ID>\",\r\n            \"datasetId\": \"<DATASET_ID>\"\r\n        }\r\n}\r\n```\r\n\r\n## Python installation\r\n\r\n```shell\r\npip install dtlpymetrics\r\n```\r\n\r\n## Functions\r\n\r\nSee [this page](docs/dtlpymetrics_fxns.md) for details on additional functions.\r\n\r\n## Contributions, Bugs and Issues - How to Contribute\r\n\r\nWe welcome anyone to help us improve this app.  \r\n[Here](CONTRIBUTING.md) are detailed instructions to help you open a bug or ask for a feature request\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Scoring and metrics app",
    "version": "1.2.16",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b20298035d92d2f2e67d2046136a1b12f03345db8ad6ac28b0062cf7d08607f",
                "md5": "f2ddb90bf0cfcc9329831c8d731cbcce",
                "sha256": "14b3bcb71646eb88d88b4fad42a50a78c76cb41a9514b4f6791450406dc9cb4b"
            },
            "downloads": -1,
            "filename": "dtlpymetrics-1.2.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f2ddb90bf0cfcc9329831c8d731cbcce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 36966,
            "upload_time": "2025-01-12T11:14:17",
            "upload_time_iso_8601": "2025-01-12T11:14:17.734434Z",
            "url": "https://files.pythonhosted.org/packages/9b/20/298035d92d2f2e67d2046136a1b12f03345db8ad6ac28b0062cf7d08607f/dtlpymetrics-1.2.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-12 11:14:17",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "dtlpymetrics"
}

Dataloop Team