datumaro


Namedatumaro JSON
Version 1.6.0 PyPI version JSON
download
home_pagehttps://github.com/openvinotoolkit/datumaro
SummaryDataset Management Framework (Datumaro)
upload_time2024-04-12 07:27:50
maintainerNone
docs_urlNone
authorIntel
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Dataset Management Framework (Datumaro)

[![Build status](https://github.com/openvinotoolkit/datumaro/actions/workflows/health_check.yml/badge.svg)](https://github.com/openvinotoolkit/datumaro/actions/workflows/health_check.yml)
[![codecov](https://codecov.io/gh/openvinotoolkit/datumaro/branch/develop/graph/badge.svg?token=FG25VU096Q)](https://codecov.io/gh/openvinotoolkit/datumaro)

A framework and CLI tool to build, transform, and analyze datasets.

<!--lint disable fenced-code-flag-->
```
VOC dataset                                  ---> Annotation tool
     +                                     /
COCO dataset -----> Datumaro ---> dataset ------> Model training
     +                                     \
CVAT annotations                             ---> Publication, statistics etc.
```
<!--lint enable fenced-code-flag-->

- [Getting started](https://openvinotoolkit.github.io/datumaro/latest/docs/get-started/quick-start-guide)
- [Level Up](https://openvinotoolkit.github.io/datumaro/latest/docs/level-up/basic_skills)
- [Features](#features)
- [User manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)
- [Developer manual](https://openvinotoolkit.github.io/datumaro/latest/docs/reference/datumaro_module)
- [Contributing](#contributing)

## Features

[(Back to top)](#dataset-management-framework-datumaro)

- Dataset reading, writing, conversion in any direction.
  - [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) (`classification`)
  - [Cityscapes](https://www.cityscapes-dataset.com/)
  - [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`,
    `captions`, `labels`, `panoptic`, `stuff`)
  - [CVAT](https://opencv.github.io/cvat/docs/manual/advanced/xml_format/)
  - [ImageNet](http://image-net.org/)
  - [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`,
    `3D raw` / `velodyne points`)
  - [LabelMe](http://labelme.csail.mit.edu/Release3.0)
  - [LFW](http://vis-www.cs.umass.edu/lfw/) (`classification`, `person re-identification`,
    `landmarks`)
  - [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)
  - [Open Images](https://storage.googleapis.com/openimages/web/download.html)
  - [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)
    (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
  - [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md)
    (`bboxes`, `masks`)
  - [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)

  Other formats and documentation for them can be found [here](https://openvinotoolkit.github.io/datumaro/latest/docs/data-formats/formats).
- Dataset building
  - Merging multiple datasets into one
  - Dataset filtering by a custom criteria:
    - remove polygons of a certain class
    - remove images without annotations of a specific class
    - remove `occluded` annotations from images
    - keep only vertically-oriented images
    - remove small area bounding boxes from annotations
  - Annotation conversions, for instance:
    - polygons to instance masks and vice-versa
    - apply a custom colormap for mask annotations
    - rename or remove dataset labels
  - Splitting a dataset into multiple subsets like `train`, `val`, and `test`:
    - random split
    - task-specific splits based on annotations,
      which keep initial label and attribute distributions
      - for classification task, based on labels
      - for detection task, based on bboxes
      - for re-identification task, based on labels,
        avoiding having same IDs in training and test splits
  - Sampling a dataset
    - analyzes inference result from the given dataset
      and selects the ‘best’ and the ‘least amount of’ samples for annotation.
    - Select the sample that best suits model training.
      - sampling with Entropy based algorithm
- Dataset quality checking
  - Simple checking for errors
  - Comparison with model inference
  - Merging and comparison of multiple datasets
  - Annotation validation based on the task type(classification, etc)
- Dataset comparison
- Dataset statistics (image mean and std, annotation statistics)
- Model integration
  - Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)
  - Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))
    - RISE for classification
    - RISE for object detection

> Check
  [the design document](https://openvinotoolkit.github.io/datumaro/latest/docs/explanation/architecture)
  for a full list of features.
> Check
  [the user manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)
  for usage instructions.

## Contributing

[(Back to top)](#dataset-management-framework-datumaro)

Feel free to
[open an Issue](https://github.com/openvinotoolkit/datumaro/issues/new), if you
think something needs to be changed. You are welcome to participate in
development, instructions are available in our
[contribution guide](https://github.com/openvinotoolkit/datumaro/blob/develop/contributing.md).

## Telemetry data collection note

The [OpenVINO™ telemetry library](https://github.com/openvinotoolkit/telemetry/)
is used to collect basic information about Datumaro usage.

To enable/disable telemetry data collection please see the
[guide](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_control_tm_data_collection).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/openvinotoolkit/datumaro",
    "name": "datumaro",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Intel",
    "author_email": "emily.chun@intel.com",
    "download_url": "https://files.pythonhosted.org/packages/24/91/b14fc92d19b536230cc8a98ef8caa4d8a41dcf37afd4aff0386dd86d76e1/datumaro-1.6.0.tar.gz",
    "platform": null,
    "description": "# Dataset Management Framework (Datumaro)\n\n[![Build status](https://github.com/openvinotoolkit/datumaro/actions/workflows/health_check.yml/badge.svg)](https://github.com/openvinotoolkit/datumaro/actions/workflows/health_check.yml)\n[![codecov](https://codecov.io/gh/openvinotoolkit/datumaro/branch/develop/graph/badge.svg?token=FG25VU096Q)](https://codecov.io/gh/openvinotoolkit/datumaro)\n\nA framework and CLI tool to build, transform, and analyze datasets.\n\n<!--lint disable fenced-code-flag-->\n```\nVOC dataset                                  ---> Annotation tool\n     +                                     /\nCOCO dataset -----> Datumaro ---> dataset ------> Model training\n     +                                     \\\nCVAT annotations                             ---> Publication, statistics etc.\n```\n<!--lint enable fenced-code-flag-->\n\n- [Getting started](https://openvinotoolkit.github.io/datumaro/latest/docs/get-started/quick-start-guide)\n- [Level Up](https://openvinotoolkit.github.io/datumaro/latest/docs/level-up/basic_skills)\n- [Features](#features)\n- [User manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)\n- [Developer manual](https://openvinotoolkit.github.io/datumaro/latest/docs/reference/datumaro_module)\n- [Contributing](#contributing)\n\n## Features\n\n[(Back to top)](#dataset-management-framework-datumaro)\n\n- Dataset reading, writing, conversion in any direction.\n  - [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) (`classification`)\n  - [Cityscapes](https://www.cityscapes-dataset.com/)\n  - [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`,\n    `captions`, `labels`, `panoptic`, `stuff`)\n  - [CVAT](https://opencv.github.io/cvat/docs/manual/advanced/xml_format/)\n  - [ImageNet](http://image-net.org/)\n  - [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`,\n    `3D raw` / `velodyne points`)\n  - [LabelMe](http://labelme.csail.mit.edu/Release3.0)\n  - [LFW](http://vis-www.cs.umass.edu/lfw/) (`classification`, `person re-identification`,\n    `landmarks`)\n  - [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)\n  - [Open Images](https://storage.googleapis.com/openimages/web/download.html)\n  - [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)\n    (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)\n  - [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md)\n    (`bboxes`, `masks`)\n  - [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)\n\n  Other formats and documentation for them can be found [here](https://openvinotoolkit.github.io/datumaro/latest/docs/data-formats/formats).\n- Dataset building\n  - Merging multiple datasets into one\n  - Dataset filtering by a custom criteria:\n    - remove polygons of a certain class\n    - remove images without annotations of a specific class\n    - remove `occluded` annotations from images\n    - keep only vertically-oriented images\n    - remove small area bounding boxes from annotations\n  - Annotation conversions, for instance:\n    - polygons to instance masks and vice-versa\n    - apply a custom colormap for mask annotations\n    - rename or remove dataset labels\n  - Splitting a dataset into multiple subsets like `train`, `val`, and `test`:\n    - random split\n    - task-specific splits based on annotations,\n      which keep initial label and attribute distributions\n      - for classification task, based on labels\n      - for detection task, based on bboxes\n      - for re-identification task, based on labels,\n        avoiding having same IDs in training and test splits\n  - Sampling a dataset\n    - analyzes inference result from the given dataset\n      and selects the \u2018best\u2019 and the \u2018least amount of\u2019 samples for annotation.\n    - Select the sample that best suits model training.\n      - sampling with Entropy based algorithm\n- Dataset quality checking\n  - Simple checking for errors\n  - Comparison with model inference\n  - Merging and comparison of multiple datasets\n  - Annotation validation based on the task type(classification, etc)\n- Dataset comparison\n- Dataset statistics (image mean and std, annotation statistics)\n- Model integration\n  - Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)\n  - Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))\n    - RISE for classification\n    - RISE for object detection\n\n> Check\n  [the design document](https://openvinotoolkit.github.io/datumaro/latest/docs/explanation/architecture)\n  for a full list of features.\n> Check\n  [the user manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)\n  for usage instructions.\n\n## Contributing\n\n[(Back to top)](#dataset-management-framework-datumaro)\n\nFeel free to\n[open an Issue](https://github.com/openvinotoolkit/datumaro/issues/new), if you\nthink something needs to be changed. You are welcome to participate in\ndevelopment, instructions are available in our\n[contribution guide](https://github.com/openvinotoolkit/datumaro/blob/develop/contributing.md).\n\n## Telemetry data collection note\n\nThe [OpenVINO\u2122 telemetry library](https://github.com/openvinotoolkit/telemetry/)\nis used to collect basic information about Datumaro usage.\n\nTo enable/disable telemetry data collection please see the\n[guide](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_control_tm_data_collection).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Dataset Management Framework (Datumaro)",
    "version": "1.6.0",
    "project_urls": {
        "Homepage": "https://github.com/openvinotoolkit/datumaro"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7e5df51696d1854b219cc8ef571d9ce14d5b3122726dee66d3536147cb4b7afd",
                "md5": "289887e8ed1759ee80c6e26208708fa8",
                "sha256": "3e14a8e5bea114ad06161aa8152f300931ccadbb3b8fca10e3fbe3c108b5e525"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "289887e8ed1759ee80c6e26208708fa8",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1107477,
            "upload_time": "2024-04-12T07:27:33",
            "upload_time_iso_8601": "2024-04-12T07:27:33.002225Z",
            "url": "https://files.pythonhosted.org/packages/7e/5d/f51696d1854b219cc8ef571d9ce14d5b3122726dee66d3536147cb4b7afd/datumaro-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "96ea37b380bcf40f8b5cb638e99a370506f882723b84daa54926d971abfe6663",
                "md5": "dfc244594614dcfb7e2250e012b0c053",
                "sha256": "bb110edddbdc76f47daa0056a8208893ab5e1dcef2f46f032a3a1a94d8f3ed8c"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp310-cp310-musllinux_1_1_x86_64.whl",
            "has_sig": false,
            "md5_digest": "dfc244594614dcfb7e2250e012b0c053",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1623310,
            "upload_time": "2024-04-12T07:27:35",
            "upload_time_iso_8601": "2024-04-12T07:27:35.506678Z",
            "url": "https://files.pythonhosted.org/packages/96/ea/37b380bcf40f8b5cb638e99a370506f882723b84daa54926d971abfe6663/datumaro-1.6.0-cp310-cp310-musllinux_1_1_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1bf2067c81075bdb159cb30f28f226a578b24f8b6cb496dc468b7f864fb45ba0",
                "md5": "609b27d65e0420f5a6b0ea90081619b5",
                "sha256": "0a024cd07719d98c4f0ae7100a7c92e03f2d75bcbe2125395b11c666732412cd"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp310-cp310-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "609b27d65e0420f5a6b0ea90081619b5",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 920338,
            "upload_time": "2024-04-12T07:27:37",
            "upload_time_iso_8601": "2024-04-12T07:27:37.640170Z",
            "url": "https://files.pythonhosted.org/packages/1b/f2/067c81075bdb159cb30f28f226a578b24f8b6cb496dc468b7f864fb45ba0/datumaro-1.6.0-cp310-cp310-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c84bfcce34171b80ff9cd7076cccf8ee55819f3cb09805cc98596412f667ea0c",
                "md5": "4312d4d3e330e546b6e82bece0c9b983",
                "sha256": "b45dd6c63ee9fd6371681d9f590952b3ed43229714faf11bbf7b915d57a35d27"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "4312d4d3e330e546b6e82bece0c9b983",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1109093,
            "upload_time": "2024-04-12T07:27:39",
            "upload_time_iso_8601": "2024-04-12T07:27:39.531954Z",
            "url": "https://files.pythonhosted.org/packages/c8/4b/fcce34171b80ff9cd7076cccf8ee55819f3cb09805cc98596412f667ea0c/datumaro-1.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a10993f30c7039963b9559e61c89293799aa5fb77b5296e3dd05a9010f71a91",
                "md5": "6615feee4052ebbdbac54c7e5e9ead7a",
                "sha256": "0423c63f56976065ea3e3f1f24eda36fee145ff4d9cb5e96c54ab46ef9782373"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp311-cp311-musllinux_1_1_x86_64.whl",
            "has_sig": false,
            "md5_digest": "6615feee4052ebbdbac54c7e5e9ead7a",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1624411,
            "upload_time": "2024-04-12T07:27:41",
            "upload_time_iso_8601": "2024-04-12T07:27:41.685683Z",
            "url": "https://files.pythonhosted.org/packages/9a/10/993f30c7039963b9559e61c89293799aa5fb77b5296e3dd05a9010f71a91/datumaro-1.6.0-cp311-cp311-musllinux_1_1_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3d82310ed2ba1a66735f5ef486e27a92c72bb31dcae6087fe9b3ab492d87f32b",
                "md5": "247b1ffbd0b873a87d9b81f4d5190e99",
                "sha256": "42e96bc945b93e48687998d39958a85729484ea2a4c39394fd764d22d685c835"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "247b1ffbd0b873a87d9b81f4d5190e99",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 921457,
            "upload_time": "2024-04-12T07:27:43",
            "upload_time_iso_8601": "2024-04-12T07:27:43.356997Z",
            "url": "https://files.pythonhosted.org/packages/3d/82/310ed2ba1a66735f5ef486e27a92c72bb31dcae6087fe9b3ab492d87f32b/datumaro-1.6.0-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a37f5788358101809b12d4c04f950f8b53948a66ff2a98d787f5bae104aa51d8",
                "md5": "7266d991a36be92899462c9a2f68026e",
                "sha256": "621f74b89ebece7d826b141fb6149adf21be814094bdaad7bf3605d0d5015561"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7266d991a36be92899462c9a2f68026e",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1107522,
            "upload_time": "2024-04-12T07:27:45",
            "upload_time_iso_8601": "2024-04-12T07:27:45.440491Z",
            "url": "https://files.pythonhosted.org/packages/a3/7f/5788358101809b12d4c04f950f8b53948a66ff2a98d787f5bae104aa51d8/datumaro-1.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a1621a335fdd6015f933e58f2f8e8786fcb3bbb6fe8ac7a4f14fe24d730e439",
                "md5": "7cdcfaa34d0c29013facde399396b98b",
                "sha256": "d35191e33a5e9ae2170478287c9ed4f8da8b7fda711807423bd5530b610af652"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp39-cp39-musllinux_1_1_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7cdcfaa34d0c29013facde399396b98b",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1623492,
            "upload_time": "2024-04-12T07:27:46",
            "upload_time_iso_8601": "2024-04-12T07:27:46.914798Z",
            "url": "https://files.pythonhosted.org/packages/9a/16/21a335fdd6015f933e58f2f8e8786fcb3bbb6fe8ac7a4f14fe24d730e439/datumaro-1.6.0-cp39-cp39-musllinux_1_1_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "11a701d610337baafdbc5d36fce7e308cc44ad9f0a0987019e80d1c1f9720f1d",
                "md5": "d44c0aafb02a2346be92146ee72a98e7",
                "sha256": "c4005561c8a4ccf6189b8b03810c0c4266dd433a0fff25088412a2fea8e15852"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0-cp39-cp39-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "d44c0aafb02a2346be92146ee72a98e7",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 920667,
            "upload_time": "2024-04-12T07:27:49",
            "upload_time_iso_8601": "2024-04-12T07:27:49.021361Z",
            "url": "https://files.pythonhosted.org/packages/11/a7/01d610337baafdbc5d36fce7e308cc44ad9f0a0987019e80d1c1f9720f1d/datumaro-1.6.0-cp39-cp39-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2491b14fc92d19b536230cc8a98ef8caa4d8a41dcf37afd4aff0386dd86d76e1",
                "md5": "3a7a6eb7fdf2005c5faebc7f688c3b46",
                "sha256": "6a6c3142e01e59c4be3bc3f8a6826995c9e610283e9a13bb2b404ffb7d6da3ed"
            },
            "downloads": -1,
            "filename": "datumaro-1.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3a7a6eb7fdf2005c5faebc7f688c3b46",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 535505,
            "upload_time": "2024-04-12T07:27:50",
            "upload_time_iso_8601": "2024-04-12T07:27:50.461979Z",
            "url": "https://files.pythonhosted.org/packages/24/91/b14fc92d19b536230cc8a98ef8caa4d8a41dcf37afd4aff0386dd86d76e1/datumaro-1.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-12 07:27:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "openvinotoolkit",
    "github_project": "datumaro",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "datumaro"
}
        
Elapsed time: 0.23183s