# Dataset Management Framework (Datumaro)
[](https://github.com/openvinotoolkit/datumaro/actions/workflows/health_check.yml)
[](https://codecov.io/gh/openvinotoolkit/datumaro)
A framework and CLI tool to build, transform, and analyze datasets.
<!--lint disable fenced-code-flag-->
```
VOC dataset ---> Annotation tool
+ /
COCO dataset -----> Datumaro ---> dataset ------> Model training
+ \
CVAT annotations ---> Publication, statistics etc.
```
<!--lint enable fenced-code-flag-->
- [Getting started](https://openvinotoolkit.github.io/datumaro/latest/docs/get-started/quick-start-guide)
- [Level Up](https://openvinotoolkit.github.io/datumaro/latest/docs/level-up/basic_skills)
- [Features](#features)
- [User manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)
- [Developer manual](https://openvinotoolkit.github.io/datumaro/latest/docs/reference/datumaro_module)
- [Contributing](#contributing)
## Features
[(Back to top)](#dataset-management-framework-datumaro)
- Dataset reading, writing, conversion in any direction.
- [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) (`classification`)
- [Cityscapes](https://www.cityscapes-dataset.com/)
- [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`,
`captions`, `labels`, `panoptic`, `stuff`)
- [CVAT](https://opencv.github.io/cvat/docs/manual/advanced/xml_format/)
- [ImageNet](http://image-net.org/)
- [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`,
`3D raw` / `velodyne points`)
- [LabelMe](http://labelme.csail.mit.edu/Release3.0)
- [LFW](http://vis-www.cs.umass.edu/lfw/) (`classification`, `person re-identification`,
`landmarks`)
- [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)
- [Open Images](https://storage.googleapis.com/openimages/web/download.html)
- [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)
(`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
- [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md)
(`bboxes`, `masks`)
- [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
Other formats and documentation for them can be found [here](https://openvinotoolkit.github.io/datumaro/latest/docs/data-formats/formats).
- Dataset building
- Merging multiple datasets into one
- Dataset filtering by a custom criteria:
- remove polygons of a certain class
- remove images without annotations of a specific class
- remove `occluded` annotations from images
- keep only vertically-oriented images
- remove small area bounding boxes from annotations
- Annotation conversions, for instance:
- polygons to instance masks and vice-versa
- apply a custom colormap for mask annotations
- rename or remove dataset labels
- Splitting a dataset into multiple subsets like `train`, `val`, and `test`:
- random split
- task-specific splits based on annotations,
which keep initial label and attribute distributions
- for classification task, based on labels
- for detection task, based on bboxes
- for re-identification task, based on labels,
avoiding having same IDs in training and test splits
- Sampling a dataset
- analyzes inference result from the given dataset
and selects the ‘best’ and the ‘least amount of’ samples for annotation.
- Select the sample that best suits model training.
- sampling with Entropy based algorithm
- Dataset quality checking
- Simple checking for errors
- Comparison with model inference
- Merging and comparison of multiple datasets
- Annotation validation based on the task type(classification, etc)
- Dataset comparison
- Dataset statistics (image mean and std, annotation statistics)
- Model integration
- Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)
- Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))
- RISE for classification
- RISE for object detection
> Check
[the design document](https://openvinotoolkit.github.io/datumaro/latest/docs/explanation/architecture)
for a full list of features.
> Check
[the user manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)
for usage instructions.
## Contributing
[(Back to top)](#dataset-management-framework-datumaro)
Feel free to
[open an Issue](https://github.com/openvinotoolkit/datumaro/issues/new), if you
think something needs to be changed. You are welcome to participate in
development, instructions are available in our
[contribution guide](https://github.com/openvinotoolkit/datumaro/blob/develop/contributing.md).
## Telemetry data collection note
The [OpenVINO™ telemetry library](https://github.com/openvinotoolkit/telemetry/)
is used to collect basic information about Datumaro usage.
To enable/disable telemetry data collection please see the
[guide](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_control_tm_data_collection).
Raw data
{
"_id": null,
"home_page": "https://github.com/openvinotoolkit/datumaro",
"name": "datumaro",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Intel",
"author_email": "emily.chun@intel.com",
"download_url": "https://files.pythonhosted.org/packages/68/67/345c3aa37bb827ea228a6c2236c2e640e7d2fed40deb5bad34f45ac3bb6a/datumaro-1.9.1.tar.gz",
"platform": null,
"description": "# Dataset Management Framework (Datumaro)\n\n[](https://github.com/openvinotoolkit/datumaro/actions/workflows/health_check.yml)\n[](https://codecov.io/gh/openvinotoolkit/datumaro)\n\nA framework and CLI tool to build, transform, and analyze datasets.\n\n<!--lint disable fenced-code-flag-->\n```\nVOC dataset ---> Annotation tool\n + /\nCOCO dataset -----> Datumaro ---> dataset ------> Model training\n + \\\nCVAT annotations ---> Publication, statistics etc.\n```\n<!--lint enable fenced-code-flag-->\n\n- [Getting started](https://openvinotoolkit.github.io/datumaro/latest/docs/get-started/quick-start-guide)\n- [Level Up](https://openvinotoolkit.github.io/datumaro/latest/docs/level-up/basic_skills)\n- [Features](#features)\n- [User manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)\n- [Developer manual](https://openvinotoolkit.github.io/datumaro/latest/docs/reference/datumaro_module)\n- [Contributing](#contributing)\n\n## Features\n\n[(Back to top)](#dataset-management-framework-datumaro)\n\n- Dataset reading, writing, conversion in any direction.\n - [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) (`classification`)\n - [Cityscapes](https://www.cityscapes-dataset.com/)\n - [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`,\n `captions`, `labels`, `panoptic`, `stuff`)\n - [CVAT](https://opencv.github.io/cvat/docs/manual/advanced/xml_format/)\n - [ImageNet](http://image-net.org/)\n - [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`,\n `3D raw` / `velodyne points`)\n - [LabelMe](http://labelme.csail.mit.edu/Release3.0)\n - [LFW](http://vis-www.cs.umass.edu/lfw/) (`classification`, `person re-identification`,\n `landmarks`)\n - [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)\n - [Open Images](https://storage.googleapis.com/openimages/web/download.html)\n - [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)\n (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)\n - [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md)\n (`bboxes`, `masks`)\n - [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)\n\n Other formats and documentation for them can be found [here](https://openvinotoolkit.github.io/datumaro/latest/docs/data-formats/formats).\n- Dataset building\n - Merging multiple datasets into one\n - Dataset filtering by a custom criteria:\n - remove polygons of a certain class\n - remove images without annotations of a specific class\n - remove `occluded` annotations from images\n - keep only vertically-oriented images\n - remove small area bounding boxes from annotations\n - Annotation conversions, for instance:\n - polygons to instance masks and vice-versa\n - apply a custom colormap for mask annotations\n - rename or remove dataset labels\n - Splitting a dataset into multiple subsets like `train`, `val`, and `test`:\n - random split\n - task-specific splits based on annotations,\n which keep initial label and attribute distributions\n - for classification task, based on labels\n - for detection task, based on bboxes\n - for re-identification task, based on labels,\n avoiding having same IDs in training and test splits\n - Sampling a dataset\n - analyzes inference result from the given dataset\n and selects the \u2018best\u2019 and the \u2018least amount of\u2019 samples for annotation.\n - Select the sample that best suits model training.\n - sampling with Entropy based algorithm\n- Dataset quality checking\n - Simple checking for errors\n - Comparison with model inference\n - Merging and comparison of multiple datasets\n - Annotation validation based on the task type(classification, etc)\n- Dataset comparison\n- Dataset statistics (image mean and std, annotation statistics)\n- Model integration\n - Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)\n - Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))\n - RISE for classification\n - RISE for object detection\n\n> Check\n [the design document](https://openvinotoolkit.github.io/datumaro/latest/docs/explanation/architecture)\n for a full list of features.\n> Check\n [the user manual](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_use_datumaro)\n for usage instructions.\n\n## Contributing\n\n[(Back to top)](#dataset-management-framework-datumaro)\n\nFeel free to\n[open an Issue](https://github.com/openvinotoolkit/datumaro/issues/new), if you\nthink something needs to be changed. You are welcome to participate in\ndevelopment, instructions are available in our\n[contribution guide](https://github.com/openvinotoolkit/datumaro/blob/develop/contributing.md).\n\n## Telemetry data collection note\n\nThe [OpenVINO\u2122 telemetry library](https://github.com/openvinotoolkit/telemetry/)\nis used to collect basic information about Datumaro usage.\n\nTo enable/disable telemetry data collection please see the\n[guide](https://openvinotoolkit.github.io/datumaro/latest/docs/user-manual/how_to_control_tm_data_collection).\n",
"bugtrack_url": null,
"license": null,
"summary": "Dataset Management Framework (Datumaro)",
"version": "1.9.1",
"project_urls": {
"Homepage": "https://github.com/openvinotoolkit/datumaro"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "27fee9b2093a1c2c3c43cb5f7a1d72e4aa771c533a1cfd6e162ee6b238ab7456",
"md5": "8591fb872ac3715746b4da1838af4525",
"sha256": "baf277697ff0de0bb44929b91e423aa2a6e08c00e494982674e8c99a6419bb8a"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "8591fb872ac3715746b4da1838af4525",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 1144732,
"upload_time": "2024-09-30T00:03:46",
"upload_time_iso_8601": "2024-09-30T00:03:46.161642Z",
"url": "https://files.pythonhosted.org/packages/27/fe/e9b2093a1c2c3c43cb5f7a1d72e4aa771c533a1cfd6e162ee6b238ab7456/datumaro-1.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3c25ffa9947837abc7d490d581fc1aad4ce3fab792244b31ab56f7557ab0bcd2",
"md5": "4982c5134f8061ed66903f8a13790480",
"sha256": "d4cf82fc495c377e39042cd82f983a206ab49b63baa5645b1e5ec90886536a9b"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp310-cp310-musllinux_1_1_x86_64.whl",
"has_sig": false,
"md5_digest": "4982c5134f8061ed66903f8a13790480",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 1661216,
"upload_time": "2024-09-30T00:03:48",
"upload_time_iso_8601": "2024-09-30T00:03:48.120722Z",
"url": "https://files.pythonhosted.org/packages/3c/25/ffa9947837abc7d490d581fc1aad4ce3fab792244b31ab56f7557ab0bcd2/datumaro-1.9.1-cp310-cp310-musllinux_1_1_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5e035a4be7c081f5d7c0b7da03484d6d165a34c3d9d802179a8f617d3b372ca4",
"md5": "8f8339613b8fbc998f349be7f2056bf0",
"sha256": "8f0646009055193e24939d8dc9e41a0ac02d34734f3d08eacfd57e0d6516edb5"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp310-cp310-win_amd64.whl",
"has_sig": false,
"md5_digest": "8f8339613b8fbc998f349be7f2056bf0",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 961298,
"upload_time": "2024-09-30T00:03:50",
"upload_time_iso_8601": "2024-09-30T00:03:50.145805Z",
"url": "https://files.pythonhosted.org/packages/5e/03/5a4be7c081f5d7c0b7da03484d6d165a34c3d9d802179a8f617d3b372ca4/datumaro-1.9.1-cp310-cp310-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "880844aecfa14d0c7a9dc40c54ac7c75b1b9e396aff76455cabaeb67c73beeac",
"md5": "1513fe9b9e797d73132f1396bf9b5968",
"sha256": "bd7c19fee15908769b179c7eb8c3ce09dc00882b7c2f7ba8fb542a1ce8b2ded8"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "1513fe9b9e797d73132f1396bf9b5968",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 1146681,
"upload_time": "2024-09-30T00:03:51",
"upload_time_iso_8601": "2024-09-30T00:03:51.815026Z",
"url": "https://files.pythonhosted.org/packages/88/08/44aecfa14d0c7a9dc40c54ac7c75b1b9e396aff76455cabaeb67c73beeac/datumaro-1.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a711aa9dea18c498606bcdd1e846c0e152da5f1793d4ef52b84542a3a7869a84",
"md5": "52bf4a45a2b8d747b1ae8de203f45a2f",
"sha256": "5da9d23d9229f37ae11f3bf0ee039b7936eb53742b763772e1675308744dc691"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp311-cp311-musllinux_1_1_x86_64.whl",
"has_sig": false,
"md5_digest": "52bf4a45a2b8d747b1ae8de203f45a2f",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 1662182,
"upload_time": "2024-09-30T00:03:54",
"upload_time_iso_8601": "2024-09-30T00:03:54.088380Z",
"url": "https://files.pythonhosted.org/packages/a7/11/aa9dea18c498606bcdd1e846c0e152da5f1793d4ef52b84542a3a7869a84/datumaro-1.9.1-cp311-cp311-musllinux_1_1_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "89f5f33098388e8353b046aa87e945e30b68f69ffa944eb9045637742c4dce8c",
"md5": "b38b282f5a90de888c830889d17f19a6",
"sha256": "205b8d83b51486ecf72e3cc8ee0200139f4e8bd9a9cceb861fd067c0d5c3751f"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp311-cp311-win_amd64.whl",
"has_sig": false,
"md5_digest": "b38b282f5a90de888c830889d17f19a6",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 962428,
"upload_time": "2024-09-30T00:03:55",
"upload_time_iso_8601": "2024-09-30T00:03:55.696510Z",
"url": "https://files.pythonhosted.org/packages/89/f5/f33098388e8353b046aa87e945e30b68f69ffa944eb9045637742c4dce8c/datumaro-1.9.1-cp311-cp311-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ee2a79b07e1d53537b9fb20bc61070a863e9507ef6e985d86265107c498a71b3",
"md5": "a31a6ddd5f8807523323c01934acc632",
"sha256": "493bf9995675c4657f40cab6665b389676dc80088be7b888a0e02cf9eae65538"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "a31a6ddd5f8807523323c01934acc632",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 1145243,
"upload_time": "2024-09-30T00:03:56",
"upload_time_iso_8601": "2024-09-30T00:03:56.931864Z",
"url": "https://files.pythonhosted.org/packages/ee/2a/79b07e1d53537b9fb20bc61070a863e9507ef6e985d86265107c498a71b3/datumaro-1.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d47f54eebdd2ec9f225a1a41472606cd394617774bcfc9eb2e929049e4058eb0",
"md5": "5305a388cf65cca1650313e4766c329d",
"sha256": "3e116750a0f3cc78fa9b5305e70b201818765c07fc97758d136746fe737bb693"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp39-cp39-musllinux_1_1_x86_64.whl",
"has_sig": false,
"md5_digest": "5305a388cf65cca1650313e4766c329d",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 1661357,
"upload_time": "2024-09-30T00:03:58",
"upload_time_iso_8601": "2024-09-30T00:03:58.644069Z",
"url": "https://files.pythonhosted.org/packages/d4/7f/54eebdd2ec9f225a1a41472606cd394617774bcfc9eb2e929049e4058eb0/datumaro-1.9.1-cp39-cp39-musllinux_1_1_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "967e5d9b9634579b7199e596c72d3c486e00e10e822b375714c929eb9d92c966",
"md5": "2629079a73c3547000da8e1d797f363a",
"sha256": "15fe86e521f72916f5aa5abb05fab41600acd48e77e07e46d0313ca221bf22b6"
},
"downloads": -1,
"filename": "datumaro-1.9.1-cp39-cp39-win_amd64.whl",
"has_sig": false,
"md5_digest": "2629079a73c3547000da8e1d797f363a",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 961470,
"upload_time": "2024-09-30T00:04:00",
"upload_time_iso_8601": "2024-09-30T00:04:00.474374Z",
"url": "https://files.pythonhosted.org/packages/96/7e/5d9b9634579b7199e596c72d3c486e00e10e822b375714c929eb9d92c966/datumaro-1.9.1-cp39-cp39-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6867345c3aa37bb827ea228a6c2236c2e640e7d2fed40deb5bad34f45ac3bb6a",
"md5": "5fe9df5c91d1459aa1c55d5372d96cc3",
"sha256": "a2b3dbd54ced6b2da6c882a38f21f29de3219e14dd362c4b63902a85278f21f5"
},
"downloads": -1,
"filename": "datumaro-1.9.1.tar.gz",
"has_sig": false,
"md5_digest": "5fe9df5c91d1459aa1c55d5372d96cc3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 569309,
"upload_time": "2024-09-30T00:04:01",
"upload_time_iso_8601": "2024-09-30T00:04:01.677738Z",
"url": "https://files.pythonhosted.org/packages/68/67/345c3aa37bb827ea228a6c2236c2e640e7d2fed40deb5bad34f45ac3bb6a/datumaro-1.9.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-30 00:04:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "openvinotoolkit",
"github_project": "datumaro",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "datumaro"
}