The **image-dataset-converter** library allows the conversion between
various dataset formats for image annotation datasets.
Filters can be supplied as well, e.g., for cleaning up the data.
Dataset formats:
- depth data: CSV (r/w), grayscale (r/w), numpy (r/w), PFM (r/w)
- image classification: ADAMS (r/w), sub-dir (r/w)
- image segmentation: blue-channel (r/w), grayscale (r/w), indexed PNG (r/w), layer segments (r/w)
- object detection: ADAMS (r/w), COCO (r/w), OPEX (r/w), ROI (r/w), VOC (r/w), YOLO (r/w)
Examples can be found here:
https://github.com/waikato-datamining/image-dataset-converter-examples
Changelog
=========
0.0.13 (2025-07-15)
-------------------
- requiring seppl>=0.2.20 now for improved help requests in `idc-convert` tool
0.0.12 (2025-07-11)
-------------------
- dropped numpy<2.0.0 restriction
- added `grayscale-to-binary` filter
- fix: `sort-pixels`, `rgb-to-grayscale` filters
- added `ensure_grayscale` and `grayscale_required_info` convenience methods (package: `idc.api`)
- added `ensure_binary` and `binary_required_info` convenience methods (package: `idc.api`)
- added `--dump_pipeline` option to `idc-convert` for saving the pipeline command
- the `rename` filter now supports lower/upper case placeholders of name and extension as well
- requiring seppl>=0.2.17 now for skippable plugin support and avoiding deprecated use of pkg_resources
- added `any-to-rgb` filter for turning binary/grayscale images back into RGB ones
- using `wai_common` instead of `wai.common` now
- requiring `fast_opex>=0.0.4` now
- added `label-to-metadata` filter for transferring labels into meta-data
- added `metadata-to-placeholder` filter for transferring meta-data into placeholders
- added basic support for images with associated depth information: `DepthData`, `DepthInformation`
- added `depth-to-grayscale` filter for converting depth information to grayscale image
- prefixed image segmentation methods like `from_bluechannel` and `to_bluechannel` with `imgseg_`
- added depth information readers `from-grayscale-dp`, `from-numpy-dp`, `from-csv-dp` and `from-pfm-dp`
- added depth information writers `to-grayscale-dp`, `to-numpy-dp`, `to-csv-dp` and `to-pfm-dp`
- added `apply-ext-mask` filter to applying external PNG masks to image containers (image and/or annotations)
- added `apply-label-mask` filter for applying image segmentation label masks to their base images
- added `label-present-ic` and `label-present-is` that ensure that certain label(s) are present or otherwise discard the image
- filter `label-present` was renamed to `label-present-od` but keeping `label-present` as alias for the time being
- fix: `imgseg_to_bluechannel`, `imgseg_to_indexedpng` and `imgseg_to_grayscale` now handle overlapping pixels correctly,
no longer adding them up and introducing additional labels
- `discard-by-name` filter can use names of files in specified paths now as well
- fixed the construction of the error messages in the pyfunc reader/filter/writer classes
0.0.11 (2025-04-03)
-------------------
- fix: `idc-registry --list writers` now returns writer plugins instead of reader ones
0.0.10 (2025-04-03)
-------------------
- added `set-placeholder` filter for dynamically setting (temporary) placeholders at runtime
- added `--resume_from` option to relevant readers that allows resuming the data processing
from the first file that matches this glob expression (e.g., `*/012345.png`)
- requiring seppl>=0.2.14 now for resume support
- using underscores now instead of dashes in dependencies (`setup.py`)
- the `array_to_image` method no longer performs unnecessary conversions of Image objects
- the `dirs` generator can limit directories now to ones that have files matching a specific
regexp (`--file_regexp`), to avoid the `Failed to locate any files using: ...` error message
when a reader doesn't find any matching files
- requiring seppl>=0.2.15 now for new `--split_group` support
- added the `from-multi` meta-reader that combines multiple base readers and returns their output
- added the `to-multi` meta-writer that forwards the data to multiple base writers
- added the `use-mask` filter for using the image segmentation annotations (= mask) as the new base image
0.0.9 (2025-03-14)
------------------
- using `wai_logging` instead of `wai.logging` as dependency now
0.0.8 (2025-03-14)
------------------
- requiring seppl>=0.2.13 now for placeholder support
- added placeholder support to tools: `idc-convert`, `idc-exec`
- added placeholder support to readers: `from-adams-ic`, `from-subdir-ic`, `from-blue-channel-is`, `from-grayscale-is`,
`from-indexed-png-is`, `from-layer-segments-is`, `from-adams-od`, `from-coco-od`, `from-opex-od`, `from-roicsv-od`,
`from-voc-od`, `from-yolo-od`, `from-data`, `from-pyfunc`, `poll-dir`
- added placeholder support to filters: `write-labels`
- added placeholder support to writers: `to-adams-ic`, `to-subdir-ic`, `to-blue-channel-is`, `to-grayscale-is`,
`to-indexed-png-is`, `to-layer-segments-is`, `to-adams-od`, `to-coco-od`, `to-opex-od`, `to-roicsv-od`,
`to-voc-od`, `to-yolo-od`, `to-data`
0.0.7 (2025-03-12)
------------------
- added `safe_deepcopy` method to idc.api._utils which creates a deep copy of an object if not None
- added `rgb-to-grayscale` filter to convert color images into gray scale ones
- added `sort-pixels` filter for grayscale images
- the following filters can operate on lists of records now as well: `inspec`, `metadata`, `metadata-from-name`
- added `metadata-od` filter for filtering object-detection annotations based on their meta-data
(e.g., scores from model predictions)
- the filters `discard-negatives` and `discard-invalid-images` now output how many were discarded/kept
when processing finishes
0.0.6 (2025-02-26)
------------------
- `LayerSegmentsImageSegmentationReader` now suggest using `--lenient` flag in exception in case image not binary
- added the `discard-by-name` filter that allows user to discard images based on name, either exact match of regexp
(matching sense can be inverted)
- requiring seppl>=0.2.10 now
- added support for aliases
- added `to_bluechannel`, `to_grayscale` and `to_indexedpng` image segmentation methods to `idc.api`
- added the `generate_palette_list` method to `idc.api` which turns a predefined palette name or comma-separated
list of RGB values into a flat list of int values, e.g., used for indexed PNG files
- exposed method `save_image` through `idc.api`
- `filter-labels` now handles not specifying any labels and only regexp
- `write-labels` filter now allows specification of custom separator
- `write-labels`: fixed retrieval of image-segmentation labels
- using `simple_palette_utils` dependency now
- `idc-convert` tool now flags aliases on the help screen with `*`
- the `from-voc-od` reader now has the `-r/--image_rel_path` option which gets injected before the `folder` property
from the XML file
0.0.5 (2025-01-13)
------------------
- added `setuptools` as dependency
- switched to underscores in project name
- using 90% as default quality for JPEG images now, can be overridden with environment variable `IDC_JPEG_QUALITY`
- added methods to idc.api module: `jpeg_quality()`, `array_to_image(...)`, `empty_image(...)`
0.0.4 (2024-07-16)
------------------
- limiting numpy to <2.0.0 due to problems with imgaug library
0.0.3 (2024-07-02)
------------------
- switched to the `fast-opex` library
- helper method `from_indexedpng` was using incorrect label index (off by 1)
- `Data.save_image` method now ensures that source/target files exist before calling `os.path.samefile`
- requiring seppl>=0.2.6 now
- readers now support default globs, allowing the user to just specify directories as input
(and the default glob gets appended)
- the `to-yolo-od` writer now has an option for predefined labels (for enforcing label order)
- the `to-yolo-od` writer now stores the labels/labels_cvs files in the respective output folders
rather than using an absolute file name
- the bluechannel/grayscale/indexed-png image segmentation readers/writers can use a value other
than 0 now for the background
- `split` filter has been renamed to `split-records`
0.0.2 (2024-06-13)
------------------
- added generic plugins that take user Python functions: `from-pyfunc`, `pyfunc-filter`, `to-pyfunc`
- added `idc-exec` tool that uses generator to produce variable/value pairs that are used to expand
the provided pipeline template which then gets executed
- added `polygon-simplifier` filter for reducing number of points in polygons
- moved several geometry/image related functions from imgaug library into core library to avoid duplication
- added python-image-complete as dependency
- the `ImageData` class now uses the python-image-complete library to determine the file format rather than
loading the image into memory in order to determine that
- the `convert-image-format` filter now correctly creates a new container with the converted image data
- the `to-coco-od` writer only allows sorting of categories when using predefined categories now
- the `from-opex-od` reader now handles absent meta-data correctly
- added the `AnnotationsOnlyWriter` mixin for writers that can skip the base image and just output the annotations
0.0.1 (2024-05-06)
------------------
- initial release
Raw data
{
"_id": null,
"home_page": "https://github.com/waikato-datamining/image-dataset-converter",
"name": "image-dataset-converter",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Peter Reutemann",
"author_email": "fracpete@waikato.ac.nz",
"download_url": "https://files.pythonhosted.org/packages/d5/87/f64e8979bb51424c27551f2319f657a9da6ce4f35b36ad89bb9a9737a9eb/image_dataset_converter-0.0.13.tar.gz",
"platform": null,
"description": "The **image-dataset-converter** library allows the conversion between\nvarious dataset formats for image annotation datasets.\nFilters can be supplied as well, e.g., for cleaning up the data.\n\nDataset formats:\n\n- depth data: CSV (r/w), grayscale (r/w), numpy (r/w), PFM (r/w)\n- image classification: ADAMS (r/w), sub-dir (r/w)\n- image segmentation: blue-channel (r/w), grayscale (r/w), indexed PNG (r/w), layer segments (r/w)\n- object detection: ADAMS (r/w), COCO (r/w), OPEX (r/w), ROI (r/w), VOC (r/w), YOLO (r/w)\n\nExamples can be found here:\n\nhttps://github.com/waikato-datamining/image-dataset-converter-examples\n\n\nChangelog\n=========\n\n0.0.13 (2025-07-15)\n-------------------\n\n- requiring seppl>=0.2.20 now for improved help requests in `idc-convert` tool\n\n\n0.0.12 (2025-07-11)\n-------------------\n\n- dropped numpy<2.0.0 restriction\n- added `grayscale-to-binary` filter\n- fix: `sort-pixels`, `rgb-to-grayscale` filters\n- added `ensure_grayscale` and `grayscale_required_info` convenience methods (package: `idc.api`)\n- added `ensure_binary` and `binary_required_info` convenience methods (package: `idc.api`)\n- added `--dump_pipeline` option to `idc-convert` for saving the pipeline command\n- the `rename` filter now supports lower/upper case placeholders of name and extension as well\n- requiring seppl>=0.2.17 now for skippable plugin support and avoiding deprecated use of pkg_resources\n- added `any-to-rgb` filter for turning binary/grayscale images back into RGB ones\n- using `wai_common` instead of `wai.common` now\n- requiring `fast_opex>=0.0.4` now\n- added `label-to-metadata` filter for transferring labels into meta-data\n- added `metadata-to-placeholder` filter for transferring meta-data into placeholders\n- added basic support for images with associated depth information: `DepthData`, `DepthInformation`\n- added `depth-to-grayscale` filter for converting depth information to grayscale image\n- prefixed image segmentation methods like `from_bluechannel` and `to_bluechannel` with `imgseg_`\n- added depth information readers `from-grayscale-dp`, `from-numpy-dp`, `from-csv-dp` and `from-pfm-dp`\n- added depth information writers `to-grayscale-dp`, `to-numpy-dp`, `to-csv-dp` and `to-pfm-dp`\n- added `apply-ext-mask` filter to applying external PNG masks to image containers (image and/or annotations)\n- added `apply-label-mask` filter for applying image segmentation label masks to their base images\n- added `label-present-ic` and `label-present-is` that ensure that certain label(s) are present or otherwise discard the image\n- filter `label-present` was renamed to `label-present-od` but keeping `label-present` as alias for the time being\n- fix: `imgseg_to_bluechannel`, `imgseg_to_indexedpng` and `imgseg_to_grayscale` now handle overlapping pixels correctly,\n no longer adding them up and introducing additional labels\n- `discard-by-name` filter can use names of files in specified paths now as well\n- fixed the construction of the error messages in the pyfunc reader/filter/writer classes\n\n\n0.0.11 (2025-04-03)\n-------------------\n\n- fix: `idc-registry --list writers` now returns writer plugins instead of reader ones\n\n\n0.0.10 (2025-04-03)\n-------------------\n\n- added `set-placeholder` filter for dynamically setting (temporary) placeholders at runtime\n- added `--resume_from` option to relevant readers that allows resuming the data processing\n from the first file that matches this glob expression (e.g., `*/012345.png`)\n- requiring seppl>=0.2.14 now for resume support\n- using underscores now instead of dashes in dependencies (`setup.py`)\n- the `array_to_image` method no longer performs unnecessary conversions of Image objects\n- the `dirs` generator can limit directories now to ones that have files matching a specific\n regexp (`--file_regexp`), to avoid the `Failed to locate any files using: ...` error message\n when a reader doesn't find any matching files\n- requiring seppl>=0.2.15 now for new `--split_group` support\n- added the `from-multi` meta-reader that combines multiple base readers and returns their output\n- added the `to-multi` meta-writer that forwards the data to multiple base writers\n- added the `use-mask` filter for using the image segmentation annotations (= mask) as the new base image\n\n\n0.0.9 (2025-03-14)\n------------------\n\n- using `wai_logging` instead of `wai.logging` as dependency now\n\n\n0.0.8 (2025-03-14)\n------------------\n\n- requiring seppl>=0.2.13 now for placeholder support\n- added placeholder support to tools: `idc-convert`, `idc-exec`\n- added placeholder support to readers: `from-adams-ic`, `from-subdir-ic`, `from-blue-channel-is`, `from-grayscale-is`,\n `from-indexed-png-is`, `from-layer-segments-is`, `from-adams-od`, `from-coco-od`, `from-opex-od`, `from-roicsv-od`,\n `from-voc-od`, `from-yolo-od`, `from-data`, `from-pyfunc`, `poll-dir`\n- added placeholder support to filters: `write-labels`\n- added placeholder support to writers: `to-adams-ic`, `to-subdir-ic`, `to-blue-channel-is`, `to-grayscale-is`,\n `to-indexed-png-is`, `to-layer-segments-is`, `to-adams-od`, `to-coco-od`, `to-opex-od`, `to-roicsv-od`,\n `to-voc-od`, `to-yolo-od`, `to-data`\n\n\n0.0.7 (2025-03-12)\n------------------\n\n- added `safe_deepcopy` method to idc.api._utils which creates a deep copy of an object if not None\n- added `rgb-to-grayscale` filter to convert color images into gray scale ones\n- added `sort-pixels` filter for grayscale images\n- the following filters can operate on lists of records now as well: `inspec`, `metadata`, `metadata-from-name`\n- added `metadata-od` filter for filtering object-detection annotations based on their meta-data\n (e.g., scores from model predictions)\n- the filters `discard-negatives` and `discard-invalid-images` now output how many were discarded/kept\n when processing finishes\n\n\n0.0.6 (2025-02-26)\n------------------\n\n- `LayerSegmentsImageSegmentationReader` now suggest using `--lenient` flag in exception in case image not binary\n- added the `discard-by-name` filter that allows user to discard images based on name, either exact match of regexp\n (matching sense can be inverted)\n- requiring seppl>=0.2.10 now\n- added support for aliases\n- added `to_bluechannel`, `to_grayscale` and `to_indexedpng` image segmentation methods to `idc.api`\n- added the `generate_palette_list` method to `idc.api` which turns a predefined palette name or comma-separated\n list of RGB values into a flat list of int values, e.g., used for indexed PNG files\n- exposed method `save_image` through `idc.api`\n- `filter-labels` now handles not specifying any labels and only regexp\n- `write-labels` filter now allows specification of custom separator\n- `write-labels`: fixed retrieval of image-segmentation labels\n- using `simple_palette_utils` dependency now\n- `idc-convert` tool now flags aliases on the help screen with `*`\n- the `from-voc-od` reader now has the `-r/--image_rel_path` option which gets injected before the `folder` property\n from the XML file\n\n\n0.0.5 (2025-01-13)\n------------------\n\n- added `setuptools` as dependency\n- switched to underscores in project name\n- using 90% as default quality for JPEG images now, can be overridden with environment variable `IDC_JPEG_QUALITY`\n- added methods to idc.api module: `jpeg_quality()`, `array_to_image(...)`, `empty_image(...)`\n\n\n0.0.4 (2024-07-16)\n------------------\n\n- limiting numpy to <2.0.0 due to problems with imgaug library\n\n\n0.0.3 (2024-07-02)\n------------------\n\n- switched to the `fast-opex` library\n- helper method `from_indexedpng` was using incorrect label index (off by 1)\n- `Data.save_image` method now ensures that source/target files exist before calling `os.path.samefile`\n- requiring seppl>=0.2.6 now\n- readers now support default globs, allowing the user to just specify directories as input\n (and the default glob gets appended)\n- the `to-yolo-od` writer now has an option for predefined labels (for enforcing label order)\n- the `to-yolo-od` writer now stores the labels/labels_cvs files in the respective output folders\n rather than using an absolute file name\n- the bluechannel/grayscale/indexed-png image segmentation readers/writers can use a value other\n than 0 now for the background\n- `split` filter has been renamed to `split-records`\n\n\n0.0.2 (2024-06-13)\n------------------\n\n- added generic plugins that take user Python functions: `from-pyfunc`, `pyfunc-filter`, `to-pyfunc`\n- added `idc-exec` tool that uses generator to produce variable/value pairs that are used to expand\n the provided pipeline template which then gets executed\n- added `polygon-simplifier` filter for reducing number of points in polygons\n- moved several geometry/image related functions from imgaug library into core library to avoid duplication\n- added python-image-complete as dependency\n- the `ImageData` class now uses the python-image-complete library to determine the file format rather than\n loading the image into memory in order to determine that\n- the `convert-image-format` filter now correctly creates a new container with the converted image data\n- the `to-coco-od` writer only allows sorting of categories when using predefined categories now\n- the `from-opex-od` reader now handles absent meta-data correctly\n- added the `AnnotationsOnlyWriter` mixin for writers that can skip the base image and just output the annotations\n\n\n0.0.1 (2024-05-06)\n------------------\n\n- initial release\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Python3 library for converting between various image annotation dataset formats.",
"version": "0.0.13",
"project_urls": {
"Homepage": "https://github.com/waikato-datamining/image-dataset-converter"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d587f64e8979bb51424c27551f2319f657a9da6ce4f35b36ad89bb9a9737a9eb",
"md5": "e0234ae109f55397252536c672ab5335",
"sha256": "521b6ec605afa933b2bb85cfca6685fd86b7c9d23305a610ef7e30acdb62c348"
},
"downloads": -1,
"filename": "image_dataset_converter-0.0.13.tar.gz",
"has_sig": false,
"md5_digest": "e0234ae109f55397252536c672ab5335",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 103090,
"upload_time": "2025-07-15T03:25:57",
"upload_time_iso_8601": "2025-07-15T03:25:57.043351Z",
"url": "https://files.pythonhosted.org/packages/d5/87/f64e8979bb51424c27551f2319f657a9da6ce4f35b36ad89bb9a9737a9eb/image_dataset_converter-0.0.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 03:25:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "waikato-datamining",
"github_project": "image-dataset-converter",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "image-dataset-converter"
}