The **audio-dataset-converter** library allows the conversion between
various dataset formats of audio datasets.
Filters can be supplied as well, e.g., for cleaning up the data.
Dataset formats:
- classification: ADAMS (r/w), sub-dir (r/w), TXT (r/w)
- speech: ADAMS (r/w), CommonVoice (r/w), Festvox (r/w), Huggingface Audiofolder (r/w), TXT (r/w)
Examples can be found here:
https://github.com/waikato-llm/audio-dataset-converter-examples
Changelog
=========
0.0.4 (2025-07-15)
------------------
- requiring seppl>=0.2.20 now for improved help requests in `adc-convert` tool
0.0.3 (2025-07-10)
------------------
- added `set-placeholder` filter for dynamically setting (temporary) placeholders at runtime
- added `--resume_from` option to relevant readers that allows resuming the data processing
from the first file that matches this glob expression (e.g., `*/012345.wav`)
- requiring seppl>=0.2.17 now for resume, split group, skippable plugin support and avoiding deprecated use of pkg_resources
- `to-adams-sp` writer now uses `-t` short flag for the transcript like the `from-adams-sp` reader
- added the `from-multi` meta-reader that combines multiple base readers and returns their output
- added the `to-multi` meta-writer that forwards the data to multiple base writers
- using `wai_common` instead of `wai.common` now
- added `split_group` parameter to splittable writers (stream/batch)
- fixed the construction of the error messages in the pyfunc reader/filter/writer classes
- added `metadata-to-placeholder` filter to transfer meta-data files into placeholders
0.0.2 (2025-03-14)
------------------
- added `setuptools` as dependency
- switched to underscores in project name
- added `discard-by-name` filter
- requiring seppl>=0.2.13 now
- added support for aliases
- added placeholder support to tools: `adc-convert`, `adc-exec`
- added placeholder support to readers: `from-adams-ac`, `from-subdir-ac`, `from-txt-ac`, `from-adams-sp`,
`from-commonvoice-sp`, `from-festvox-sp`, `from-hf-audiofolder-sp`, `from-txt-sp`, `from-data`, `poll-dir`,
`from-pyfunc`
- added placeholder support to writers: `to-adams-ac`, `to-subdir-ac`, `to-txt-ac`, `to-adams-sp`, `to-commonvoice-sp`,
`to-festvox-sp`, `to-hf-audiofolder-sp`, `to-txt-sp`, `to-audioinfo`, `to-data`
0.0.1 (2024-07-05)
------------------
- initial release
Raw data
{
"_id": null,
"home_page": "https://github.com/waikato-llm/audio-dataset-converter",
"name": "audio-dataset-converter",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Peter Reutemann",
"author_email": "fracpete@waikato.ac.nz",
"download_url": "https://files.pythonhosted.org/packages/78/95/dd9ed58f8dc4fd4f10d93caeb2f6c76a03202c502bc48179d9e8d00bcbb3/audio_dataset_converter-0.0.4.tar.gz",
"platform": null,
"description": "The **audio-dataset-converter** library allows the conversion between\nvarious dataset formats of audio datasets.\nFilters can be supplied as well, e.g., for cleaning up the data.\n\nDataset formats:\n\n- classification: ADAMS (r/w), sub-dir (r/w), TXT (r/w)\n- speech: ADAMS (r/w), CommonVoice (r/w), Festvox (r/w), Huggingface Audiofolder (r/w), TXT (r/w)\n\nExamples can be found here:\n\nhttps://github.com/waikato-llm/audio-dataset-converter-examples\n\n\nChangelog\n=========\n\n0.0.4 (2025-07-15)\n------------------\n\n- requiring seppl>=0.2.20 now for improved help requests in `adc-convert` tool\n\n\n0.0.3 (2025-07-10)\n------------------\n\n- added `set-placeholder` filter for dynamically setting (temporary) placeholders at runtime\n- added `--resume_from` option to relevant readers that allows resuming the data processing\n from the first file that matches this glob expression (e.g., `*/012345.wav`)\n- requiring seppl>=0.2.17 now for resume, split group, skippable plugin support and avoiding deprecated use of pkg_resources\n- `to-adams-sp` writer now uses `-t` short flag for the transcript like the `from-adams-sp` reader\n- added the `from-multi` meta-reader that combines multiple base readers and returns their output\n- added the `to-multi` meta-writer that forwards the data to multiple base writers\n- using `wai_common` instead of `wai.common` now\n- added `split_group` parameter to splittable writers (stream/batch)\n- fixed the construction of the error messages in the pyfunc reader/filter/writer classes\n- added `metadata-to-placeholder` filter to transfer meta-data files into placeholders\n\n\n0.0.2 (2025-03-14)\n------------------\n\n- added `setuptools` as dependency\n- switched to underscores in project name\n- added `discard-by-name` filter\n- requiring seppl>=0.2.13 now\n- added support for aliases\n- added placeholder support to tools: `adc-convert`, `adc-exec`\n- added placeholder support to readers: `from-adams-ac`, `from-subdir-ac`, `from-txt-ac`, `from-adams-sp`,\n `from-commonvoice-sp`, `from-festvox-sp`, `from-hf-audiofolder-sp`, `from-txt-sp`, `from-data`, `poll-dir`,\n `from-pyfunc`\n- added placeholder support to writers: `to-adams-ac`, `to-subdir-ac`, `to-txt-ac`, `to-adams-sp`, `to-commonvoice-sp`,\n `to-festvox-sp`, `to-hf-audiofolder-sp`, `to-txt-sp`, `to-audioinfo`, `to-data`\n\n\n0.0.1 (2024-07-05)\n------------------\n\n- initial release\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Python3 library for converting between various audio dataset formats.",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/waikato-llm/audio-dataset-converter"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7895dd9ed58f8dc4fd4f10d93caeb2f6c76a03202c502bc48179d9e8d00bcbb3",
"md5": "4aa1d08a4254828485b41e070b9e7327",
"sha256": "283fcce3251a90b3d8ad3c805a457cf05a4e7f2a90e7ae1425058c355436d314"
},
"downloads": -1,
"filename": "audio_dataset_converter-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "4aa1d08a4254828485b41e070b9e7327",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 58186,
"upload_time": "2025-07-15T03:30:09",
"upload_time_iso_8601": "2025-07-15T03:30:09.784767Z",
"url": "https://files.pythonhosted.org/packages/78/95/dd9ed58f8dc4fd4f10d93caeb2f6c76a03202c502bc48179d9e8d00bcbb3/audio_dataset_converter-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 03:30:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "waikato-llm",
"github_project": "audio-dataset-converter",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "audio-dataset-converter"
}