audio-dataset-converter


Nameaudio-dataset-converter JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/waikato-llm/audio-dataset-converter
SummaryPython3 library for converting between various audio dataset formats.
upload_time2025-07-15 03:30:09
maintainerNone
docs_urlNone
authorPeter Reutemann
requires_pythonNone
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            The **audio-dataset-converter** library allows the conversion between
various dataset formats of audio datasets.
Filters can be supplied as well, e.g., for cleaning up the data.

Dataset formats:

- classification: ADAMS (r/w), sub-dir (r/w), TXT (r/w)
- speech: ADAMS (r/w), CommonVoice (r/w), Festvox (r/w), Huggingface Audiofolder (r/w), TXT (r/w)

Examples can be found here:

https://github.com/waikato-llm/audio-dataset-converter-examples


Changelog
=========

0.0.4 (2025-07-15)
------------------

- requiring seppl>=0.2.20 now for improved help requests in `adc-convert` tool


0.0.3 (2025-07-10)
------------------

- added `set-placeholder` filter for dynamically setting (temporary) placeholders at runtime
- added `--resume_from` option to relevant readers that allows resuming the data processing
  from the first file that matches this glob expression (e.g., `*/012345.wav`)
- requiring seppl>=0.2.17 now for resume, split group, skippable plugin support and avoiding deprecated use of pkg_resources
- `to-adams-sp` writer now uses `-t` short flag for the transcript like the `from-adams-sp` reader
- added the `from-multi` meta-reader that combines multiple base readers and returns their output
- added the `to-multi` meta-writer that forwards the data to multiple base writers
- using `wai_common` instead of `wai.common` now
- added `split_group` parameter to splittable writers (stream/batch)
- fixed the construction of the error messages in the pyfunc reader/filter/writer classes
- added `metadata-to-placeholder` filter to transfer meta-data files into placeholders


0.0.2 (2025-03-14)
------------------

- added `setuptools` as dependency
- switched to underscores in project name
- added `discard-by-name` filter
- requiring seppl>=0.2.13 now
- added support for aliases
- added placeholder support to tools: `adc-convert`, `adc-exec`
- added placeholder support to readers: `from-adams-ac`, `from-subdir-ac`, `from-txt-ac`, `from-adams-sp`,
  `from-commonvoice-sp`, `from-festvox-sp`, `from-hf-audiofolder-sp`, `from-txt-sp`, `from-data`, `poll-dir`,
  `from-pyfunc`
- added placeholder support to writers: `to-adams-ac`, `to-subdir-ac`, `to-txt-ac`, `to-adams-sp`, `to-commonvoice-sp`,
  `to-festvox-sp`, `to-hf-audiofolder-sp`, `to-txt-sp`, `to-audioinfo`, `to-data`


0.0.1 (2024-07-05)
------------------

- initial release


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/waikato-llm/audio-dataset-converter",
    "name": "audio-dataset-converter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Peter Reutemann",
    "author_email": "fracpete@waikato.ac.nz",
    "download_url": "https://files.pythonhosted.org/packages/78/95/dd9ed58f8dc4fd4f10d93caeb2f6c76a03202c502bc48179d9e8d00bcbb3/audio_dataset_converter-0.0.4.tar.gz",
    "platform": null,
    "description": "The **audio-dataset-converter** library allows the conversion between\nvarious dataset formats of audio datasets.\nFilters can be supplied as well, e.g., for cleaning up the data.\n\nDataset formats:\n\n- classification: ADAMS (r/w), sub-dir (r/w), TXT (r/w)\n- speech: ADAMS (r/w), CommonVoice (r/w), Festvox (r/w), Huggingface Audiofolder (r/w), TXT (r/w)\n\nExamples can be found here:\n\nhttps://github.com/waikato-llm/audio-dataset-converter-examples\n\n\nChangelog\n=========\n\n0.0.4 (2025-07-15)\n------------------\n\n- requiring seppl>=0.2.20 now for improved help requests in `adc-convert` tool\n\n\n0.0.3 (2025-07-10)\n------------------\n\n- added `set-placeholder` filter for dynamically setting (temporary) placeholders at runtime\n- added `--resume_from` option to relevant readers that allows resuming the data processing\n  from the first file that matches this glob expression (e.g., `*/012345.wav`)\n- requiring seppl>=0.2.17 now for resume, split group, skippable plugin support and avoiding deprecated use of pkg_resources\n- `to-adams-sp` writer now uses `-t` short flag for the transcript like the `from-adams-sp` reader\n- added the `from-multi` meta-reader that combines multiple base readers and returns their output\n- added the `to-multi` meta-writer that forwards the data to multiple base writers\n- using `wai_common` instead of `wai.common` now\n- added `split_group` parameter to splittable writers (stream/batch)\n- fixed the construction of the error messages in the pyfunc reader/filter/writer classes\n- added `metadata-to-placeholder` filter to transfer meta-data files into placeholders\n\n\n0.0.2 (2025-03-14)\n------------------\n\n- added `setuptools` as dependency\n- switched to underscores in project name\n- added `discard-by-name` filter\n- requiring seppl>=0.2.13 now\n- added support for aliases\n- added placeholder support to tools: `adc-convert`, `adc-exec`\n- added placeholder support to readers: `from-adams-ac`, `from-subdir-ac`, `from-txt-ac`, `from-adams-sp`,\n  `from-commonvoice-sp`, `from-festvox-sp`, `from-hf-audiofolder-sp`, `from-txt-sp`, `from-data`, `poll-dir`,\n  `from-pyfunc`\n- added placeholder support to writers: `to-adams-ac`, `to-subdir-ac`, `to-txt-ac`, `to-adams-sp`, `to-commonvoice-sp`,\n  `to-festvox-sp`, `to-hf-audiofolder-sp`, `to-txt-sp`, `to-audioinfo`, `to-data`\n\n\n0.0.1 (2024-07-05)\n------------------\n\n- initial release\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Python3 library for converting between various audio dataset formats.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/waikato-llm/audio-dataset-converter"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7895dd9ed58f8dc4fd4f10d93caeb2f6c76a03202c502bc48179d9e8d00bcbb3",
                "md5": "4aa1d08a4254828485b41e070b9e7327",
                "sha256": "283fcce3251a90b3d8ad3c805a457cf05a4e7f2a90e7ae1425058c355436d314"
            },
            "downloads": -1,
            "filename": "audio_dataset_converter-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "4aa1d08a4254828485b41e070b9e7327",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 58186,
            "upload_time": "2025-07-15T03:30:09",
            "upload_time_iso_8601": "2025-07-15T03:30:09.784767Z",
            "url": "https://files.pythonhosted.org/packages/78/95/dd9ed58f8dc4fd4f10d93caeb2f6c76a03202c502bc48179d9e8d00bcbb3/audio_dataset_converter-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-15 03:30:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "waikato-llm",
    "github_project": "audio-dataset-converter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "audio-dataset-converter"
}
        
Elapsed time: 0.99907s