Name | datasets-toolbox JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | A toolbox for audio dataset processing and augmentation. |
upload_time | 2024-11-02 08:10:56 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | MIT |
keywords |
datasets
cli
audio
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Datasets Toolbox
A toolbox for creating, processing and inspecting audio/image datasets through a simple CLI interface.
## Installation
```sh
pip install datasets-toolbox
```
## Usage
The goal of datasets-toolbox is to build audio/image datasets with CLI.
All the commands support `--config [config-name]` and `--split [split-name]` options to specified the target. Where `config-name` is the configuration name (e.g. language) and `split-name` is something like `train`, `validation`, `test`.
### Add More Data
`datasets import --config [data] --split [train] <sources>`
Import data into datasets structure.
If the configuration/split is not configured, will defaults to `default` configuration and `train` split.
### Modify Dataset
`datasets modify <action> --config [data] --split [train] --other-params`
If the configuration/split is not configured, will defaults to recursively run on all configurations and all splits.
#### Audio Slicer
`datasets modify slice --config [data] --split [train] --min-length [ms] --hop-size [n]`
#### Audio Resample
`datasets modify resample --config [data] --split [train] --sr [16000] --mono`
#### Audio Transcription
`datasets modify transcribe --model [openai/whisper-large-v3-turbo]'`
### Inspect Dataset
`datasets inspect --config [data] --split [train] --other-params`
If the configuration/split is not configured, will defaults to recursively run on all configurations and all splits.
#### Audio Hours
`datasets inspect hours --config [data] --split [train]`
Raw data
{
"_id": null,
"home_page": null,
"name": "datasets-toolbox",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "datasets, CLI, audio",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/97/05/a302628710e8e302f89230aec50975ec6276facee1ba205af2b3e8c5f833/datasets_toolbox-0.1.0.tar.gz",
"platform": null,
"description": "# Datasets Toolbox\n\nA toolbox for creating, processing and inspecting audio/image datasets through a simple CLI interface.\n\n## Installation\n\n```sh\npip install datasets-toolbox\n```\n\n## Usage\n\nThe goal of datasets-toolbox is to build audio/image datasets with CLI.\n\nAll the commands support `--config [config-name]` and `--split [split-name]` options to specified the target. Where `config-name` is the configuration name (e.g. language) and `split-name` is something like `train`, `validation`, `test`.\n\n### Add More Data\n\n`datasets import --config [data] --split [train] <sources>`\n\nImport data into datasets structure.\n\nIf the configuration/split is not configured, will defaults to `default` configuration and `train` split.\n\n### Modify Dataset\n\n`datasets modify <action> --config [data] --split [train] --other-params`\n\nIf the configuration/split is not configured, will defaults to recursively run on all configurations and all splits.\n\n#### Audio Slicer\n\n`datasets modify slice --config [data] --split [train] --min-length [ms] --hop-size [n]`\n\n#### Audio Resample\n\n`datasets modify resample --config [data] --split [train] --sr [16000] --mono`\n\n#### Audio Transcription\n\n`datasets modify transcribe --model [openai/whisper-large-v3-turbo]'`\n\n### Inspect Dataset\n\n`datasets inspect --config [data] --split [train] --other-params`\n\nIf the configuration/split is not configured, will defaults to recursively run on all configurations and all splits.\n\n#### Audio Hours\n\n`datasets inspect hours --config [data] --split [train]`\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A toolbox for audio dataset processing and augmentation.",
"version": "0.1.0",
"project_urls": {
"homepage": "https://github.com/JacobLinCool/datasets-toolbox",
"repository": "https://github.com/JacobLinCool/datasets-toolbox"
},
"split_keywords": [
"datasets",
" cli",
" audio"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "779561288c2bda302d603260e410704042518437acf0a2dc5f0206b4ae0cace3",
"md5": "2810ddfca41ab63aadb45e0e4913c617",
"sha256": "0f29df8f68962204096038a0183d3bb1005869ca0db0d5ecc7091be445531bff"
},
"downloads": -1,
"filename": "datasets_toolbox-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2810ddfca41ab63aadb45e0e4913c617",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 13255,
"upload_time": "2024-11-02T08:10:55",
"upload_time_iso_8601": "2024-11-02T08:10:55.066429Z",
"url": "https://files.pythonhosted.org/packages/77/95/61288c2bda302d603260e410704042518437acf0a2dc5f0206b4ae0cace3/datasets_toolbox-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9705a302628710e8e302f89230aec50975ec6276facee1ba205af2b3e8c5f833",
"md5": "12fc22f999f334cabfe7d4096357e912",
"sha256": "c059d470f2472631d329658b50d01012290e5ba9f98e6af0100f38c9b0593ab2"
},
"downloads": -1,
"filename": "datasets_toolbox-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "12fc22f999f334cabfe7d4096357e912",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 9864,
"upload_time": "2024-11-02T08:10:56",
"upload_time_iso_8601": "2024-11-02T08:10:56.834332Z",
"url": "https://files.pythonhosted.org/packages/97/05/a302628710e8e302f89230aec50975ec6276facee1ba205af2b3e8c5f833/datasets_toolbox-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-02 08:10:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "JacobLinCool",
"github_project": "datasets-toolbox",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "datasets-toolbox"
}