Name | rVADfast JSON |
Version |
0.0.2
JSON |
| download |
home_page | |
Summary | rVADfast - a fast and robust unsupervised VAD |
upload_time | 2024-01-23 09:41:59 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | MIT License |
keywords |
audio
tools
vad
speech
speech processing
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# rVADfast
The Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as presented in [rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method, Computer Speech & Language, 2020](https://www.sciencedirect.com/science/article/pii/S0885230819300920) or its [arXiv version](https://arxiv.org/abs/1906.03588).
More info on [the rVAD GitHub page](https://github.com/zhenghuatan/rVAD).
***The rVAD paper published in Computer Speech & Language won International Speech Communication Association (ISCA) 2022 Best Research Paper Award.***
The rVAD method consists of two passes of denoising followed by a VAD stage. It has been applied as a preprocessor for
a wide range of applications, such as speech recognition, speaker identification, language identification, age and
gender identification, self-supervised learning, human-robot interaction, audio archive segmentation,
and so on as in [Google Scholar](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=fugL2E8AAAAJ&citation_for_view=fugL2E8AAAAJ:-mN3Mh-tlDkC).
The method is unsupervised to make it applicable to a broad range of acoustic environments,
and it is optimized considering both noisy and clean conditions.
The rVAD (out of the box) ranks the 4th place (out of 27 supervised/unsupervised systems)
in a Fearless Steps Speech Activity Detection Challenge.
The rVAD paper is among [the most cited articles from Computer Speech and Language published since 2018](https://www.journals.elsevier.com/computer-speech-and-language/most-cited-articles) (the 6th place), in 2023.
## Usage
The [rVADfast](https://pypi.org/project/rVADfast/) library is available as a python package installable via:
```bash
pip install rVADfast
```
After installation, you can import the rVADfast class
from which you can instantiate a VAD instance which you can use to generate vad labels:
```python
import audiofile
from rVADfast import rVADfast
vad = rVADfast()
path_to_audiofile = "some_audio_file.wav"
waveform, sampling_rate = audiofile.read(path_to_audiofile)
vad_labels, vad_timestamps = vad(waveform, sampling_rate)
```
The package also contains functionality to process folders of audio files, to generate VAD labels
or to trim non-speeh segments from audio files.
This is done by importing the ```rVADfast.process``` module which has two methods for processing audio files,
namely ```process.rVADfast_single_process``` and ```process.rVADfast_multi_process```,
with the latter utilizing multiple CPUs for processing.
Additionally, a processing script can be called from commandline-tools by executing:
```bash
rVADfast_process --root <audio_file_root> --save_folder <path_to_save_files>
--ext <audio_file_extension> --n_workers <number_of_multiprocessing_workers>
```
For an explanation of the additional available arguments for the commandline tool you can use:
```bash
rVADfast_process --help
```
In ```/notebooks``` a concrete example on how to use the rVADfast package is found.
*Note that the package is still in development.
Therefore, we welcome any feedback or suggestions for changes and/or additional features.*
## References
1) Z.-H. Tan, A.k. Sarkara and N. Dehak, "rVAD: an unsupervised segment-based robust voice activity detection method," Computer Speech and Language, vol. 59, pp. 1-21, 2020.
2) Z.-H. Tan and B. Lindberg, "Low-complexity variable frame rate analysis for speech recognition and voice activity detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.
Raw data
{
"_id": null,
"home_page": "",
"name": "rVADfast",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Holger Severin Bovbjerg <hsbo@es.aau.dk>, Zheng-Hua Tan <zt@es.aau.dk>",
"keywords": "Audio,Tools,VAD,Speech,Speech Processing",
"author": "",
"author_email": "Zheng-Hua Tan <zt@es.aau.dk>, Achintya Kumar Sarkar <sarkar.achintya@gmail.com>, Holger Severin Bovbjerg <hsbo@es.aau.dk>",
"download_url": "https://files.pythonhosted.org/packages/7c/0e/5fad0a2a3f72189d17ade9d5f27d927467178d0e4c089a5a669605bf6768/rVADfast-0.0.2.tar.gz",
"platform": null,
"description": "# rVADfast\nThe Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as presented in [rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method, Computer Speech & Language, 2020](https://www.sciencedirect.com/science/article/pii/S0885230819300920) or its [arXiv version](https://arxiv.org/abs/1906.03588). \nMore info on [the rVAD GitHub page](https://github.com/zhenghuatan/rVAD). \n\n***The rVAD paper published in Computer Speech & Language won International Speech Communication Association (ISCA) 2022 Best Research Paper Award.***\n\nThe rVAD method consists of two passes of denoising followed by a VAD stage. It has been applied as a preprocessor for \na wide range of applications, such as speech recognition, speaker identification, language identification, age and \ngender identification, self-supervised learning, human-robot interaction, audio archive segmentation, \nand so on as in [Google Scholar](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=fugL2E8AAAAJ&citation_for_view=fugL2E8AAAAJ:-mN3Mh-tlDkC). \n\nThe method is unsupervised to make it applicable to a broad range of acoustic environments, \nand it is optimized considering both noisy and clean conditions. \n\nThe rVAD (out of the box) ranks the 4th place (out of 27 supervised/unsupervised systems) \nin a Fearless Steps Speech Activity Detection Challenge. \n\nThe rVAD paper is among [the most cited articles from Computer Speech and Language published since 2018](https://www.journals.elsevier.com/computer-speech-and-language/most-cited-articles) (the 6th place), in 2023.\n\n## Usage\nThe [rVADfast](https://pypi.org/project/rVADfast/) library is available as a python package installable via: \n```bash\npip install rVADfast\n```\nAfter installation, you can import the rVADfast class \nfrom which you can instantiate a VAD instance which you can use to generate vad labels:\n```python\nimport audiofile\nfrom rVADfast import rVADfast\n\nvad = rVADfast()\n\npath_to_audiofile = \"some_audio_file.wav\"\n\nwaveform, sampling_rate = audiofile.read(path_to_audiofile)\nvad_labels, vad_timestamps = vad(waveform, sampling_rate)\n\n```\n\nThe package also contains functionality to process folders of audio files, to generate VAD labels \nor to trim non-speeh segments from audio files.\nThis is done by importing the ```rVADfast.process``` module which has two methods for processing audio files, \nnamely ```process.rVADfast_single_process``` and ```process.rVADfast_multi_process```, \nwith the latter utilizing multiple CPUs for processing.\nAdditionally, a processing script can be called from commandline-tools by executing: \n```bash\nrVADfast_process --root <audio_file_root> --save_folder <path_to_save_files> \n--ext <audio_file_extension> --n_workers <number_of_multiprocessing_workers>\n```\nFor an explanation of the additional available arguments for the commandline tool you can use: \n```bash\nrVADfast_process --help\n```\n\nIn ```/notebooks``` a concrete example on how to use the rVADfast package is found.\n\n*Note that the package is still in development.\nTherefore, we welcome any feedback or suggestions for changes and/or additional features.*\n\n## References\n1) Z.-H. Tan, A.k. Sarkara and N. Dehak, \"rVAD: an unsupervised segment-based robust voice activity detection method,\" Computer Speech and Language, vol. 59, pp. 1-21, 2020. \n2) Z.-H. Tan and B. Lindberg, \"Low-complexity variable frame rate analysis for speech recognition and voice activity detection,\u201d IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "rVADfast - a fast and robust unsupervised VAD",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/zhenghuatan/rVADfast/",
"Issues": "https://github.com/zhenghuatan/rVADfast/issues",
"Repository": "https://github.com/zhenghuatan/rVADfast.git",
"Source": "https://github.com/zhenghuatan/rVADfast/"
},
"split_keywords": [
"audio",
"tools",
"vad",
"speech",
"speech processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "45217c6dcc24b3e53996f25e60b958489ff0812f755bbd5b2afe961391a315b6",
"md5": "a74c1d86377329433663df53e04bf77e",
"sha256": "9047ce426bf1995c533b5dc335ca5521b8813faaa3671e785f3ccd4b480111f0"
},
"downloads": -1,
"filename": "rVADfast-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a74c1d86377329433663df53e04bf77e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18239,
"upload_time": "2024-01-23T09:41:57",
"upload_time_iso_8601": "2024-01-23T09:41:57.522180Z",
"url": "https://files.pythonhosted.org/packages/45/21/7c6dcc24b3e53996f25e60b958489ff0812f755bbd5b2afe961391a315b6/rVADfast-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7c0e5fad0a2a3f72189d17ade9d5f27d927467178d0e4c089a5a669605bf6768",
"md5": "8c9e5160890e3d8f834bc25e40015f57",
"sha256": "f4c1964aa3a00d8ca3a1dc293a9f9a4a98c050b45a9016a45f1b91f4b846eeac"
},
"downloads": -1,
"filename": "rVADfast-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "8c9e5160890e3d8f834bc25e40015f57",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17250,
"upload_time": "2024-01-23T09:41:59",
"upload_time_iso_8601": "2024-01-23T09:41:59.368878Z",
"url": "https://files.pythonhosted.org/packages/7c/0e/5fad0a2a3f72189d17ade9d5f27d927467178d0e4c089a5a669605bf6768/rVADfast-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-23 09:41:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zhenghuatan",
"github_project": "rVADfast",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "rvadfast"
}