rVADfast


NamerVADfast JSON
Version 0.0.2 PyPI version JSON
download
home_page
SummaryrVADfast - a fast and robust unsupervised VAD
upload_time2024-01-23 09:41:59
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT License
keywords audio tools vad speech speech processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # rVADfast
The Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as presented in [rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method, Computer Speech & Language, 2020](https://www.sciencedirect.com/science/article/pii/S0885230819300920) or its [arXiv version](https://arxiv.org/abs/1906.03588). 
More info on [the rVAD GitHub page](https://github.com/zhenghuatan/rVAD). 

***The rVAD paper published in Computer Speech & Language won International Speech Communication Association (ISCA) 2022 Best Research Paper Award.***

The rVAD method consists of two passes of denoising followed by a VAD stage. It has been applied as a preprocessor for 
a wide range of applications, such as speech recognition, speaker identification, language identification, age and 
gender identification, self-supervised learning, human-robot interaction, audio archive segmentation, 
and so on as in [Google Scholar](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=fugL2E8AAAAJ&citation_for_view=fugL2E8AAAAJ:-mN3Mh-tlDkC).  

The method is unsupervised to make it applicable to a broad range of acoustic environments, 
and it is optimized considering both noisy and clean conditions. 

The rVAD (out of the box) ranks the 4th place (out of 27 supervised/unsupervised systems) 
in a Fearless Steps Speech Activity Detection Challenge. 

The rVAD paper is among [the most cited articles from Computer Speech and Language published since 2018](https://www.journals.elsevier.com/computer-speech-and-language/most-cited-articles) (the 6th place), in 2023.

## Usage
The [rVADfast](https://pypi.org/project/rVADfast/) library is available as a python package installable via: 
```bash
pip install rVADfast
```
After installation, you can import the rVADfast class 
from which you can instantiate a VAD instance which you can use to generate vad labels:
```python
import audiofile
from rVADfast import rVADfast

vad = rVADfast()

path_to_audiofile = "some_audio_file.wav"

waveform, sampling_rate = audiofile.read(path_to_audiofile)
vad_labels, vad_timestamps = vad(waveform, sampling_rate)

```

The package also contains functionality to process folders of audio files, to generate VAD labels 
or to trim non-speeh segments from audio files.
This is done by importing the ```rVADfast.process``` module which has two methods for processing audio files, 
namely ```process.rVADfast_single_process``` and ```process.rVADfast_multi_process```, 
with the latter utilizing multiple CPUs for processing.
Additionally, a processing script can be called from commandline-tools by executing: 
```bash
rVADfast_process --root <audio_file_root> --save_folder <path_to_save_files> 
--ext <audio_file_extension> --n_workers <number_of_multiprocessing_workers>
```
For an explanation of the additional available arguments for the commandline tool you can use: 
```bash
rVADfast_process --help
```

In ```/notebooks``` a concrete example on how to use the rVADfast package is found.

*Note that the package is still in development.
Therefore, we welcome any feedback or suggestions for changes and/or additional features.*

## References
1) Z.-H. Tan, A.k. Sarkara and N. Dehak, "rVAD: an unsupervised segment-based robust voice activity detection method," Computer Speech and Language, vol. 59, pp. 1-21, 2020. 
2) Z.-H. Tan and B. Lindberg, "Low-complexity variable frame rate analysis for speech recognition and voice activity detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "rVADfast",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Holger Severin Bovbjerg <hsbo@es.aau.dk>, Zheng-Hua Tan <zt@es.aau.dk>",
    "keywords": "Audio,Tools,VAD,Speech,Speech Processing",
    "author": "",
    "author_email": "Zheng-Hua Tan <zt@es.aau.dk>, Achintya Kumar Sarkar <sarkar.achintya@gmail.com>, Holger Severin Bovbjerg <hsbo@es.aau.dk>",
    "download_url": "https://files.pythonhosted.org/packages/7c/0e/5fad0a2a3f72189d17ade9d5f27d927467178d0e4c089a5a669605bf6768/rVADfast-0.0.2.tar.gz",
    "platform": null,
    "description": "# rVADfast\nThe Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as presented in [rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method, Computer Speech & Language, 2020](https://www.sciencedirect.com/science/article/pii/S0885230819300920) or its [arXiv version](https://arxiv.org/abs/1906.03588). \nMore info on [the rVAD GitHub page](https://github.com/zhenghuatan/rVAD). \n\n***The rVAD paper published in Computer Speech & Language won International Speech Communication Association (ISCA) 2022 Best Research Paper Award.***\n\nThe rVAD method consists of two passes of denoising followed by a VAD stage. It has been applied as a preprocessor for \na wide range of applications, such as speech recognition, speaker identification, language identification, age and \ngender identification, self-supervised learning, human-robot interaction, audio archive segmentation, \nand so on as in [Google Scholar](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=fugL2E8AAAAJ&citation_for_view=fugL2E8AAAAJ:-mN3Mh-tlDkC).  \n\nThe method is unsupervised to make it applicable to a broad range of acoustic environments, \nand it is optimized considering both noisy and clean conditions. \n\nThe rVAD (out of the box) ranks the 4th place (out of 27 supervised/unsupervised systems) \nin a Fearless Steps Speech Activity Detection Challenge. \n\nThe rVAD paper is among [the most cited articles from Computer Speech and Language published since 2018](https://www.journals.elsevier.com/computer-speech-and-language/most-cited-articles) (the 6th place), in 2023.\n\n## Usage\nThe [rVADfast](https://pypi.org/project/rVADfast/) library is available as a python package installable via: \n```bash\npip install rVADfast\n```\nAfter installation, you can import the rVADfast class \nfrom which you can instantiate a VAD instance which you can use to generate vad labels:\n```python\nimport audiofile\nfrom rVADfast import rVADfast\n\nvad = rVADfast()\n\npath_to_audiofile = \"some_audio_file.wav\"\n\nwaveform, sampling_rate = audiofile.read(path_to_audiofile)\nvad_labels, vad_timestamps = vad(waveform, sampling_rate)\n\n```\n\nThe package also contains functionality to process folders of audio files, to generate VAD labels \nor to trim non-speeh segments from audio files.\nThis is done by importing the ```rVADfast.process``` module which has two methods for processing audio files, \nnamely ```process.rVADfast_single_process``` and ```process.rVADfast_multi_process```, \nwith the latter utilizing multiple CPUs for processing.\nAdditionally, a processing script can be called from commandline-tools by executing: \n```bash\nrVADfast_process --root <audio_file_root> --save_folder <path_to_save_files> \n--ext <audio_file_extension> --n_workers <number_of_multiprocessing_workers>\n```\nFor an explanation of the additional available arguments for the commandline tool you can use: \n```bash\nrVADfast_process --help\n```\n\nIn ```/notebooks``` a concrete example on how to use the rVADfast package is found.\n\n*Note that the package is still in development.\nTherefore, we welcome any feedback or suggestions for changes and/or additional features.*\n\n## References\n1) Z.-H. Tan, A.k. Sarkara and N. Dehak, \"rVAD: an unsupervised segment-based robust voice activity detection method,\" Computer Speech and Language, vol. 59, pp. 1-21, 2020. \n2) Z.-H. Tan and B. Lindberg, \"Low-complexity variable frame rate analysis for speech recognition and voice activity detection,\u201d IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "rVADfast - a fast and robust unsupervised VAD",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/zhenghuatan/rVADfast/",
        "Issues": "https://github.com/zhenghuatan/rVADfast/issues",
        "Repository": "https://github.com/zhenghuatan/rVADfast.git",
        "Source": "https://github.com/zhenghuatan/rVADfast/"
    },
    "split_keywords": [
        "audio",
        "tools",
        "vad",
        "speech",
        "speech processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "45217c6dcc24b3e53996f25e60b958489ff0812f755bbd5b2afe961391a315b6",
                "md5": "a74c1d86377329433663df53e04bf77e",
                "sha256": "9047ce426bf1995c533b5dc335ca5521b8813faaa3671e785f3ccd4b480111f0"
            },
            "downloads": -1,
            "filename": "rVADfast-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a74c1d86377329433663df53e04bf77e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18239,
            "upload_time": "2024-01-23T09:41:57",
            "upload_time_iso_8601": "2024-01-23T09:41:57.522180Z",
            "url": "https://files.pythonhosted.org/packages/45/21/7c6dcc24b3e53996f25e60b958489ff0812f755bbd5b2afe961391a315b6/rVADfast-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7c0e5fad0a2a3f72189d17ade9d5f27d927467178d0e4c089a5a669605bf6768",
                "md5": "8c9e5160890e3d8f834bc25e40015f57",
                "sha256": "f4c1964aa3a00d8ca3a1dc293a9f9a4a98c050b45a9016a45f1b91f4b846eeac"
            },
            "downloads": -1,
            "filename": "rVADfast-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8c9e5160890e3d8f834bc25e40015f57",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17250,
            "upload_time": "2024-01-23T09:41:59",
            "upload_time_iso_8601": "2024-01-23T09:41:59.368878Z",
            "url": "https://files.pythonhosted.org/packages/7c/0e/5fad0a2a3f72189d17ade9d5f27d927467178d0e4c089a5a669605bf6768/rVADfast-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-23 09:41:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zhenghuatan",
    "github_project": "rVADfast",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rvadfast"
}
        
Elapsed time: 0.71191s