## Speech Denoiser System (With NVIDIA CleanUNet)
Developed by <b> Mahfuzul Kabir</b>, \
Machine Learning Engineer, \
ACI Limited \
Website: <a href = mahfuzulkabir.com>MahfuzulKabir.com</a>
<img alt="An image of the gradio app" src="assets/image.png">
## Description:
This system is the backbone of speech denoiser system, incorporating NVIDIA's CleanUNet. The denoiser module is built for use in any python system with ease. The system also offers an API for usage in developments.
## Algorithmic difficulties
The CleanUNet is a very GPU hungry system. Upon my own testing, it can take upto 10GB of GPU usage for a 3 minute audio file (I know, crazy and shouldn't be like that at all). I guess the underlying reason is very poor handling and optimization of data usage in CleanUNet.
To tackle such situation, I extensively used batching. After some initial EDA, I found that the system can handle audio file roughly upto 3 seconds without any problem (I have checked with 60s audio file and it still crashes CUDA). So, each audio file is divided in chunks of 3 seconds and created batches, hoping for batch inference achieving faster and optimized system.
I wish that was the case. In case of longer audios, the number of batches are huge and yet again, CleanUNet fails to handle data properly, causing to crash the system and demanding 5GB of GPU memory while inferencing with 3s audio chunks. So, I batched them. The underlying code uses batches of 20 audio chunks. The number of each batch containing audio chunks and the duration of each audio chunks can be controlled using <b>'chunk_length_s'</b> and <b>'max_batch_size'</b> arguements while initializing the module (<i>default values are defined, so if you don't want to experiment, no need to worry</i>).
This approach creates amazing results. A 50 minute audio from World Economic Forum took only 4s to process. And about performance of CleanUNet itself, give it a try and see for yourself (spoiler: it's amazinggg).
I plan to add more speech enhancement options in this module as time goes. Surely you can contribute yourself or give me ideas that can be done as well.
Thank you for using this system. Give it a <b>STAR</b> if it helps you in any way.
## Usage
#### Use with user interface:
The latest addition to this project is the addition of gradio interface for non-tech people. Simple run the following codes in your terminal and a gradio user interface will be launched.
```
# Clone the git repository
git clone https://github.com/Kabir5296/Speech-Denoiser-System.git
# Change directory
cd Speech-Denoiser-System
# Install necessary modules
pip install -r requirements.txt
# Launch gradio_app.py
python gradio_app.py
```
The gradio app will be accessible at https://127.0.0.0:7862
#### Use wiith API:
To use with API follow the commands below. The system was built on python3.9 and it's preferable to use in same version. Create a python environment using python3.9 if needed.
```
# Clone the git repository
git clone https://github.com/Kabir5296/Speech-Denoiser-System.git
# Change directory
cd Speech-Denoiser-System
# Install necessary modules
pip install -r requirements.txt
# Launch main.py
python main.py
```
The APIs can be accessed at https://127.0.0.0:8877 and access the swagger at https://127.0.0.0:8877/docs
#### Use with denoiser module:
For usage in development, the denoiser module can be used. To do so, simply initialize the DenoiserAudio module. The following codes can be helpful.
```
from denoiser import DenoiserAudio
denoise = DenoiserAudio()
denoised_audio = DenoiserAudio("--> path/to/your/audio/file <--")
```
To save the denoised audio use the following code.
```
import torchaudio
torchaudio.save(output_filename,
torch.from_numpy(denoised_audio).unsqueeze(0),
sample_rate = 16000)
```
## Citation
Don't forget to cite NVIDIA's original paper on CleanUNet. Kudos to them (not for handling the data efficiently though :p )
```
@inproceedings{kong2022speech,
title={Speech Denoising in the Waveform Domain with Self-Attention},
author={Kong, Zhifeng and Ping, Wei and Dantrey, Ambrish and Catanzaro, Bryan},
booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={7867--7871},
year={2022},
organization={IEEE}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Kabir5296/Audio-Denoiser.git",
"name": "cleanunet-denoiser",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "audio denoise, denoise, cleanunet, speech denoise",
"author": "A F M Mahfuzul Kabir",
"author_email": "<afmmahfuzulkabir@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/03/be/a58899ea91c632819f5c0eda3d72148655dc0e04df32eebe67263bb2b30f/cleanunet_denoiser-0.0.1.tar.gz",
"platform": null,
"description": "## Speech Denoiser System (With NVIDIA CleanUNet)\nDeveloped by <b> Mahfuzul Kabir</b>, \\\nMachine Learning Engineer, \\\nACI Limited \\\nWebsite: <a href = mahfuzulkabir.com>MahfuzulKabir.com</a>\n\n<img alt=\"An image of the gradio app\" src=\"assets/image.png\">\n\n## Description:\nThis system is the backbone of speech denoiser system, incorporating NVIDIA's CleanUNet. The denoiser module is built for use in any python system with ease. The system also offers an API for usage in developments.\n\n## Algorithmic difficulties\nThe CleanUNet is a very GPU hungry system. Upon my own testing, it can take upto 10GB of GPU usage for a 3 minute audio file (I know, crazy and shouldn't be like that at all). I guess the underlying reason is very poor handling and optimization of data usage in CleanUNet.\n\nTo tackle such situation, I extensively used batching. After some initial EDA, I found that the system can handle audio file roughly upto 3 seconds without any problem (I have checked with 60s audio file and it still crashes CUDA). So, each audio file is divided in chunks of 3 seconds and created batches, hoping for batch inference achieving faster and optimized system.\n\nI wish that was the case. In case of longer audios, the number of batches are huge and yet again, CleanUNet fails to handle data properly, causing to crash the system and demanding 5GB of GPU memory while inferencing with 3s audio chunks. So, I batched them. The underlying code uses batches of 20 audio chunks. The number of each batch containing audio chunks and the duration of each audio chunks can be controlled using <b>'chunk_length_s'</b> and <b>'max_batch_size'</b> arguements while initializing the module (<i>default values are defined, so if you don't want to experiment, no need to worry</i>).\n\nThis approach creates amazing results. A 50 minute audio from World Economic Forum took only 4s to process. And about performance of CleanUNet itself, give it a try and see for yourself (spoiler: it's amazinggg).\n\nI plan to add more speech enhancement options in this module as time goes. Surely you can contribute yourself or give me ideas that can be done as well.\n\nThank you for using this system. Give it a <b>STAR</b> if it helps you in any way.\n\n## Usage\n#### Use with user interface:\nThe latest addition to this project is the addition of gradio interface for non-tech people. Simple run the following codes in your terminal and a gradio user interface will be launched.\n\n```\n# Clone the git repository\ngit clone https://github.com/Kabir5296/Speech-Denoiser-System.git\n\n# Change directory\ncd Speech-Denoiser-System\n\n# Install necessary modules\npip install -r requirements.txt\n\n# Launch gradio_app.py\npython gradio_app.py\n```\n\nThe gradio app will be accessible at https://127.0.0.0:7862 \n\n#### Use wiith API:\nTo use with API follow the commands below. The system was built on python3.9 and it's preferable to use in same version. Create a python environment using python3.9 if needed.\n\n```\n# Clone the git repository\ngit clone https://github.com/Kabir5296/Speech-Denoiser-System.git\n\n# Change directory\ncd Speech-Denoiser-System\n\n# Install necessary modules\npip install -r requirements.txt\n\n# Launch main.py\npython main.py\n```\n\nThe APIs can be accessed at https://127.0.0.0:8877 and access the swagger at https://127.0.0.0:8877/docs\n\n#### Use with denoiser module:\nFor usage in development, the denoiser module can be used. To do so, simply initialize the DenoiserAudio module. The following codes can be helpful.\n\n```\nfrom denoiser import DenoiserAudio\n\ndenoise = DenoiserAudio()\n\ndenoised_audio = DenoiserAudio(\"--> path/to/your/audio/file <--\")\n```\n\nTo save the denoised audio use the following code.\n\n```\nimport torchaudio\n\ntorchaudio.save(output_filename, \n torch.from_numpy(denoised_audio).unsqueeze(0), \n sample_rate = 16000)\n```\n\n## Citation\nDon't forget to cite NVIDIA's original paper on CleanUNet. Kudos to them (not for handling the data efficiently though :p )\n```\n@inproceedings{kong2022speech,\n title={Speech Denoising in the Waveform Domain with Self-Attention},\n author={Kong, Zhifeng and Ping, Wei and Dantrey, Ambrish and Catanzaro, Bryan},\n booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n pages={7867--7871},\n year={2022},\n organization={IEEE}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "CleanUNet based audio denoiser",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/Kabir5296/Audio-Denoiser.git"
},
"split_keywords": [
"audio denoise",
" denoise",
" cleanunet",
" speech denoise"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f19039568ceae0f1f548122b2969b1c6091fb5111bc199232d3a0d50b31ef69c",
"md5": "6d710aa208c4ba14964f49939f46b8c7",
"sha256": "3f0baa144b340cadc62e2d76e02230f0cd4b9118a8497965c1388d2d470dd789"
},
"downloads": -1,
"filename": "cleanunet_denoiser-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6d710aa208c4ba14964f49939f46b8c7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 4213,
"upload_time": "2024-09-25T07:49:59",
"upload_time_iso_8601": "2024-09-25T07:49:59.618855Z",
"url": "https://files.pythonhosted.org/packages/f1/90/39568ceae0f1f548122b2969b1c6091fb5111bc199232d3a0d50b31ef69c/cleanunet_denoiser-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "03bea58899ea91c632819f5c0eda3d72148655dc0e04df32eebe67263bb2b30f",
"md5": "0a34ea5595c7a482dc1056f780f647c3",
"sha256": "a8bb25c3c1be1900a8003497ed8647ad8be6affbbf219d2176f1b5215e6bc021"
},
"downloads": -1,
"filename": "cleanunet_denoiser-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "0a34ea5595c7a482dc1056f780f647c3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 48478,
"upload_time": "2024-09-25T07:50:01",
"upload_time_iso_8601": "2024-09-25T07:50:01.815920Z",
"url": "https://files.pythonhosted.org/packages/03/be/a58899ea91c632819f5c0eda3d72148655dc0e04df32eebe67263bb2b30f/cleanunet_denoiser-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-25 07:50:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Kabir5296",
"github_project": "Audio-Denoiser",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "torch",
"specs": [
[
"==",
"2.4.1"
]
]
},
{
"name": "torchaudio",
"specs": [
[
"==",
"2.4.1"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.0.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "soundfile",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.44.2"
]
]
},
{
"name": "cleanunet",
"specs": [
[
"==",
"0.0.3"
]
]
},
{
"name": "fastapi",
"specs": []
},
{
"name": "fastapi",
"specs": [
[
"==",
"0.115.0"
]
]
},
{
"name": "gradio",
"specs": [
[
"==",
"4.44.0"
]
]
}
],
"lcname": "cleanunet-denoiser"
}