# WeSpeaker
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python-Version](https://img.shields.io/badge/Python-3.8%7C3.9-brightgreen)](https://github.com/wenet-e2e/wespeaker)
[**Roadmap**](ROADMAP.md)
| [**Docs**](http://wenet.org.cn/wespeaker)
| [**Paper**](https://arxiv.org/abs/2210.17016)
| [**Runtime**](https://github.com/wenet-e2e/wespeaker/tree/master/runtime)
| [**Pretrained Models**](docs/pretrained.md)
| [**Huggingface Demo**](https://huggingface.co/spaces/wenet/wespeaker_demo)
| [**Modelscope Demo**](https://www.modelscope.cn/studios/wenet/Speaker_Verification_in_WeSpeaker/summary)
WeSpeaker mainly focuses on [**speaker embedding learning**](https://wsstriving.github.io/talk/ncmmsc_slides_shuai.pdf), with application to the speaker verification task. We support
online feature extraction or loading pre-extracted features in kaldi-format.
## Installation
### Install python package
``` sh
pip install git+https://github.com/wenet-e2e/wespeaker_nuaazs.git
```
**Command-line usage** (use `-h` for parameters):
``` sh
$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding
$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav
$ wespeaker --task diarization --audio_file audio.wav
```
**Python programming usage**:
``` python
import wespeaker
model = wespeaker_nuaazs.load_model('chinese')
embedding = model.extract_embedding('audio.wav')
utt_names, embeddings = model.extract_embedding_list('wav.scp')
similarity = model.compute_similarity('audio1.wav', 'audio2.wav')
diar_result = model.diarize('audio.wav')
```
Please refer to [python usage](docs/python_package.md) for more command line and python programming usage.
### Install for development & deployment
* Clone this repo
``` sh
git clone https://github.com/wenet-e2e/wespeaker_nuaazs.git
```
* Create conda env: pytorch version >= 1.12.1 is recommended !!!
``` sh
conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
pre-commit install # for clean and tidy code
```
## 🔥 News
* 2024.05.15: Add support for [score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320)
* 2024.04.25: Add support for the gemini-dfresnet model, see [#291](https://github.com/wenet-e2e/wespeaker/pull/291)
* 2024.04.23: Support MNN inference engine in runtime, see [#310](https://github.com/wenet-e2e/wespeaker/pull/310)
* 2024.04.02: Release [Wespeaker document](http://wenet.org.cn/wespeaker) with detailed model-training tutorials, introduction of various runtime platforms, etc.
* 2024.03.04: Support the [eres2net-cn-common-200k](https://www.modelscope.cn/models/iic/speech_eres2net_sv_zh-cn_16k-common/summary) and [campplus-cn-common-200k](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) of damo [#281](https://github.com/wenet-e2e/wespeaker/pull/281), check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.
* 2024.02.05: Support the ERes2Net [#272](https://github.com/wenet-e2e/wespeaker/pull/272) and Res2Net [#273](https://github.com/wenet-e2e/wespeaker/pull/273) models.
* 2023.11.13: Support CLI usage of wespeaker, check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.
* 2023.07.18: Support the kaldi-compatible PLDA and unsupervised adaptation, see [#186](https://github.com/wenet-e2e/wespeaker/pull/186).
* 2023.07.14: Support the [NIST SRE16 recipe](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016), see [#177](https://github.com/wenet-e2e/wespeaker/pull/177).
## Recipes
* [VoxCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb): Speaker Verification recipe on the [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)
* 🔥 UPDATE 2024.05.15: We support score calibration for Voxceleb and achieve better performance!
* 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving **2.627%** (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
* 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving **0.447%/0.043** EER/mindcf on vox1-O-clean test set
* 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
- EER/minDCF on vox1-O-clean test set are **0.723%/0.069** (ResNet34) and **0.728%/0.099** (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
* [CNCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/cnceleb/v2): Speaker Verification recipe on the [CnCeleb dataset](http://cnceleb.org/)
* 🔥 UPDATE 2024.05.16: We support score calibration for Cnceleb and achieve better EER.
* 🔥 UPDATE 2022.10.31: 221-layer ResNet achieves **5.655%/0.330** EER/minDCF
* 🔥 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 [report](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082.pdf) [slides](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082-ZhengyangChen.pdf)
- EER/minDCF reduction from 8.426%/0.487 to **6.492%/0.354** after large margin fine-tuning and AS-Norm
* [NIST SRE16](https://github.com/wenet-e2e/wespeaker/tree/master/examples/sre/v2): Speaker Verification recipe for the [2016 NIST Speaker Recognition Evaluation Plan](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016). Similar recipe can be found in [Kaldi](https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16).
* 🔥 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.
* [VoxConverse](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse): Diarization recipe on the [VoxConverse dataset](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/)
## Discussion
For Chinese users, you can scan the QR code on the left to follow our offical account of `WeNet Community`.
We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.
| <img src="https://github.com/wenet-e2e/wenet-contributors/blob/main/wenet_official.jpeg" width="250px"> | <img src="https://github.com/wenet-e2e/wenet-contributors/blob/main/wespeaker/wangshuai.jpg" width="250px"> |
| ---- | ---- |
## Citations
If you find wespeaker useful, please cite it as
```bibtex
@inproceedings{wang2023wespeaker,
title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
```
## Looking for contributors
If you are interested to contribute, feel free to contact @wsstriving or @robin1001
Raw data
{
"_id": null,
"home_page": null,
"name": "wespeaker-nuaazs",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Sheng Zhao",
"author_email": "zhaosheng@nuaa.edu.cn",
"download_url": "https://files.pythonhosted.org/packages/37/91/4420f46e2c12e38a4ab151f67d0c35d467a3a5841bfd2d29e645a5d3537a/wespeaker_nuaazs-0.0.5.tar.gz",
"platform": null,
"description": "# WeSpeaker\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python-Version](https://img.shields.io/badge/Python-3.8%7C3.9-brightgreen)](https://github.com/wenet-e2e/wespeaker)\n\n[**Roadmap**](ROADMAP.md)\n| [**Docs**](http://wenet.org.cn/wespeaker)\n| [**Paper**](https://arxiv.org/abs/2210.17016)\n| [**Runtime**](https://github.com/wenet-e2e/wespeaker/tree/master/runtime)\n| [**Pretrained Models**](docs/pretrained.md)\n| [**Huggingface Demo**](https://huggingface.co/spaces/wenet/wespeaker_demo)\n| [**Modelscope Demo**](https://www.modelscope.cn/studios/wenet/Speaker_Verification_in_WeSpeaker/summary)\n\n\nWeSpeaker mainly focuses on [**speaker embedding learning**](https://wsstriving.github.io/talk/ncmmsc_slides_shuai.pdf), with application to the speaker verification task. We support\nonline feature extraction or loading pre-extracted features in kaldi-format.\n\n## Installation\n\n### Install python package\n``` sh\npip install git+https://github.com/wenet-e2e/wespeaker_nuaazs.git\n```\n**Command-line usage** (use `-h` for parameters):\n\n``` sh\n$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt\n$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding\n$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav\n$ wespeaker --task diarization --audio_file audio.wav\n```\n\n**Python programming usage**:\n\n``` python\nimport wespeaker\n\nmodel = wespeaker_nuaazs.load_model('chinese')\nembedding = model.extract_embedding('audio.wav')\nutt_names, embeddings = model.extract_embedding_list('wav.scp')\nsimilarity = model.compute_similarity('audio1.wav', 'audio2.wav')\ndiar_result = model.diarize('audio.wav')\n```\n\nPlease refer to [python usage](docs/python_package.md) for more command line and python programming usage.\n\n### Install for development & deployment\n* Clone this repo\n``` sh\ngit clone https://github.com/wenet-e2e/wespeaker_nuaazs.git\n```\n\n* Create conda env: pytorch version >= 1.12.1 is recommended !!!\n``` sh\nconda create -n wespeaker python=3.9\nconda activate wespeaker\nconda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge\npip install -r requirements.txt\npre-commit install # for clean and tidy code\n```\n\n## \ud83d\udd25 News\n* 2024.05.15: Add support for [score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320)\n* 2024.04.25: Add support for the gemini-dfresnet model, see [#291](https://github.com/wenet-e2e/wespeaker/pull/291)\n* 2024.04.23: Support MNN inference engine in runtime, see [#310](https://github.com/wenet-e2e/wespeaker/pull/310)\n* 2024.04.02: Release [Wespeaker document](http://wenet.org.cn/wespeaker) with detailed model-training tutorials, introduction of various runtime platforms, etc.\n* 2024.03.04: Support the [eres2net-cn-common-200k](https://www.modelscope.cn/models/iic/speech_eres2net_sv_zh-cn_16k-common/summary) and [campplus-cn-common-200k](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) of damo [#281](https://github.com/wenet-e2e/wespeaker/pull/281), check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.\n* 2024.02.05: Support the ERes2Net [#272](https://github.com/wenet-e2e/wespeaker/pull/272) and Res2Net [#273](https://github.com/wenet-e2e/wespeaker/pull/273) models.\n* 2023.11.13: Support CLI usage of wespeaker, check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.\n* 2023.07.18: Support the kaldi-compatible PLDA and unsupervised adaptation, see [#186](https://github.com/wenet-e2e/wespeaker/pull/186).\n* 2023.07.14: Support the [NIST SRE16 recipe](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016), see [#177](https://github.com/wenet-e2e/wespeaker/pull/177).\n\n## Recipes\n\n* [VoxCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb): Speaker Verification recipe on the [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)\n * \ud83d\udd25 UPDATE 2024.05.15: We support score calibration for Voxceleb and achieve better performance!\n * \ud83d\udd25 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving **2.627%** (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.\n * \ud83d\udd25 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving **0.447%/0.043** EER/mindcf on vox1-O-clean test set\n * \ud83d\udd25 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems\n - EER/minDCF on vox1-O-clean test set are **0.723%/0.069** (ResNet34) and **0.728%/0.099** (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm\n* [CNCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/cnceleb/v2): Speaker Verification recipe on the [CnCeleb dataset](http://cnceleb.org/)\n * \ud83d\udd25 UPDATE 2024.05.16: We support score calibration for Cnceleb and achieve better EER.\n * \ud83d\udd25 UPDATE 2022.10.31: 221-layer ResNet achieves **5.655%/0.330** EER/minDCF\n * \ud83d\udd25 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 [report](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082.pdf) [slides](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082-ZhengyangChen.pdf)\n - EER/minDCF reduction from 8.426%/0.487 to **6.492%/0.354** after large margin fine-tuning and AS-Norm\n* [NIST SRE16](https://github.com/wenet-e2e/wespeaker/tree/master/examples/sre/v2): Speaker Verification recipe for the [2016 NIST Speaker Recognition Evaluation Plan](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016). Similar recipe can be found in [Kaldi](https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16).\n * \ud83d\udd25 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.\n* [VoxConverse](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse): Diarization recipe on the [VoxConverse dataset](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/)\n\n## Discussion\n\nFor Chinese users, you can scan the QR code on the left to follow our offical account of `WeNet Community`.\nWe also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.\n| <img src=\"https://github.com/wenet-e2e/wenet-contributors/blob/main/wenet_official.jpeg\" width=\"250px\"> | <img src=\"https://github.com/wenet-e2e/wenet-contributors/blob/main/wespeaker/wangshuai.jpg\" width=\"250px\"> |\n| ---- | ---- |\n\n## Citations\nIf you find wespeaker useful, please cite it as\n```bibtex\n@inproceedings{wang2023wespeaker,\n title={Wespeaker: A research and production oriented speaker embedding learning toolkit},\n author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},\n booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n pages={1--5},\n year={2023},\n organization={IEEE}\n}\n```\n## Looking for contributors\n\nIf you are interested to contribute, feel free to contact @wsstriving or @robin1001\n",
"bugtrack_url": null,
"license": "Apache Software License",
"summary": "Speaker Embedding",
"version": "0.0.5",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e9cd55d0bd04ae5431c707fbbab26ea2d6f78c47f9a726cfea45ac154af926e3",
"md5": "da21fa1feeadb93ef7810342e05ce58c",
"sha256": "8bab46156562d0dc6ab5048f72defdeaf80bfb23041af13e3020c71a527ccfa8"
},
"downloads": -1,
"filename": "wespeaker_nuaazs-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "da21fa1feeadb93ef7810342e05ce58c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 70914,
"upload_time": "2024-10-14T12:58:10",
"upload_time_iso_8601": "2024-10-14T12:58:10.003449Z",
"url": "https://files.pythonhosted.org/packages/e9/cd/55d0bd04ae5431c707fbbab26ea2d6f78c47f9a726cfea45ac154af926e3/wespeaker_nuaazs-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "37914420f46e2c12e38a4ab151f67d0c35d467a3a5841bfd2d29e645a5d3537a",
"md5": "4f3f98763990733610fe99346c7addc0",
"sha256": "f19a136400a4257e2f1ab9c8e9a645fe42078af086c1b0627714456c43f32249"
},
"downloads": -1,
"filename": "wespeaker_nuaazs-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "4f3f98763990733610fe99346c7addc0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 50313,
"upload_time": "2024-10-14T12:58:11",
"upload_time_iso_8601": "2024-10-14T12:58:11.876961Z",
"url": "https://files.pythonhosted.org/packages/37/91/4420f46e2c12e38a4ab151f67d0c35d467a3a5841bfd2d29e645a5d3537a/wespeaker_nuaazs-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-14 12:58:11",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "wespeaker-nuaazs"
}