wespeaker-nuaazs


Namewespeaker-nuaazs JSON
Version 0.0.5 PyPI version JSON
download
home_pageNone
SummarySpeaker Embedding
upload_time2024-10-14 12:58:11
maintainerNone
docs_urlNone
authorSheng Zhao
requires_python>=3.8
licenseApache Software License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # WeSpeaker

[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python-Version](https://img.shields.io/badge/Python-3.8%7C3.9-brightgreen)](https://github.com/wenet-e2e/wespeaker)

[**Roadmap**](ROADMAP.md)
| [**Docs**](http://wenet.org.cn/wespeaker)
| [**Paper**](https://arxiv.org/abs/2210.17016)
| [**Runtime**](https://github.com/wenet-e2e/wespeaker/tree/master/runtime)
| [**Pretrained Models**](docs/pretrained.md)
| [**Huggingface Demo**](https://huggingface.co/spaces/wenet/wespeaker_demo)
| [**Modelscope Demo**](https://www.modelscope.cn/studios/wenet/Speaker_Verification_in_WeSpeaker/summary)


WeSpeaker mainly focuses on [**speaker embedding learning**](https://wsstriving.github.io/talk/ncmmsc_slides_shuai.pdf), with application to the speaker verification task. We support
online feature extraction or loading pre-extracted features in kaldi-format.

## Installation

### Install python package
``` sh
pip install git+https://github.com/wenet-e2e/wespeaker_nuaazs.git
```
**Command-line usage** (use `-h` for parameters):

``` sh
$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding
$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav
$ wespeaker --task diarization --audio_file audio.wav
```

**Python programming usage**:

``` python
import wespeaker

model = wespeaker_nuaazs.load_model('chinese')
embedding = model.extract_embedding('audio.wav')
utt_names, embeddings = model.extract_embedding_list('wav.scp')
similarity = model.compute_similarity('audio1.wav', 'audio2.wav')
diar_result = model.diarize('audio.wav')
```

Please refer to [python usage](docs/python_package.md) for more command line and python programming usage.

### Install for development & deployment
* Clone this repo
``` sh
git clone https://github.com/wenet-e2e/wespeaker_nuaazs.git
```

* Create conda env: pytorch version >= 1.12.1 is recommended !!!
``` sh
conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
pre-commit install  # for clean and tidy code
```

## 🔥 News
* 2024.05.15: Add support for [score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320)
* 2024.04.25: Add support for the gemini-dfresnet model, see [#291](https://github.com/wenet-e2e/wespeaker/pull/291)
* 2024.04.23: Support MNN inference engine in runtime, see [#310](https://github.com/wenet-e2e/wespeaker/pull/310)
* 2024.04.02: Release [Wespeaker document](http://wenet.org.cn/wespeaker) with detailed model-training tutorials, introduction of various runtime platforms, etc.
* 2024.03.04: Support the [eres2net-cn-common-200k](https://www.modelscope.cn/models/iic/speech_eres2net_sv_zh-cn_16k-common/summary) and [campplus-cn-common-200k](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) of damo [#281](https://github.com/wenet-e2e/wespeaker/pull/281), check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.
* 2024.02.05: Support the ERes2Net [#272](https://github.com/wenet-e2e/wespeaker/pull/272) and Res2Net [#273](https://github.com/wenet-e2e/wespeaker/pull/273) models.
* 2023.11.13: Support CLI usage of wespeaker, check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.
* 2023.07.18: Support the kaldi-compatible PLDA and unsupervised adaptation, see [#186](https://github.com/wenet-e2e/wespeaker/pull/186).
* 2023.07.14: Support the [NIST SRE16 recipe](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016), see [#177](https://github.com/wenet-e2e/wespeaker/pull/177).

## Recipes

* [VoxCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb): Speaker Verification recipe on the [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)
    * 🔥 UPDATE 2024.05.15: We support score calibration for Voxceleb and achieve better performance!
    * 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving **2.627%** (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
    * 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving **0.447%/0.043** EER/mindcf on vox1-O-clean test set
    * 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
      - EER/minDCF on vox1-O-clean test set are **0.723%/0.069** (ResNet34) and **0.728%/0.099** (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
* [CNCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/cnceleb/v2): Speaker Verification recipe on the [CnCeleb dataset](http://cnceleb.org/)
    * 🔥 UPDATE 2024.05.16: We support score calibration for Cnceleb and achieve better EER.
    * 🔥 UPDATE 2022.10.31: 221-layer ResNet achieves **5.655%/0.330**  EER/minDCF
    * 🔥 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 [report](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082.pdf) [slides](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082-ZhengyangChen.pdf)
      - EER/minDCF reduction from 8.426%/0.487 to **6.492%/0.354** after large margin fine-tuning and AS-Norm
* [NIST SRE16](https://github.com/wenet-e2e/wespeaker/tree/master/examples/sre/v2): Speaker Verification recipe for the [2016 NIST Speaker Recognition Evaluation Plan](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016). Similar recipe can be found in [Kaldi](https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16).
   * 🔥 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.
* [VoxConverse](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse): Diarization recipe on the [VoxConverse dataset](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/)

## Discussion

For Chinese users, you can scan the QR code on the left to follow our offical account of `WeNet Community`.
We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.
| <img src="https://github.com/wenet-e2e/wenet-contributors/blob/main/wenet_official.jpeg" width="250px"> | <img src="https://github.com/wenet-e2e/wenet-contributors/blob/main/wespeaker/wangshuai.jpg" width="250px"> |
| ---- | ---- |

## Citations
If you find wespeaker useful, please cite it as
```bibtex
@inproceedings{wang2023wespeaker,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}
```
## Looking for contributors

If you are interested to contribute, feel free to contact @wsstriving or @robin1001

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "wespeaker-nuaazs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Sheng Zhao",
    "author_email": "zhaosheng@nuaa.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/37/91/4420f46e2c12e38a4ab151f67d0c35d467a3a5841bfd2d29e645a5d3537a/wespeaker_nuaazs-0.0.5.tar.gz",
    "platform": null,
    "description": "# WeSpeaker\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python-Version](https://img.shields.io/badge/Python-3.8%7C3.9-brightgreen)](https://github.com/wenet-e2e/wespeaker)\n\n[**Roadmap**](ROADMAP.md)\n| [**Docs**](http://wenet.org.cn/wespeaker)\n| [**Paper**](https://arxiv.org/abs/2210.17016)\n| [**Runtime**](https://github.com/wenet-e2e/wespeaker/tree/master/runtime)\n| [**Pretrained Models**](docs/pretrained.md)\n| [**Huggingface Demo**](https://huggingface.co/spaces/wenet/wespeaker_demo)\n| [**Modelscope Demo**](https://www.modelscope.cn/studios/wenet/Speaker_Verification_in_WeSpeaker/summary)\n\n\nWeSpeaker mainly focuses on [**speaker embedding learning**](https://wsstriving.github.io/talk/ncmmsc_slides_shuai.pdf), with application to the speaker verification task. We support\nonline feature extraction or loading pre-extracted features in kaldi-format.\n\n## Installation\n\n### Install python package\n``` sh\npip install git+https://github.com/wenet-e2e/wespeaker_nuaazs.git\n```\n**Command-line usage** (use `-h` for parameters):\n\n``` sh\n$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt\n$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding\n$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav\n$ wespeaker --task diarization --audio_file audio.wav\n```\n\n**Python programming usage**:\n\n``` python\nimport wespeaker\n\nmodel = wespeaker_nuaazs.load_model('chinese')\nembedding = model.extract_embedding('audio.wav')\nutt_names, embeddings = model.extract_embedding_list('wav.scp')\nsimilarity = model.compute_similarity('audio1.wav', 'audio2.wav')\ndiar_result = model.diarize('audio.wav')\n```\n\nPlease refer to [python usage](docs/python_package.md) for more command line and python programming usage.\n\n### Install for development & deployment\n* Clone this repo\n``` sh\ngit clone https://github.com/wenet-e2e/wespeaker_nuaazs.git\n```\n\n* Create conda env: pytorch version >= 1.12.1 is recommended !!!\n``` sh\nconda create -n wespeaker python=3.9\nconda activate wespeaker\nconda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge\npip install -r requirements.txt\npre-commit install  # for clean and tidy code\n```\n\n## \ud83d\udd25 News\n* 2024.05.15: Add support for [score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320)\n* 2024.04.25: Add support for the gemini-dfresnet model, see [#291](https://github.com/wenet-e2e/wespeaker/pull/291)\n* 2024.04.23: Support MNN inference engine in runtime, see [#310](https://github.com/wenet-e2e/wespeaker/pull/310)\n* 2024.04.02: Release [Wespeaker document](http://wenet.org.cn/wespeaker) with detailed model-training tutorials, introduction of various runtime platforms, etc.\n* 2024.03.04: Support the [eres2net-cn-common-200k](https://www.modelscope.cn/models/iic/speech_eres2net_sv_zh-cn_16k-common/summary) and [campplus-cn-common-200k](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) of damo [#281](https://github.com/wenet-e2e/wespeaker/pull/281), check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.\n* 2024.02.05: Support the ERes2Net [#272](https://github.com/wenet-e2e/wespeaker/pull/272) and Res2Net [#273](https://github.com/wenet-e2e/wespeaker/pull/273) models.\n* 2023.11.13: Support CLI usage of wespeaker, check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.\n* 2023.07.18: Support the kaldi-compatible PLDA and unsupervised adaptation, see [#186](https://github.com/wenet-e2e/wespeaker/pull/186).\n* 2023.07.14: Support the [NIST SRE16 recipe](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016), see [#177](https://github.com/wenet-e2e/wespeaker/pull/177).\n\n## Recipes\n\n* [VoxCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb): Speaker Verification recipe on the [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)\n    * \ud83d\udd25 UPDATE 2024.05.15: We support score calibration for Voxceleb and achieve better performance!\n    * \ud83d\udd25 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving **2.627%** (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.\n    * \ud83d\udd25 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving **0.447%/0.043** EER/mindcf on vox1-O-clean test set\n    * \ud83d\udd25 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems\n      - EER/minDCF on vox1-O-clean test set are **0.723%/0.069** (ResNet34) and **0.728%/0.099** (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm\n* [CNCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/cnceleb/v2): Speaker Verification recipe on the [CnCeleb dataset](http://cnceleb.org/)\n    * \ud83d\udd25 UPDATE 2024.05.16: We support score calibration for Cnceleb and achieve better EER.\n    * \ud83d\udd25 UPDATE 2022.10.31: 221-layer ResNet achieves **5.655%/0.330**  EER/minDCF\n    * \ud83d\udd25 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 [report](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082.pdf) [slides](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082-ZhengyangChen.pdf)\n      - EER/minDCF reduction from 8.426%/0.487 to **6.492%/0.354** after large margin fine-tuning and AS-Norm\n* [NIST SRE16](https://github.com/wenet-e2e/wespeaker/tree/master/examples/sre/v2): Speaker Verification recipe for the [2016 NIST Speaker Recognition Evaluation Plan](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016). Similar recipe can be found in [Kaldi](https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16).\n   * \ud83d\udd25 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.\n* [VoxConverse](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse): Diarization recipe on the [VoxConverse dataset](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/)\n\n## Discussion\n\nFor Chinese users, you can scan the QR code on the left to follow our offical account of `WeNet Community`.\nWe also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.\n| <img src=\"https://github.com/wenet-e2e/wenet-contributors/blob/main/wenet_official.jpeg\" width=\"250px\"> | <img src=\"https://github.com/wenet-e2e/wenet-contributors/blob/main/wespeaker/wangshuai.jpg\" width=\"250px\"> |\n| ---- | ---- |\n\n## Citations\nIf you find wespeaker useful, please cite it as\n```bibtex\n@inproceedings{wang2023wespeaker,\n  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},\n  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},\n  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n  pages={1--5},\n  year={2023},\n  organization={IEEE}\n}\n```\n## Looking for contributors\n\nIf you are interested to contribute, feel free to contact @wsstriving or @robin1001\n",
    "bugtrack_url": null,
    "license": "Apache Software License",
    "summary": "Speaker Embedding",
    "version": "0.0.5",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e9cd55d0bd04ae5431c707fbbab26ea2d6f78c47f9a726cfea45ac154af926e3",
                "md5": "da21fa1feeadb93ef7810342e05ce58c",
                "sha256": "8bab46156562d0dc6ab5048f72defdeaf80bfb23041af13e3020c71a527ccfa8"
            },
            "downloads": -1,
            "filename": "wespeaker_nuaazs-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "da21fa1feeadb93ef7810342e05ce58c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 70914,
            "upload_time": "2024-10-14T12:58:10",
            "upload_time_iso_8601": "2024-10-14T12:58:10.003449Z",
            "url": "https://files.pythonhosted.org/packages/e9/cd/55d0bd04ae5431c707fbbab26ea2d6f78c47f9a726cfea45ac154af926e3/wespeaker_nuaazs-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "37914420f46e2c12e38a4ab151f67d0c35d467a3a5841bfd2d29e645a5d3537a",
                "md5": "4f3f98763990733610fe99346c7addc0",
                "sha256": "f19a136400a4257e2f1ab9c8e9a645fe42078af086c1b0627714456c43f32249"
            },
            "downloads": -1,
            "filename": "wespeaker_nuaazs-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "4f3f98763990733610fe99346c7addc0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 50313,
            "upload_time": "2024-10-14T12:58:11",
            "upload_time_iso_8601": "2024-10-14T12:58:11.876961Z",
            "url": "https://files.pythonhosted.org/packages/37/91/4420f46e2c12e38a4ab151f67d0c35d467a3a5841bfd2d29e645a5d3537a/wespeaker_nuaazs-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-14 12:58:11",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "wespeaker-nuaazs"
}
        
Elapsed time: 4.94816s