mvector


Namemvector JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/yeyupiaoling/VoiceprintRecognition_Pytorch
SummaryVoice Print Recognition toolkit on Pytorch
upload_time2024-10-13 12:33:29
maintainerNone
docs_urlNone
authoryeyupiaoling
requires_pythonNone
licenseApache License 2.0
keywords voice pytorch
VCS
bugtrack_url
requirements numpy tqdm visualdl resampy soundfile soundcard pyyaml scikit-learn pydub torchinfo loguru yeaudio
Travis-CI No Travis.
coveralls test coverage No coveralls.
            简体中文 | [English](./README_en.md)

# 基于Pytorch实现的声纹识别系统

![python version](https://img.shields.io/badge/python-3.8+-orange.svg)
![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/VoiceprintRecognition-Pytorch)
![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/VoiceprintRecognition-Pytorch)
![GitHub](https://img.shields.io/github/license/yeyupiaoling/VoiceprintRecognition-Pytorch)
![支持系统](https://img.shields.io/badge/支持系统-Win/Linux/MAC-9cf)

本分支为1.1版本,如果要使用之前的1.0版本请在[1.0分支](https://github.com/yeyupiaoling/VoiceprintRecognition-Pytorch/tree/release/1.0.5)使用。本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型,不排除以后会支持更多模型,同时本项目也支持了MelSpectrogram、Spectrogram、MFCC、Fbank等多种数据预处理方法,使用了ArcFace Loss,ArcFace loss:Additive Angular Margin Loss(加性角度间隔损失函数),对应项目中的AAMLoss,对特征向量和权重归一化,对θ加上角度间隔m,角度间隔比余弦间隔在对角度的影响更加直接,除此之外,还支持AMLoss、ARMLoss、CELoss等多种损失函数。

**本项目是如果对你有帮助,欢迎Star,避免之后需要找不到了。**

**欢迎大家扫码入知识星球或者QQ群讨论,知识星球里面提供项目的模型文件和博主其他相关项目的模型文件,也包括其他一些资源。**

<div align="center">
  <img src="https://yeyupiaoling.cn/zsxq.png" alt="知识星球" width="400">
  <img src="https://yeyupiaoling.cn/qq.png" alt="QQ群" width="400">
</div>

使用环境:

 - Anaconda 3
 - Python 3.11
 - Pytorch 2.4.0
 - Windows 11 or Ubuntu 22.04

# 目录

- [项目介绍](#基于Pytorch实现的声纹识别系统)
- [项目记录](#项目记录)
- [项目特性](#项目特性)
- [安装环境](#安装环境)
- [创建数据](#创建数据)
- [修改预处理方法(可选)](#修改预处理方法可选)
- [提取特征(可选)](#提取特征可选)
- [训练模型](#训练模型)
- [评估模型](#评估模型)
- [推理接口](#推理接口)
- [声纹对比](#声纹对比)
- [声纹识别](#声纹识别)
- [说话人日志(分离说话人)](#说话人日志分离说话人)


# 项目记录

1. 2024.10.12:发布1.1版本。

# 项目特性

1. 支持模型:EcapaTdnn、TDNN、Res2Net、ResNetSE、ERes2Net、CAM++
2. 支持池化层:AttentiveStatsPool(ASP)、SelfAttentivePooling(SAP)、TemporalStatisticsPooling(TSP)、TemporalAveragePooling(TAP)、TemporalStatsPool(TSTP)
3. 支持损失函数:AAMLoss、SphereFace2、AMLoss、ARMLoss、CELoss、SubCenterLoss、TripletAngularMarginLoss
4. 支持预处理方法:MelSpectrogram、Spectrogram、MFCC、Fbank、Wav2vec2.0、WavLM
5. 支持数据增强方法:语速增强、音量增强、噪声增强、混响增强、SpecAugment


**模型论文:**

- EcapaTdnn:[ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/abs/2005.07143v3)
- PANNS:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/abs/1912.10211v5)
- TDNN:[Prediction of speech intelligibility with DNN-based performance measures](https://arxiv.org/abs/2203.09148)
- Res2Net:[Res2Net: A New Multi-scale Backbone Architecture](https://arxiv.org/abs/1904.01169)
- ResNetSE:[Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)
- CAMPPlus:[CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking](https://arxiv.org/abs/2303.00332v3)
- ERes2Net:[An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification](https://arxiv.org/abs/2305.12838v1)


# 模型下载

### 训练CN-Celeb数据,共有2796个说话人。

|     模型     | Params(M) |                数据集                 | train speakers | threshold |   EER   | MinDCF  |   模型下载   |
|:----------:|:---------:|:----------------------------------:|:--------------:|:---------:|:-------:|:-------:|:--------:|
| ERes2NetV2 |    6.6    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.20089  | 0.08071 | 0.45705 | 加入知识星球获取 |
|  ERes2Net  |    6.6    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.20014  | 0.08132 | 0.45544 | 加入知识星球获取 |
|   CAM++    |    6.8    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.23323  | 0.08332 | 0.48536 | 加入知识星球获取 |
|  ResNetSE  |    7.8    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.19066  | 0.08544 | 0.49142 | 加入知识星球获取 |
| EcapaTdnn  |    6.1    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.23646  | 0.09259 | 0.51378 | 加入知识星球获取 |
|    TDNN    |    2.6    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.23858  | 0.10825 | 0.59545 | 加入知识星球获取 |
|  Res2Net   |    5.0    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.19526  | 0.12436 | 0.65347 | 加入知识星球获取 |
|   CAM++    |    6.8    |               更大数据集                |      2W+       |   0.33    | 0.07874 | 0.52524 | 加入知识星球获取 |
|  ERes2Net  |   55.1    |               其他数据集                |      20W+      |   0.36    | 0.02936 | 0.18355 | 加入知识星球获取 |
| ERes2NetV2 |   56.2    |               其他数据集                |      20W+      |   0.36    | 0.03847 | 0.24301 | 加入知识星球获取 |
|   CAM++    |    6.8    |               其他数据集                |      20W+      |   0.29    | 0.04765 | 0.31436 | 加入知识星球获取 |

说明:
1. 评估的测试集为[CN-Celeb的测试集](https://aistudio.baidu.com/aistudio/datasetdetail/233361),包含196个说话人。
2. 使用语速增强分类大小翻三倍`speed_perturb_3_class: True`。
3. 使用的预处理方法为`Fbank`,损失函数为`AAMLoss`。
4. 参数数量不包含了分类器的参数数量。
5. 使用了噪声增强和混响增强。


### 训练VoxCeleb1&2数据,共有7205个说话人。

|     模型     | Params(M) |     数据集     | train speakers | threshold |   EER   | MinDCF  |   模型下载   |
|:----------:|:---------:|:-----------:|:--------------:|:---------:|:-------:|:-------:|:--------:|
|   CAM++    |    6.8    | VoxCeleb1&2 |      7205      |  0.22504  | 0.02436 | 0.15543 | 加入知识星球获取 |
| EcapaTdnn  |    6.1    | VoxCeleb1&2 |      7205      |  0.24877  | 0.02480 | 0.16188 | 加入知识星球获取 |
|  ResNetSE  |    7.8    | VoxCeleb1&2 |      7205      |  0.22567  | 0.03189 | 0.23040 | 加入知识星球获取 |
|    TDNN    |    2.6    | VoxCeleb1&2 |      7205      |  0.23834  | 0.03486 | 0.26792 | 加入知识星球获取 |
|  Res2Net   |    5.0    | VoxCeleb1&2 |      7205      |  0.19472  | 0.04370 | 0.40072 | 加入知识星球获取 |
|  ERes2Net  |    6.6    | VoxCeleb1&2 |      7205      |           |         |         | 加入知识星球获取 |
|   CAM++    |    6.8    |    更大数据集    |      2W+       |   0.28    | 0.03182 | 0.23731 | 加入知识星球获取 |
|  ERes2Net  |   55.1    |    其他数据集    |      20W+      |   0.53    | 0.08904 | 0.62130 | 加入知识星球获取 |
| ERes2NetV2 |   56.2    |    其他数据集    |      20W+      |   0.52    | 0.08649 | 0.64193 | 加入知识星球获取 |
|   CAM++    |    6.8    |    其他数据集    |      20W+      |   0.49    | 0.10334 | 0.71200 | 加入知识星球获取 |

说明:

1. 评估的测试集为[VoxCeleb1&2的测试集](https://aistudio.baidu.com/aistudio/datasetdetail/255977),包含158个说话人。
2. 使用语速增强分类大小翻三倍`speed_perturb_3_class: True`。
3. 使用的预处理方法为`Fbank`,损失函数为`AAMLoss`。
4. 参数数量不包含了分类器的参数数量。


### 预处理方法效果对比实验

|                                      预处理方法                                       |   数据集    | train speakers | threshold |   EER   | MinDCF  |   模型下载   |
|:--------------------------------------------------------------------------------:|:--------:|:--------------:|:---------:|:-------:|:-------:|:--------:|
|                                      Fbank                                       | CN-Celeb |      2796      |  0.14574  | 0.10988 | 0.58955 | 加入知识星球获取 |
|                                       MFCC                                       | CN-Celeb |      2796      |  0.14868  | 0.11483 | 0.61275 | 加入知识星球获取 |
|                                   Spectrogram                                    | CN-Celeb |      2796      |  0.14962  | 0.11613 | 0.60057 | 加入知识星球获取 |
|                                  MelSpectrogram                                  | CN-Celeb |      2796      |  0.13458  | 0.12498 | 0.60741 | 加入知识星球获取 |
|       [wavlm-base-plus](https://huggingface.co/microsoft/wavlm-base-plus)        | CN-Celeb |      2796      |  0.14166  | 0.13247 | 0.62451 | 加入知识星球获取 |
|           [w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0)           | CN-Celeb |      2796      |           |         |         | 加入知识星球获取 |
| [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) | CN-Celeb |      2796      |           |         |         | 加入知识星球获取 |
|           [wavlm-large](https://huggingface.co/microsoft/wavlm-large)            | CN-Celeb |      2796      |           |         |         | 加入知识星球获取 |

说明:

1. 评估的测试集为[CN-Celeb的测试集](https://aistudio.baidu.com/aistudio/datasetdetail/233361),包含196个说话人。
2. 实验数据为[CN-Celeb](http://openslr.org/82/),实验模型为`CAM++`,损失函数为`AAMLoss`。
3. 数据使用`extract_features.py`提前提取特征,也就是说训练中没有使用对音频的数据增强。
4. `w2v-bert-2.0`、`wav2vec2-large-xlsr-53`是多语言数据预训练得到的,`wavlm-base-plus`、`wavlm-large`的预训练数据仅用英文。


### 损失函数效果对比实验

|           损失函数           |   数据集    | train speakers | threshold |   EER   | MinDCF  |   模型下载   |
|:------------------------:|:--------:|:--------------:|:---------:|:-------:|:-------:|:--------:|
|         AAMLoss          | CN-Celeb |      2796      |  0.14574  | 0.10988 | 0.58955 | 加入知识星球获取 |
|       SphereFace2        | CN-Celeb |      2796      |  0.20377  | 0.11309 | 0.61536 | 加入知识星球获取 |
| TripletAngularMarginLoss | CN-Celeb |      2796      |  0.28940  | 0.11749 | 0.63735 | 加入知识星球获取 |
|      SubCenterLoss       | CN-Celeb |      2796      |  0.13126  | 0.11775 | 0.56995 | 加入知识星球获取 |
|         ARMLoss          | CN-Celeb |      2796      |  0.14563  | 0.11805 | 0.57171 | 加入知识星球获取 |
|          AMLoss          | CN-Celeb |      2796      |  0.12870  | 0.12301 | 0.63263 | 加入知识星球获取 |
|          CELoss          | CN-Celeb |      2796      |  0.13607  | 0.12684 | 0.65176 | 加入知识星球获取 |

说明:

1. 评估的测试集为[CN-Celeb的测试集](https://aistudio.baidu.com/aistudio/datasetdetail/233361),包含196个说话人。
2. 实验数据为[CN-Celeb](http://openslr.org/82/),实验模型为`CAM++`,预处理方法为`Fbank`。
3. 数据使用`extract_features.py`提前提取特征,也就是说训练中没有使用对音频的数据增强。


## 安装环境

 - 首先安装的是Pytorch的GPU版本,如果已经安装过了,请跳过。
```shell
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
```

 - 安装ppvector库。
 
使用pip安装,命令如下:
```shell
python -m pip install mvector -U -i https://pypi.tuna.tsinghua.edu.cn/simple
```

**建议源码安装**,源码安装能保证使用最新代码。
```shell
git clone https://github.com/yeyupiaoling/VoiceprintRecognition-Pytorch.git
cd VoiceprintRecognition-Pytorch/
pip install .
```

# 创建数据
本教程笔者使用的是[CN-Celeb](https://openslr.elda.org/resources/82),这个数据集一共有约3000个人的语音数据,有65W+条语音数据,下载之后要解压数据集到`dataset`目录,另外如果要评估,还需要下载[CN-Celeb的测试集](https://aistudio.baidu.com/aistudio/datasetdetail/233361)。如果读者有其他更好的数据集,可以混合在一起使用,但最好是要用python的工具模块aukit处理音频,降噪和去除静音。

首先是创建一个数据列表,数据列表的格式为`<语音文件路径\t语音分类标签>`,创建这个列表主要是方便之后的读取,也是方便读取使用其他的语音数据集,语音分类标签是指说话人的唯一ID,不同的语音数据集,可以通过编写对应的生成数据列表的函数,把这些数据集都写在同一个数据列表中。

执行`create_data.py`程序完成数据准备。
```shell
python create_data.py
```

执行上面的程序之后,会生成以下的数据格式,如果要自定义数据,参考如下数据列表,前面是音频的相对路径,后面的是该音频对应的说话人的标签,就跟分类一样。**自定义数据集的注意**,测试数据列表的ID可以不用跟训练的ID一样,也就是说测试的数据的说话人可以不用出现在训练集,只要保证测试数据列表中同一个人相同的ID即可。
```
dataset/CN-Celeb2_flac/data/id11999/recitation-03-019.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-10-023.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-06-025.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-04-014.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-06-030.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-10-032.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-06-028.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-10-031.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-05-003.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-04-017.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-10-016.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-09-001.flac      2795
dataset/CN-Celeb2_flac/data/id11999/recitation-05-010.flac      2795
```

# 修改预处理方法(可选)

配置文件中默认使用的是Fbank预处理方法,如果要使用其他预处理方法,可以修改配置文件中的安装下面方式修改,具体的值可以根据自己情况修改。如果不清楚如何设置参数,可以直接删除该部分,直接使用默认值。

```yaml
# 数据预处理参数
preprocess_conf:
  # 是否使用HF上的Wav2Vec2类似模型提取音频特征
  use_hf_model: False
  # 音频预处理方法,也可以叫特征提取方法
  # 当use_hf_model为False时,支持:MelSpectrogram、Spectrogram、MFCC、Fbank
  # 当use_hf_model为True时,指定的是HuggingFace的模型或者本地路径,比如facebook/w2v-bert-2.0或者./feature_models/w2v-bert-2.0
  feature_method: 'Fbank'
  # 当use_hf_model为False时,设置API参数,更参数查看对应API,不清楚的可以直接删除该部分,直接使用默认值。
  # 当use_hf_model为True时,可以设置参数use_gpu,指定是否使用GPU提取特征
  method_args:
    sample_frequency: 16000
    num_mel_bins: 80
```

# 提取特征(可选)

在训练过程中,首先是要读取音频数据,然后提取特征,最后再进行训练。其中读取音频数据、提取特征也是比较消耗时间的,所以我们可以选择提前提取好取特征,训练模型的是就可以直接加载提取好的特征,这样训练速度会更快。这个提取特征是可选择,如果没有提取好的特征,训练模型的时候就会从读取音频数据,然后提取特征开始。提取特征步骤如下:

1. 执行`extract_features.py`,提取特征,特征会保存在`dataset/features`目录下,并生成新的数据列表`train_list_features.txt`、`enroll_list_features.txt`和`trials_list_features.txt`。

```shell
python extract_features.py --configs=configs/cam++.yml --save_dir=dataset/features
```

2. 修改配置文件,将`dataset_conf.train_list`、`dataset_conf.enroll_list`和`dataset_conf.trials_list`修改为`train_list_features.txt`、`enroll_list_features.txt`和`trials_list_features.txt`。


# 训练模型
使用`train.py`训练模型,本项目支持多个音频预处理方式,通过`configs/ecapa_tdnn.yml`配置文件的参数`preprocess_conf.feature_method`可以指定,`MelSpectrogram`为梅尔频谱,`Spectrogram`为语谱图,`MFCC`梅尔频谱倒谱系数等等。通过参数`augment_conf_path`可以指定数据增强方式。训练过程中,会使用VisualDL保存训练日志,通过启动VisualDL可以随时查看训练结果,启动命令`visualdl --logdir=log --host 0.0.0.0`
```shell
# 单卡训练
CUDA_VISIBLE_DEVICES=0 python train.py
# 多卡训练
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py
```

训练输出日志:
```
[2023-08-05 09:52:06.497988 INFO   ] utils:print_arguments:13 - ----------- 额外配置参数 -----------
[2023-08-05 09:52:06.498094 INFO   ] utils:print_arguments:15 - configs: configs/ecapa_tdnn.yml
[2023-08-05 09:52:06.498149 INFO   ] utils:print_arguments:15 - do_eval: True
[2023-08-05 09:52:06.498191 INFO   ] utils:print_arguments:15 - local_rank: 0
[2023-08-05 09:52:06.498230 INFO   ] utils:print_arguments:15 - pretrained_model: None
[2023-08-05 09:52:06.498269 INFO   ] utils:print_arguments:15 - resume_model: None
[2023-08-05 09:52:06.498306 INFO   ] utils:print_arguments:15 - save_model_path: models/
[2023-08-05 09:52:06.498342 INFO   ] utils:print_arguments:15 - use_gpu: True
[2023-08-05 09:52:06.498378 INFO   ] utils:print_arguments:16 - ------------------------------------------------
[2023-08-05 09:52:06.513761 INFO   ] utils:print_arguments:18 - ----------- 配置文件参数 -----------
[2023-08-05 09:52:06.513906 INFO   ] utils:print_arguments:21 - dataset_conf:
[2023-08-05 09:52:06.513957 INFO   ] utils:print_arguments:24 -         dataLoader:
[2023-08-05 09:52:06.513995 INFO   ] utils:print_arguments:26 -                 batch_size: 64
[2023-08-05 09:52:06.514031 INFO   ] utils:print_arguments:26 -                 num_workers: 4
[2023-08-05 09:52:06.514066 INFO   ] utils:print_arguments:28 -         do_vad: False
[2023-08-05 09:52:06.514101 INFO   ] utils:print_arguments:28 -         enroll_list: dataset/enroll_list.txt
[2023-08-05 09:52:06.514135 INFO   ] utils:print_arguments:24 -         eval_conf:
[2023-08-05 09:52:06.514169 INFO   ] utils:print_arguments:26 -                 batch_size: 1
[2023-08-05 09:52:06.514203 INFO   ] utils:print_arguments:26 -                 max_duration: 20
[2023-08-05 09:52:06.514237 INFO   ] utils:print_arguments:28 -         max_duration: 3
[2023-08-05 09:52:06.514274 INFO   ] utils:print_arguments:28 -         min_duration: 0.5
[2023-08-05 09:52:06.514308 INFO   ] utils:print_arguments:28 -         noise_aug_prob: 0.2
[2023-08-05 09:52:06.514342 INFO   ] utils:print_arguments:28 -         noise_dir: dataset/noise
[2023-08-05 09:52:06.514374 INFO   ] utils:print_arguments:28 -         num_speakers: 3242
[2023-08-05 09:52:06.514408 INFO   ] utils:print_arguments:28 -         sample_rate: 16000
[2023-08-05 09:52:06.514441 INFO   ] utils:print_arguments:28 -         speed_perturb: True
[2023-08-05 09:52:06.514475 INFO   ] utils:print_arguments:28 -         target_dB: -20
[2023-08-05 09:52:06.514508 INFO   ] utils:print_arguments:28 -         train_list: dataset/train_list.txt
[2023-08-05 09:52:06.514542 INFO   ] utils:print_arguments:28 -         trials_list: dataset/trials_list.txt
[2023-08-05 09:52:06.514575 INFO   ] utils:print_arguments:28 -         use_dB_normalization: True
[2023-08-05 09:52:06.514609 INFO   ] utils:print_arguments:21 - loss_conf:
[2023-08-05 09:52:06.514643 INFO   ] utils:print_arguments:24 -         args:
[2023-08-05 09:52:06.514678 INFO   ] utils:print_arguments:26 -                 easy_margin: False
[2023-08-05 09:52:06.514713 INFO   ] utils:print_arguments:26 -                 margin: 0.2
[2023-08-05 09:52:06.514746 INFO   ] utils:print_arguments:26 -                 scale: 32
[2023-08-05 09:52:06.514779 INFO   ] utils:print_arguments:24 -         margin_scheduler_args:
[2023-08-05 09:52:06.514814 INFO   ] utils:print_arguments:26 -                 final_margin: 0.3
[2023-08-05 09:52:06.514848 INFO   ] utils:print_arguments:28 -         use_loss: AAMLoss
[2023-08-05 09:52:06.514882 INFO   ] utils:print_arguments:28 -         use_margin_scheduler: True
[2023-08-05 09:52:06.514915 INFO   ] utils:print_arguments:21 - model_conf:
[2023-08-05 09:52:06.514950 INFO   ] utils:print_arguments:24 -         backbone:
[2023-08-05 09:52:06.514984 INFO   ] utils:print_arguments:26 -                 embd_dim: 192
[2023-08-05 09:52:06.515017 INFO   ] utils:print_arguments:26 -                 pooling_type: ASP
[2023-08-05 09:52:06.515050 INFO   ] utils:print_arguments:24 -         classifier:
[2023-08-05 09:52:06.515084 INFO   ] utils:print_arguments:26 -                 num_blocks: 0
[2023-08-05 09:52:06.515118 INFO   ] utils:print_arguments:21 - optimizer_conf:
[2023-08-05 09:52:06.515154 INFO   ] utils:print_arguments:28 -         learning_rate: 0.001
[2023-08-05 09:52:06.515188 INFO   ] utils:print_arguments:28 -         optimizer: Adam
[2023-08-05 09:52:06.515221 INFO   ] utils:print_arguments:28 -         scheduler: CosineAnnealingLR
[2023-08-05 09:52:06.515254 INFO   ] utils:print_arguments:28 -         scheduler_args: None
[2023-08-05 09:52:06.515289 INFO   ] utils:print_arguments:28 -         weight_decay: 1e-06
[2023-08-05 09:52:06.515323 INFO   ] utils:print_arguments:21 - preprocess_conf:
[2023-08-05 09:52:06.515357 INFO   ] utils:print_arguments:28 -         feature_method: MelSpectrogram
[2023-08-05 09:52:06.515390 INFO   ] utils:print_arguments:24 -         method_args:
[2023-08-05 09:52:06.515426 INFO   ] utils:print_arguments:26 -                 f_max: 14000.0
[2023-08-05 09:52:06.515460 INFO   ] utils:print_arguments:26 -                 f_min: 50.0
[2023-08-05 09:52:06.515493 INFO   ] utils:print_arguments:26 -                 hop_length: 320
[2023-08-05 09:52:06.515527 INFO   ] utils:print_arguments:26 -                 n_fft: 1024
[2023-08-05 09:52:06.515560 INFO   ] utils:print_arguments:26 -                 n_mels: 64
[2023-08-05 09:52:06.515593 INFO   ] utils:print_arguments:26 -                 sample_rate: 16000
[2023-08-05 09:52:06.515626 INFO   ] utils:print_arguments:26 -                 win_length: 1024
[2023-08-05 09:52:06.515660 INFO   ] utils:print_arguments:21 - train_conf:
[2023-08-05 09:52:06.515694 INFO   ] utils:print_arguments:28 -         log_interval: 100
[2023-08-05 09:52:06.515728 INFO   ] utils:print_arguments:28 -         max_epoch: 30
[2023-08-05 09:52:06.515761 INFO   ] utils:print_arguments:30 - use_model: EcapaTdnn
[2023-08-05 09:52:06.515794 INFO   ] utils:print_arguments:31 - ------------------------------------------------
······
===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
Sequential                                    [1, 9726]                 --
├─EcapaTdnn: 1-1                              [1, 192]                  --
│    └─Conv1dReluBn: 2-1                      [1, 512, 98]              --
│    │    └─Conv1d: 3-1                       [1, 512, 98]              163,840
│    │    └─BatchNorm1d: 3-2                  [1, 512, 98]              1,024
│    └─Sequential: 2-2                        [1, 512, 98]              --
│    │    └─Conv1dReluBn: 3-3                 [1, 512, 98]              263,168
│    │    └─Res2Conv1dReluBn: 3-4             [1, 512, 98]              86,912
│    │    └─Conv1dReluBn: 3-5                 [1, 512, 98]              263,168
│    │    └─SE_Connect: 3-6                   [1, 512, 98]              262,912
│    └─Sequential: 2-3                        [1, 512, 98]              --
│    │    └─Conv1dReluBn: 3-7                 [1, 512, 98]              263,168
│    │    └─Res2Conv1dReluBn: 3-8             [1, 512, 98]              86,912
│    │    └─Conv1dReluBn: 3-9                 [1, 512, 98]              263,168
│    │    └─SE_Connect: 3-10                  [1, 512, 98]              262,912
│    └─Sequential: 2-4                        [1, 512, 98]              --
│    │    └─Conv1dReluBn: 3-11                [1, 512, 98]              263,168
│    │    └─Res2Conv1dReluBn: 3-12            [1, 512, 98]              86,912
│    │    └─Conv1dReluBn: 3-13                [1, 512, 98]              263,168
│    │    └─SE_Connect: 3-14                  [1, 512, 98]              262,912
│    └─Conv1d: 2-5                            [1, 1536, 98]             2,360,832
│    └─AttentiveStatsPool: 2-6                [1, 3072]                 --
│    │    └─Conv1d: 3-15                      [1, 128, 98]              196,736
│    │    └─Conv1d: 3-16                      [1, 1536, 98]             198,144
│    └─BatchNorm1d: 2-7                       [1, 3072]                 6,144
│    └─Linear: 2-8                            [1, 192]                  590,016
│    └─BatchNorm1d: 2-9                       [1, 192]                  384
├─SpeakerIdentification: 1-2                  [1, 9726]                 1,867,392
===============================================================================================
Total params: 8,012,992
Trainable params: 8,012,992
Non-trainable params: 0
Total mult-adds (M): 468.81
===============================================================================================
Input size (MB): 0.03
Forward/backward pass size (MB): 10.36
Params size (MB): 32.05
Estimated Total Size (MB): 42.44
===============================================================================================
[2023-08-05 09:52:08.084231 INFO   ] trainer:train:388 - 训练数据:874175
[2023-08-05 09:52:09.186542 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [0/13659], loss: 11.95824, accuracy: 0.00000, learning rate: 0.00100000, speed: 58.09 data/sec, eta: 5 days, 5:24:08
[2023-08-05 09:52:22.477905 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [100/13659], loss: 10.35675, accuracy: 0.00278, learning rate: 0.00100000, speed: 481.65 data/sec, eta: 15:07:15
[2023-08-05 09:52:35.948581 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [200/13659], loss: 10.22089, accuracy: 0.00505, learning rate: 0.00100000, speed: 475.27 data/sec, eta: 15:19:12
[2023-08-05 09:52:49.249098 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [300/13659], loss: 10.00268, accuracy: 0.00706, learning rate: 0.00100000, speed: 481.45 data/sec, eta: 15:07:11
[2023-08-05 09:53:03.716015 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [400/13659], loss: 9.76052, accuracy: 0.00830, learning rate: 0.00100000, speed: 442.74 data/sec, eta: 16:26:16
[2023-08-05 09:53:18.258807 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [500/13659], loss: 9.50189, accuracy: 0.01060, learning rate: 0.00100000, speed: 440.46 data/sec, eta: 16:31:08
[2023-08-05 09:53:31.618354 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [600/13659], loss: 9.26083, accuracy: 0.01256, learning rate: 0.00100000, speed: 479.50 data/sec, eta: 15:10:12
[2023-08-05 09:53:45.439642 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [700/13659], loss: 9.03548, accuracy: 0.01449, learning rate: 0.00099999, speed: 463.63 data/sec, eta: 15:41:08
```

启动VisualDL:`visualdl --logdir=log --host 0.0.0.0`,VisualDL页面如下:

<div align="center">
<img src="./docs/images/log.jpg" alt="VisualDL页面" width="600">
</div>


# 评估模型
训练结束之后会保存预测模型,我们用预测模型来预测测试集中的音频特征,然后使用音频特征进行两两对比,计算EER和MinDCF。
```shell
python eval.py
```

输出类似如下:
```
······
------------------------------------------------
W0425 08:27:32.057426 17654 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2
W0425 08:27:32.065165 17654 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2023-03-16 20:20:47.195908 INFO   ] trainer:evaluate:341 - 成功加载模型:models/EcapaTdnn_Fbank/best_model/model.pth
100%|███████████████████████████| 84/84 [00:28<00:00,  2.95it/s]
开始两两对比音频特征...
100%|███████████████████████████| 5332/5332 [00:05<00:00, 1027.83it/s]
评估消耗时间:65s,threshold:0.26,EER: 0.14739, MinDCF: 0.41999
```

# 推理接口

下面给出了几个常用的接口,更多接口请参考`mvector/predict.py`,也可以往下看`声纹对比`和`声纹识别`的例子。

```python
from mvector.predict import MVectorPredictor

predictor = MVectorPredictor(configs='configs/cam++.yml',
                             model_path='models/CAMPPlus_Fbank/best_model/')
# 获取音频特征
embedding = predictor.predict(audio_data='dataset/a_1.wav')
# 获取两个音频的相似度
similarity = predictor.contrast(audio_data1='dataset/a_1.wav', audio_data2='dataset/a_2.wav')

# 注册用户音频
predictor.register(user_name='夜雨飘零', audio_data='dataset/test.wav')
# 识别用户音频
name, score = predictor.recognition(audio_data='dataset/test1.wav')
# 获取所有用户
users_name = predictor.get_users()
# 删除用户音频
predictor.remove_user(user_name='夜雨飘零')
```

# 声纹对比
下面开始实现声纹对比,创建`infer_contrast.py`程序,首先介绍几个重要的函数,`predict()`函数是可以获取声纹特征,`predict_batch()`函数是可以获取一批的声纹特征,`contrast()`函数可以对比两条音频的相似度,`register()`函数注册一条音频到声纹库里面,`recognition()`函输入一条音频并且从声纹库里面对比识别,`remove_user()`函数移除你好。声纹库里面的注册人。我们输入两个语音,通过预测函数获取他们的特征数据,使用这个特征数据可以求他们的对角余弦值,得到的结果可以作为他们相识度。对于这个相识度的阈值`threshold`,读者可以根据自己项目的准确度要求进行修改。
```shell
python infer_contrast.py --audio_path1=audio/a_1.wav --audio_path2=audio/b_2.wav
```

输出类似如下:
```
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:13 - ----------- 额外配置参数 -----------
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - audio_path1: dataset/a_1.wav
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - audio_path2: dataset/b_2.wav
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - configs: configs/ecapa_tdnn.yml
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - model_path: models/EcapaTdnn_Fbank/best_model/
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - threshold: 0.6
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - use_gpu: True
[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:16 - ------------------------------------------------
······································································
W0425 08:29:10.006249 21121 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2
W0425 08:29:10.008555 21121 device_context.cc:465] device: 0, cuDNN Version: 7.6.
成功加载模型参数和优化方法参数:models/EcapaTdnn_Fbank/best_model/model.pth
audio/a_1.wav 和 audio/b_2.wav 不是同一个人,相似度为:-0.09565544128417969
```

同时还提供了有GUI界面的声纹对比程序,执行`infer_contrast_gui.py`启动程序,界面如下,分别选择两个音频,点击开始判断,就可以判断它们是否是同一个人。

<div align="center">
<img src="./docs/images/contrast.jpg" alt="声纹对比界面">
</div>

# 声纹识别

在新闻识别里面主要使用到`register()`函数和`recognition()`函数,首先使用`register()`函数函数来注册音频到声纹库里面,也可以直接把文件添加到`audio_db`文件夹里面,使用的时候通过`recognition()`函数来发起识别,输入一条音频,就可以从声纹库里面识别到所需要的说话人。

有了上面的声纹识别的函数,读者可以根据自己项目的需求完成声纹识别的方式,例如笔者下面提供的是通过录音来完成声纹识别。首先必须要加载语音库中的语音,语音库文件夹为`audio_db`,然后用户回车后录音3秒钟,然后程序会自动录音,并使用录音到的音频进行声纹识别,去匹配语音库中的语音,获取用户的信息。通过这样方式,读者也可以修改成通过服务请求的方式完成声纹识别,例如提供一个API供APP调用,用户在APP上通过声纹登录时,把录音到的语音发送到后端完成声纹识别,再把结果返回给APP,前提是用户已经使用语音注册,并成功把语音数据存放在`audio_db`文件夹中。
```shell
python infer_recognition.py
```

输出类似如下:
```
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:13 - ----------- 额外配置参数 -----------
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - audio_db_path: audio_db/
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - configs: configs/ecapa_tdnn.yml
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - model_path: models/EcapaTdnn_Fbank/best_model/
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - record_seconds: 3
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - threshold: 0.6
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - use_gpu: True
[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:16 - ------------------------------------------------
······································································
W0425 08:30:13.257884 23889 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2
W0425 08:30:13.260191 23889 device_context.cc:465] device: 0, cuDNN Version: 7.6.
成功加载模型参数和优化方法参数:models/ecapa_tdnn/model.pth
Loaded 沙瑞金 audio.
Loaded 李达康 audio.
请选择功能,0为注册音频到声纹库,1为执行声纹识别:0
按下回车键开机录音,录音3秒中:
开始录音......
录音已结束!
请输入该音频用户的名称:夜雨飘零
请选择功能,0为注册音频到声纹库,1为执行声纹识别:1
按下回车键开机录音,录音3秒中:
开始录音......
录音已结束!
识别说话的为:夜雨飘零,相似度为:0.920434
```


同时还提供了有GUI界面的声纹识别程序,执行`infer_recognition_gui.py`启动,点击`注册音频到声纹库`按钮,理解开始说话,录制3秒钟,然后输入注册人的名称,之后可以`执行声纹识别`按钮,然后立即说话,录制3秒钟后,等待识别结果。`删除用户`按钮可以删除用户。`实时识别`按钮可以实时识别,可以一直录音,一直识别。

<div align="center">
<img src="./docs/images/recognition.jpg" alt="声纹识别界面">
</div>

# 说话人日志(分离说话人)

执行`infer_speaker_diarization.py`程序,输入音频路径,就可以分离出说话人,并显示结果,建议音频长度不要低于10秒。更多功能可以查看该程序参数。
```shell
python infer_speaker_diarization.py --audio_path=dataset/test_long.wav
```

输出类似如下:
```
2024-10-10 19:30:40.768 | INFO     | mvector.predict:__init__:61 - 成功加载模型参数:models/CAMPPlus_Fbank/best_model/model.pth
2024-10-10 19:30:40.795 | INFO     | mvector.predict:__create_index:127 - 声纹特征索引创建完成,一共有3个用户,分别是:['沙瑞金', '夜雨飘零', '李达康']
2024-10-10 19:30:40.796 | INFO     | mvector.predict:__load_audio_db:142 - 正在加载声纹库数据...
100%|██████████| 3/3 [00:00<?, ?it/s]
2024-10-10 19:30:40.798 | INFO     | mvector.predict:__create_index:127 - 声纹特征索引创建完成,一共有3个用户,分别是:['沙瑞金', '夜雨飘零', '李达康']
2024-10-10 19:30:40.798 | INFO     | mvector.predict:__load_audio_db:172 - 声纹库数据加载完成!
识别结果:
{'speaker': '沙瑞金', 'start': 0.0, 'end': 2.0}
{'speaker': '陌生人1', 'start': 4.0, 'end': 7.0}
{'speaker': '李达康', 'start': 7.0, 'end': 8.0}
{'speaker': '沙瑞金', 'start': 9.0, 'end': 12.0}
{'speaker': '沙瑞金', 'start': 13.0, 'end': 14.0}
{'speaker': '陌生人1', 'start': 15.0, 'end': 19.0}
```

显示结果图像如下,可以通过`空格`键控制播放音频,点击位置可以跳转音频到指定位置:
<div align="center">
<img src="./docs/images/speaker_diarization.jpg" alt="说话人日志" width="800">
</div>

项目同样提供了GUI界面的程序,执行`infer_speaker_diarization_gui.py`程序。更多功能可以查看该程序参数。
```shell
python infer_speaker_diarization_gui.py
```

可以打开这样一个页面,进行说话人识别:

<div align="center">
<img src="./docs/images/speaker_diarization_gui.png" alt="说话人日志" width="800">
</div>


注意:如果说话人名字是中文的,需要设置安装字体才能正常显示,一般情况下Windows无需安装,Ubuntu需要安装。如果Windows确实是缺少字体,只需要[字体文件](https://github.com/tracyone/program_font)这里下载`.ttf`格式的文件,复制到`C:\Windows\Fonts`即可。Ubuntu系统操作如下。

1. 安装字体
```shell
git clone https://github.com/tracyone/program_font && cd program_font && ./install.sh
```

2. 执行下面Python代码
```python
import matplotlib
import shutil
import os

path = matplotlib.matplotlib_fname()
path = path.replace('matplotlibrc', 'fonts/ttf/')
print(path)
shutil.copy('/usr/share/fonts/MyFonts/simhei.ttf', path)
user_dir = os.path.expanduser('~')
shutil.rmtree(f'{user_dir}/.cache/matplotlib', ignore_errors=True)
```


# 其他版本
 - Tensorflow:[VoiceprintRecognition-Tensorflow](https://github.com/yeyupiaoling/VoiceprintRecognition-Tensorflow)
 - PaddlePaddle:[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)
 - Keras:[VoiceprintRecognition-Keras](https://github.com/yeyupiaoling/VoiceprintRecognition-Keras)


## 打赏作者

<br/>
<div align="center">
<p>打赏一块钱支持一下作者</p>
<img src="https://yeyupiaoling.cn/reward.png" alt="打赏作者" width="400">
</div>


# 参考资料
1. https://github.com/PaddlePaddle/PaddleSpeech
2. https://github.com/yeyupiaoling/PaddlePaddle-MobileFaceNets
3. https://github.com/yeyupiaoling/PPASR
4. https://github.com/alibaba-damo-academy/3D-Speaker
5. https://github.com/wenet-e2e/wespeaker

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yeyupiaoling/VoiceprintRecognition_Pytorch",
    "name": "mvector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Voice, Pytorch",
    "author": "yeyupiaoling",
    "author_email": null,
    "download_url": "https://github.com/yeyupiaoling/VoiceprintRecognition_Pytorch.git",
    "platform": null,
    "description": "\u7b80\u4f53\u4e2d\u6587 | [English](./README_en.md)\r\n\r\n# \u57fa\u4e8ePytorch\u5b9e\u73b0\u7684\u58f0\u7eb9\u8bc6\u522b\u7cfb\u7edf\r\n\r\n![python version](https://img.shields.io/badge/python-3.8+-orange.svg)\r\n![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/VoiceprintRecognition-Pytorch)\r\n![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/VoiceprintRecognition-Pytorch)\r\n![GitHub](https://img.shields.io/github/license/yeyupiaoling/VoiceprintRecognition-Pytorch)\r\n![\u652f\u6301\u7cfb\u7edf](https://img.shields.io/badge/\u652f\u6301\u7cfb\u7edf-Win/Linux/MAC-9cf)\r\n\r\n\u672c\u5206\u652f\u4e3a1.1\u7248\u672c\uff0c\u5982\u679c\u8981\u4f7f\u7528\u4e4b\u524d\u76841.0\u7248\u672c\u8bf7\u5728[1.0\u5206\u652f](https://github.com/yeyupiaoling/VoiceprintRecognition-Pytorch/tree/release/1.0.5)\u4f7f\u7528\u3002\u672c\u9879\u76ee\u4f7f\u7528\u4e86EcapaTdnn\u3001ResNetSE\u3001ERes2Net\u3001CAM++\u7b49\u591a\u79cd\u5148\u8fdb\u7684\u58f0\u7eb9\u8bc6\u522b\u6a21\u578b\uff0c\u4e0d\u6392\u9664\u4ee5\u540e\u4f1a\u652f\u6301\u66f4\u591a\u6a21\u578b\uff0c\u540c\u65f6\u672c\u9879\u76ee\u4e5f\u652f\u6301\u4e86MelSpectrogram\u3001Spectrogram\u3001MFCC\u3001Fbank\u7b49\u591a\u79cd\u6570\u636e\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u4f7f\u7528\u4e86ArcFace Loss\uff0cArcFace loss\uff1aAdditive Angular Margin Loss\uff08\u52a0\u6027\u89d2\u5ea6\u95f4\u9694\u635f\u5931\u51fd\u6570\uff09\uff0c\u5bf9\u5e94\u9879\u76ee\u4e2d\u7684AAMLoss\uff0c\u5bf9\u7279\u5f81\u5411\u91cf\u548c\u6743\u91cd\u5f52\u4e00\u5316\uff0c\u5bf9\u03b8\u52a0\u4e0a\u89d2\u5ea6\u95f4\u9694m\uff0c\u89d2\u5ea6\u95f4\u9694\u6bd4\u4f59\u5f26\u95f4\u9694\u5728\u5bf9\u89d2\u5ea6\u7684\u5f71\u54cd\u66f4\u52a0\u76f4\u63a5\uff0c\u9664\u6b64\u4e4b\u5916\uff0c\u8fd8\u652f\u6301AMLoss\u3001ARMLoss\u3001CELoss\u7b49\u591a\u79cd\u635f\u5931\u51fd\u6570\u3002\r\n\r\n**\u672c\u9879\u76ee\u662f\u5982\u679c\u5bf9\u4f60\u6709\u5e2e\u52a9\uff0c\u6b22\u8fceStar\uff0c\u907f\u514d\u4e4b\u540e\u9700\u8981\u627e\u4e0d\u5230\u4e86\u3002**\r\n\r\n**\u6b22\u8fce\u5927\u5bb6\u626b\u7801\u5165\u77e5\u8bc6\u661f\u7403\u6216\u8005QQ\u7fa4\u8ba8\u8bba\uff0c\u77e5\u8bc6\u661f\u7403\u91cc\u9762\u63d0\u4f9b\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\u548c\u535a\u4e3b\u5176\u4ed6\u76f8\u5173\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\uff0c\u4e5f\u5305\u62ec\u5176\u4ed6\u4e00\u4e9b\u8d44\u6e90\u3002**\r\n\r\n<div align=\"center\">\r\n  <img src=\"https://yeyupiaoling.cn/zsxq.png\" alt=\"\u77e5\u8bc6\u661f\u7403\" width=\"400\">\r\n  <img src=\"https://yeyupiaoling.cn/qq.png\" alt=\"QQ\u7fa4\" width=\"400\">\r\n</div>\r\n\r\n\u4f7f\u7528\u73af\u5883\uff1a\r\n\r\n - Anaconda 3\r\n - Python 3.11\r\n - Pytorch 2.4.0\r\n - Windows 11 or Ubuntu 22.04\r\n\r\n# \u76ee\u5f55\r\n\r\n- [\u9879\u76ee\u4ecb\u7ecd](#\u57fa\u4e8ePytorch\u5b9e\u73b0\u7684\u58f0\u7eb9\u8bc6\u522b\u7cfb\u7edf)\r\n- [\u9879\u76ee\u8bb0\u5f55](#\u9879\u76ee\u8bb0\u5f55)\r\n- [\u9879\u76ee\u7279\u6027](#\u9879\u76ee\u7279\u6027)\r\n- [\u5b89\u88c5\u73af\u5883](#\u5b89\u88c5\u73af\u5883)\r\n- [\u521b\u5efa\u6570\u636e](#\u521b\u5efa\u6570\u636e)\r\n- [\u4fee\u6539\u9884\u5904\u7406\u65b9\u6cd5\uff08\u53ef\u9009\uff09](#\u4fee\u6539\u9884\u5904\u7406\u65b9\u6cd5\u53ef\u9009)\r\n- [\u63d0\u53d6\u7279\u5f81\uff08\u53ef\u9009\uff09](#\u63d0\u53d6\u7279\u5f81\u53ef\u9009)\r\n- [\u8bad\u7ec3\u6a21\u578b](#\u8bad\u7ec3\u6a21\u578b)\r\n- [\u8bc4\u4f30\u6a21\u578b](#\u8bc4\u4f30\u6a21\u578b)\r\n- [\u63a8\u7406\u63a5\u53e3](#\u63a8\u7406\u63a5\u53e3)\r\n- [\u58f0\u7eb9\u5bf9\u6bd4](#\u58f0\u7eb9\u5bf9\u6bd4)\r\n- [\u58f0\u7eb9\u8bc6\u522b](#\u58f0\u7eb9\u8bc6\u522b)\r\n- [\u8bf4\u8bdd\u4eba\u65e5\u5fd7\uff08\u5206\u79bb\u8bf4\u8bdd\u4eba\uff09](#\u8bf4\u8bdd\u4eba\u65e5\u5fd7\u5206\u79bb\u8bf4\u8bdd\u4eba)\r\n\r\n\r\n# \u9879\u76ee\u8bb0\u5f55\r\n\r\n1. 2024.10.12\uff1a\u53d1\u5e031.1\u7248\u672c\u3002\r\n\r\n# \u9879\u76ee\u7279\u6027\r\n\r\n1. \u652f\u6301\u6a21\u578b\uff1aEcapaTdnn\u3001TDNN\u3001Res2Net\u3001ResNetSE\u3001ERes2Net\u3001CAM++\r\n2. \u652f\u6301\u6c60\u5316\u5c42\uff1aAttentiveStatsPool(ASP)\u3001SelfAttentivePooling(SAP)\u3001TemporalStatisticsPooling(TSP)\u3001TemporalAveragePooling(TAP)\u3001TemporalStatsPool(TSTP)\r\n3. \u652f\u6301\u635f\u5931\u51fd\u6570\uff1aAAMLoss\u3001SphereFace2\u3001AMLoss\u3001ARMLoss\u3001CELoss\u3001SubCenterLoss\u3001TripletAngularMarginLoss\r\n4. \u652f\u6301\u9884\u5904\u7406\u65b9\u6cd5\uff1aMelSpectrogram\u3001Spectrogram\u3001MFCC\u3001Fbank\u3001Wav2vec2.0\u3001WavLM\r\n5. \u652f\u6301\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\uff1a\u8bed\u901f\u589e\u5f3a\u3001\u97f3\u91cf\u589e\u5f3a\u3001\u566a\u58f0\u589e\u5f3a\u3001\u6df7\u54cd\u589e\u5f3a\u3001SpecAugment\r\n\r\n\r\n**\u6a21\u578b\u8bba\u6587\uff1a**\r\n\r\n- EcapaTdnn\uff1a[ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/abs/2005.07143v3)\r\n- PANNS\uff1a[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/abs/1912.10211v5)\r\n- TDNN\uff1a[Prediction of speech intelligibility with DNN-based performance measures](https://arxiv.org/abs/2203.09148)\r\n- Res2Net\uff1a[Res2Net: A New Multi-scale Backbone Architecture](https://arxiv.org/abs/1904.01169)\r\n- ResNetSE\uff1a[Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)\r\n- CAMPPlus\uff1a[CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking](https://arxiv.org/abs/2303.00332v3)\r\n- ERes2Net\uff1a[An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification](https://arxiv.org/abs/2305.12838v1)\r\n\r\n\r\n# \u6a21\u578b\u4e0b\u8f7d\r\n\r\n### \u8bad\u7ec3CN-Celeb\u6570\u636e\uff0c\u5171\u67092796\u4e2a\u8bf4\u8bdd\u4eba\u3002\r\n\r\n|     \u6a21\u578b     | Params(M) |                \u6570\u636e\u96c6                 | train speakers | threshold |   EER   | MinDCF  |   \u6a21\u578b\u4e0b\u8f7d   |\r\n|:----------:|:---------:|:----------------------------------:|:--------------:|:---------:|:-------:|:-------:|:--------:|\r\n| ERes2NetV2 |    6.6    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.20089  | 0.08071 | 0.45705 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  ERes2Net  |    6.6    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.20014  | 0.08132 | 0.45544 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|   CAM++    |    6.8    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.23323  | 0.08332 | 0.48536 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  ResNetSE  |    7.8    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.19066  | 0.08544 | 0.49142 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| EcapaTdnn  |    6.1    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.23646  | 0.09259 | 0.51378 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|    TDNN    |    2.6    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.23858  | 0.10825 | 0.59545 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  Res2Net   |    5.0    | [CN-Celeb](http://openslr.org/82/) |      2796      |  0.19526  | 0.12436 | 0.65347 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|   CAM++    |    6.8    |               \u66f4\u5927\u6570\u636e\u96c6                |      2W+       |   0.33    | 0.07874 | 0.52524 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  ERes2Net  |   55.1    |               \u5176\u4ed6\u6570\u636e\u96c6                |      20W+      |   0.36    | 0.02936 | 0.18355 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| ERes2NetV2 |   56.2    |               \u5176\u4ed6\u6570\u636e\u96c6                |      20W+      |   0.36    | 0.03847 | 0.24301 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|   CAM++    |    6.8    |               \u5176\u4ed6\u6570\u636e\u96c6                |      20W+      |   0.29    | 0.04765 | 0.31436 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n\r\n\u8bf4\u660e\uff1a\r\n1. \u8bc4\u4f30\u7684\u6d4b\u8bd5\u96c6\u4e3a[CN-Celeb\u7684\u6d4b\u8bd5\u96c6](https://aistudio.baidu.com/aistudio/datasetdetail/233361)\uff0c\u5305\u542b196\u4e2a\u8bf4\u8bdd\u4eba\u3002\r\n2. \u4f7f\u7528\u8bed\u901f\u589e\u5f3a\u5206\u7c7b\u5927\u5c0f\u7ffb\u4e09\u500d`speed_perturb_3_class: True`\u3002\r\n3. \u4f7f\u7528\u7684\u9884\u5904\u7406\u65b9\u6cd5\u4e3a`Fbank`\uff0c\u635f\u5931\u51fd\u6570\u4e3a`AAMLoss`\u3002\r\n4. \u53c2\u6570\u6570\u91cf\u4e0d\u5305\u542b\u4e86\u5206\u7c7b\u5668\u7684\u53c2\u6570\u6570\u91cf\u3002\r\n5. \u4f7f\u7528\u4e86\u566a\u58f0\u589e\u5f3a\u548c\u6df7\u54cd\u589e\u5f3a\u3002\r\n\r\n\r\n### \u8bad\u7ec3VoxCeleb1&2\u6570\u636e\uff0c\u5171\u67097205\u4e2a\u8bf4\u8bdd\u4eba\u3002\r\n\r\n|     \u6a21\u578b     | Params(M) |     \u6570\u636e\u96c6     | train speakers | threshold |   EER   | MinDCF  |   \u6a21\u578b\u4e0b\u8f7d   |\r\n|:----------:|:---------:|:-----------:|:--------------:|:---------:|:-------:|:-------:|:--------:|\r\n|   CAM++    |    6.8    | VoxCeleb1&2 |      7205      |  0.22504  | 0.02436 | 0.15543 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| EcapaTdnn  |    6.1    | VoxCeleb1&2 |      7205      |  0.24877  | 0.02480 | 0.16188 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  ResNetSE  |    7.8    | VoxCeleb1&2 |      7205      |  0.22567  | 0.03189 | 0.23040 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|    TDNN    |    2.6    | VoxCeleb1&2 |      7205      |  0.23834  | 0.03486 | 0.26792 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  Res2Net   |    5.0    | VoxCeleb1&2 |      7205      |  0.19472  | 0.04370 | 0.40072 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  ERes2Net  |    6.6    | VoxCeleb1&2 |      7205      |           |         |         | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|   CAM++    |    6.8    |    \u66f4\u5927\u6570\u636e\u96c6    |      2W+       |   0.28    | 0.03182 | 0.23731 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  ERes2Net  |   55.1    |    \u5176\u4ed6\u6570\u636e\u96c6    |      20W+      |   0.53    | 0.08904 | 0.62130 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| ERes2NetV2 |   56.2    |    \u5176\u4ed6\u6570\u636e\u96c6    |      20W+      |   0.52    | 0.08649 | 0.64193 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|   CAM++    |    6.8    |    \u5176\u4ed6\u6570\u636e\u96c6    |      20W+      |   0.49    | 0.10334 | 0.71200 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n\r\n\u8bf4\u660e\uff1a\r\n\r\n1. \u8bc4\u4f30\u7684\u6d4b\u8bd5\u96c6\u4e3a[VoxCeleb1&2\u7684\u6d4b\u8bd5\u96c6](https://aistudio.baidu.com/aistudio/datasetdetail/255977)\uff0c\u5305\u542b158\u4e2a\u8bf4\u8bdd\u4eba\u3002\r\n2. \u4f7f\u7528\u8bed\u901f\u589e\u5f3a\u5206\u7c7b\u5927\u5c0f\u7ffb\u4e09\u500d`speed_perturb_3_class: True`\u3002\r\n3. \u4f7f\u7528\u7684\u9884\u5904\u7406\u65b9\u6cd5\u4e3a`Fbank`\uff0c\u635f\u5931\u51fd\u6570\u4e3a`AAMLoss`\u3002\r\n4. \u53c2\u6570\u6570\u91cf\u4e0d\u5305\u542b\u4e86\u5206\u7c7b\u5668\u7684\u53c2\u6570\u6570\u91cf\u3002\r\n\r\n\r\n### \u9884\u5904\u7406\u65b9\u6cd5\u6548\u679c\u5bf9\u6bd4\u5b9e\u9a8c\r\n\r\n|                                      \u9884\u5904\u7406\u65b9\u6cd5                                       |   \u6570\u636e\u96c6    | train speakers | threshold |   EER   | MinDCF  |   \u6a21\u578b\u4e0b\u8f7d   |\r\n|:--------------------------------------------------------------------------------:|:--------:|:--------------:|:---------:|:-------:|:-------:|:--------:|\r\n|                                      Fbank                                       | CN-Celeb |      2796      |  0.14574  | 0.10988 | 0.58955 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|                                       MFCC                                       | CN-Celeb |      2796      |  0.14868  | 0.11483 | 0.61275 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|                                   Spectrogram                                    | CN-Celeb |      2796      |  0.14962  | 0.11613 | 0.60057 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|                                  MelSpectrogram                                  | CN-Celeb |      2796      |  0.13458  | 0.12498 | 0.60741 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|       [wavlm-base-plus](https://huggingface.co/microsoft/wavlm-base-plus)        | CN-Celeb |      2796      |  0.14166  | 0.13247 | 0.62451 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|           [w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0)           | CN-Celeb |      2796      |           |         |         | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) | CN-Celeb |      2796      |           |         |         | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|           [wavlm-large](https://huggingface.co/microsoft/wavlm-large)            | CN-Celeb |      2796      |           |         |         | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n\r\n\u8bf4\u660e\uff1a\r\n\r\n1. \u8bc4\u4f30\u7684\u6d4b\u8bd5\u96c6\u4e3a[CN-Celeb\u7684\u6d4b\u8bd5\u96c6](https://aistudio.baidu.com/aistudio/datasetdetail/233361)\uff0c\u5305\u542b196\u4e2a\u8bf4\u8bdd\u4eba\u3002\r\n2. \u5b9e\u9a8c\u6570\u636e\u4e3a[CN-Celeb](http://openslr.org/82/)\uff0c\u5b9e\u9a8c\u6a21\u578b\u4e3a`CAM++`\uff0c\u635f\u5931\u51fd\u6570\u4e3a`AAMLoss`\u3002\r\n3. \u6570\u636e\u4f7f\u7528`extract_features.py`\u63d0\u524d\u63d0\u53d6\u7279\u5f81\uff0c\u4e5f\u5c31\u662f\u8bf4\u8bad\u7ec3\u4e2d\u6ca1\u6709\u4f7f\u7528\u5bf9\u97f3\u9891\u7684\u6570\u636e\u589e\u5f3a\u3002\r\n4. `w2v-bert-2.0`\u3001`wav2vec2-large-xlsr-53`\u662f\u591a\u8bed\u8a00\u6570\u636e\u9884\u8bad\u7ec3\u5f97\u5230\u7684\uff0c`wavlm-base-plus`\u3001`wavlm-large`\u7684\u9884\u8bad\u7ec3\u6570\u636e\u4ec5\u7528\u82f1\u6587\u3002\r\n\r\n\r\n### \u635f\u5931\u51fd\u6570\u6548\u679c\u5bf9\u6bd4\u5b9e\u9a8c\r\n\r\n|           \u635f\u5931\u51fd\u6570           |   \u6570\u636e\u96c6    | train speakers | threshold |   EER   | MinDCF  |   \u6a21\u578b\u4e0b\u8f7d   |\r\n|:------------------------:|:--------:|:--------------:|:---------:|:-------:|:-------:|:--------:|\r\n|         AAMLoss          | CN-Celeb |      2796      |  0.14574  | 0.10988 | 0.58955 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|       SphereFace2        | CN-Celeb |      2796      |  0.20377  | 0.11309 | 0.61536 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| TripletAngularMarginLoss | CN-Celeb |      2796      |  0.28940  | 0.11749 | 0.63735 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|      SubCenterLoss       | CN-Celeb |      2796      |  0.13126  | 0.11775 | 0.56995 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|         ARMLoss          | CN-Celeb |      2796      |  0.14563  | 0.11805 | 0.57171 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|          AMLoss          | CN-Celeb |      2796      |  0.12870  | 0.12301 | 0.63263 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|          CELoss          | CN-Celeb |      2796      |  0.13607  | 0.12684 | 0.65176 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n\r\n\u8bf4\u660e\uff1a\r\n\r\n1. \u8bc4\u4f30\u7684\u6d4b\u8bd5\u96c6\u4e3a[CN-Celeb\u7684\u6d4b\u8bd5\u96c6](https://aistudio.baidu.com/aistudio/datasetdetail/233361)\uff0c\u5305\u542b196\u4e2a\u8bf4\u8bdd\u4eba\u3002\r\n2. \u5b9e\u9a8c\u6570\u636e\u4e3a[CN-Celeb](http://openslr.org/82/)\uff0c\u5b9e\u9a8c\u6a21\u578b\u4e3a`CAM++`\uff0c\u9884\u5904\u7406\u65b9\u6cd5\u4e3a`Fbank`\u3002\r\n3. \u6570\u636e\u4f7f\u7528`extract_features.py`\u63d0\u524d\u63d0\u53d6\u7279\u5f81\uff0c\u4e5f\u5c31\u662f\u8bf4\u8bad\u7ec3\u4e2d\u6ca1\u6709\u4f7f\u7528\u5bf9\u97f3\u9891\u7684\u6570\u636e\u589e\u5f3a\u3002\r\n\r\n\r\n## \u5b89\u88c5\u73af\u5883\r\n\r\n - \u9996\u5148\u5b89\u88c5\u7684\u662fPytorch\u7684GPU\u7248\u672c\uff0c\u5982\u679c\u5df2\u7ecf\u5b89\u88c5\u8fc7\u4e86\uff0c\u8bf7\u8df3\u8fc7\u3002\r\n```shell\r\nconda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia\r\n```\r\n\r\n - \u5b89\u88c5ppvector\u5e93\u3002\r\n \r\n\u4f7f\u7528pip\u5b89\u88c5\uff0c\u547d\u4ee4\u5982\u4e0b\uff1a\r\n```shell\r\npython -m pip install mvector -U -i https://pypi.tuna.tsinghua.edu.cn/simple\r\n```\r\n\r\n**\u5efa\u8bae\u6e90\u7801\u5b89\u88c5**\uff0c\u6e90\u7801\u5b89\u88c5\u80fd\u4fdd\u8bc1\u4f7f\u7528\u6700\u65b0\u4ee3\u7801\u3002\r\n```shell\r\ngit clone https://github.com/yeyupiaoling/VoiceprintRecognition-Pytorch.git\r\ncd VoiceprintRecognition-Pytorch/\r\npip install .\r\n```\r\n\r\n# \u521b\u5efa\u6570\u636e\r\n\u672c\u6559\u7a0b\u7b14\u8005\u4f7f\u7528\u7684\u662f[CN-Celeb](https://openslr.elda.org/resources/82)\uff0c\u8fd9\u4e2a\u6570\u636e\u96c6\u4e00\u5171\u6709\u7ea63000\u4e2a\u4eba\u7684\u8bed\u97f3\u6570\u636e\uff0c\u670965W+\u6761\u8bed\u97f3\u6570\u636e\uff0c\u4e0b\u8f7d\u4e4b\u540e\u8981\u89e3\u538b\u6570\u636e\u96c6\u5230`dataset`\u76ee\u5f55\uff0c\u53e6\u5916\u5982\u679c\u8981\u8bc4\u4f30\uff0c\u8fd8\u9700\u8981\u4e0b\u8f7d[CN-Celeb\u7684\u6d4b\u8bd5\u96c6](https://aistudio.baidu.com/aistudio/datasetdetail/233361)\u3002\u5982\u679c\u8bfb\u8005\u6709\u5176\u4ed6\u66f4\u597d\u7684\u6570\u636e\u96c6\uff0c\u53ef\u4ee5\u6df7\u5408\u5728\u4e00\u8d77\u4f7f\u7528\uff0c\u4f46\u6700\u597d\u662f\u8981\u7528python\u7684\u5de5\u5177\u6a21\u5757aukit\u5904\u7406\u97f3\u9891\uff0c\u964d\u566a\u548c\u53bb\u9664\u9759\u97f3\u3002\r\n\r\n\u9996\u5148\u662f\u521b\u5efa\u4e00\u4e2a\u6570\u636e\u5217\u8868\uff0c\u6570\u636e\u5217\u8868\u7684\u683c\u5f0f\u4e3a`<\u8bed\u97f3\u6587\u4ef6\u8def\u5f84\\t\u8bed\u97f3\u5206\u7c7b\u6807\u7b7e>`\uff0c\u521b\u5efa\u8fd9\u4e2a\u5217\u8868\u4e3b\u8981\u662f\u65b9\u4fbf\u4e4b\u540e\u7684\u8bfb\u53d6\uff0c\u4e5f\u662f\u65b9\u4fbf\u8bfb\u53d6\u4f7f\u7528\u5176\u4ed6\u7684\u8bed\u97f3\u6570\u636e\u96c6\uff0c\u8bed\u97f3\u5206\u7c7b\u6807\u7b7e\u662f\u6307\u8bf4\u8bdd\u4eba\u7684\u552f\u4e00ID\uff0c\u4e0d\u540c\u7684\u8bed\u97f3\u6570\u636e\u96c6\uff0c\u53ef\u4ee5\u901a\u8fc7\u7f16\u5199\u5bf9\u5e94\u7684\u751f\u6210\u6570\u636e\u5217\u8868\u7684\u51fd\u6570\uff0c\u628a\u8fd9\u4e9b\u6570\u636e\u96c6\u90fd\u5199\u5728\u540c\u4e00\u4e2a\u6570\u636e\u5217\u8868\u4e2d\u3002\r\n\r\n\u6267\u884c`create_data.py`\u7a0b\u5e8f\u5b8c\u6210\u6570\u636e\u51c6\u5907\u3002\r\n```shell\r\npython create_data.py\r\n```\r\n\r\n\u6267\u884c\u4e0a\u9762\u7684\u7a0b\u5e8f\u4e4b\u540e\uff0c\u4f1a\u751f\u6210\u4ee5\u4e0b\u7684\u6570\u636e\u683c\u5f0f\uff0c\u5982\u679c\u8981\u81ea\u5b9a\u4e49\u6570\u636e\uff0c\u53c2\u8003\u5982\u4e0b\u6570\u636e\u5217\u8868\uff0c\u524d\u9762\u662f\u97f3\u9891\u7684\u76f8\u5bf9\u8def\u5f84\uff0c\u540e\u9762\u7684\u662f\u8be5\u97f3\u9891\u5bf9\u5e94\u7684\u8bf4\u8bdd\u4eba\u7684\u6807\u7b7e\uff0c\u5c31\u8ddf\u5206\u7c7b\u4e00\u6837\u3002**\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\u7684\u6ce8\u610f**\uff0c\u6d4b\u8bd5\u6570\u636e\u5217\u8868\u7684ID\u53ef\u4ee5\u4e0d\u7528\u8ddf\u8bad\u7ec3\u7684ID\u4e00\u6837\uff0c\u4e5f\u5c31\u662f\u8bf4\u6d4b\u8bd5\u7684\u6570\u636e\u7684\u8bf4\u8bdd\u4eba\u53ef\u4ee5\u4e0d\u7528\u51fa\u73b0\u5728\u8bad\u7ec3\u96c6\uff0c\u53ea\u8981\u4fdd\u8bc1\u6d4b\u8bd5\u6570\u636e\u5217\u8868\u4e2d\u540c\u4e00\u4e2a\u4eba\u76f8\u540c\u7684ID\u5373\u53ef\u3002\r\n```\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-03-019.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-10-023.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-06-025.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-04-014.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-06-030.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-10-032.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-06-028.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-10-031.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-05-003.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-04-017.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-10-016.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-09-001.flac      2795\r\ndataset/CN-Celeb2_flac/data/id11999/recitation-05-010.flac      2795\r\n```\r\n\r\n# \u4fee\u6539\u9884\u5904\u7406\u65b9\u6cd5\uff08\u53ef\u9009\uff09\r\n\r\n\u914d\u7f6e\u6587\u4ef6\u4e2d\u9ed8\u8ba4\u4f7f\u7528\u7684\u662fFbank\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u5982\u679c\u8981\u4f7f\u7528\u5176\u4ed6\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u53ef\u4ee5\u4fee\u6539\u914d\u7f6e\u6587\u4ef6\u4e2d\u7684\u5b89\u88c5\u4e0b\u9762\u65b9\u5f0f\u4fee\u6539\uff0c\u5177\u4f53\u7684\u503c\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u60c5\u51b5\u4fee\u6539\u3002\u5982\u679c\u4e0d\u6e05\u695a\u5982\u4f55\u8bbe\u7f6e\u53c2\u6570\uff0c\u53ef\u4ee5\u76f4\u63a5\u5220\u9664\u8be5\u90e8\u5206\uff0c\u76f4\u63a5\u4f7f\u7528\u9ed8\u8ba4\u503c\u3002\r\n\r\n```yaml\r\n# \u6570\u636e\u9884\u5904\u7406\u53c2\u6570\r\npreprocess_conf:\r\n  # \u662f\u5426\u4f7f\u7528HF\u4e0a\u7684Wav2Vec2\u7c7b\u4f3c\u6a21\u578b\u63d0\u53d6\u97f3\u9891\u7279\u5f81\r\n  use_hf_model: False\r\n  # \u97f3\u9891\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u4e5f\u53ef\u4ee5\u53eb\u7279\u5f81\u63d0\u53d6\u65b9\u6cd5\r\n  # \u5f53use_hf_model\u4e3aFalse\u65f6\uff0c\u652f\u6301\uff1aMelSpectrogram\u3001Spectrogram\u3001MFCC\u3001Fbank\r\n  # \u5f53use_hf_model\u4e3aTrue\u65f6\uff0c\u6307\u5b9a\u7684\u662fHuggingFace\u7684\u6a21\u578b\u6216\u8005\u672c\u5730\u8def\u5f84\uff0c\u6bd4\u5982facebook/w2v-bert-2.0\u6216\u8005./feature_models/w2v-bert-2.0\r\n  feature_method: 'Fbank'\r\n  # \u5f53use_hf_model\u4e3aFalse\u65f6\uff0c\u8bbe\u7f6eAPI\u53c2\u6570\uff0c\u66f4\u53c2\u6570\u67e5\u770b\u5bf9\u5e94API\uff0c\u4e0d\u6e05\u695a\u7684\u53ef\u4ee5\u76f4\u63a5\u5220\u9664\u8be5\u90e8\u5206\uff0c\u76f4\u63a5\u4f7f\u7528\u9ed8\u8ba4\u503c\u3002\r\n  # \u5f53use_hf_model\u4e3aTrue\u65f6\uff0c\u53ef\u4ee5\u8bbe\u7f6e\u53c2\u6570use_gpu\uff0c\u6307\u5b9a\u662f\u5426\u4f7f\u7528GPU\u63d0\u53d6\u7279\u5f81\r\n  method_args:\r\n    sample_frequency: 16000\r\n    num_mel_bins: 80\r\n```\r\n\r\n# \u63d0\u53d6\u7279\u5f81\uff08\u53ef\u9009\uff09\r\n\r\n\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u9996\u5148\u662f\u8981\u8bfb\u53d6\u97f3\u9891\u6570\u636e\uff0c\u7136\u540e\u63d0\u53d6\u7279\u5f81\uff0c\u6700\u540e\u518d\u8fdb\u884c\u8bad\u7ec3\u3002\u5176\u4e2d\u8bfb\u53d6\u97f3\u9891\u6570\u636e\u3001\u63d0\u53d6\u7279\u5f81\u4e5f\u662f\u6bd4\u8f83\u6d88\u8017\u65f6\u95f4\u7684\uff0c\u6240\u4ee5\u6211\u4eec\u53ef\u4ee5\u9009\u62e9\u63d0\u524d\u63d0\u53d6\u597d\u53d6\u7279\u5f81\uff0c\u8bad\u7ec3\u6a21\u578b\u7684\u662f\u5c31\u53ef\u4ee5\u76f4\u63a5\u52a0\u8f7d\u63d0\u53d6\u597d\u7684\u7279\u5f81\uff0c\u8fd9\u6837\u8bad\u7ec3\u901f\u5ea6\u4f1a\u66f4\u5feb\u3002\u8fd9\u4e2a\u63d0\u53d6\u7279\u5f81\u662f\u53ef\u9009\u62e9\uff0c\u5982\u679c\u6ca1\u6709\u63d0\u53d6\u597d\u7684\u7279\u5f81\uff0c\u8bad\u7ec3\u6a21\u578b\u7684\u65f6\u5019\u5c31\u4f1a\u4ece\u8bfb\u53d6\u97f3\u9891\u6570\u636e\uff0c\u7136\u540e\u63d0\u53d6\u7279\u5f81\u5f00\u59cb\u3002\u63d0\u53d6\u7279\u5f81\u6b65\u9aa4\u5982\u4e0b\uff1a\r\n\r\n1. \u6267\u884c`extract_features.py`\uff0c\u63d0\u53d6\u7279\u5f81\uff0c\u7279\u5f81\u4f1a\u4fdd\u5b58\u5728`dataset/features`\u76ee\u5f55\u4e0b\uff0c\u5e76\u751f\u6210\u65b0\u7684\u6570\u636e\u5217\u8868`train_list_features.txt`\u3001`enroll_list_features.txt`\u548c`trials_list_features.txt`\u3002\r\n\r\n```shell\r\npython extract_features.py --configs=configs/cam++.yml --save_dir=dataset/features\r\n```\r\n\r\n2. \u4fee\u6539\u914d\u7f6e\u6587\u4ef6\uff0c\u5c06`dataset_conf.train_list`\u3001`dataset_conf.enroll_list`\u548c`dataset_conf.trials_list`\u4fee\u6539\u4e3a`train_list_features.txt`\u3001`enroll_list_features.txt`\u548c`trials_list_features.txt`\u3002\r\n\r\n\r\n# \u8bad\u7ec3\u6a21\u578b\r\n\u4f7f\u7528`train.py`\u8bad\u7ec3\u6a21\u578b\uff0c\u672c\u9879\u76ee\u652f\u6301\u591a\u4e2a\u97f3\u9891\u9884\u5904\u7406\u65b9\u5f0f\uff0c\u901a\u8fc7`configs/ecapa_tdnn.yml`\u914d\u7f6e\u6587\u4ef6\u7684\u53c2\u6570`preprocess_conf.feature_method`\u53ef\u4ee5\u6307\u5b9a\uff0c`MelSpectrogram`\u4e3a\u6885\u5c14\u9891\u8c31\uff0c`Spectrogram`\u4e3a\u8bed\u8c31\u56fe\uff0c`MFCC`\u6885\u5c14\u9891\u8c31\u5012\u8c31\u7cfb\u6570\u7b49\u7b49\u3002\u901a\u8fc7\u53c2\u6570`augment_conf_path`\u53ef\u4ee5\u6307\u5b9a\u6570\u636e\u589e\u5f3a\u65b9\u5f0f\u3002\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u4f1a\u4f7f\u7528VisualDL\u4fdd\u5b58\u8bad\u7ec3\u65e5\u5fd7\uff0c\u901a\u8fc7\u542f\u52a8VisualDL\u53ef\u4ee5\u968f\u65f6\u67e5\u770b\u8bad\u7ec3\u7ed3\u679c\uff0c\u542f\u52a8\u547d\u4ee4`visualdl --logdir=log --host 0.0.0.0`\r\n```shell\r\n# \u5355\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0 python train.py\r\n# \u591a\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py\r\n```\r\n\r\n\u8bad\u7ec3\u8f93\u51fa\u65e5\u5fd7\uff1a\r\n```\r\n[2023-08-05 09:52:06.497988 INFO   ] utils:print_arguments:13 - ----------- \u989d\u5916\u914d\u7f6e\u53c2\u6570 -----------\r\n[2023-08-05 09:52:06.498094 INFO   ] utils:print_arguments:15 - configs: configs/ecapa_tdnn.yml\r\n[2023-08-05 09:52:06.498149 INFO   ] utils:print_arguments:15 - do_eval: True\r\n[2023-08-05 09:52:06.498191 INFO   ] utils:print_arguments:15 - local_rank: 0\r\n[2023-08-05 09:52:06.498230 INFO   ] utils:print_arguments:15 - pretrained_model: None\r\n[2023-08-05 09:52:06.498269 INFO   ] utils:print_arguments:15 - resume_model: None\r\n[2023-08-05 09:52:06.498306 INFO   ] utils:print_arguments:15 - save_model_path: models/\r\n[2023-08-05 09:52:06.498342 INFO   ] utils:print_arguments:15 - use_gpu: True\r\n[2023-08-05 09:52:06.498378 INFO   ] utils:print_arguments:16 - ------------------------------------------------\r\n[2023-08-05 09:52:06.513761 INFO   ] utils:print_arguments:18 - ----------- \u914d\u7f6e\u6587\u4ef6\u53c2\u6570 -----------\r\n[2023-08-05 09:52:06.513906 INFO   ] utils:print_arguments:21 - dataset_conf:\r\n[2023-08-05 09:52:06.513957 INFO   ] utils:print_arguments:24 -         dataLoader:\r\n[2023-08-05 09:52:06.513995 INFO   ] utils:print_arguments:26 -                 batch_size: 64\r\n[2023-08-05 09:52:06.514031 INFO   ] utils:print_arguments:26 -                 num_workers: 4\r\n[2023-08-05 09:52:06.514066 INFO   ] utils:print_arguments:28 -         do_vad: False\r\n[2023-08-05 09:52:06.514101 INFO   ] utils:print_arguments:28 -         enroll_list: dataset/enroll_list.txt\r\n[2023-08-05 09:52:06.514135 INFO   ] utils:print_arguments:24 -         eval_conf:\r\n[2023-08-05 09:52:06.514169 INFO   ] utils:print_arguments:26 -                 batch_size: 1\r\n[2023-08-05 09:52:06.514203 INFO   ] utils:print_arguments:26 -                 max_duration: 20\r\n[2023-08-05 09:52:06.514237 INFO   ] utils:print_arguments:28 -         max_duration: 3\r\n[2023-08-05 09:52:06.514274 INFO   ] utils:print_arguments:28 -         min_duration: 0.5\r\n[2023-08-05 09:52:06.514308 INFO   ] utils:print_arguments:28 -         noise_aug_prob: 0.2\r\n[2023-08-05 09:52:06.514342 INFO   ] utils:print_arguments:28 -         noise_dir: dataset/noise\r\n[2023-08-05 09:52:06.514374 INFO   ] utils:print_arguments:28 -         num_speakers: 3242\r\n[2023-08-05 09:52:06.514408 INFO   ] utils:print_arguments:28 -         sample_rate: 16000\r\n[2023-08-05 09:52:06.514441 INFO   ] utils:print_arguments:28 -         speed_perturb: True\r\n[2023-08-05 09:52:06.514475 INFO   ] utils:print_arguments:28 -         target_dB: -20\r\n[2023-08-05 09:52:06.514508 INFO   ] utils:print_arguments:28 -         train_list: dataset/train_list.txt\r\n[2023-08-05 09:52:06.514542 INFO   ] utils:print_arguments:28 -         trials_list: dataset/trials_list.txt\r\n[2023-08-05 09:52:06.514575 INFO   ] utils:print_arguments:28 -         use_dB_normalization: True\r\n[2023-08-05 09:52:06.514609 INFO   ] utils:print_arguments:21 - loss_conf:\r\n[2023-08-05 09:52:06.514643 INFO   ] utils:print_arguments:24 -         args:\r\n[2023-08-05 09:52:06.514678 INFO   ] utils:print_arguments:26 -                 easy_margin: False\r\n[2023-08-05 09:52:06.514713 INFO   ] utils:print_arguments:26 -                 margin: 0.2\r\n[2023-08-05 09:52:06.514746 INFO   ] utils:print_arguments:26 -                 scale: 32\r\n[2023-08-05 09:52:06.514779 INFO   ] utils:print_arguments:24 -         margin_scheduler_args:\r\n[2023-08-05 09:52:06.514814 INFO   ] utils:print_arguments:26 -                 final_margin: 0.3\r\n[2023-08-05 09:52:06.514848 INFO   ] utils:print_arguments:28 -         use_loss: AAMLoss\r\n[2023-08-05 09:52:06.514882 INFO   ] utils:print_arguments:28 -         use_margin_scheduler: True\r\n[2023-08-05 09:52:06.514915 INFO   ] utils:print_arguments:21 - model_conf:\r\n[2023-08-05 09:52:06.514950 INFO   ] utils:print_arguments:24 -         backbone:\r\n[2023-08-05 09:52:06.514984 INFO   ] utils:print_arguments:26 -                 embd_dim: 192\r\n[2023-08-05 09:52:06.515017 INFO   ] utils:print_arguments:26 -                 pooling_type: ASP\r\n[2023-08-05 09:52:06.515050 INFO   ] utils:print_arguments:24 -         classifier:\r\n[2023-08-05 09:52:06.515084 INFO   ] utils:print_arguments:26 -                 num_blocks: 0\r\n[2023-08-05 09:52:06.515118 INFO   ] utils:print_arguments:21 - optimizer_conf:\r\n[2023-08-05 09:52:06.515154 INFO   ] utils:print_arguments:28 -         learning_rate: 0.001\r\n[2023-08-05 09:52:06.515188 INFO   ] utils:print_arguments:28 -         optimizer: Adam\r\n[2023-08-05 09:52:06.515221 INFO   ] utils:print_arguments:28 -         scheduler: CosineAnnealingLR\r\n[2023-08-05 09:52:06.515254 INFO   ] utils:print_arguments:28 -         scheduler_args: None\r\n[2023-08-05 09:52:06.515289 INFO   ] utils:print_arguments:28 -         weight_decay: 1e-06\r\n[2023-08-05 09:52:06.515323 INFO   ] utils:print_arguments:21 - preprocess_conf:\r\n[2023-08-05 09:52:06.515357 INFO   ] utils:print_arguments:28 -         feature_method: MelSpectrogram\r\n[2023-08-05 09:52:06.515390 INFO   ] utils:print_arguments:24 -         method_args:\r\n[2023-08-05 09:52:06.515426 INFO   ] utils:print_arguments:26 -                 f_max: 14000.0\r\n[2023-08-05 09:52:06.515460 INFO   ] utils:print_arguments:26 -                 f_min: 50.0\r\n[2023-08-05 09:52:06.515493 INFO   ] utils:print_arguments:26 -                 hop_length: 320\r\n[2023-08-05 09:52:06.515527 INFO   ] utils:print_arguments:26 -                 n_fft: 1024\r\n[2023-08-05 09:52:06.515560 INFO   ] utils:print_arguments:26 -                 n_mels: 64\r\n[2023-08-05 09:52:06.515593 INFO   ] utils:print_arguments:26 -                 sample_rate: 16000\r\n[2023-08-05 09:52:06.515626 INFO   ] utils:print_arguments:26 -                 win_length: 1024\r\n[2023-08-05 09:52:06.515660 INFO   ] utils:print_arguments:21 - train_conf:\r\n[2023-08-05 09:52:06.515694 INFO   ] utils:print_arguments:28 -         log_interval: 100\r\n[2023-08-05 09:52:06.515728 INFO   ] utils:print_arguments:28 -         max_epoch: 30\r\n[2023-08-05 09:52:06.515761 INFO   ] utils:print_arguments:30 - use_model: EcapaTdnn\r\n[2023-08-05 09:52:06.515794 INFO   ] utils:print_arguments:31 - ------------------------------------------------\r\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\r\n===============================================================================================\r\nLayer (type:depth-idx)                        Output Shape              Param #\r\n===============================================================================================\r\nSequential                                    [1, 9726]                 --\r\n\u251c\u2500EcapaTdnn: 1-1                              [1, 192]                  --\r\n\u2502    \u2514\u2500Conv1dReluBn: 2-1                      [1, 512, 98]              --\r\n\u2502    \u2502    \u2514\u2500Conv1d: 3-1                       [1, 512, 98]              163,840\r\n\u2502    \u2502    \u2514\u2500BatchNorm1d: 3-2                  [1, 512, 98]              1,024\r\n\u2502    \u2514\u2500Sequential: 2-2                        [1, 512, 98]              --\r\n\u2502    \u2502    \u2514\u2500Conv1dReluBn: 3-3                 [1, 512, 98]              263,168\r\n\u2502    \u2502    \u2514\u2500Res2Conv1dReluBn: 3-4             [1, 512, 98]              86,912\r\n\u2502    \u2502    \u2514\u2500Conv1dReluBn: 3-5                 [1, 512, 98]              263,168\r\n\u2502    \u2502    \u2514\u2500SE_Connect: 3-6                   [1, 512, 98]              262,912\r\n\u2502    \u2514\u2500Sequential: 2-3                        [1, 512, 98]              --\r\n\u2502    \u2502    \u2514\u2500Conv1dReluBn: 3-7                 [1, 512, 98]              263,168\r\n\u2502    \u2502    \u2514\u2500Res2Conv1dReluBn: 3-8             [1, 512, 98]              86,912\r\n\u2502    \u2502    \u2514\u2500Conv1dReluBn: 3-9                 [1, 512, 98]              263,168\r\n\u2502    \u2502    \u2514\u2500SE_Connect: 3-10                  [1, 512, 98]              262,912\r\n\u2502    \u2514\u2500Sequential: 2-4                        [1, 512, 98]              --\r\n\u2502    \u2502    \u2514\u2500Conv1dReluBn: 3-11                [1, 512, 98]              263,168\r\n\u2502    \u2502    \u2514\u2500Res2Conv1dReluBn: 3-12            [1, 512, 98]              86,912\r\n\u2502    \u2502    \u2514\u2500Conv1dReluBn: 3-13                [1, 512, 98]              263,168\r\n\u2502    \u2502    \u2514\u2500SE_Connect: 3-14                  [1, 512, 98]              262,912\r\n\u2502    \u2514\u2500Conv1d: 2-5                            [1, 1536, 98]             2,360,832\r\n\u2502    \u2514\u2500AttentiveStatsPool: 2-6                [1, 3072]                 --\r\n\u2502    \u2502    \u2514\u2500Conv1d: 3-15                      [1, 128, 98]              196,736\r\n\u2502    \u2502    \u2514\u2500Conv1d: 3-16                      [1, 1536, 98]             198,144\r\n\u2502    \u2514\u2500BatchNorm1d: 2-7                       [1, 3072]                 6,144\r\n\u2502    \u2514\u2500Linear: 2-8                            [1, 192]                  590,016\r\n\u2502    \u2514\u2500BatchNorm1d: 2-9                       [1, 192]                  384\r\n\u251c\u2500SpeakerIdentification: 1-2                  [1, 9726]                 1,867,392\r\n===============================================================================================\r\nTotal params: 8,012,992\r\nTrainable params: 8,012,992\r\nNon-trainable params: 0\r\nTotal mult-adds (M): 468.81\r\n===============================================================================================\r\nInput size (MB): 0.03\r\nForward/backward pass size (MB): 10.36\r\nParams size (MB): 32.05\r\nEstimated Total Size (MB): 42.44\r\n===============================================================================================\r\n[2023-08-05 09:52:08.084231 INFO   ] trainer:train:388 - \u8bad\u7ec3\u6570\u636e\uff1a874175\r\n[2023-08-05 09:52:09.186542 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [0/13659], loss: 11.95824, accuracy: 0.00000, learning rate: 0.00100000, speed: 58.09 data/sec, eta: 5 days, 5:24:08\r\n[2023-08-05 09:52:22.477905 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [100/13659], loss: 10.35675, accuracy: 0.00278, learning rate: 0.00100000, speed: 481.65 data/sec, eta: 15:07:15\r\n[2023-08-05 09:52:35.948581 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [200/13659], loss: 10.22089, accuracy: 0.00505, learning rate: 0.00100000, speed: 475.27 data/sec, eta: 15:19:12\r\n[2023-08-05 09:52:49.249098 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [300/13659], loss: 10.00268, accuracy: 0.00706, learning rate: 0.00100000, speed: 481.45 data/sec, eta: 15:07:11\r\n[2023-08-05 09:53:03.716015 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [400/13659], loss: 9.76052, accuracy: 0.00830, learning rate: 0.00100000, speed: 442.74 data/sec, eta: 16:26:16\r\n[2023-08-05 09:53:18.258807 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [500/13659], loss: 9.50189, accuracy: 0.01060, learning rate: 0.00100000, speed: 440.46 data/sec, eta: 16:31:08\r\n[2023-08-05 09:53:31.618354 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [600/13659], loss: 9.26083, accuracy: 0.01256, learning rate: 0.00100000, speed: 479.50 data/sec, eta: 15:10:12\r\n[2023-08-05 09:53:45.439642 INFO   ] trainer:__train_epoch:334 - Train epoch: [1/30], batch: [700/13659], loss: 9.03548, accuracy: 0.01449, learning rate: 0.00099999, speed: 463.63 data/sec, eta: 15:41:08\r\n```\r\n\r\n\u542f\u52a8VisualDL\uff1a`visualdl --logdir=log --host 0.0.0.0`\uff0cVisualDL\u9875\u9762\u5982\u4e0b\uff1a\r\n\r\n<div align=\"center\">\r\n<img src=\"./docs/images/log.jpg\" alt=\"VisualDL\u9875\u9762\" width=\"600\">\r\n</div>\r\n\r\n\r\n# \u8bc4\u4f30\u6a21\u578b\r\n\u8bad\u7ec3\u7ed3\u675f\u4e4b\u540e\u4f1a\u4fdd\u5b58\u9884\u6d4b\u6a21\u578b\uff0c\u6211\u4eec\u7528\u9884\u6d4b\u6a21\u578b\u6765\u9884\u6d4b\u6d4b\u8bd5\u96c6\u4e2d\u7684\u97f3\u9891\u7279\u5f81\uff0c\u7136\u540e\u4f7f\u7528\u97f3\u9891\u7279\u5f81\u8fdb\u884c\u4e24\u4e24\u5bf9\u6bd4\uff0c\u8ba1\u7b97EER\u548cMinDCF\u3002\r\n```shell\r\npython eval.py\r\n```\r\n\r\n\u8f93\u51fa\u7c7b\u4f3c\u5982\u4e0b\uff1a\r\n```\r\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\r\n------------------------------------------------\r\nW0425 08:27:32.057426 17654 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2\r\nW0425 08:27:32.065165 17654 device_context.cc:465] device: 0, cuDNN Version: 7.6.\r\n[2023-03-16 20:20:47.195908 INFO   ] trainer:evaluate:341 - \u6210\u529f\u52a0\u8f7d\u6a21\u578b\uff1amodels/EcapaTdnn_Fbank/best_model/model.pth\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 84/84 [00:28<00:00,  2.95it/s]\r\n\u5f00\u59cb\u4e24\u4e24\u5bf9\u6bd4\u97f3\u9891\u7279\u5f81...\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5332/5332 [00:05<00:00, 1027.83it/s]\r\n\u8bc4\u4f30\u6d88\u8017\u65f6\u95f4\uff1a65s\uff0cthreshold\uff1a0.26\uff0cEER: 0.14739, MinDCF: 0.41999\r\n```\r\n\r\n# \u63a8\u7406\u63a5\u53e3\r\n\r\n\u4e0b\u9762\u7ed9\u51fa\u4e86\u51e0\u4e2a\u5e38\u7528\u7684\u63a5\u53e3\uff0c\u66f4\u591a\u63a5\u53e3\u8bf7\u53c2\u8003`mvector/predict.py`\uff0c\u4e5f\u53ef\u4ee5\u5f80\u4e0b\u770b`\u58f0\u7eb9\u5bf9\u6bd4`\u548c`\u58f0\u7eb9\u8bc6\u522b`\u7684\u4f8b\u5b50\u3002\r\n\r\n```python\r\nfrom mvector.predict import MVectorPredictor\r\n\r\npredictor = MVectorPredictor(configs='configs/cam++.yml',\r\n                             model_path='models/CAMPPlus_Fbank/best_model/')\r\n# \u83b7\u53d6\u97f3\u9891\u7279\u5f81\r\nembedding = predictor.predict(audio_data='dataset/a_1.wav')\r\n# \u83b7\u53d6\u4e24\u4e2a\u97f3\u9891\u7684\u76f8\u4f3c\u5ea6\r\nsimilarity = predictor.contrast(audio_data1='dataset/a_1.wav', audio_data2='dataset/a_2.wav')\r\n\r\n# \u6ce8\u518c\u7528\u6237\u97f3\u9891\r\npredictor.register(user_name='\u591c\u96e8\u98d8\u96f6', audio_data='dataset/test.wav')\r\n# \u8bc6\u522b\u7528\u6237\u97f3\u9891\r\nname, score = predictor.recognition(audio_data='dataset/test1.wav')\r\n# \u83b7\u53d6\u6240\u6709\u7528\u6237\r\nusers_name = predictor.get_users()\r\n# \u5220\u9664\u7528\u6237\u97f3\u9891\r\npredictor.remove_user(user_name='\u591c\u96e8\u98d8\u96f6')\r\n```\r\n\r\n# \u58f0\u7eb9\u5bf9\u6bd4\r\n\u4e0b\u9762\u5f00\u59cb\u5b9e\u73b0\u58f0\u7eb9\u5bf9\u6bd4\uff0c\u521b\u5efa`infer_contrast.py`\u7a0b\u5e8f\uff0c\u9996\u5148\u4ecb\u7ecd\u51e0\u4e2a\u91cd\u8981\u7684\u51fd\u6570\uff0c`predict()`\u51fd\u6570\u662f\u53ef\u4ee5\u83b7\u53d6\u58f0\u7eb9\u7279\u5f81\uff0c`predict_batch()`\u51fd\u6570\u662f\u53ef\u4ee5\u83b7\u53d6\u4e00\u6279\u7684\u58f0\u7eb9\u7279\u5f81\uff0c`contrast()`\u51fd\u6570\u53ef\u4ee5\u5bf9\u6bd4\u4e24\u6761\u97f3\u9891\u7684\u76f8\u4f3c\u5ea6\uff0c`register()`\u51fd\u6570\u6ce8\u518c\u4e00\u6761\u97f3\u9891\u5230\u58f0\u7eb9\u5e93\u91cc\u9762\uff0c`recognition()`\u51fd\u8f93\u5165\u4e00\u6761\u97f3\u9891\u5e76\u4e14\u4ece\u58f0\u7eb9\u5e93\u91cc\u9762\u5bf9\u6bd4\u8bc6\u522b\uff0c`remove_user()`\u51fd\u6570\u79fb\u9664\u4f60\u597d\u3002\u58f0\u7eb9\u5e93\u91cc\u9762\u7684\u6ce8\u518c\u4eba\u3002\u6211\u4eec\u8f93\u5165\u4e24\u4e2a\u8bed\u97f3\uff0c\u901a\u8fc7\u9884\u6d4b\u51fd\u6570\u83b7\u53d6\u4ed6\u4eec\u7684\u7279\u5f81\u6570\u636e\uff0c\u4f7f\u7528\u8fd9\u4e2a\u7279\u5f81\u6570\u636e\u53ef\u4ee5\u6c42\u4ed6\u4eec\u7684\u5bf9\u89d2\u4f59\u5f26\u503c\uff0c\u5f97\u5230\u7684\u7ed3\u679c\u53ef\u4ee5\u4f5c\u4e3a\u4ed6\u4eec\u76f8\u8bc6\u5ea6\u3002\u5bf9\u4e8e\u8fd9\u4e2a\u76f8\u8bc6\u5ea6\u7684\u9608\u503c`threshold`\uff0c\u8bfb\u8005\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u9879\u76ee\u7684\u51c6\u786e\u5ea6\u8981\u6c42\u8fdb\u884c\u4fee\u6539\u3002\r\n```shell\r\npython infer_contrast.py --audio_path1=audio/a_1.wav --audio_path2=audio/b_2.wav\r\n```\r\n\r\n\u8f93\u51fa\u7c7b\u4f3c\u5982\u4e0b\uff1a\r\n```\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:13 - ----------- \u989d\u5916\u914d\u7f6e\u53c2\u6570 -----------\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - audio_path1: dataset/a_1.wav\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - audio_path2: dataset/b_2.wav\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - configs: configs/ecapa_tdnn.yml\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - model_path: models/EcapaTdnn_Fbank/best_model/\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - threshold: 0.6\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:15 - use_gpu: True\r\n[2023-04-02 18:30:48.009149 INFO   ] utils:print_arguments:16 - ------------------------------------------------\r\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\r\nW0425 08:29:10.006249 21121 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2\r\nW0425 08:29:10.008555 21121 device_context.cc:465] device: 0, cuDNN Version: 7.6.\r\n\u6210\u529f\u52a0\u8f7d\u6a21\u578b\u53c2\u6570\u548c\u4f18\u5316\u65b9\u6cd5\u53c2\u6570\uff1amodels/EcapaTdnn_Fbank/best_model/model.pth\r\naudio/a_1.wav \u548c audio/b_2.wav \u4e0d\u662f\u540c\u4e00\u4e2a\u4eba\uff0c\u76f8\u4f3c\u5ea6\u4e3a\uff1a-0.09565544128417969\r\n```\r\n\r\n\u540c\u65f6\u8fd8\u63d0\u4f9b\u4e86\u6709GUI\u754c\u9762\u7684\u58f0\u7eb9\u5bf9\u6bd4\u7a0b\u5e8f\uff0c\u6267\u884c`infer_contrast_gui.py`\u542f\u52a8\u7a0b\u5e8f\uff0c\u754c\u9762\u5982\u4e0b\uff0c\u5206\u522b\u9009\u62e9\u4e24\u4e2a\u97f3\u9891\uff0c\u70b9\u51fb\u5f00\u59cb\u5224\u65ad\uff0c\u5c31\u53ef\u4ee5\u5224\u65ad\u5b83\u4eec\u662f\u5426\u662f\u540c\u4e00\u4e2a\u4eba\u3002\r\n\r\n<div align=\"center\">\r\n<img src=\"./docs/images/contrast.jpg\" alt=\"\u58f0\u7eb9\u5bf9\u6bd4\u754c\u9762\">\r\n</div>\r\n\r\n# \u58f0\u7eb9\u8bc6\u522b\r\n\r\n\u5728\u65b0\u95fb\u8bc6\u522b\u91cc\u9762\u4e3b\u8981\u4f7f\u7528\u5230`register()`\u51fd\u6570\u548c`recognition()`\u51fd\u6570\uff0c\u9996\u5148\u4f7f\u7528`register()`\u51fd\u6570\u51fd\u6570\u6765\u6ce8\u518c\u97f3\u9891\u5230\u58f0\u7eb9\u5e93\u91cc\u9762\uff0c\u4e5f\u53ef\u4ee5\u76f4\u63a5\u628a\u6587\u4ef6\u6dfb\u52a0\u5230`audio_db`\u6587\u4ef6\u5939\u91cc\u9762\uff0c\u4f7f\u7528\u7684\u65f6\u5019\u901a\u8fc7`recognition()`\u51fd\u6570\u6765\u53d1\u8d77\u8bc6\u522b\uff0c\u8f93\u5165\u4e00\u6761\u97f3\u9891\uff0c\u5c31\u53ef\u4ee5\u4ece\u58f0\u7eb9\u5e93\u91cc\u9762\u8bc6\u522b\u5230\u6240\u9700\u8981\u7684\u8bf4\u8bdd\u4eba\u3002\r\n\r\n\u6709\u4e86\u4e0a\u9762\u7684\u58f0\u7eb9\u8bc6\u522b\u7684\u51fd\u6570\uff0c\u8bfb\u8005\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u9879\u76ee\u7684\u9700\u6c42\u5b8c\u6210\u58f0\u7eb9\u8bc6\u522b\u7684\u65b9\u5f0f\uff0c\u4f8b\u5982\u7b14\u8005\u4e0b\u9762\u63d0\u4f9b\u7684\u662f\u901a\u8fc7\u5f55\u97f3\u6765\u5b8c\u6210\u58f0\u7eb9\u8bc6\u522b\u3002\u9996\u5148\u5fc5\u987b\u8981\u52a0\u8f7d\u8bed\u97f3\u5e93\u4e2d\u7684\u8bed\u97f3\uff0c\u8bed\u97f3\u5e93\u6587\u4ef6\u5939\u4e3a`audio_db`\uff0c\u7136\u540e\u7528\u6237\u56de\u8f66\u540e\u5f55\u97f33\u79d2\u949f\uff0c\u7136\u540e\u7a0b\u5e8f\u4f1a\u81ea\u52a8\u5f55\u97f3\uff0c\u5e76\u4f7f\u7528\u5f55\u97f3\u5230\u7684\u97f3\u9891\u8fdb\u884c\u58f0\u7eb9\u8bc6\u522b\uff0c\u53bb\u5339\u914d\u8bed\u97f3\u5e93\u4e2d\u7684\u8bed\u97f3\uff0c\u83b7\u53d6\u7528\u6237\u7684\u4fe1\u606f\u3002\u901a\u8fc7\u8fd9\u6837\u65b9\u5f0f\uff0c\u8bfb\u8005\u4e5f\u53ef\u4ee5\u4fee\u6539\u6210\u901a\u8fc7\u670d\u52a1\u8bf7\u6c42\u7684\u65b9\u5f0f\u5b8c\u6210\u58f0\u7eb9\u8bc6\u522b\uff0c\u4f8b\u5982\u63d0\u4f9b\u4e00\u4e2aAPI\u4f9bAPP\u8c03\u7528\uff0c\u7528\u6237\u5728APP\u4e0a\u901a\u8fc7\u58f0\u7eb9\u767b\u5f55\u65f6\uff0c\u628a\u5f55\u97f3\u5230\u7684\u8bed\u97f3\u53d1\u9001\u5230\u540e\u7aef\u5b8c\u6210\u58f0\u7eb9\u8bc6\u522b\uff0c\u518d\u628a\u7ed3\u679c\u8fd4\u56de\u7ed9APP\uff0c\u524d\u63d0\u662f\u7528\u6237\u5df2\u7ecf\u4f7f\u7528\u8bed\u97f3\u6ce8\u518c\uff0c\u5e76\u6210\u529f\u628a\u8bed\u97f3\u6570\u636e\u5b58\u653e\u5728`audio_db`\u6587\u4ef6\u5939\u4e2d\u3002\r\n```shell\r\npython infer_recognition.py\r\n```\r\n\r\n\u8f93\u51fa\u7c7b\u4f3c\u5982\u4e0b\uff1a\r\n```\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:13 - ----------- \u989d\u5916\u914d\u7f6e\u53c2\u6570 -----------\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - audio_db_path: audio_db/\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - configs: configs/ecapa_tdnn.yml\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - model_path: models/EcapaTdnn_Fbank/best_model/\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - record_seconds: 3\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - threshold: 0.6\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:15 - use_gpu: True\r\n[2023-04-02 18:31:20.521040 INFO   ] utils:print_arguments:16 - ------------------------------------------------\r\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\r\nW0425 08:30:13.257884 23889 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2\r\nW0425 08:30:13.260191 23889 device_context.cc:465] device: 0, cuDNN Version: 7.6.\r\n\u6210\u529f\u52a0\u8f7d\u6a21\u578b\u53c2\u6570\u548c\u4f18\u5316\u65b9\u6cd5\u53c2\u6570\uff1amodels/ecapa_tdnn/model.pth\r\nLoaded \u6c99\u745e\u91d1 audio.\r\nLoaded \u674e\u8fbe\u5eb7 audio.\r\n\u8bf7\u9009\u62e9\u529f\u80fd\uff0c0\u4e3a\u6ce8\u518c\u97f3\u9891\u5230\u58f0\u7eb9\u5e93\uff0c1\u4e3a\u6267\u884c\u58f0\u7eb9\u8bc6\u522b\uff1a0\r\n\u6309\u4e0b\u56de\u8f66\u952e\u5f00\u673a\u5f55\u97f3\uff0c\u5f55\u97f33\u79d2\u4e2d\uff1a\r\n\u5f00\u59cb\u5f55\u97f3......\r\n\u5f55\u97f3\u5df2\u7ed3\u675f!\r\n\u8bf7\u8f93\u5165\u8be5\u97f3\u9891\u7528\u6237\u7684\u540d\u79f0\uff1a\u591c\u96e8\u98d8\u96f6\r\n\u8bf7\u9009\u62e9\u529f\u80fd\uff0c0\u4e3a\u6ce8\u518c\u97f3\u9891\u5230\u58f0\u7eb9\u5e93\uff0c1\u4e3a\u6267\u884c\u58f0\u7eb9\u8bc6\u522b\uff1a1\r\n\u6309\u4e0b\u56de\u8f66\u952e\u5f00\u673a\u5f55\u97f3\uff0c\u5f55\u97f33\u79d2\u4e2d\uff1a\r\n\u5f00\u59cb\u5f55\u97f3......\r\n\u5f55\u97f3\u5df2\u7ed3\u675f!\r\n\u8bc6\u522b\u8bf4\u8bdd\u7684\u4e3a\uff1a\u591c\u96e8\u98d8\u96f6\uff0c\u76f8\u4f3c\u5ea6\u4e3a\uff1a0.920434\r\n```\r\n\r\n\r\n\u540c\u65f6\u8fd8\u63d0\u4f9b\u4e86\u6709GUI\u754c\u9762\u7684\u58f0\u7eb9\u8bc6\u522b\u7a0b\u5e8f\uff0c\u6267\u884c`infer_recognition_gui.py`\u542f\u52a8\uff0c\u70b9\u51fb`\u6ce8\u518c\u97f3\u9891\u5230\u58f0\u7eb9\u5e93`\u6309\u94ae\uff0c\u7406\u89e3\u5f00\u59cb\u8bf4\u8bdd\uff0c\u5f55\u52363\u79d2\u949f\uff0c\u7136\u540e\u8f93\u5165\u6ce8\u518c\u4eba\u7684\u540d\u79f0\uff0c\u4e4b\u540e\u53ef\u4ee5`\u6267\u884c\u58f0\u7eb9\u8bc6\u522b`\u6309\u94ae\uff0c\u7136\u540e\u7acb\u5373\u8bf4\u8bdd\uff0c\u5f55\u52363\u79d2\u949f\u540e\uff0c\u7b49\u5f85\u8bc6\u522b\u7ed3\u679c\u3002`\u5220\u9664\u7528\u6237`\u6309\u94ae\u53ef\u4ee5\u5220\u9664\u7528\u6237\u3002`\u5b9e\u65f6\u8bc6\u522b`\u6309\u94ae\u53ef\u4ee5\u5b9e\u65f6\u8bc6\u522b\uff0c\u53ef\u4ee5\u4e00\u76f4\u5f55\u97f3\uff0c\u4e00\u76f4\u8bc6\u522b\u3002\r\n\r\n<div align=\"center\">\r\n<img src=\"./docs/images/recognition.jpg\" alt=\"\u58f0\u7eb9\u8bc6\u522b\u754c\u9762\">\r\n</div>\r\n\r\n# \u8bf4\u8bdd\u4eba\u65e5\u5fd7\uff08\u5206\u79bb\u8bf4\u8bdd\u4eba\uff09\r\n\r\n\u6267\u884c`infer_speaker_diarization.py`\u7a0b\u5e8f\uff0c\u8f93\u5165\u97f3\u9891\u8def\u5f84\uff0c\u5c31\u53ef\u4ee5\u5206\u79bb\u51fa\u8bf4\u8bdd\u4eba\uff0c\u5e76\u663e\u793a\u7ed3\u679c\uff0c\u5efa\u8bae\u97f3\u9891\u957f\u5ea6\u4e0d\u8981\u4f4e\u4e8e10\u79d2\u3002\u66f4\u591a\u529f\u80fd\u53ef\u4ee5\u67e5\u770b\u8be5\u7a0b\u5e8f\u53c2\u6570\u3002\r\n```shell\r\npython infer_speaker_diarization.py --audio_path=dataset/test_long.wav\r\n```\r\n\r\n\u8f93\u51fa\u7c7b\u4f3c\u5982\u4e0b\uff1a\r\n```\r\n2024-10-10 19:30:40.768 | INFO     | mvector.predict:__init__:61 - \u6210\u529f\u52a0\u8f7d\u6a21\u578b\u53c2\u6570\uff1amodels/CAMPPlus_Fbank/best_model/model.pth\r\n2024-10-10 19:30:40.795 | INFO     | mvector.predict:__create_index:127 - \u58f0\u7eb9\u7279\u5f81\u7d22\u5f15\u521b\u5efa\u5b8c\u6210\uff0c\u4e00\u5171\u67093\u4e2a\u7528\u6237\uff0c\u5206\u522b\u662f\uff1a['\u6c99\u745e\u91d1', '\u591c\u96e8\u98d8\u96f6', '\u674e\u8fbe\u5eb7']\r\n2024-10-10 19:30:40.796 | INFO     | mvector.predict:__load_audio_db:142 - \u6b63\u5728\u52a0\u8f7d\u58f0\u7eb9\u5e93\u6570\u636e...\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<?, ?it/s]\r\n2024-10-10 19:30:40.798 | INFO     | mvector.predict:__create_index:127 - \u58f0\u7eb9\u7279\u5f81\u7d22\u5f15\u521b\u5efa\u5b8c\u6210\uff0c\u4e00\u5171\u67093\u4e2a\u7528\u6237\uff0c\u5206\u522b\u662f\uff1a['\u6c99\u745e\u91d1', '\u591c\u96e8\u98d8\u96f6', '\u674e\u8fbe\u5eb7']\r\n2024-10-10 19:30:40.798 | INFO     | mvector.predict:__load_audio_db:172 - \u58f0\u7eb9\u5e93\u6570\u636e\u52a0\u8f7d\u5b8c\u6210\uff01\r\n\u8bc6\u522b\u7ed3\u679c\uff1a\r\n{'speaker': '\u6c99\u745e\u91d1', 'start': 0.0, 'end': 2.0}\r\n{'speaker': '\u964c\u751f\u4eba1', 'start': 4.0, 'end': 7.0}\r\n{'speaker': '\u674e\u8fbe\u5eb7', 'start': 7.0, 'end': 8.0}\r\n{'speaker': '\u6c99\u745e\u91d1', 'start': 9.0, 'end': 12.0}\r\n{'speaker': '\u6c99\u745e\u91d1', 'start': 13.0, 'end': 14.0}\r\n{'speaker': '\u964c\u751f\u4eba1', 'start': 15.0, 'end': 19.0}\r\n```\r\n\r\n\u663e\u793a\u7ed3\u679c\u56fe\u50cf\u5982\u4e0b\uff0c\u53ef\u4ee5\u901a\u8fc7`\u7a7a\u683c`\u952e\u63a7\u5236\u64ad\u653e\u97f3\u9891\uff0c\u70b9\u51fb\u4f4d\u7f6e\u53ef\u4ee5\u8df3\u8f6c\u97f3\u9891\u5230\u6307\u5b9a\u4f4d\u7f6e\uff1a\r\n<div align=\"center\">\r\n<img src=\"./docs/images/speaker_diarization.jpg\" alt=\"\u8bf4\u8bdd\u4eba\u65e5\u5fd7\" width=\"800\">\r\n</div>\r\n\r\n\u9879\u76ee\u540c\u6837\u63d0\u4f9b\u4e86GUI\u754c\u9762\u7684\u7a0b\u5e8f\uff0c\u6267\u884c`infer_speaker_diarization_gui.py`\u7a0b\u5e8f\u3002\u66f4\u591a\u529f\u80fd\u53ef\u4ee5\u67e5\u770b\u8be5\u7a0b\u5e8f\u53c2\u6570\u3002\r\n```shell\r\npython infer_speaker_diarization_gui.py\r\n```\r\n\r\n\u53ef\u4ee5\u6253\u5f00\u8fd9\u6837\u4e00\u4e2a\u9875\u9762\uff0c\u8fdb\u884c\u8bf4\u8bdd\u4eba\u8bc6\u522b\uff1a\r\n\r\n<div align=\"center\">\r\n<img src=\"./docs/images/speaker_diarization_gui.png\" alt=\"\u8bf4\u8bdd\u4eba\u65e5\u5fd7\" width=\"800\">\r\n</div>\r\n\r\n\r\n\u6ce8\u610f\uff1a\u5982\u679c\u8bf4\u8bdd\u4eba\u540d\u5b57\u662f\u4e2d\u6587\u7684\uff0c\u9700\u8981\u8bbe\u7f6e\u5b89\u88c5\u5b57\u4f53\u624d\u80fd\u6b63\u5e38\u663e\u793a\uff0c\u4e00\u822c\u60c5\u51b5\u4e0bWindows\u65e0\u9700\u5b89\u88c5\uff0cUbuntu\u9700\u8981\u5b89\u88c5\u3002\u5982\u679cWindows\u786e\u5b9e\u662f\u7f3a\u5c11\u5b57\u4f53\uff0c\u53ea\u9700\u8981[\u5b57\u4f53\u6587\u4ef6](https://github.com/tracyone/program_font)\u8fd9\u91cc\u4e0b\u8f7d`.ttf`\u683c\u5f0f\u7684\u6587\u4ef6\uff0c\u590d\u5236\u5230`C:\\Windows\\Fonts`\u5373\u53ef\u3002Ubuntu\u7cfb\u7edf\u64cd\u4f5c\u5982\u4e0b\u3002\r\n\r\n1. \u5b89\u88c5\u5b57\u4f53\r\n```shell\r\ngit clone https://github.com/tracyone/program_font && cd program_font && ./install.sh\r\n```\r\n\r\n2. \u6267\u884c\u4e0b\u9762Python\u4ee3\u7801\r\n```python\r\nimport matplotlib\r\nimport shutil\r\nimport os\r\n\r\npath = matplotlib.matplotlib_fname()\r\npath = path.replace('matplotlibrc', 'fonts/ttf/')\r\nprint(path)\r\nshutil.copy('/usr/share/fonts/MyFonts/simhei.ttf', path)\r\nuser_dir = os.path.expanduser('~')\r\nshutil.rmtree(f'{user_dir}/.cache/matplotlib', ignore_errors=True)\r\n```\r\n\r\n\r\n# \u5176\u4ed6\u7248\u672c\r\n - Tensorflow\uff1a[VoiceprintRecognition-Tensorflow](https://github.com/yeyupiaoling/VoiceprintRecognition-Tensorflow)\r\n - PaddlePaddle\uff1a[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)\r\n - Keras\uff1a[VoiceprintRecognition-Keras](https://github.com/yeyupiaoling/VoiceprintRecognition-Keras)\r\n\r\n\r\n## \u6253\u8d4f\u4f5c\u8005\r\n\r\n<br/>\r\n<div align=\"center\">\r\n<p>\u6253\u8d4f\u4e00\u5757\u94b1\u652f\u6301\u4e00\u4e0b\u4f5c\u8005</p>\r\n<img src=\"https://yeyupiaoling.cn/reward.png\" alt=\"\u6253\u8d4f\u4f5c\u8005\" width=\"400\">\r\n</div>\r\n\r\n\r\n# \u53c2\u8003\u8d44\u6599\r\n1. https://github.com/PaddlePaddle/PaddleSpeech\r\n2. https://github.com/yeyupiaoling/PaddlePaddle-MobileFaceNets\r\n3. https://github.com/yeyupiaoling/PPASR\r\n4. https://github.com/alibaba-damo-academy/3D-Speaker\r\n5. https://github.com/wenet-e2e/wespeaker\r\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Voice Print Recognition toolkit on Pytorch",
    "version": "1.1.1",
    "project_urls": {
        "Download": "https://github.com/yeyupiaoling/VoiceprintRecognition_Pytorch.git",
        "Homepage": "https://github.com/yeyupiaoling/VoiceprintRecognition_Pytorch"
    },
    "split_keywords": [
        "voice",
        " pytorch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8777fc8719b1fd31309c2f8d34f12d7f39e73a4cf60e78ee373c8313ef6dd292",
                "md5": "6ad7f72bd39e5afe363ac5bfd08025de",
                "sha256": "2914fac7dba838321ee4c0cd164cf83a0154bc20dfdc9430fb5bdc26abc967b1"
            },
            "downloads": -1,
            "filename": "mvector-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ad7f72bd39e5afe363ac5bfd08025de",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 73944,
            "upload_time": "2024-10-13T12:33:29",
            "upload_time_iso_8601": "2024-10-13T12:33:29.121033Z",
            "url": "https://files.pythonhosted.org/packages/87/77/fc8719b1fd31309c2f8d34f12d7f39e73a4cf60e78ee373c8313ef6dd292/mvector-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-13 12:33:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yeyupiaoling",
    "github_project": "VoiceprintRecognition_Pytorch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.19.2"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.59.0"
                ]
            ]
        },
        {
            "name": "visualdl",
            "specs": [
                [
                    ">=",
                    "2.5.3"
                ]
            ]
        },
        {
            "name": "resampy",
            "specs": [
                [
                    ">=",
                    "0.2.2"
                ]
            ]
        },
        {
            "name": "soundfile",
            "specs": [
                [
                    ">=",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "soundcard",
            "specs": [
                [
                    ">=",
                    "0.4.2"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "5.4.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.5.2"
                ]
            ]
        },
        {
            "name": "pydub",
            "specs": [
                [
                    ">=",
                    "0.25.1"
                ]
            ]
        },
        {
            "name": "torchinfo",
            "specs": [
                [
                    ">=",
                    "1.7.2"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    ">=",
                    "0.7.2"
                ]
            ]
        },
        {
            "name": "yeaudio",
            "specs": [
                [
                    ">=",
                    "0.0.6"
                ]
            ]
        }
    ],
    "lcname": "mvector"
}
        
Elapsed time: 2.45768s