# 基于PaddlePaddle实现的声音分类系统
![python version](https://img.shields.io/badge/python-3.8+-orange.svg)
![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/AudioClassification-PaddlePaddle)
![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/AudioClassification-PaddlePaddle)
![GitHub](https://img.shields.io/github/license/yeyupiaoling/AudioClassification-PaddlePaddle)
![支持系统](https://img.shields.io/badge/支持系统-Win/Linux/MAC-9cf)
# 前言
本项目是基于PaddlePaddle的声音分类项目,旨在实现对各种环境声音、动物叫声和语种的识别。项目提供了多种声音分类模型,如EcapaTdnn、PANNS、ResNetSE、CAMPPlus和ERes2Net,以支持不同的应用场景。此外,项目还提供了常用的Urbansound8K数据集测试报告和一些方言数据集的下载和使用例子。用户可以根据自己的需求选择适合的模型和数据集,以实现更准确的声音分类。项目的应用场景广泛,可以用于室外的环境监测、野生动物保护、语音识别等领域。同时,项目也鼓励用户探索更多的使用场景,以推动声音分类技术的发展和应用。
**欢迎大家扫码入知识星球或者QQ群讨论,知识星球里面提供项目的模型文件和博主其他相关项目的模型文件,也包括其他一些资源。**
<div align="center">
<img src="https://yeyupiaoling.cn/zsxq.png" alt="知识星球" width="400">
<img src="https://yeyupiaoling.cn/qq.png" alt="QQ群" width="400">
</div>
# 目录
- [前言](#前言)
- [项目特性](#项目特性)
- [模型测试表](#模型测试表)
- [安装环境](#安装环境)
- [创建数据](#创建数据)
- [修改预处理方法(可选)](#修改预处理方法可选)
- [提取特征(可选)](#提取特征可选)
- [训练模型](#训练模型)
- [评估模型](#评估模型)
- [预测](#预测)
- [其他功能](#其他功能)
# 使用准备
- Anaconda 3
- Python 3.8
- PaddlePaddle 2.6.1
- Windows 10 or Ubuntu 18.04
# 项目特性
1. 支持模型:EcapaTdnn、PANNS、TDNN、Res2Net、ResNetSE
2. 支持池化层:AttentiveStatisticsPooling(ASP)、SelfAttentivePooling(SAP)、TemporalStatisticsPooling(TSP)、TemporalAveragePooling(TAP)
3. 支持预处理方法:MelSpectrogram、LogMelSpectrogram、Spectrogram、MFCC、Fbank
**模型论文:**
- EcapaTdnn:[ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/abs/2005.07143v3)
- PANNS:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/abs/1912.10211v5)
- TDNN:[Prediction of speech intelligibility with DNN-based performance measures](https://arxiv.org/abs/2203.09148)
- Res2Net:[Res2Net: A New Multi-scale Backbone Architecture](https://arxiv.org/abs/1904.01169)
- ResNetSE:[Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)
- CAMPPlus:[CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking](https://arxiv.org/abs/2303.00332v3)
- ERes2Net:[An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification](https://arxiv.org/abs/2305.12838v1)
# 模型测试表
| 模型 | Params(M) | 预处理方法 | 数据集 | 类别数量 | 准确率 | 获取模型 |
|:------------:|:---------:|:-----:|:------------:|:----:|:-------:|:--------:|
| ResNetSE | 9.1 | Flank | UrbanSound8K | 10 | 0.95568 | 加入知识星球获取 |
| CAMPPlus | 7.2 | Flank | UrbanSound8K | 10 | 0.95000 | 加入知识星球获取 |
| ERes2NetV2 | 5.4 | Flank | UrbanSound8K | 10 | 0.94545 | 加入知识星球获取 |
| ERes2Net | 6.6 | Flank | UrbanSound8K | 10 | 0.93977 | 加入知识星球获取 |
| PANNS(CNN10) | 4.9 | Flank | UrbanSound8K | 10 | 0.92841 | 加入知识星球获取 |
| EcapaTdnn | 6.2 | Flank | UrbanSound8K | 10 | 0.92727 | 加入知识星球获取 |
| TDNN | 2.7 | Flank | UrbanSound8K | 10 | 0.92727 | 加入知识星球获取 |
| Res2Net | 5.6 | Flank | UrbanSound8K | 10 | 0.88750 | 加入知识星球获取 |
**说明:**
1. 使用的测试集为从数据集中每10条音频取一条,共874条。
## 安装环境
- 首先安装的是PaddlePaddle的2.6.1以上的版本,如果已经安装过了,请跳过。
```shell
conda install paddlepaddle-gpu==2.6.1 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
- 安装ppacls库。
使用pip安装,命令如下:
```shell
python -m pip install ppacls -U -i https://pypi.tuna.tsinghua.edu.cn/simple
```
**建议源码安装**,源码安装能保证使用最新代码。
```shell
git clone https://github.com/yeyupiaoling/AudioClassification_PaddlePaddle.git
cd AudioClassification_PaddlePaddle
pip install .
```
## 创建数据
生成数据列表,用于下一步的读取需要,`audio_path`为音频文件路径,用户需要提前把音频数据集存放在`dataset/audio`目录下,每个文件夹存放一个类别的音频数据,每条音频数据长度在3秒以上,如 `dataset/audio/鸟叫声/······`。`audio`是数据列表存放的位置,生成的数据类别的格式为 `音频路径\t音频对应的类别标签`,音频路径和标签用制表符 `\t`分开。读者也可以根据自己存放数据的方式修改以下函数。
以Urbansound8K为例,Urbansound8K是目前应用较为广泛的用于自动城市环境声分类研究的公共数据集,包含10个分类:空调声、汽车鸣笛声、儿童玩耍声、狗叫声、钻孔声、引擎空转声、枪声、手提钻、警笛声和街道音乐声。数据集下载地址:[UrbanSound8K.tar.gz](https://aistudio.baidu.com/aistudio/datasetdetail/36625)。以下是针对Urbansound8K生成数据列表的函数。如果读者想使用该数据集,请下载并解压到 `dataset`目录下,把生成数据列表代码改为以下代码。
执行`create_data.py`即可生成数据列表,里面提供了两种生成列表方式,第一种是自定义的数据,第二种是生成Urbansound8K的数据列表,具体看代码。
```shell
python create_data.py
```
生成的列表是长这样的,前面是音频的路径,后面是该音频对应的标签,从0开始,路径和标签之间用Tab隔开。
```shell
dataset/UrbanSound8K/audio/fold2/104817-4-0-2.wav 4
dataset/UrbanSound8K/audio/fold9/105029-7-2-5.wav 7
dataset/UrbanSound8K/audio/fold3/107228-5-0-0.wav 5
dataset/UrbanSound8K/audio/fold4/109711-3-2-4.wav 3
```
# 修改预处理方法(可选)
配置文件中默认使用的是MelSpectrogram预处理方法,如果要使用其他预处理方法,可以修改配置文件中的安装下面方式修改,具体的值可以根据自己情况修改。如果不清楚如何设置参数,可以直接删除该部分,直接使用默认值。
```yaml
preprocess_conf:
# 音频预处理方法,支持:MelSpectrogram、Spectrogram、MFCC、Fbank
feature_method: 'MelSpectrogram'
# 设置API参数,更参数查看对应API,不清楚的可以直接删除该部分,直接使用默认值
method_args:
sample_rate: 16000
n_fft: 1024
hop_length: 320
win_length: 1024
f_min: 50.0
f_max: 14000.0
n_mels: 64
```
# 提取特征(可选)
在训练过程中,首先是要读取音频数据,然后提取特征,最后再进行训练。其中读取音频数据、提取特征也是比较消耗时间的,所以我们可以选择提前提取好取特征,训练模型的是就可以直接加载提取好的特征,这样训练速度会更快。这个提取特征是可选择,如果没有提取好的特征,训练模型的时候就会从读取音频数据,然后提取特征开始。提取特征步骤如下:
1. 执行`extract_features.py`,提取特征,特征会保存在`dataset/features`目录下,并生成新的数据列表`train_list_features.txt`和`test_list_features.txt`。
```shell
python extract_features.py --configs=configs/cam++.yml --save_dir=dataset/features
```
2. 修改配置文件,将`dataset_conf.train_list`和`dataset_conf.test_list`修改为`train_list_features.txt`和`test_list_features.txt`。
## 训练模型
接着就可以开始训练模型了,创建 `train.py`。配置文件里面的参数一般不需要修改,但是这几个是需要根据自己实际的数据集进行调整的,首先最重要的就是分类大小`dataset_conf.num_class`,这个每个数据集的分类大小可能不一样,根据自己的实际情况设定。然后是`dataset_conf.batch_size`,如果是显存不够的话,可以减小这个参数。
```shell
# 单卡训练
CUDA_VISIBLE_DEVICES=0 python train.py
# 多卡训练
python -m paddle.distributed.launch --gpus '0,1' train.py
```
训练输出日志:
```
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:14 - ----------- 额外配置参数 -----------
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - configs: configs/ecapa_tdnn.yml
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - pretrained_model: None
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - resume_model: None
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - save_model_path: models/
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - use_gpu: True
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:17 - ------------------------------------------------
[2023-08-07 23:02:08.811036 INFO ] utils:print_arguments:19 - ----------- 配置文件参数 -----------
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:22 - dataset_conf:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - aug_conf:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - noise_aug_prob: 0.2
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - noise_dir: dataset/noise
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - speed_perturb: True
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - volume_aug_prob: 0.2
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - volume_perturb: False
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - dataLoader:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - batch_size: 64
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - num_workers: 4
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - do_vad: False
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - eval_conf:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - batch_size: 1
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - max_duration: 20
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - label_list_path: dataset/label_list.txt
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - max_duration: 3
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - min_duration: 0.5
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - sample_rate: 16000
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - spec_aug_args:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - freq_mask_width: [0, 8]
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - time_mask_width: [0, 10]
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - target_dB: -20
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - test_list: dataset/test_list.txt
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - train_list: dataset/train_list.txt
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - use_dB_normalization: True
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - use_spec_aug: True
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:22 - model_conf:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - num_class: 10
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - pooling_type: ASP
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - optimizer_conf:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - optimizer: Adam
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - scheduler: WarmupCosineSchedulerLR
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:25 - scheduler_args:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - learning_rate: 0.001
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - min_lr: 1e-05
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - warmup_epoch: 5
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - weight_decay: 1e-06
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - preprocess_conf:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - feature_method: Fbank
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:25 - method_args:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - n_mels: 80
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - sr: 16000
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - train_conf:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - log_interval: 10
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - max_epoch: 60
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:31 - use_model: EcapaTdnn
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:32 - ------------------------------------------------
[2023-08-07 23:02:08.817077 WARNING] trainer:__init__:69 - Windows系统不支持多线程读取数据,已自动关闭!
W0807 23:02:08.822477 3192 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.7, Runtime API Version: 11.6
W0807 23:02:08.826478 3192 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
----------------------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
========================================================================================
Conv1D-2 [[1, 80, 102]] [1, 512, 98] 205,312
Conv1d-1 [[1, 80, 98]] [1, 512, 98] 0
ReLU-1 [[1, 512, 98]] [1, 512, 98] 0
BatchNorm1D-2 [[1, 512, 98]] [1, 512, 98] 2,048
BatchNorm1d-1 [[1, 512, 98]] [1, 512, 98] 0
TDNNBlock-1 [[1, 80, 98]] [1, 512, 98] 0
Conv1D-4 [[1, 512, 98]] [1, 512, 98] 262,656
Conv1d-3 [[1, 512, 98]] [1, 512, 98] 0
ReLU-2 [[1, 512, 98]] [1, 512, 98] 0
BatchNorm1D-4 [[1, 512, 98]] [1, 512, 98] 2,048
BatchNorm1d-3 [[1, 512, 98]] [1, 512, 98] 0
TDNNBlock-2 [[1, 512, 98]] [1, 512, 98] 0
Conv1D-6 [[1, 64, 102]] [1, 64, 98] 12,352
Conv1d-5 [[1, 64, 98]] [1, 64, 98] 0
ReLU-3 [[1, 64, 98]] [1, 64, 98] 0
BatchNorm1D-6 [[1, 64, 98]] [1, 64, 98] 256
BatchNorm1d-5 [[1, 64, 98]] [1, 64, 98] 0
TDNNBlock-3 [[1, 64, 98]] [1, 64, 98] 0
Conv1D-8 [[1, 64, 102]] [1, 64, 98] 12,352
Conv1d-7 [[1, 64, 98]] [1, 64, 98] 0
ReLU-4 [[1, 64, 98]] [1, 64, 98] 0
BatchNorm1D-8 [[1, 64, 98]] [1, 64, 98] 256
BatchNorm1d-7 [[1, 64, 98]] [1, 64, 98] 0
TDNNBlock-4 [[1, 64, 98]] [1, 64, 98] 0
Conv1D-10 [[1, 64, 102]] [1, 64, 98] 12,352
Conv1d-9 [[1, 64, 98]] [1, 64, 98] 0
ReLU-5 [[1, 64, 98]] [1, 64, 98] 0
BatchNorm1D-10 [[1, 64, 98]] [1, 64, 98] 256
BatchNorm1d-9 [[1, 64, 98]] [1, 64, 98] 0
TDNNBlock-5 [[1, 64, 98]] [1, 64, 98] 0
Conv1D-12 [[1, 64, 102]] [1, 64, 98] 12,352
Conv1d-11 [[1, 64, 98]] [1, 64, 98] 0
ReLU-6 [[1, 64, 98]] [1, 64, 98] 0
BatchNorm1D-12 [[1, 64, 98]] [1, 64, 98] 256
BatchNorm1d-11 [[1, 64, 98]] [1, 64, 98] 0
TDNNBlock-6 [[1, 64, 98]] [1, 64, 98] 0
······················································
BatchNorm1d-59 [[1, 128, 98]] [1, 128, 98] 0
TDNNBlock-30 [[1, 4608, 98]] [1, 128, 98] 0
Tanh-1 [[1, 128, 98]] [1, 128, 98] 0
Conv1D-74 [[1, 128, 98]] [1, 1536, 98] 198,144
Conv1d-73 [[1, 128, 98]] [1, 1536, 98] 0
AttentiveStatisticsPooling-1 [[1, 1536, 98]] [1, 3072, 1] 0
BatchNorm1D-62 [[1, 3072, 1]] [1, 3072, 1] 12,288
BatchNorm1d-61 [[1, 3072, 1]] [1, 3072, 1] 0
Conv1D-76 [[1, 3072, 1]] [1, 192, 1] 590,016
Conv1d-75 [[1, 3072, 1]] [1, 192, 1] 0
Linear-1 [[1, 192]] [1, 10] 1,930
========================================================================================
Total params: 6,215,306
Trainable params: 6,195,978
Non-trainable params: 19,328
----------------------------------------------------------------------------------------
Input size (MB): 0.03
Forward/backward pass size (MB): 35.53
Params size (MB): 23.71
Estimated Total Size (MB): 59.27
----------------------------------------------------------------------------------------
[2023-08-07 23:02:11.081835 INFO ] trainer:train:317 - 训练数据:8644
[2023-08-07 23:02:15.428326 INFO ] trainer:__train_epoch:269 - Train epoch: [1/60], batch: [0/136], loss: 2.99582, accuracy: 0.04688, learning rate: 0.00000000, speed: 14.72 data/sec, eta: 9:51:07
```
# 评估模型
执行下面命令执行评估。
```shell
python eval.py --configs=configs/bi_lstm.yml
```
评估输出如下:
```shell
[2024-02-03 15:13:25.469242 INFO ] trainer:evaluate:461 - 成功加载模型:models/CAMPPlus_Fbank/best_model/model.pth
100%|██████████████████████████████| 150/150 [00:00<00:00, 1281.96it/s]
评估消耗时间:1s,loss:0.61840,accuracy:0.87333
```
评估会出来输出准确率,还保存了混淆矩阵图片,保存路径`output/images/`,如下。
<br/>
<div align="center">
<img src="docs/images/image1.png" alt="混淆矩阵" width="600">
</div>
注意:如果类别标签是中文的,需要设置安装字体才能正常显示,一般情况下Windows无需安装,Ubuntu需要安装。如果Windows确实是缺少字体,只需要[字体文件](https://github.com/tracyone/program_font)这里下载`.ttf`格式的文件,复制到`C:\Windows\Fonts`即可。Ubuntu系统操作如下。
1. 安装字体
```shell
git clone https://github.com/tracyone/program_font && cd program_font && ./install.sh
```
2. 执行下面Python代码
```python
import matplotlib
import shutil
import os
path = matplotlib.matplotlib_fname()
path = path.replace('matplotlibrc', 'fonts/ttf/')
print(path)
shutil.copy('/usr/share/fonts/MyFonts/simhei.ttf', path)
user_dir = os.path.expanduser('~')
shutil.rmtree(f'{user_dir}/.cache/matplotlib', ignore_errors=True)
```
# 预测
在训练结束之后,我们得到了一个模型参数文件,我们使用这个模型预测音频。
```shell
python infer.py --audio_path=dataset/UrbanSound8K/audio/fold5/156634-5-2-5.wav
```
# 其他功能
- 为了方便读取录制数据和制作数据集,这里提供了录音程序`record_audio.py`,这个用于录制音频,录制的音频采样率为16000,单通道,16bit。
```shell
python record_audio.py
```
- `infer_record.py`这个程序是用来不断进行录音识别,我们可以大致理解为这个程序在实时录音识别。通过这个应该我们可以做一些比较有趣的事情,比如把麦克风放在小鸟经常来的地方,通过实时录音识别,一旦识别到有鸟叫的声音,如果你的数据集足够强大,有每种鸟叫的声音数据集,这样你还能准确识别是那种鸟叫。如果识别到目标鸟类,就启动程序,例如拍照等等。
```shell
python infer_record.py --record_seconds=3
```
## 打赏作者
<br/>
<div align="center">
<p>打赏一块钱支持一下作者</p>
<img src="https://yeyupiaoling.cn/reward.png" alt="打赏作者" width="400">
</div>
# 参考资料
1. https://github.com/PaddlePaddle/PaddleSpeech
2. https://github.com/yeyupiaoling/PaddlePaddle-MobileFaceNets
3. https://github.com/yeyupiaoling/PPASR
4. https://github.com/alibaba-damo-academy/3D-Speaker
Raw data
{
"_id": null,
"home_page": "https://github.com/yeyupiaoling/AudioClassification_PaddlePaddle",
"name": "ppacls",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "audio, paddle",
"author": "yeyupiaoling",
"author_email": null,
"download_url": "https://github.com/yeyupiaoling/AudioClassification_PaddlePaddle.git",
"platform": null,
"description": "# \u57fa\u4e8ePaddlePaddle\u5b9e\u73b0\u7684\u58f0\u97f3\u5206\u7c7b\u7cfb\u7edf\r\n\r\n![python version](https://img.shields.io/badge/python-3.8+-orange.svg)\r\n![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/AudioClassification-PaddlePaddle)\r\n![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/AudioClassification-PaddlePaddle)\r\n![GitHub](https://img.shields.io/github/license/yeyupiaoling/AudioClassification-PaddlePaddle)\r\n![\u652f\u6301\u7cfb\u7edf](https://img.shields.io/badge/\u652f\u6301\u7cfb\u7edf-Win/Linux/MAC-9cf)\r\n\r\n# \u524d\u8a00\r\n\r\n\u672c\u9879\u76ee\u662f\u57fa\u4e8ePaddlePaddle\u7684\u58f0\u97f3\u5206\u7c7b\u9879\u76ee\uff0c\u65e8\u5728\u5b9e\u73b0\u5bf9\u5404\u79cd\u73af\u5883\u58f0\u97f3\u3001\u52a8\u7269\u53eb\u58f0\u548c\u8bed\u79cd\u7684\u8bc6\u522b\u3002\u9879\u76ee\u63d0\u4f9b\u4e86\u591a\u79cd\u58f0\u97f3\u5206\u7c7b\u6a21\u578b\uff0c\u5982EcapaTdnn\u3001PANNS\u3001ResNetSE\u3001CAMPPlus\u548cERes2Net\uff0c\u4ee5\u652f\u6301\u4e0d\u540c\u7684\u5e94\u7528\u573a\u666f\u3002\u6b64\u5916\uff0c\u9879\u76ee\u8fd8\u63d0\u4f9b\u4e86\u5e38\u7528\u7684Urbansound8K\u6570\u636e\u96c6\u6d4b\u8bd5\u62a5\u544a\u548c\u4e00\u4e9b\u65b9\u8a00\u6570\u636e\u96c6\u7684\u4e0b\u8f7d\u548c\u4f7f\u7528\u4f8b\u5b50\u3002\u7528\u6237\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u7684\u9700\u6c42\u9009\u62e9\u9002\u5408\u7684\u6a21\u578b\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u5b9e\u73b0\u66f4\u51c6\u786e\u7684\u58f0\u97f3\u5206\u7c7b\u3002\u9879\u76ee\u7684\u5e94\u7528\u573a\u666f\u5e7f\u6cdb\uff0c\u53ef\u4ee5\u7528\u4e8e\u5ba4\u5916\u7684\u73af\u5883\u76d1\u6d4b\u3001\u91ce\u751f\u52a8\u7269\u4fdd\u62a4\u3001\u8bed\u97f3\u8bc6\u522b\u7b49\u9886\u57df\u3002\u540c\u65f6\uff0c\u9879\u76ee\u4e5f\u9f13\u52b1\u7528\u6237\u63a2\u7d22\u66f4\u591a\u7684\u4f7f\u7528\u573a\u666f\uff0c\u4ee5\u63a8\u52a8\u58f0\u97f3\u5206\u7c7b\u6280\u672f\u7684\u53d1\u5c55\u548c\u5e94\u7528\u3002\r\n\r\n**\u6b22\u8fce\u5927\u5bb6\u626b\u7801\u5165\u77e5\u8bc6\u661f\u7403\u6216\u8005QQ\u7fa4\u8ba8\u8bba\uff0c\u77e5\u8bc6\u661f\u7403\u91cc\u9762\u63d0\u4f9b\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\u548c\u535a\u4e3b\u5176\u4ed6\u76f8\u5173\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\uff0c\u4e5f\u5305\u62ec\u5176\u4ed6\u4e00\u4e9b\u8d44\u6e90\u3002**\r\n\r\n<div align=\"center\">\r\n <img src=\"https://yeyupiaoling.cn/zsxq.png\" alt=\"\u77e5\u8bc6\u661f\u7403\" width=\"400\">\r\n <img src=\"https://yeyupiaoling.cn/qq.png\" alt=\"QQ\u7fa4\" width=\"400\">\r\n</div>\r\n\r\n\r\n# \u76ee\u5f55\r\n\r\n- [\u524d\u8a00](#\u524d\u8a00)\r\n- [\u9879\u76ee\u7279\u6027](#\u9879\u76ee\u7279\u6027)\r\n- [\u6a21\u578b\u6d4b\u8bd5\u8868](#\u6a21\u578b\u6d4b\u8bd5\u8868)\r\n- [\u5b89\u88c5\u73af\u5883](#\u5b89\u88c5\u73af\u5883)\r\n- [\u521b\u5efa\u6570\u636e](#\u521b\u5efa\u6570\u636e)\r\n- [\u4fee\u6539\u9884\u5904\u7406\u65b9\u6cd5\uff08\u53ef\u9009\uff09](#\u4fee\u6539\u9884\u5904\u7406\u65b9\u6cd5\u53ef\u9009)\r\n- [\u63d0\u53d6\u7279\u5f81\uff08\u53ef\u9009\uff09](#\u63d0\u53d6\u7279\u5f81\u53ef\u9009)\r\n- [\u8bad\u7ec3\u6a21\u578b](#\u8bad\u7ec3\u6a21\u578b)\r\n- [\u8bc4\u4f30\u6a21\u578b](#\u8bc4\u4f30\u6a21\u578b)\r\n- [\u9884\u6d4b](#\u9884\u6d4b)\r\n- [\u5176\u4ed6\u529f\u80fd](#\u5176\u4ed6\u529f\u80fd)\r\n\r\n\r\n# \u4f7f\u7528\u51c6\u5907\r\n\r\n - Anaconda 3\r\n - Python 3.8\r\n - PaddlePaddle 2.6.1\r\n - Windows 10 or Ubuntu 18.04\r\n\r\n# \u9879\u76ee\u7279\u6027\r\n\r\n1. \u652f\u6301\u6a21\u578b\uff1aEcapaTdnn\u3001PANNS\u3001TDNN\u3001Res2Net\u3001ResNetSE\r\n2. \u652f\u6301\u6c60\u5316\u5c42\uff1aAttentiveStatisticsPooling(ASP)\u3001SelfAttentivePooling(SAP)\u3001TemporalStatisticsPooling(TSP)\u3001TemporalAveragePooling(TAP)\r\n3. \u652f\u6301\u9884\u5904\u7406\u65b9\u6cd5\uff1aMelSpectrogram\u3001LogMelSpectrogram\u3001Spectrogram\u3001MFCC\u3001Fbank\r\n\r\n**\u6a21\u578b\u8bba\u6587\uff1a**\r\n\r\n- EcapaTdnn\uff1a[ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/abs/2005.07143v3)\r\n- PANNS\uff1a[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/abs/1912.10211v5)\r\n- TDNN\uff1a[Prediction of speech intelligibility with DNN-based performance measures](https://arxiv.org/abs/2203.09148)\r\n- Res2Net\uff1a[Res2Net: A New Multi-scale Backbone Architecture](https://arxiv.org/abs/1904.01169)\r\n- ResNetSE\uff1a[Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)\r\n- CAMPPlus\uff1a[CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking](https://arxiv.org/abs/2303.00332v3)\r\n- ERes2Net\uff1a[An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification](https://arxiv.org/abs/2305.12838v1)\r\n\r\n\r\n# \u6a21\u578b\u6d4b\u8bd5\u8868\r\n\r\n| \u6a21\u578b | Params(M) | \u9884\u5904\u7406\u65b9\u6cd5 | \u6570\u636e\u96c6 | \u7c7b\u522b\u6570\u91cf | \u51c6\u786e\u7387 | \u83b7\u53d6\u6a21\u578b |\r\n|:------------:|:---------:|:-----:|:------------:|:----:|:-------:|:--------:|\r\n| ResNetSE | 9.1 | Flank | UrbanSound8K | 10 | 0.95568 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| CAMPPlus | 7.2 | Flank | UrbanSound8K | 10 | 0.95000 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| ERes2NetV2 | 5.4 | Flank | UrbanSound8K | 10 | 0.94545 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| ERes2Net | 6.6 | Flank | UrbanSound8K | 10 | 0.93977 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| PANNS\uff08CNN10\uff09 | 4.9 | Flank | UrbanSound8K | 10 | 0.92841 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| EcapaTdnn | 6.2 | Flank | UrbanSound8K | 10 | 0.92727 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| TDNN | 2.7 | Flank | UrbanSound8K | 10 | 0.92727 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| Res2Net | 5.6 | Flank | UrbanSound8K | 10 | 0.88750 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n\r\n**\u8bf4\u660e\uff1a**\r\n\r\n1. \u4f7f\u7528\u7684\u6d4b\u8bd5\u96c6\u4e3a\u4ece\u6570\u636e\u96c6\u4e2d\u6bcf10\u6761\u97f3\u9891\u53d6\u4e00\u6761\uff0c\u5171874\u6761\u3002\r\n\r\n## \u5b89\u88c5\u73af\u5883\r\n\r\n - \u9996\u5148\u5b89\u88c5\u7684\u662fPaddlePaddle\u76842.6.1\u4ee5\u4e0a\u7684\u7248\u672c\uff0c\u5982\u679c\u5df2\u7ecf\u5b89\u88c5\u8fc7\u4e86\uff0c\u8bf7\u8df3\u8fc7\u3002\r\n```shell\r\nconda install paddlepaddle-gpu==2.6.1 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge\r\n```\r\n\r\n - \u5b89\u88c5ppacls\u5e93\u3002\r\n \r\n\u4f7f\u7528pip\u5b89\u88c5\uff0c\u547d\u4ee4\u5982\u4e0b\uff1a\r\n```shell\r\npython -m pip install ppacls -U -i https://pypi.tuna.tsinghua.edu.cn/simple\r\n```\r\n\r\n**\u5efa\u8bae\u6e90\u7801\u5b89\u88c5**\uff0c\u6e90\u7801\u5b89\u88c5\u80fd\u4fdd\u8bc1\u4f7f\u7528\u6700\u65b0\u4ee3\u7801\u3002\r\n```shell\r\ngit clone https://github.com/yeyupiaoling/AudioClassification_PaddlePaddle.git\r\ncd AudioClassification_PaddlePaddle\r\npip install .\r\n```\r\n\r\n## \u521b\u5efa\u6570\u636e\r\n\r\n\u751f\u6210\u6570\u636e\u5217\u8868\uff0c\u7528\u4e8e\u4e0b\u4e00\u6b65\u7684\u8bfb\u53d6\u9700\u8981\uff0c`audio_path`\u4e3a\u97f3\u9891\u6587\u4ef6\u8def\u5f84\uff0c\u7528\u6237\u9700\u8981\u63d0\u524d\u628a\u97f3\u9891\u6570\u636e\u96c6\u5b58\u653e\u5728`dataset/audio`\u76ee\u5f55\u4e0b\uff0c\u6bcf\u4e2a\u6587\u4ef6\u5939\u5b58\u653e\u4e00\u4e2a\u7c7b\u522b\u7684\u97f3\u9891\u6570\u636e\uff0c\u6bcf\u6761\u97f3\u9891\u6570\u636e\u957f\u5ea6\u57283\u79d2\u4ee5\u4e0a\uff0c\u5982 `dataset/audio/\u9e1f\u53eb\u58f0/\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7`\u3002`audio`\u662f\u6570\u636e\u5217\u8868\u5b58\u653e\u7684\u4f4d\u7f6e\uff0c\u751f\u6210\u7684\u6570\u636e\u7c7b\u522b\u7684\u683c\u5f0f\u4e3a `\u97f3\u9891\u8def\u5f84\\t\u97f3\u9891\u5bf9\u5e94\u7684\u7c7b\u522b\u6807\u7b7e`\uff0c\u97f3\u9891\u8def\u5f84\u548c\u6807\u7b7e\u7528\u5236\u8868\u7b26 `\\t`\u5206\u5f00\u3002\u8bfb\u8005\u4e5f\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u5b58\u653e\u6570\u636e\u7684\u65b9\u5f0f\u4fee\u6539\u4ee5\u4e0b\u51fd\u6570\u3002\r\n\r\n\u4ee5Urbansound8K\u4e3a\u4f8b\uff0cUrbansound8K\u662f\u76ee\u524d\u5e94\u7528\u8f83\u4e3a\u5e7f\u6cdb\u7684\u7528\u4e8e\u81ea\u52a8\u57ce\u5e02\u73af\u5883\u58f0\u5206\u7c7b\u7814\u7a76\u7684\u516c\u5171\u6570\u636e\u96c6\uff0c\u5305\u542b10\u4e2a\u5206\u7c7b\uff1a\u7a7a\u8c03\u58f0\u3001\u6c7d\u8f66\u9e23\u7b1b\u58f0\u3001\u513f\u7ae5\u73a9\u800d\u58f0\u3001\u72d7\u53eb\u58f0\u3001\u94bb\u5b54\u58f0\u3001\u5f15\u64ce\u7a7a\u8f6c\u58f0\u3001\u67aa\u58f0\u3001\u624b\u63d0\u94bb\u3001\u8b66\u7b1b\u58f0\u548c\u8857\u9053\u97f3\u4e50\u58f0\u3002\u6570\u636e\u96c6\u4e0b\u8f7d\u5730\u5740\uff1a[UrbanSound8K.tar.gz](https://aistudio.baidu.com/aistudio/datasetdetail/36625)\u3002\u4ee5\u4e0b\u662f\u9488\u5bf9Urbansound8K\u751f\u6210\u6570\u636e\u5217\u8868\u7684\u51fd\u6570\u3002\u5982\u679c\u8bfb\u8005\u60f3\u4f7f\u7528\u8be5\u6570\u636e\u96c6\uff0c\u8bf7\u4e0b\u8f7d\u5e76\u89e3\u538b\u5230 `dataset`\u76ee\u5f55\u4e0b\uff0c\u628a\u751f\u6210\u6570\u636e\u5217\u8868\u4ee3\u7801\u6539\u4e3a\u4ee5\u4e0b\u4ee3\u7801\u3002\r\n\r\n\u6267\u884c`create_data.py`\u5373\u53ef\u751f\u6210\u6570\u636e\u5217\u8868\uff0c\u91cc\u9762\u63d0\u4f9b\u4e86\u4e24\u79cd\u751f\u6210\u5217\u8868\u65b9\u5f0f\uff0c\u7b2c\u4e00\u79cd\u662f\u81ea\u5b9a\u4e49\u7684\u6570\u636e\uff0c\u7b2c\u4e8c\u79cd\u662f\u751f\u6210Urbansound8K\u7684\u6570\u636e\u5217\u8868\uff0c\u5177\u4f53\u770b\u4ee3\u7801\u3002\r\n```shell\r\npython create_data.py\r\n```\r\n\r\n\u751f\u6210\u7684\u5217\u8868\u662f\u957f\u8fd9\u6837\u7684\uff0c\u524d\u9762\u662f\u97f3\u9891\u7684\u8def\u5f84\uff0c\u540e\u9762\u662f\u8be5\u97f3\u9891\u5bf9\u5e94\u7684\u6807\u7b7e\uff0c\u4ece0\u5f00\u59cb\uff0c\u8def\u5f84\u548c\u6807\u7b7e\u4e4b\u95f4\u7528Tab\u9694\u5f00\u3002\r\n```shell\r\ndataset/UrbanSound8K/audio/fold2/104817-4-0-2.wav\t4\r\ndataset/UrbanSound8K/audio/fold9/105029-7-2-5.wav\t7\r\ndataset/UrbanSound8K/audio/fold3/107228-5-0-0.wav\t5\r\ndataset/UrbanSound8K/audio/fold4/109711-3-2-4.wav\t3\r\n```\r\n\r\n# \u4fee\u6539\u9884\u5904\u7406\u65b9\u6cd5\uff08\u53ef\u9009\uff09\r\n\r\n\u914d\u7f6e\u6587\u4ef6\u4e2d\u9ed8\u8ba4\u4f7f\u7528\u7684\u662fMelSpectrogram\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u5982\u679c\u8981\u4f7f\u7528\u5176\u4ed6\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u53ef\u4ee5\u4fee\u6539\u914d\u7f6e\u6587\u4ef6\u4e2d\u7684\u5b89\u88c5\u4e0b\u9762\u65b9\u5f0f\u4fee\u6539\uff0c\u5177\u4f53\u7684\u503c\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u60c5\u51b5\u4fee\u6539\u3002\u5982\u679c\u4e0d\u6e05\u695a\u5982\u4f55\u8bbe\u7f6e\u53c2\u6570\uff0c\u53ef\u4ee5\u76f4\u63a5\u5220\u9664\u8be5\u90e8\u5206\uff0c\u76f4\u63a5\u4f7f\u7528\u9ed8\u8ba4\u503c\u3002\r\n\r\n```yaml\r\npreprocess_conf:\r\n # \u97f3\u9891\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u652f\u6301\uff1aMelSpectrogram\u3001Spectrogram\u3001MFCC\u3001Fbank\r\n feature_method: 'MelSpectrogram'\r\n # \u8bbe\u7f6eAPI\u53c2\u6570\uff0c\u66f4\u53c2\u6570\u67e5\u770b\u5bf9\u5e94API\uff0c\u4e0d\u6e05\u695a\u7684\u53ef\u4ee5\u76f4\u63a5\u5220\u9664\u8be5\u90e8\u5206\uff0c\u76f4\u63a5\u4f7f\u7528\u9ed8\u8ba4\u503c\r\n method_args:\r\n sample_rate: 16000\r\n n_fft: 1024\r\n hop_length: 320\r\n win_length: 1024\r\n f_min: 50.0\r\n f_max: 14000.0\r\n n_mels: 64\r\n```\r\n\r\n# \u63d0\u53d6\u7279\u5f81\uff08\u53ef\u9009\uff09\r\n\r\n\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u9996\u5148\u662f\u8981\u8bfb\u53d6\u97f3\u9891\u6570\u636e\uff0c\u7136\u540e\u63d0\u53d6\u7279\u5f81\uff0c\u6700\u540e\u518d\u8fdb\u884c\u8bad\u7ec3\u3002\u5176\u4e2d\u8bfb\u53d6\u97f3\u9891\u6570\u636e\u3001\u63d0\u53d6\u7279\u5f81\u4e5f\u662f\u6bd4\u8f83\u6d88\u8017\u65f6\u95f4\u7684\uff0c\u6240\u4ee5\u6211\u4eec\u53ef\u4ee5\u9009\u62e9\u63d0\u524d\u63d0\u53d6\u597d\u53d6\u7279\u5f81\uff0c\u8bad\u7ec3\u6a21\u578b\u7684\u662f\u5c31\u53ef\u4ee5\u76f4\u63a5\u52a0\u8f7d\u63d0\u53d6\u597d\u7684\u7279\u5f81\uff0c\u8fd9\u6837\u8bad\u7ec3\u901f\u5ea6\u4f1a\u66f4\u5feb\u3002\u8fd9\u4e2a\u63d0\u53d6\u7279\u5f81\u662f\u53ef\u9009\u62e9\uff0c\u5982\u679c\u6ca1\u6709\u63d0\u53d6\u597d\u7684\u7279\u5f81\uff0c\u8bad\u7ec3\u6a21\u578b\u7684\u65f6\u5019\u5c31\u4f1a\u4ece\u8bfb\u53d6\u97f3\u9891\u6570\u636e\uff0c\u7136\u540e\u63d0\u53d6\u7279\u5f81\u5f00\u59cb\u3002\u63d0\u53d6\u7279\u5f81\u6b65\u9aa4\u5982\u4e0b\uff1a\r\n\r\n1. \u6267\u884c`extract_features.py`\uff0c\u63d0\u53d6\u7279\u5f81\uff0c\u7279\u5f81\u4f1a\u4fdd\u5b58\u5728`dataset/features`\u76ee\u5f55\u4e0b\uff0c\u5e76\u751f\u6210\u65b0\u7684\u6570\u636e\u5217\u8868`train_list_features.txt`\u548c`test_list_features.txt`\u3002\r\n\r\n```shell\r\npython extract_features.py --configs=configs/cam++.yml --save_dir=dataset/features\r\n```\r\n\r\n2. \u4fee\u6539\u914d\u7f6e\u6587\u4ef6\uff0c\u5c06`dataset_conf.train_list`\u548c`dataset_conf.test_list`\u4fee\u6539\u4e3a`train_list_features.txt`\u548c`test_list_features.txt`\u3002\r\n\r\n\r\n## \u8bad\u7ec3\u6a21\u578b\r\n\r\n\u63a5\u7740\u5c31\u53ef\u4ee5\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u4e86\uff0c\u521b\u5efa `train.py`\u3002\u914d\u7f6e\u6587\u4ef6\u91cc\u9762\u7684\u53c2\u6570\u4e00\u822c\u4e0d\u9700\u8981\u4fee\u6539\uff0c\u4f46\u662f\u8fd9\u51e0\u4e2a\u662f\u9700\u8981\u6839\u636e\u81ea\u5df1\u5b9e\u9645\u7684\u6570\u636e\u96c6\u8fdb\u884c\u8c03\u6574\u7684\uff0c\u9996\u5148\u6700\u91cd\u8981\u7684\u5c31\u662f\u5206\u7c7b\u5927\u5c0f`dataset_conf.num_class`\uff0c\u8fd9\u4e2a\u6bcf\u4e2a\u6570\u636e\u96c6\u7684\u5206\u7c7b\u5927\u5c0f\u53ef\u80fd\u4e0d\u4e00\u6837\uff0c\u6839\u636e\u81ea\u5df1\u7684\u5b9e\u9645\u60c5\u51b5\u8bbe\u5b9a\u3002\u7136\u540e\u662f`dataset_conf.batch_size`\uff0c\u5982\u679c\u662f\u663e\u5b58\u4e0d\u591f\u7684\u8bdd\uff0c\u53ef\u4ee5\u51cf\u5c0f\u8fd9\u4e2a\u53c2\u6570\u3002\r\n\r\n```shell\r\n# \u5355\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0 python train.py\r\n# \u591a\u5361\u8bad\u7ec3\r\npython -m paddle.distributed.launch --gpus '0,1' train.py\r\n```\r\n\u8bad\u7ec3\u8f93\u51fa\u65e5\u5fd7\uff1a\r\n```\r\n[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:14 - ----------- \u989d\u5916\u914d\u7f6e\u53c2\u6570 -----------\r\n[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - configs: configs/ecapa_tdnn.yml\r\n[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - pretrained_model: None\r\n[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - resume_model: None\r\n[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - save_model_path: models/\r\n[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - use_gpu: True\r\n[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:17 - ------------------------------------------------\r\n[2023-08-07 23:02:08.811036 INFO ] utils:print_arguments:19 - ----------- \u914d\u7f6e\u6587\u4ef6\u53c2\u6570 -----------\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:22 - dataset_conf:\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - \taug_conf:\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tnoise_aug_prob: 0.2\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tnoise_dir: dataset/noise\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tspeed_perturb: True\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tvolume_aug_prob: 0.2\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tvolume_perturb: False\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - \tdataLoader:\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tbatch_size: 64\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tnum_workers: 4\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \tdo_vad: False\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - \teval_conf:\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tbatch_size: 1\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tmax_duration: 20\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \tlabel_list_path: dataset/label_list.txt\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \tmax_duration: 3\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \tmin_duration: 0.5\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \tsample_rate: 16000\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - \tspec_aug_args:\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\tfreq_mask_width: [0, 8]\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - \t\ttime_mask_width: [0, 10]\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \ttarget_dB: -20\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \ttest_list: dataset/test_list.txt\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \ttrain_list: dataset/train_list.txt\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \tuse_dB_normalization: True\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - \tuse_spec_aug: True\r\n[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:22 - model_conf:\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \tnum_class: 10\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \tpooling_type: ASP\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - optimizer_conf:\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \toptimizer: Adam\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \tscheduler: WarmupCosineSchedulerLR\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:25 - \tscheduler_args:\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - \t\tlearning_rate: 0.001\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - \t\tmin_lr: 1e-05\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - \t\twarmup_epoch: 5\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \tweight_decay: 1e-06\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - preprocess_conf:\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \tfeature_method: Fbank\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:25 - \tmethod_args:\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - \t\tn_mels: 80\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - \t\tsr: 16000\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - train_conf:\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \tlog_interval: 10\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - \tmax_epoch: 60\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:31 - use_model: EcapaTdnn\r\n[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:32 - ------------------------------------------------\r\n[2023-08-07 23:02:08.817077 WARNING] trainer:__init__:69 - Windows\u7cfb\u7edf\u4e0d\u652f\u6301\u591a\u7ebf\u7a0b\u8bfb\u53d6\u6570\u636e\uff0c\u5df2\u81ea\u52a8\u5173\u95ed\uff01\r\nW0807 23:02:08.822477 3192 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.7, Runtime API Version: 11.6\r\nW0807 23:02:08.826478 3192 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.\r\n----------------------------------------------------------------------------------------\r\n Layer (type) Input Shape Output Shape Param # \r\n========================================================================================\r\n Conv1D-2 [[1, 80, 102]] [1, 512, 98] 205,312 \r\n Conv1d-1 [[1, 80, 98]] [1, 512, 98] 0 \r\n ReLU-1 [[1, 512, 98]] [1, 512, 98] 0 \r\n BatchNorm1D-2 [[1, 512, 98]] [1, 512, 98] 2,048 \r\n BatchNorm1d-1 [[1, 512, 98]] [1, 512, 98] 0 \r\n TDNNBlock-1 [[1, 80, 98]] [1, 512, 98] 0 \r\n Conv1D-4 [[1, 512, 98]] [1, 512, 98] 262,656 \r\n Conv1d-3 [[1, 512, 98]] [1, 512, 98] 0 \r\n ReLU-2 [[1, 512, 98]] [1, 512, 98] 0 \r\n BatchNorm1D-4 [[1, 512, 98]] [1, 512, 98] 2,048 \r\n BatchNorm1d-3 [[1, 512, 98]] [1, 512, 98] 0 \r\n TDNNBlock-2 [[1, 512, 98]] [1, 512, 98] 0 \r\n Conv1D-6 [[1, 64, 102]] [1, 64, 98] 12,352 \r\n Conv1d-5 [[1, 64, 98]] [1, 64, 98] 0 \r\n ReLU-3 [[1, 64, 98]] [1, 64, 98] 0 \r\n BatchNorm1D-6 [[1, 64, 98]] [1, 64, 98] 256 \r\n BatchNorm1d-5 [[1, 64, 98]] [1, 64, 98] 0 \r\n TDNNBlock-3 [[1, 64, 98]] [1, 64, 98] 0 \r\n Conv1D-8 [[1, 64, 102]] [1, 64, 98] 12,352 \r\n Conv1d-7 [[1, 64, 98]] [1, 64, 98] 0 \r\n ReLU-4 [[1, 64, 98]] [1, 64, 98] 0 \r\n BatchNorm1D-8 [[1, 64, 98]] [1, 64, 98] 256 \r\n BatchNorm1d-7 [[1, 64, 98]] [1, 64, 98] 0 \r\n TDNNBlock-4 [[1, 64, 98]] [1, 64, 98] 0 \r\n Conv1D-10 [[1, 64, 102]] [1, 64, 98] 12,352 \r\n Conv1d-9 [[1, 64, 98]] [1, 64, 98] 0 \r\n ReLU-5 [[1, 64, 98]] [1, 64, 98] 0 \r\n BatchNorm1D-10 [[1, 64, 98]] [1, 64, 98] 256 \r\n BatchNorm1d-9 [[1, 64, 98]] [1, 64, 98] 0 \r\n TDNNBlock-5 [[1, 64, 98]] [1, 64, 98] 0 \r\n Conv1D-12 [[1, 64, 102]] [1, 64, 98] 12,352 \r\n Conv1d-11 [[1, 64, 98]] [1, 64, 98] 0 \r\n ReLU-6 [[1, 64, 98]] [1, 64, 98] 0 \r\n BatchNorm1D-12 [[1, 64, 98]] [1, 64, 98] 256 \r\n BatchNorm1d-11 [[1, 64, 98]] [1, 64, 98] 0 \r\n TDNNBlock-6 [[1, 64, 98]] [1, 64, 98] 0 \r\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 \r\n BatchNorm1d-59 [[1, 128, 98]] [1, 128, 98] 0 \r\n TDNNBlock-30 [[1, 4608, 98]] [1, 128, 98] 0 \r\n Tanh-1 [[1, 128, 98]] [1, 128, 98] 0 \r\n Conv1D-74 [[1, 128, 98]] [1, 1536, 98] 198,144 \r\n Conv1d-73 [[1, 128, 98]] [1, 1536, 98] 0 \r\nAttentiveStatisticsPooling-1 [[1, 1536, 98]] [1, 3072, 1] 0 \r\n BatchNorm1D-62 [[1, 3072, 1]] [1, 3072, 1] 12,288 \r\n BatchNorm1d-61 [[1, 3072, 1]] [1, 3072, 1] 0 \r\n Conv1D-76 [[1, 3072, 1]] [1, 192, 1] 590,016 \r\n Conv1d-75 [[1, 3072, 1]] [1, 192, 1] 0 \r\n Linear-1 [[1, 192]] [1, 10] 1,930 \r\n========================================================================================\r\nTotal params: 6,215,306\r\nTrainable params: 6,195,978\r\nNon-trainable params: 19,328\r\n----------------------------------------------------------------------------------------\r\nInput size (MB): 0.03\r\nForward/backward pass size (MB): 35.53\r\nParams size (MB): 23.71\r\nEstimated Total Size (MB): 59.27\r\n----------------------------------------------------------------------------------------\r\n\r\n[2023-08-07 23:02:11.081835 INFO ] trainer:train:317 - \u8bad\u7ec3\u6570\u636e\uff1a8644\r\n[2023-08-07 23:02:15.428326 INFO ] trainer:__train_epoch:269 - Train epoch: [1/60], batch: [0/136], loss: 2.99582, accuracy: 0.04688, learning rate: 0.00000000, speed: 14.72 data/sec, eta: 9:51:07\r\n```\r\n\r\n\r\n# \u8bc4\u4f30\u6a21\u578b\r\n\r\n\u6267\u884c\u4e0b\u9762\u547d\u4ee4\u6267\u884c\u8bc4\u4f30\u3002\r\n\r\n```shell\r\npython eval.py --configs=configs/bi_lstm.yml\r\n```\r\n\r\n\u8bc4\u4f30\u8f93\u51fa\u5982\u4e0b\uff1a\r\n```shell\r\n[2024-02-03 15:13:25.469242 INFO ] trainer:evaluate:461 - \u6210\u529f\u52a0\u8f7d\u6a21\u578b\uff1amodels/CAMPPlus_Fbank/best_model/model.pth\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 150/150 [00:00<00:00, 1281.96it/s]\r\n\u8bc4\u4f30\u6d88\u8017\u65f6\u95f4\uff1a1s\uff0closs\uff1a0.61840\uff0caccuracy\uff1a0.87333\r\n```\r\n\r\n\u8bc4\u4f30\u4f1a\u51fa\u6765\u8f93\u51fa\u51c6\u786e\u7387\uff0c\u8fd8\u4fdd\u5b58\u4e86\u6df7\u6dc6\u77e9\u9635\u56fe\u7247\uff0c\u4fdd\u5b58\u8def\u5f84`output/images/`\uff0c\u5982\u4e0b\u3002\r\n\r\n<br/>\r\n<div align=\"center\">\r\n<img src=\"docs/images/image1.png\" alt=\"\u6df7\u6dc6\u77e9\u9635\" width=\"600\">\r\n</div>\r\n\r\n\r\n\u6ce8\u610f\uff1a\u5982\u679c\u7c7b\u522b\u6807\u7b7e\u662f\u4e2d\u6587\u7684\uff0c\u9700\u8981\u8bbe\u7f6e\u5b89\u88c5\u5b57\u4f53\u624d\u80fd\u6b63\u5e38\u663e\u793a\uff0c\u4e00\u822c\u60c5\u51b5\u4e0bWindows\u65e0\u9700\u5b89\u88c5\uff0cUbuntu\u9700\u8981\u5b89\u88c5\u3002\u5982\u679cWindows\u786e\u5b9e\u662f\u7f3a\u5c11\u5b57\u4f53\uff0c\u53ea\u9700\u8981[\u5b57\u4f53\u6587\u4ef6](https://github.com/tracyone/program_font)\u8fd9\u91cc\u4e0b\u8f7d`.ttf`\u683c\u5f0f\u7684\u6587\u4ef6\uff0c\u590d\u5236\u5230`C:\\Windows\\Fonts`\u5373\u53ef\u3002Ubuntu\u7cfb\u7edf\u64cd\u4f5c\u5982\u4e0b\u3002\r\n\r\n1. \u5b89\u88c5\u5b57\u4f53\r\n```shell\r\ngit clone https://github.com/tracyone/program_font && cd program_font && ./install.sh\r\n```\r\n\r\n2. \u6267\u884c\u4e0b\u9762Python\u4ee3\u7801\r\n```python\r\nimport matplotlib\r\nimport shutil\r\nimport os\r\n\r\npath = matplotlib.matplotlib_fname()\r\npath = path.replace('matplotlibrc', 'fonts/ttf/')\r\nprint(path)\r\nshutil.copy('/usr/share/fonts/MyFonts/simhei.ttf', path)\r\nuser_dir = os.path.expanduser('~')\r\nshutil.rmtree(f'{user_dir}/.cache/matplotlib', ignore_errors=True)\r\n```\r\n\r\n# \u9884\u6d4b\r\n\r\n\u5728\u8bad\u7ec3\u7ed3\u675f\u4e4b\u540e\uff0c\u6211\u4eec\u5f97\u5230\u4e86\u4e00\u4e2a\u6a21\u578b\u53c2\u6570\u6587\u4ef6\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6a21\u578b\u9884\u6d4b\u97f3\u9891\u3002\r\n\r\n```shell\r\npython infer.py --audio_path=dataset/UrbanSound8K/audio/fold5/156634-5-2-5.wav\r\n```\r\n\r\n# \u5176\u4ed6\u529f\u80fd\r\n\r\n - \u4e3a\u4e86\u65b9\u4fbf\u8bfb\u53d6\u5f55\u5236\u6570\u636e\u548c\u5236\u4f5c\u6570\u636e\u96c6\uff0c\u8fd9\u91cc\u63d0\u4f9b\u4e86\u5f55\u97f3\u7a0b\u5e8f`record_audio.py`\uff0c\u8fd9\u4e2a\u7528\u4e8e\u5f55\u5236\u97f3\u9891\uff0c\u5f55\u5236\u7684\u97f3\u9891\u91c7\u6837\u7387\u4e3a16000\uff0c\u5355\u901a\u9053\uff0c16bit\u3002\r\n\r\n```shell\r\npython record_audio.py\r\n```\r\n\r\n - `infer_record.py`\u8fd9\u4e2a\u7a0b\u5e8f\u662f\u7528\u6765\u4e0d\u65ad\u8fdb\u884c\u5f55\u97f3\u8bc6\u522b\uff0c\u6211\u4eec\u53ef\u4ee5\u5927\u81f4\u7406\u89e3\u4e3a\u8fd9\u4e2a\u7a0b\u5e8f\u5728\u5b9e\u65f6\u5f55\u97f3\u8bc6\u522b\u3002\u901a\u8fc7\u8fd9\u4e2a\u5e94\u8be5\u6211\u4eec\u53ef\u4ee5\u505a\u4e00\u4e9b\u6bd4\u8f83\u6709\u8da3\u7684\u4e8b\u60c5\uff0c\u6bd4\u5982\u628a\u9ea6\u514b\u98ce\u653e\u5728\u5c0f\u9e1f\u7ecf\u5e38\u6765\u7684\u5730\u65b9\uff0c\u901a\u8fc7\u5b9e\u65f6\u5f55\u97f3\u8bc6\u522b\uff0c\u4e00\u65e6\u8bc6\u522b\u5230\u6709\u9e1f\u53eb\u7684\u58f0\u97f3\uff0c\u5982\u679c\u4f60\u7684\u6570\u636e\u96c6\u8db3\u591f\u5f3a\u5927\uff0c\u6709\u6bcf\u79cd\u9e1f\u53eb\u7684\u58f0\u97f3\u6570\u636e\u96c6\uff0c\u8fd9\u6837\u4f60\u8fd8\u80fd\u51c6\u786e\u8bc6\u522b\u662f\u90a3\u79cd\u9e1f\u53eb\u3002\u5982\u679c\u8bc6\u522b\u5230\u76ee\u6807\u9e1f\u7c7b\uff0c\u5c31\u542f\u52a8\u7a0b\u5e8f\uff0c\u4f8b\u5982\u62cd\u7167\u7b49\u7b49\u3002\r\n\r\n```shell\r\npython infer_record.py --record_seconds=3\r\n```\r\n\r\n## \u6253\u8d4f\u4f5c\u8005\r\n\r\n<br/>\r\n<div align=\"center\">\r\n<p>\u6253\u8d4f\u4e00\u5757\u94b1\u652f\u6301\u4e00\u4e0b\u4f5c\u8005</p>\r\n<img src=\"https://yeyupiaoling.cn/reward.png\" alt=\"\u6253\u8d4f\u4f5c\u8005\" width=\"400\">\r\n</div>\r\n\r\n\r\n# \u53c2\u8003\u8d44\u6599\r\n\r\n1. https://github.com/PaddlePaddle/PaddleSpeech\r\n2. https://github.com/yeyupiaoling/PaddlePaddle-MobileFaceNets\r\n3. https://github.com/yeyupiaoling/PPASR\r\n4. https://github.com/alibaba-damo-academy/3D-Speaker\r\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Audio Classification toolkit on PaddlePaddle",
"version": "1.0.3",
"project_urls": {
"Download": "https://github.com/yeyupiaoling/AudioClassification_PaddlePaddle.git",
"Homepage": "https://github.com/yeyupiaoling/AudioClassification_PaddlePaddle"
},
"split_keywords": [
"audio",
" paddle"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d066e7949f373c32b68e6e2c4692d565a1ddadea3ae6f21cba5bbd47e8227a9d",
"md5": "50efb9e743725a5aeba36cbf77e0dfac",
"sha256": "a9abc9d98639857b0aa000a3b9b51e73a8dc1f995090b0f4c83c1317375031fd"
},
"downloads": -1,
"filename": "ppacls-1.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "50efb9e743725a5aeba36cbf77e0dfac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 48309,
"upload_time": "2024-10-24T12:52:16",
"upload_time_iso_8601": "2024-10-24T12:52:16.013022Z",
"url": "https://files.pythonhosted.org/packages/d0/66/e7949f373c32b68e6e2c4692d565a1ddadea3ae6f21cba5bbd47e8227a9d/ppacls-1.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-24 12:52:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yeyupiaoling",
"github_project": "AudioClassification_PaddlePaddle",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.19.2"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.6.3"
]
]
},
{
"name": "librosa",
"specs": [
[
">=",
"0.9.1"
]
]
},
{
"name": "soundfile",
"specs": [
[
">=",
"0.12.1"
]
]
},
{
"name": "soundcard",
"specs": [
[
">=",
"0.4.2"
]
]
},
{
"name": "resampy",
"specs": [
[
">=",
"0.2.2"
]
]
},
{
"name": "numba",
"specs": [
[
">=",
"0.53.0"
]
]
},
{
"name": "pydub",
"specs": [
[
"~=",
"0.25.1"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.5.2"
]
]
},
{
"name": "typeguard",
"specs": [
[
"==",
"2.13.3"
]
]
},
{
"name": "pillow",
"specs": [
[
">=",
"10.0.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.64.1"
]
]
},
{
"name": "visualdl",
"specs": [
[
">=",
"2.2.3"
]
]
},
{
"name": "pyyaml",
"specs": [
[
">=",
"5.4.1"
]
]
},
{
"name": "paddleaudio",
"specs": [
[
">=",
"1.0.1"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.2"
]
]
},
{
"name": "av",
"specs": [
[
">=",
"10.0.0"
]
]
},
{
"name": "loguru",
"specs": [
[
">=",
"0.7.2"
]
]
},
{
"name": "yeaudio",
"specs": [
[
">=",
"0.0.2"
]
]
}
],
"lcname": "ppacls"
}