mser


Namemser JSON
Version 0.0.7 PyPI version JSON
download
home_pagehttps://github.com/yeyupiaoling/SpeechEmotionRecognition-Pytorch
SummarySpeech Emotion Recognition toolkit on Pytorch
upload_time2024-09-02 12:20:18
maintainerNone
docs_urlNone
authoryeyupiaoling
requires_pythonNone
licenseApache License 2.0
keywords audio pytorch emotion
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 基于Pytorch实现的语音情感识别系统

本项目是一个语音情感识别项目,使用多种的预处理方法,使用多种模型,实现了语音情感识别。

**欢迎大家扫码入知识星球或者QQ群讨论,知识星球里面提供项目的模型文件和博主其他相关项目的模型文件,也包括其他一些资源。**

<div align="center">
  <img src="https://yeyupiaoling.cn/zsxq.png" alt="知识星球" width="400">
  <img src="https://yeyupiaoling.cn/qq.png" alt="QQ群" width="400">
</div>


# 使用准备

 - Anaconda 3
 - Python 3.8
 - Pytorch 1.13.1
 - Windows 10 or Ubuntu 18.04

# 模型测试表

|    模型     | Params(M) |     预处理方法     |   数据集   | 类别数量 |   准确率   |   获取模型   |
|:---------:|:---------:|:-------------:|:-------:|:----:|:-------:|:--------:|
|  BiLSTM   |   2.10    |  Emotion2Vec  | RAVDESS |  8   | 0.85333 | 加入知识星球获取 |
|  BiLSTM   |   1.87    | CustomFeature | RAVDESS |  8   | 0.68666 | 加入知识星球获取 |
| BaseModel |   0.19    |  Emotion2Vec  | RAVDESS |  8   | 0.85333 | 加入知识星球获取 |
| BaseModel |   0.08    | CustomFeature | RAVDESS |  8   | 0.68000 | 加入知识星球获取 |
|  BiLSTM   |   2.10    |  Emotion2Vec  |  更大数据集  |  9   | 0.91826 | 加入知识星球获取 |
|  BiLSTM   |   1.87    | CustomFeature |  更大数据集  |  9   | 0.90817 | 加入知识星球获取 |
| BaseModel |   0.19    |  Emotion2Vec  |  更大数据集  |  9   | 0.92870 | 加入知识星球获取 |
| BaseModel |   0.08    | CustomFeature |  更大数据集  |  9   | 0.91026 | 加入知识星球获取 |

说明:
1. RAVDESS数据集只使用`Audio_Speech_Actors_01-24.zip`
2. 更大数据集数据集有近2.5万条数据,做了数据量均衡的,知识星球也提供了该数据集的特征数据。

## 安装环境

 - 首先安装的是Pytorch的GPU版本,如果已经安装过了,请跳过。
```shell
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
```

 - 安装mser库。
 
使用pip安装,命令如下:
```shell
python -m pip install mser -U -i https://pypi.tuna.tsinghua.edu.cn/simple
```

**建议源码安装**,源码安装能保证使用最新代码。
```shell
git clone https://github.com/yeyupiaoling/SpeechEmotionRecognition-Pytorch.git
cd SpeechEmotionRecognition-Pytorch/
pip install .
```

## 快速使用

在使用时只需要设置`--use_ms_model=iic/emotion2vec_plus_base`参数和音频路径即可。

```shell
python infer.py --audio_path=dataset/test.wav --use_ms_model=iic/emotion2vec_plus_base
```

输出如下:

```
[2024-07-02 19:45:36.154355 INFO   ] emotion2vec_predict:__init__:27 - 成功加载模型:models/iic/emotion2vec_plus_base
音频:dataset/test.wav 的预测结果标签为:生气,得分:1.0
```

## 准备数据

生成数据列表,用于下一步的读取需要,项目默认提供一个数据集[RAVDESS](https://zenodo.org/record/1188976/files/Audio_Speech_Actors_01-24.zip?download=1),这个数据集的[介绍页面](https://zenodo.org/record/1188976#.XsAXemgzaUk),这个数据包含中性、平静、快乐、悲伤、愤怒、恐惧、厌恶、惊讶八种情感,本项目只使用里面的`Audio_Speech_Actors_01-24.zip`,数据集,说话的语句只有`Kids are talking by the door`和`Dogs are sitting by the door`,可以说这个训练集是非常简单的。下载这个数据集并解压到`dataset`目录下。

然后执行`create_data.py`里面的`create_ravdess_list('dataset/Audio_Speech_Actors_01-24', 'dataset')`函数即可生成数据列表,同时也生成归一化文件,具体看代码。

```shell
python create_data.py
```

如果自定义数据集,可以按照下面格式,`audio_path`为音频文件路径,用户需要提前把音频数据集存放在`dataset/audio`目录下,每个文件夹存放一个类别的音频数据,每条音频数据长度在3秒左右,如 `dataset/audio/angry/······`。`audio`是数据列表存放的位置,生成的数据类别的格式为 `音频路径\t音频对应的类别标签`,音频路径和标签用制表符 `\t`分开。读者也可以根据自己存放数据的方式修改以下函数。

执行`create_data.py`里面的`get_data_list('dataset/audios', 'dataset')`函数即可生成数据列表,同时也生成归一化文件,具体看代码。
```shell
python create_data.py
```

生成的列表是长这样的,前面是音频的路径,后面是该音频对应的标签,从0开始,路径和标签之间用`\t`隔开。
```shell
dataset/Audio_Speech_Actors_01-24/Actor_13/03-01-01-01-02-01-13.wav	0
dataset/Audio_Speech_Actors_01-24/Actor_01/03-01-02-01-01-01-01.wav	1
dataset/Audio_Speech_Actors_01-24/Actor_01/03-01-03-02-01-01-01.wav	2
```

**注意:** `create_data.py`里面的`create_standard('configs/bi_lstm.yml')`函数必须要执行的,这个是生成归一化的文件。


# 提取特征(可选)

在训练过程中,首先是要读取音频数据,然后提取特征,最后再进行训练。其中读取音频数据、提取特征也是比较消耗时间的,所以我们可以选择提前提取好取特征,训练模型的是就可以直接加载提取好的特征,这样训练速度会更快。这个提取特征是可选择,如果没有提取好的特征,训练模型的时候就会从读取音频数据,然后提取特征开始。提取特征步骤如下:

1. 执行`extract_features.py`,提取特征,特征会保存在`dataset/features`目录下,并生成新的数据列表`train_list_features.txt`和`test_list_features.txt`。

```shell
python extract_features.py --configs=configs/bi_lstm.yml --save_dir=dataset/features
```

2. 修改配置文件,将`dataset_conf.train_list`和`dataset_conf.test_list`修改为`train_list_features.txt`和`test_list_features.txt`。


## 训练

训练有两个方法,第一个是提前提取特征,保持在本地,然后在进行训练,这种方法的好处就是训练特别快,因为本项目的特征提取方法比较慢,如果在训练中要提取特征,那么训练会很慢,缺点是没办法使用随机数据增强。第二种就是在训练过程中提取特征,这种好处是可以使用随机数据增强,缺点是训练比较慢。

 - 提取特征(可选),执行`extract_features.py`程序即可,特征提取完成需要修改`configs/bi_lstm.yml`里面的`train_list`和`test_list`,将它们修改为新生成的数据列表路径。

```shell
python extract_features.py --configs=configs/bi_lstm.yml
```

输出日志:
```
·······
100%████████████████████████████| 1290/1290 [01:39<00:00, 12.99it/s]
[2024-02-03 14:57:00.699338 INFO   ] trainer:get_standard_file:136 - 归一化文件保存在:dataset/standard.m
[2024-02-03 14:57:00.700046 INFO   ] featurizer:__init__:23 - 使用的特征方法为 Emotion2Vec
100%|████████████████████████████| 1290/1290 [01:36<00:00, 13.40it/s]
[2024-02-03 14:58:36.941253 INFO   ] trainer:extract_features:162 - dataset/train_list.txt列表中的数据已提取特征完成,新列表为:dataset/train_list_features.txt
100%|██████████████████████████████| 150/150 [00:11<00:00, 13.52it/s]
[2024-02-03 14:58:48.036661 INFO   ] trainer:extract_features:162 - dataset/test_list.txt列表中的数据已提取特征完成,新列表为:dataset/test_list_features.txt
```

不管是否提前提取特征,接着都可以开始训练模型了,创建 `train.py`。配置文件里面的参数一般不需要修改,但是这几个是需要根据自己实际的数据集进行调整的,首先最重要的就是分类大小`dataset_conf.num_class`,这个每个数据集的分类大小可能不一样,根据自己的实际情况设定。然后是`dataset_conf.batch_size`,如果是显存不够的话,可以减小这个参数。

```shell
# 单卡训练
CUDA_VISIBLE_DEVICES=0 python train.py --configs=configs/bi_lstm.yml
# 多卡训练
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py --configs=configs/bi_lstm.yml
```


训练输出日志:
```
[2024-02-03 15:09:26.166181 INFO   ] utils:print_arguments:14 - ----------- 额外配置参数 -----------
[2024-02-03 15:09:26.166281 INFO   ] utils:print_arguments:16 - configs: configs/bi_lstm.yml
[2024-02-03 15:09:26.166358 INFO   ] utils:print_arguments:16 - local_rank: 0
[2024-02-03 15:09:26.166427 INFO   ] utils:print_arguments:16 - pretrained_model: None
[2024-02-03 15:09:26.166494 INFO   ] utils:print_arguments:16 - resume_model: None
[2024-02-03 15:09:26.166550 INFO   ] utils:print_arguments:16 - save_model_path: models/
[2024-02-03 15:09:26.166613 INFO   ] utils:print_arguments:16 - use_gpu: True
[2024-02-03 15:09:26.166676 INFO   ] utils:print_arguments:17 - ------------------------------------------------
[2024-02-03 15:09:26.176508 INFO   ] utils:print_arguments:19 - ----------- 配置文件参数 -----------
[2024-02-03 15:09:26.176604 INFO   ] utils:print_arguments:22 - dataset_conf:
[2024-02-03 15:09:26.176673 INFO   ] utils:print_arguments:25 -         aug_conf:
[2024-02-03 15:09:26.176736 INFO   ] utils:print_arguments:27 -                 noise_aug_prob: 0.2
[2024-02-03 15:09:26.176792 INFO   ] utils:print_arguments:27 -                 noise_dir: dataset/noise
[2024-02-03 15:09:26.176861 INFO   ] utils:print_arguments:27 -                 speed_perturb: True
[2024-02-03 15:09:26.176914 INFO   ] utils:print_arguments:27 -                 volume_aug_prob: 0.2
[2024-02-03 15:09:26.176966 INFO   ] utils:print_arguments:27 -                 volume_perturb: False
[2024-02-03 15:09:26.177017 INFO   ] utils:print_arguments:25 -         dataLoader:
[2024-02-03 15:09:26.177070 INFO   ] utils:print_arguments:27 -                 batch_size: 32
[2024-02-03 15:09:26.177151 INFO   ] utils:print_arguments:27 -                 num_workers: 4
[2024-02-03 15:09:26.177224 INFO   ] utils:print_arguments:29 -         do_vad: False
[2024-02-03 15:09:26.177275 INFO   ] utils:print_arguments:25 -         eval_conf:
[2024-02-03 15:09:26.177328 INFO   ] utils:print_arguments:27 -                 batch_size: 1
[2024-02-03 15:09:26.177387 INFO   ] utils:print_arguments:27 -                 max_duration: 3
[2024-02-03 15:09:26.177438 INFO   ] utils:print_arguments:29 -         label_list_path: dataset/label_list.txt
[2024-02-03 15:09:26.177489 INFO   ] utils:print_arguments:29 -         max_duration: 3
[2024-02-03 15:09:26.177542 INFO   ] utils:print_arguments:29 -         min_duration: 0.5
[2024-02-03 15:09:26.177593 INFO   ] utils:print_arguments:29 -         sample_rate: 16000
[2024-02-03 15:09:26.177647 INFO   ] utils:print_arguments:29 -         scaler_path: dataset/standard.m
[2024-02-03 15:09:26.177699 INFO   ] utils:print_arguments:29 -         target_dB: -20
[2024-02-03 15:09:26.177749 INFO   ] utils:print_arguments:29 -         test_list: dataset/test_list.txt
[2024-02-03 15:09:26.177800 INFO   ] utils:print_arguments:29 -         train_list: dataset/train_list.txt
[2024-02-03 15:09:26.177851 INFO   ] utils:print_arguments:29 -         use_dB_normalization: False
[2024-02-03 15:09:26.177905 INFO   ] utils:print_arguments:22 - model_conf:
[2024-02-03 15:09:26.177959 INFO   ] utils:print_arguments:29 -         num_class: None
[2024-02-03 15:09:26.178011 INFO   ] utils:print_arguments:22 - optimizer_conf:
[2024-02-03 15:09:26.178066 INFO   ] utils:print_arguments:29 -         learning_rate: 0.001
[2024-02-03 15:09:26.178118 INFO   ] utils:print_arguments:29 -         optimizer: Adam
[2024-02-03 15:09:26.178173 INFO   ] utils:print_arguments:29 -         scheduler: WarmupCosineSchedulerLR
[2024-02-03 15:09:26.178224 INFO   ] utils:print_arguments:25 -         scheduler_args:
[2024-02-03 15:09:26.178277 INFO   ] utils:print_arguments:27 -                 max_lr: 0.001
[2024-02-03 15:09:26.178330 INFO   ] utils:print_arguments:27 -                 min_lr: 1e-05
[2024-02-03 15:09:26.178381 INFO   ] utils:print_arguments:27 -                 warmup_epoch: 5
[2024-02-03 15:09:26.178434 INFO   ] utils:print_arguments:29 -         weight_decay: 1e-06
[2024-02-03 15:09:26.178485 INFO   ] utils:print_arguments:22 - preprocess_conf:
[2024-02-03 15:09:26.178537 INFO   ] utils:print_arguments:29 -         feature_method: Emotion2Vec
[2024-02-03 15:09:26.178588 INFO   ] utils:print_arguments:25 -         method_args:
[2024-02-03 15:09:26.178644 INFO   ] utils:print_arguments:27 -                 granularity: utterance
[2024-02-03 15:09:26.178695 INFO   ] utils:print_arguments:22 - train_conf:
[2024-02-03 15:09:26.178748 INFO   ] utils:print_arguments:29 -         enable_amp: False
[2024-02-03 15:09:26.178800 INFO   ] utils:print_arguments:29 -         log_interval: 10
[2024-02-03 15:09:26.178852 INFO   ] utils:print_arguments:29 -         loss_weight: None
[2024-02-03 15:09:26.178906 INFO   ] utils:print_arguments:29 -         max_epoch: 60
[2024-02-03 15:09:26.178957 INFO   ] utils:print_arguments:29 -         use_compile: False
[2024-02-03 15:09:26.179008 INFO   ] utils:print_arguments:31 - use_model: BiLSTM
[2024-02-03 15:09:26.179059 INFO   ] utils:print_arguments:32 - ------------------------------------------------
[2024-02-03 15:09:26.179184 WARNING] trainer:__init__:69 - Emotion2Vec特征提取方法不支持多线程,已自动使用单线程提取特征!
[2024-02-03 15:09:26.198994 INFO   ] featurizer:__init__:23 - 使用的特征方法为 Emotion2Vec
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
BiLSTM                                   [1, 8]                    --
├─Linear: 1-1                            [1, 512]                  393,728
├─LSTM: 1-2                              [1, 1, 512]               1,576,960
├─Tanh: 1-3                              [1, 512]                  --
├─Dropout: 1-4                           [1, 512]                  --
├─Linear: 1-5                            [1, 256]                  131,328
├─ReLU: 1-6                              [1, 256]                  --
├─Linear: 1-7                            [1, 8]                    2,056
==========================================================================================
Total params: 2,104,072
Trainable params: 2,104,072
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 2.10
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.01
Params size (MB): 8.42
Estimated Total Size (MB): 8.43
==========================================================================================
[2024-02-05 15:09:31.551738 INFO   ] trainer:train:378 - 训练数据:4407
[2024-02-05 15:09:32.951738 INFO   ] trainer:__train_epoch:362 - Train epoch: [1/60], batch: [0/41], loss: 2.07688, accuracy: 0.15625, learning rate: 0.00001000, speed: 5.35 data/sec, eta: 4:05:18
[2024-02-05 15:09:56.525906 INFO   ] trainer:__train_epoch:362 - Train epoch: [1/60], batch: [10/41], loss: 2.05963, accuracy: 0.22187, learning rate: 0.00005829, speed: 13.57 data/sec, eta: 1:36:15
····················
```

# 评估

执行下面命令执行评估。

```shell
python eval.py --configs=configs/bi_lstm.yml
```

评估输出如下:
```shell
[2024-02-03 15:13:25.469242 INFO   ] trainer:evaluate:461 - 成功加载模型:models/BiLSTM_Emotion2Vec/best_model/model.pth
100%|██████████████████████████████| 150/150 [00:00<00:00, 1281.96it/s]
评估消耗时间:1s,loss:0.61840,accuracy:0.87333
```

评估会出来输出准确率,还保存了混淆矩阵图片,保存路径`output/images/`,如下。
<br/>
<div align="center">
<img src="docs/images/image1.png" alt="混淆矩阵" width="600">
</div>


注意:如果类别标签是中文的,需要设置安装字体才能正常显示,一般情况下Windows无需安装,Ubuntu需要安装。如果Windows确实是确实字体,只需要[字体文件](https://github.com/tracyone/program_font)这里下载`.ttf`格式的文件,复制到`C:\Windows\Fonts`即可。Ubuntu系统操作如下。

1. 安装字体
```shell
git clone https://github.com/tracyone/program_font && cd program_font && ./install.sh
```

2. 执行下面Python代码
```python
import matplotlib
import shutil
import os

path = matplotlib.matplotlib_fname()
path = path.replace('matplotlibrc', 'fonts/ttf/')
print(path)
shutil.copy('/usr/share/fonts/MyFonts/simhei.ttf', path)
user_dir = os.path.expanduser('~')
shutil.rmtree(f'{user_dir}/.cache/matplotlib', ignore_errors=True)
```


# 预测

在训练结束之后,我们得到了一个模型参数文件,我们使用这个模型预测音频。

```shell
python infer.py --audio_path=dataset/test.wav
```

输出如下:
```
成功加载模型参数:models/BiLSTM_Emotion2Vec/best_model/model.pth
[2024-07-02 19:48:42.864262 INFO   ] emotion2vec_predict:__init__:27 - 成功加载模型:models/iic/emotion2vec_base
音频:dataset/test.wav 的预测结果标签为:angry,得分:0.99995
```

# 使用Emotion2vec模型预测

项目已经提供了Emotion2vec模型,使用ModelScope公开的Emotion2vec模型预测音频。在使用时只需要设置`--use_ms_model`参数即可,不需要额外配置文件和指定模型路径,首次使用时会自动下载模型文件。支持`iic/emotion2vec_plus_seed`、`iic/emotion2vec_plus_base`、`iic/emotion2vec_plus_large`三个模型。

```shell
python infer.py --audio_path=dataset/test.wav --use_ms_model=iic/emotion2vec_plus_base
```

输出如下:
```
[2024-07-02 19:45:36.154355 INFO   ] emotion2vec_predict:__init__:27 - 成功加载模型:models/iic/emotion2vec_plus_base
音频:dataset/test.wav 的预测结果标签为:生气,得分:1.0
```

## 打赏作者
<br/>
<div align="center">
<p>打赏一块钱支持一下作者</p>
<img src="https://yeyupiaoling.cn/reward.png" alt="打赏作者" width="400">
</div>

# 参考资料

1. https://github.com/yeyupiaoling/AudioClassification-Pytorch
2. https://github.com/alibaba-damo-academy/FunASR

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yeyupiaoling/SpeechEmotionRecognition-Pytorch",
    "name": "mser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "audio, pytorch, emotion",
    "author": "yeyupiaoling",
    "author_email": null,
    "download_url": "https://github.com/yeyupiaoling/SpeechEmotionRecognition-Pytorch.git",
    "platform": null,
    "description": "# \u57fa\u4e8ePytorch\u5b9e\u73b0\u7684\u8bed\u97f3\u60c5\u611f\u8bc6\u522b\u7cfb\u7edf\r\n\r\n\u672c\u9879\u76ee\u662f\u4e00\u4e2a\u8bed\u97f3\u60c5\u611f\u8bc6\u522b\u9879\u76ee\uff0c\u4f7f\u7528\u591a\u79cd\u7684\u9884\u5904\u7406\u65b9\u6cd5\uff0c\u4f7f\u7528\u591a\u79cd\u6a21\u578b\uff0c\u5b9e\u73b0\u4e86\u8bed\u97f3\u60c5\u611f\u8bc6\u522b\u3002\r\n\r\n**\u6b22\u8fce\u5927\u5bb6\u626b\u7801\u5165\u77e5\u8bc6\u661f\u7403\u6216\u8005QQ\u7fa4\u8ba8\u8bba\uff0c\u77e5\u8bc6\u661f\u7403\u91cc\u9762\u63d0\u4f9b\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\u548c\u535a\u4e3b\u5176\u4ed6\u76f8\u5173\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\uff0c\u4e5f\u5305\u62ec\u5176\u4ed6\u4e00\u4e9b\u8d44\u6e90\u3002**\r\n\r\n<div align=\"center\">\r\n  <img src=\"https://yeyupiaoling.cn/zsxq.png\" alt=\"\u77e5\u8bc6\u661f\u7403\" width=\"400\">\r\n  <img src=\"https://yeyupiaoling.cn/qq.png\" alt=\"QQ\u7fa4\" width=\"400\">\r\n</div>\r\n\r\n\r\n# \u4f7f\u7528\u51c6\u5907\r\n\r\n - Anaconda 3\r\n - Python 3.8\r\n - Pytorch 1.13.1\r\n - Windows 10 or Ubuntu 18.04\r\n\r\n# \u6a21\u578b\u6d4b\u8bd5\u8868\r\n\r\n|    \u6a21\u578b     | Params(M) |     \u9884\u5904\u7406\u65b9\u6cd5     |   \u6570\u636e\u96c6   | \u7c7b\u522b\u6570\u91cf |   \u51c6\u786e\u7387   |   \u83b7\u53d6\u6a21\u578b   |\r\n|:---------:|:---------:|:-------------:|:-------:|:----:|:-------:|:--------:|\r\n|  BiLSTM   |   2.10    |  Emotion2Vec  | RAVDESS |  8   | 0.85333 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  BiLSTM   |   1.87    | CustomFeature | RAVDESS |  8   | 0.68666 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| BaseModel |   0.19    |  Emotion2Vec  | RAVDESS |  8   | 0.85333 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| BaseModel |   0.08    | CustomFeature | RAVDESS |  8   | 0.68000 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  BiLSTM   |   2.10    |  Emotion2Vec  |  \u66f4\u5927\u6570\u636e\u96c6  |  9   | 0.91826 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n|  BiLSTM   |   1.87    | CustomFeature |  \u66f4\u5927\u6570\u636e\u96c6  |  9   | 0.90817 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| BaseModel |   0.19    |  Emotion2Vec  |  \u66f4\u5927\u6570\u636e\u96c6  |  9   | 0.92870 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n| BaseModel |   0.08    | CustomFeature |  \u66f4\u5927\u6570\u636e\u96c6  |  9   | 0.91026 | \u52a0\u5165\u77e5\u8bc6\u661f\u7403\u83b7\u53d6 |\r\n\r\n\u8bf4\u660e\uff1a\r\n1. RAVDESS\u6570\u636e\u96c6\u53ea\u4f7f\u7528`Audio_Speech_Actors_01-24.zip`\r\n2. \u66f4\u5927\u6570\u636e\u96c6\u6570\u636e\u96c6\u6709\u8fd12.5\u4e07\u6761\u6570\u636e\uff0c\u505a\u4e86\u6570\u636e\u91cf\u5747\u8861\u7684\uff0c\u77e5\u8bc6\u661f\u7403\u4e5f\u63d0\u4f9b\u4e86\u8be5\u6570\u636e\u96c6\u7684\u7279\u5f81\u6570\u636e\u3002\r\n\r\n## \u5b89\u88c5\u73af\u5883\r\n\r\n - \u9996\u5148\u5b89\u88c5\u7684\u662fPytorch\u7684GPU\u7248\u672c\uff0c\u5982\u679c\u5df2\u7ecf\u5b89\u88c5\u8fc7\u4e86\uff0c\u8bf7\u8df3\u8fc7\u3002\r\n```shell\r\nconda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia\r\n```\r\n\r\n - \u5b89\u88c5mser\u5e93\u3002\r\n \r\n\u4f7f\u7528pip\u5b89\u88c5\uff0c\u547d\u4ee4\u5982\u4e0b\uff1a\r\n```shell\r\npython -m pip install mser -U -i https://pypi.tuna.tsinghua.edu.cn/simple\r\n```\r\n\r\n**\u5efa\u8bae\u6e90\u7801\u5b89\u88c5**\uff0c\u6e90\u7801\u5b89\u88c5\u80fd\u4fdd\u8bc1\u4f7f\u7528\u6700\u65b0\u4ee3\u7801\u3002\r\n```shell\r\ngit clone https://github.com/yeyupiaoling/SpeechEmotionRecognition-Pytorch.git\r\ncd SpeechEmotionRecognition-Pytorch/\r\npip install .\r\n```\r\n\r\n## \u5feb\u901f\u4f7f\u7528\r\n\r\n\u5728\u4f7f\u7528\u65f6\u53ea\u9700\u8981\u8bbe\u7f6e`--use_ms_model=iic/emotion2vec_plus_base`\u53c2\u6570\u548c\u97f3\u9891\u8def\u5f84\u5373\u53ef\u3002\r\n\r\n```shell\r\npython infer.py --audio_path=dataset/test.wav --use_ms_model=iic/emotion2vec_plus_base\r\n```\r\n\r\n\u8f93\u51fa\u5982\u4e0b\uff1a\r\n\r\n```\r\n[2024-07-02 19:45:36.154355 INFO   ] emotion2vec_predict:__init__:27 - \u6210\u529f\u52a0\u8f7d\u6a21\u578b\uff1amodels/iic/emotion2vec_plus_base\r\n\u97f3\u9891\uff1adataset/test.wav \u7684\u9884\u6d4b\u7ed3\u679c\u6807\u7b7e\u4e3a\uff1a\u751f\u6c14\uff0c\u5f97\u5206\uff1a1.0\r\n```\r\n\r\n## \u51c6\u5907\u6570\u636e\r\n\r\n\u751f\u6210\u6570\u636e\u5217\u8868\uff0c\u7528\u4e8e\u4e0b\u4e00\u6b65\u7684\u8bfb\u53d6\u9700\u8981\uff0c\u9879\u76ee\u9ed8\u8ba4\u63d0\u4f9b\u4e00\u4e2a\u6570\u636e\u96c6[RAVDESS](https://zenodo.org/record/1188976/files/Audio_Speech_Actors_01-24.zip?download=1)\uff0c\u8fd9\u4e2a\u6570\u636e\u96c6\u7684[\u4ecb\u7ecd\u9875\u9762](https://zenodo.org/record/1188976#.XsAXemgzaUk)\uff0c\u8fd9\u4e2a\u6570\u636e\u5305\u542b\u4e2d\u6027\u3001\u5e73\u9759\u3001\u5feb\u4e50\u3001\u60b2\u4f24\u3001\u6124\u6012\u3001\u6050\u60e7\u3001\u538c\u6076\u3001\u60ca\u8bb6\u516b\u79cd\u60c5\u611f\uff0c\u672c\u9879\u76ee\u53ea\u4f7f\u7528\u91cc\u9762\u7684`Audio_Speech_Actors_01-24.zip`\uff0c\u6570\u636e\u96c6\uff0c\u8bf4\u8bdd\u7684\u8bed\u53e5\u53ea\u6709`Kids are talking by the door`\u548c`Dogs are sitting by the door`\uff0c\u53ef\u4ee5\u8bf4\u8fd9\u4e2a\u8bad\u7ec3\u96c6\u662f\u975e\u5e38\u7b80\u5355\u7684\u3002\u4e0b\u8f7d\u8fd9\u4e2a\u6570\u636e\u96c6\u5e76\u89e3\u538b\u5230`dataset`\u76ee\u5f55\u4e0b\u3002\r\n\r\n\u7136\u540e\u6267\u884c`create_data.py`\u91cc\u9762\u7684`create_ravdess_list('dataset/Audio_Speech_Actors_01-24', 'dataset')`\u51fd\u6570\u5373\u53ef\u751f\u6210\u6570\u636e\u5217\u8868\uff0c\u540c\u65f6\u4e5f\u751f\u6210\u5f52\u4e00\u5316\u6587\u4ef6\uff0c\u5177\u4f53\u770b\u4ee3\u7801\u3002\r\n\r\n```shell\r\npython create_data.py\r\n```\r\n\r\n\u5982\u679c\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\uff0c\u53ef\u4ee5\u6309\u7167\u4e0b\u9762\u683c\u5f0f\uff0c`audio_path`\u4e3a\u97f3\u9891\u6587\u4ef6\u8def\u5f84\uff0c\u7528\u6237\u9700\u8981\u63d0\u524d\u628a\u97f3\u9891\u6570\u636e\u96c6\u5b58\u653e\u5728`dataset/audio`\u76ee\u5f55\u4e0b\uff0c\u6bcf\u4e2a\u6587\u4ef6\u5939\u5b58\u653e\u4e00\u4e2a\u7c7b\u522b\u7684\u97f3\u9891\u6570\u636e\uff0c\u6bcf\u6761\u97f3\u9891\u6570\u636e\u957f\u5ea6\u57283\u79d2\u5de6\u53f3\uff0c\u5982 `dataset/audio/angry/\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7`\u3002`audio`\u662f\u6570\u636e\u5217\u8868\u5b58\u653e\u7684\u4f4d\u7f6e\uff0c\u751f\u6210\u7684\u6570\u636e\u7c7b\u522b\u7684\u683c\u5f0f\u4e3a `\u97f3\u9891\u8def\u5f84\\t\u97f3\u9891\u5bf9\u5e94\u7684\u7c7b\u522b\u6807\u7b7e`\uff0c\u97f3\u9891\u8def\u5f84\u548c\u6807\u7b7e\u7528\u5236\u8868\u7b26 `\\t`\u5206\u5f00\u3002\u8bfb\u8005\u4e5f\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u5b58\u653e\u6570\u636e\u7684\u65b9\u5f0f\u4fee\u6539\u4ee5\u4e0b\u51fd\u6570\u3002\r\n\r\n\u6267\u884c`create_data.py`\u91cc\u9762\u7684`get_data_list('dataset/audios', 'dataset')`\u51fd\u6570\u5373\u53ef\u751f\u6210\u6570\u636e\u5217\u8868\uff0c\u540c\u65f6\u4e5f\u751f\u6210\u5f52\u4e00\u5316\u6587\u4ef6\uff0c\u5177\u4f53\u770b\u4ee3\u7801\u3002\r\n```shell\r\npython create_data.py\r\n```\r\n\r\n\u751f\u6210\u7684\u5217\u8868\u662f\u957f\u8fd9\u6837\u7684\uff0c\u524d\u9762\u662f\u97f3\u9891\u7684\u8def\u5f84\uff0c\u540e\u9762\u662f\u8be5\u97f3\u9891\u5bf9\u5e94\u7684\u6807\u7b7e\uff0c\u4ece0\u5f00\u59cb\uff0c\u8def\u5f84\u548c\u6807\u7b7e\u4e4b\u95f4\u7528`\\t`\u9694\u5f00\u3002\r\n```shell\r\ndataset/Audio_Speech_Actors_01-24/Actor_13/03-01-01-01-02-01-13.wav\t0\r\ndataset/Audio_Speech_Actors_01-24/Actor_01/03-01-02-01-01-01-01.wav\t1\r\ndataset/Audio_Speech_Actors_01-24/Actor_01/03-01-03-02-01-01-01.wav\t2\r\n```\r\n\r\n**\u6ce8\u610f\uff1a** `create_data.py`\u91cc\u9762\u7684`create_standard('configs/bi_lstm.yml')`\u51fd\u6570\u5fc5\u987b\u8981\u6267\u884c\u7684\uff0c\u8fd9\u4e2a\u662f\u751f\u6210\u5f52\u4e00\u5316\u7684\u6587\u4ef6\u3002\r\n\r\n\r\n# \u63d0\u53d6\u7279\u5f81\uff08\u53ef\u9009\uff09\r\n\r\n\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u9996\u5148\u662f\u8981\u8bfb\u53d6\u97f3\u9891\u6570\u636e\uff0c\u7136\u540e\u63d0\u53d6\u7279\u5f81\uff0c\u6700\u540e\u518d\u8fdb\u884c\u8bad\u7ec3\u3002\u5176\u4e2d\u8bfb\u53d6\u97f3\u9891\u6570\u636e\u3001\u63d0\u53d6\u7279\u5f81\u4e5f\u662f\u6bd4\u8f83\u6d88\u8017\u65f6\u95f4\u7684\uff0c\u6240\u4ee5\u6211\u4eec\u53ef\u4ee5\u9009\u62e9\u63d0\u524d\u63d0\u53d6\u597d\u53d6\u7279\u5f81\uff0c\u8bad\u7ec3\u6a21\u578b\u7684\u662f\u5c31\u53ef\u4ee5\u76f4\u63a5\u52a0\u8f7d\u63d0\u53d6\u597d\u7684\u7279\u5f81\uff0c\u8fd9\u6837\u8bad\u7ec3\u901f\u5ea6\u4f1a\u66f4\u5feb\u3002\u8fd9\u4e2a\u63d0\u53d6\u7279\u5f81\u662f\u53ef\u9009\u62e9\uff0c\u5982\u679c\u6ca1\u6709\u63d0\u53d6\u597d\u7684\u7279\u5f81\uff0c\u8bad\u7ec3\u6a21\u578b\u7684\u65f6\u5019\u5c31\u4f1a\u4ece\u8bfb\u53d6\u97f3\u9891\u6570\u636e\uff0c\u7136\u540e\u63d0\u53d6\u7279\u5f81\u5f00\u59cb\u3002\u63d0\u53d6\u7279\u5f81\u6b65\u9aa4\u5982\u4e0b\uff1a\r\n\r\n1. \u6267\u884c`extract_features.py`\uff0c\u63d0\u53d6\u7279\u5f81\uff0c\u7279\u5f81\u4f1a\u4fdd\u5b58\u5728`dataset/features`\u76ee\u5f55\u4e0b\uff0c\u5e76\u751f\u6210\u65b0\u7684\u6570\u636e\u5217\u8868`train_list_features.txt`\u548c`test_list_features.txt`\u3002\r\n\r\n```shell\r\npython extract_features.py --configs=configs/bi_lstm.yml --save_dir=dataset/features\r\n```\r\n\r\n2. \u4fee\u6539\u914d\u7f6e\u6587\u4ef6\uff0c\u5c06`dataset_conf.train_list`\u548c`dataset_conf.test_list`\u4fee\u6539\u4e3a`train_list_features.txt`\u548c`test_list_features.txt`\u3002\r\n\r\n\r\n## \u8bad\u7ec3\r\n\r\n\u8bad\u7ec3\u6709\u4e24\u4e2a\u65b9\u6cd5\uff0c\u7b2c\u4e00\u4e2a\u662f\u63d0\u524d\u63d0\u53d6\u7279\u5f81\uff0c\u4fdd\u6301\u5728\u672c\u5730\uff0c\u7136\u540e\u5728\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u597d\u5904\u5c31\u662f\u8bad\u7ec3\u7279\u522b\u5feb\uff0c\u56e0\u4e3a\u672c\u9879\u76ee\u7684\u7279\u5f81\u63d0\u53d6\u65b9\u6cd5\u6bd4\u8f83\u6162\uff0c\u5982\u679c\u5728\u8bad\u7ec3\u4e2d\u8981\u63d0\u53d6\u7279\u5f81\uff0c\u90a3\u4e48\u8bad\u7ec3\u4f1a\u5f88\u6162\uff0c\u7f3a\u70b9\u662f\u6ca1\u529e\u6cd5\u4f7f\u7528\u968f\u673a\u6570\u636e\u589e\u5f3a\u3002\u7b2c\u4e8c\u79cd\u5c31\u662f\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u63d0\u53d6\u7279\u5f81\uff0c\u8fd9\u79cd\u597d\u5904\u662f\u53ef\u4ee5\u4f7f\u7528\u968f\u673a\u6570\u636e\u589e\u5f3a\uff0c\u7f3a\u70b9\u662f\u8bad\u7ec3\u6bd4\u8f83\u6162\u3002\r\n\r\n - \u63d0\u53d6\u7279\u5f81\uff08\u53ef\u9009\uff09\uff0c\u6267\u884c`extract_features.py`\u7a0b\u5e8f\u5373\u53ef\uff0c\u7279\u5f81\u63d0\u53d6\u5b8c\u6210\u9700\u8981\u4fee\u6539`configs/bi_lstm.yml`\u91cc\u9762\u7684`train_list`\u548c`test_list`\uff0c\u5c06\u5b83\u4eec\u4fee\u6539\u4e3a\u65b0\u751f\u6210\u7684\u6570\u636e\u5217\u8868\u8def\u5f84\u3002\r\n\r\n```shell\r\npython extract_features.py --configs=configs/bi_lstm.yml\r\n```\r\n\r\n\u8f93\u51fa\u65e5\u5fd7\uff1a\r\n```\r\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\r\n100%\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1290/1290 [01:39<00:00, 12.99it/s]\r\n[2024-02-03 14:57:00.699338 INFO   ] trainer:get_standard_file:136 - \u5f52\u4e00\u5316\u6587\u4ef6\u4fdd\u5b58\u5728\uff1adataset/standard.m\r\n[2024-02-03 14:57:00.700046 INFO   ] featurizer:__init__:23 - \u4f7f\u7528\u7684\u7279\u5f81\u65b9\u6cd5\u4e3a Emotion2Vec\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1290/1290 [01:36<00:00, 13.40it/s]\r\n[2024-02-03 14:58:36.941253 INFO   ] trainer:extract_features:162 - dataset/train_list.txt\u5217\u8868\u4e2d\u7684\u6570\u636e\u5df2\u63d0\u53d6\u7279\u5f81\u5b8c\u6210\uff0c\u65b0\u5217\u8868\u4e3a\uff1adataset/train_list_features.txt\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 150/150 [00:11<00:00, 13.52it/s]\r\n[2024-02-03 14:58:48.036661 INFO   ] trainer:extract_features:162 - dataset/test_list.txt\u5217\u8868\u4e2d\u7684\u6570\u636e\u5df2\u63d0\u53d6\u7279\u5f81\u5b8c\u6210\uff0c\u65b0\u5217\u8868\u4e3a\uff1adataset/test_list_features.txt\r\n```\r\n\r\n\u4e0d\u7ba1\u662f\u5426\u63d0\u524d\u63d0\u53d6\u7279\u5f81\uff0c\u63a5\u7740\u90fd\u53ef\u4ee5\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u4e86\uff0c\u521b\u5efa `train.py`\u3002\u914d\u7f6e\u6587\u4ef6\u91cc\u9762\u7684\u53c2\u6570\u4e00\u822c\u4e0d\u9700\u8981\u4fee\u6539\uff0c\u4f46\u662f\u8fd9\u51e0\u4e2a\u662f\u9700\u8981\u6839\u636e\u81ea\u5df1\u5b9e\u9645\u7684\u6570\u636e\u96c6\u8fdb\u884c\u8c03\u6574\u7684\uff0c\u9996\u5148\u6700\u91cd\u8981\u7684\u5c31\u662f\u5206\u7c7b\u5927\u5c0f`dataset_conf.num_class`\uff0c\u8fd9\u4e2a\u6bcf\u4e2a\u6570\u636e\u96c6\u7684\u5206\u7c7b\u5927\u5c0f\u53ef\u80fd\u4e0d\u4e00\u6837\uff0c\u6839\u636e\u81ea\u5df1\u7684\u5b9e\u9645\u60c5\u51b5\u8bbe\u5b9a\u3002\u7136\u540e\u662f`dataset_conf.batch_size`\uff0c\u5982\u679c\u662f\u663e\u5b58\u4e0d\u591f\u7684\u8bdd\uff0c\u53ef\u4ee5\u51cf\u5c0f\u8fd9\u4e2a\u53c2\u6570\u3002\r\n\r\n```shell\r\n# \u5355\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0 python train.py --configs=configs/bi_lstm.yml\r\n# \u591a\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py --configs=configs/bi_lstm.yml\r\n```\r\n\r\n\r\n\u8bad\u7ec3\u8f93\u51fa\u65e5\u5fd7\uff1a\r\n```\r\n[2024-02-03 15:09:26.166181 INFO   ] utils:print_arguments:14 - ----------- \u989d\u5916\u914d\u7f6e\u53c2\u6570 -----------\r\n[2024-02-03 15:09:26.166281 INFO   ] utils:print_arguments:16 - configs: configs/bi_lstm.yml\r\n[2024-02-03 15:09:26.166358 INFO   ] utils:print_arguments:16 - local_rank: 0\r\n[2024-02-03 15:09:26.166427 INFO   ] utils:print_arguments:16 - pretrained_model: None\r\n[2024-02-03 15:09:26.166494 INFO   ] utils:print_arguments:16 - resume_model: None\r\n[2024-02-03 15:09:26.166550 INFO   ] utils:print_arguments:16 - save_model_path: models/\r\n[2024-02-03 15:09:26.166613 INFO   ] utils:print_arguments:16 - use_gpu: True\r\n[2024-02-03 15:09:26.166676 INFO   ] utils:print_arguments:17 - ------------------------------------------------\r\n[2024-02-03 15:09:26.176508 INFO   ] utils:print_arguments:19 - ----------- \u914d\u7f6e\u6587\u4ef6\u53c2\u6570 -----------\r\n[2024-02-03 15:09:26.176604 INFO   ] utils:print_arguments:22 - dataset_conf:\r\n[2024-02-03 15:09:26.176673 INFO   ] utils:print_arguments:25 -         aug_conf:\r\n[2024-02-03 15:09:26.176736 INFO   ] utils:print_arguments:27 -                 noise_aug_prob: 0.2\r\n[2024-02-03 15:09:26.176792 INFO   ] utils:print_arguments:27 -                 noise_dir: dataset/noise\r\n[2024-02-03 15:09:26.176861 INFO   ] utils:print_arguments:27 -                 speed_perturb: True\r\n[2024-02-03 15:09:26.176914 INFO   ] utils:print_arguments:27 -                 volume_aug_prob: 0.2\r\n[2024-02-03 15:09:26.176966 INFO   ] utils:print_arguments:27 -                 volume_perturb: False\r\n[2024-02-03 15:09:26.177017 INFO   ] utils:print_arguments:25 -         dataLoader:\r\n[2024-02-03 15:09:26.177070 INFO   ] utils:print_arguments:27 -                 batch_size: 32\r\n[2024-02-03 15:09:26.177151 INFO   ] utils:print_arguments:27 -                 num_workers: 4\r\n[2024-02-03 15:09:26.177224 INFO   ] utils:print_arguments:29 -         do_vad: False\r\n[2024-02-03 15:09:26.177275 INFO   ] utils:print_arguments:25 -         eval_conf:\r\n[2024-02-03 15:09:26.177328 INFO   ] utils:print_arguments:27 -                 batch_size: 1\r\n[2024-02-03 15:09:26.177387 INFO   ] utils:print_arguments:27 -                 max_duration: 3\r\n[2024-02-03 15:09:26.177438 INFO   ] utils:print_arguments:29 -         label_list_path: dataset/label_list.txt\r\n[2024-02-03 15:09:26.177489 INFO   ] utils:print_arguments:29 -         max_duration: 3\r\n[2024-02-03 15:09:26.177542 INFO   ] utils:print_arguments:29 -         min_duration: 0.5\r\n[2024-02-03 15:09:26.177593 INFO   ] utils:print_arguments:29 -         sample_rate: 16000\r\n[2024-02-03 15:09:26.177647 INFO   ] utils:print_arguments:29 -         scaler_path: dataset/standard.m\r\n[2024-02-03 15:09:26.177699 INFO   ] utils:print_arguments:29 -         target_dB: -20\r\n[2024-02-03 15:09:26.177749 INFO   ] utils:print_arguments:29 -         test_list: dataset/test_list.txt\r\n[2024-02-03 15:09:26.177800 INFO   ] utils:print_arguments:29 -         train_list: dataset/train_list.txt\r\n[2024-02-03 15:09:26.177851 INFO   ] utils:print_arguments:29 -         use_dB_normalization: False\r\n[2024-02-03 15:09:26.177905 INFO   ] utils:print_arguments:22 - model_conf:\r\n[2024-02-03 15:09:26.177959 INFO   ] utils:print_arguments:29 -         num_class: None\r\n[2024-02-03 15:09:26.178011 INFO   ] utils:print_arguments:22 - optimizer_conf:\r\n[2024-02-03 15:09:26.178066 INFO   ] utils:print_arguments:29 -         learning_rate: 0.001\r\n[2024-02-03 15:09:26.178118 INFO   ] utils:print_arguments:29 -         optimizer: Adam\r\n[2024-02-03 15:09:26.178173 INFO   ] utils:print_arguments:29 -         scheduler: WarmupCosineSchedulerLR\r\n[2024-02-03 15:09:26.178224 INFO   ] utils:print_arguments:25 -         scheduler_args:\r\n[2024-02-03 15:09:26.178277 INFO   ] utils:print_arguments:27 -                 max_lr: 0.001\r\n[2024-02-03 15:09:26.178330 INFO   ] utils:print_arguments:27 -                 min_lr: 1e-05\r\n[2024-02-03 15:09:26.178381 INFO   ] utils:print_arguments:27 -                 warmup_epoch: 5\r\n[2024-02-03 15:09:26.178434 INFO   ] utils:print_arguments:29 -         weight_decay: 1e-06\r\n[2024-02-03 15:09:26.178485 INFO   ] utils:print_arguments:22 - preprocess_conf:\r\n[2024-02-03 15:09:26.178537 INFO   ] utils:print_arguments:29 -         feature_method: Emotion2Vec\r\n[2024-02-03 15:09:26.178588 INFO   ] utils:print_arguments:25 -         method_args:\r\n[2024-02-03 15:09:26.178644 INFO   ] utils:print_arguments:27 -                 granularity: utterance\r\n[2024-02-03 15:09:26.178695 INFO   ] utils:print_arguments:22 - train_conf:\r\n[2024-02-03 15:09:26.178748 INFO   ] utils:print_arguments:29 -         enable_amp: False\r\n[2024-02-03 15:09:26.178800 INFO   ] utils:print_arguments:29 -         log_interval: 10\r\n[2024-02-03 15:09:26.178852 INFO   ] utils:print_arguments:29 -         loss_weight: None\r\n[2024-02-03 15:09:26.178906 INFO   ] utils:print_arguments:29 -         max_epoch: 60\r\n[2024-02-03 15:09:26.178957 INFO   ] utils:print_arguments:29 -         use_compile: False\r\n[2024-02-03 15:09:26.179008 INFO   ] utils:print_arguments:31 - use_model: BiLSTM\r\n[2024-02-03 15:09:26.179059 INFO   ] utils:print_arguments:32 - ------------------------------------------------\r\n[2024-02-03 15:09:26.179184 WARNING] trainer:__init__:69 - Emotion2Vec\u7279\u5f81\u63d0\u53d6\u65b9\u6cd5\u4e0d\u652f\u6301\u591a\u7ebf\u7a0b\uff0c\u5df2\u81ea\u52a8\u4f7f\u7528\u5355\u7ebf\u7a0b\u63d0\u53d6\u7279\u5f81\uff01\r\n[2024-02-03 15:09:26.198994 INFO   ] featurizer:__init__:23 - \u4f7f\u7528\u7684\u7279\u5f81\u65b9\u6cd5\u4e3a Emotion2Vec\r\n==========================================================================================\r\nLayer (type:depth-idx)                   Output Shape              Param #\r\n==========================================================================================\r\nBiLSTM                                   [1, 8]                    --\r\n\u251c\u2500Linear: 1-1                            [1, 512]                  393,728\r\n\u251c\u2500LSTM: 1-2                              [1, 1, 512]               1,576,960\r\n\u251c\u2500Tanh: 1-3                              [1, 512]                  --\r\n\u251c\u2500Dropout: 1-4                           [1, 512]                  --\r\n\u251c\u2500Linear: 1-5                            [1, 256]                  131,328\r\n\u251c\u2500ReLU: 1-6                              [1, 256]                  --\r\n\u251c\u2500Linear: 1-7                            [1, 8]                    2,056\r\n==========================================================================================\r\nTotal params: 2,104,072\r\nTrainable params: 2,104,072\r\nNon-trainable params: 0\r\nTotal mult-adds (Units.MEGABYTES): 2.10\r\n==========================================================================================\r\nInput size (MB): 0.00\r\nForward/backward pass size (MB): 0.01\r\nParams size (MB): 8.42\r\nEstimated Total Size (MB): 8.43\r\n==========================================================================================\r\n[2024-02-05 15:09:31.551738 INFO   ] trainer:train:378 - \u8bad\u7ec3\u6570\u636e\uff1a4407\r\n[2024-02-05 15:09:32.951738 INFO   ] trainer:__train_epoch:362 - Train epoch: [1/60], batch: [0/41], loss: 2.07688, accuracy: 0.15625, learning rate: 0.00001000, speed: 5.35 data/sec, eta: 4:05:18\r\n[2024-02-05 15:09:56.525906 INFO   ] trainer:__train_epoch:362 - Train epoch: [1/60], batch: [10/41], loss: 2.05963, accuracy: 0.22187, learning rate: 0.00005829, speed: 13.57 data/sec, eta: 1:36:15\r\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\r\n```\r\n\r\n# \u8bc4\u4f30\r\n\r\n\u6267\u884c\u4e0b\u9762\u547d\u4ee4\u6267\u884c\u8bc4\u4f30\u3002\r\n\r\n```shell\r\npython eval.py --configs=configs/bi_lstm.yml\r\n```\r\n\r\n\u8bc4\u4f30\u8f93\u51fa\u5982\u4e0b\uff1a\r\n```shell\r\n[2024-02-03 15:13:25.469242 INFO   ] trainer:evaluate:461 - \u6210\u529f\u52a0\u8f7d\u6a21\u578b\uff1amodels/BiLSTM_Emotion2Vec/best_model/model.pth\r\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 150/150 [00:00<00:00, 1281.96it/s]\r\n\u8bc4\u4f30\u6d88\u8017\u65f6\u95f4\uff1a1s\uff0closs\uff1a0.61840\uff0caccuracy\uff1a0.87333\r\n```\r\n\r\n\u8bc4\u4f30\u4f1a\u51fa\u6765\u8f93\u51fa\u51c6\u786e\u7387\uff0c\u8fd8\u4fdd\u5b58\u4e86\u6df7\u6dc6\u77e9\u9635\u56fe\u7247\uff0c\u4fdd\u5b58\u8def\u5f84`output/images/`\uff0c\u5982\u4e0b\u3002\r\n<br/>\r\n<div align=\"center\">\r\n<img src=\"docs/images/image1.png\" alt=\"\u6df7\u6dc6\u77e9\u9635\" width=\"600\">\r\n</div>\r\n\r\n\r\n\u6ce8\u610f\uff1a\u5982\u679c\u7c7b\u522b\u6807\u7b7e\u662f\u4e2d\u6587\u7684\uff0c\u9700\u8981\u8bbe\u7f6e\u5b89\u88c5\u5b57\u4f53\u624d\u80fd\u6b63\u5e38\u663e\u793a\uff0c\u4e00\u822c\u60c5\u51b5\u4e0bWindows\u65e0\u9700\u5b89\u88c5\uff0cUbuntu\u9700\u8981\u5b89\u88c5\u3002\u5982\u679cWindows\u786e\u5b9e\u662f\u786e\u5b9e\u5b57\u4f53\uff0c\u53ea\u9700\u8981[\u5b57\u4f53\u6587\u4ef6](https://github.com/tracyone/program_font)\u8fd9\u91cc\u4e0b\u8f7d`.ttf`\u683c\u5f0f\u7684\u6587\u4ef6\uff0c\u590d\u5236\u5230`C:\\Windows\\Fonts`\u5373\u53ef\u3002Ubuntu\u7cfb\u7edf\u64cd\u4f5c\u5982\u4e0b\u3002\r\n\r\n1. \u5b89\u88c5\u5b57\u4f53\r\n```shell\r\ngit clone https://github.com/tracyone/program_font && cd program_font && ./install.sh\r\n```\r\n\r\n2. \u6267\u884c\u4e0b\u9762Python\u4ee3\u7801\r\n```python\r\nimport matplotlib\r\nimport shutil\r\nimport os\r\n\r\npath = matplotlib.matplotlib_fname()\r\npath = path.replace('matplotlibrc', 'fonts/ttf/')\r\nprint(path)\r\nshutil.copy('/usr/share/fonts/MyFonts/simhei.ttf', path)\r\nuser_dir = os.path.expanduser('~')\r\nshutil.rmtree(f'{user_dir}/.cache/matplotlib', ignore_errors=True)\r\n```\r\n\r\n\r\n# \u9884\u6d4b\r\n\r\n\u5728\u8bad\u7ec3\u7ed3\u675f\u4e4b\u540e\uff0c\u6211\u4eec\u5f97\u5230\u4e86\u4e00\u4e2a\u6a21\u578b\u53c2\u6570\u6587\u4ef6\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6a21\u578b\u9884\u6d4b\u97f3\u9891\u3002\r\n\r\n```shell\r\npython infer.py --audio_path=dataset/test.wav\r\n```\r\n\r\n\u8f93\u51fa\u5982\u4e0b\uff1a\r\n```\r\n\u6210\u529f\u52a0\u8f7d\u6a21\u578b\u53c2\u6570\uff1amodels/BiLSTM_Emotion2Vec/best_model/model.pth\r\n[2024-07-02 19:48:42.864262 INFO   ] emotion2vec_predict:__init__:27 - \u6210\u529f\u52a0\u8f7d\u6a21\u578b\uff1amodels/iic/emotion2vec_base\r\n\u97f3\u9891\uff1adataset/test.wav \u7684\u9884\u6d4b\u7ed3\u679c\u6807\u7b7e\u4e3a\uff1aangry\uff0c\u5f97\u5206\uff1a0.99995\r\n```\r\n\r\n# \u4f7f\u7528Emotion2vec\u6a21\u578b\u9884\u6d4b\r\n\r\n\u9879\u76ee\u5df2\u7ecf\u63d0\u4f9b\u4e86Emotion2vec\u6a21\u578b\uff0c\u4f7f\u7528ModelScope\u516c\u5f00\u7684Emotion2vec\u6a21\u578b\u9884\u6d4b\u97f3\u9891\u3002\u5728\u4f7f\u7528\u65f6\u53ea\u9700\u8981\u8bbe\u7f6e`--use_ms_model`\u53c2\u6570\u5373\u53ef\uff0c\u4e0d\u9700\u8981\u989d\u5916\u914d\u7f6e\u6587\u4ef6\u548c\u6307\u5b9a\u6a21\u578b\u8def\u5f84\uff0c\u9996\u6b21\u4f7f\u7528\u65f6\u4f1a\u81ea\u52a8\u4e0b\u8f7d\u6a21\u578b\u6587\u4ef6\u3002\u652f\u6301`iic/emotion2vec_plus_seed`\u3001`iic/emotion2vec_plus_base`\u3001`iic/emotion2vec_plus_large`\u4e09\u4e2a\u6a21\u578b\u3002\r\n\r\n```shell\r\npython infer.py --audio_path=dataset/test.wav --use_ms_model=iic/emotion2vec_plus_base\r\n```\r\n\r\n\u8f93\u51fa\u5982\u4e0b\uff1a\r\n```\r\n[2024-07-02 19:45:36.154355 INFO   ] emotion2vec_predict:__init__:27 - \u6210\u529f\u52a0\u8f7d\u6a21\u578b\uff1amodels/iic/emotion2vec_plus_base\r\n\u97f3\u9891\uff1adataset/test.wav \u7684\u9884\u6d4b\u7ed3\u679c\u6807\u7b7e\u4e3a\uff1a\u751f\u6c14\uff0c\u5f97\u5206\uff1a1.0\r\n```\r\n\r\n## \u6253\u8d4f\u4f5c\u8005\r\n<br/>\r\n<div align=\"center\">\r\n<p>\u6253\u8d4f\u4e00\u5757\u94b1\u652f\u6301\u4e00\u4e0b\u4f5c\u8005</p>\r\n<img src=\"https://yeyupiaoling.cn/reward.png\" alt=\"\u6253\u8d4f\u4f5c\u8005\" width=\"400\">\r\n</div>\r\n\r\n# \u53c2\u8003\u8d44\u6599\r\n\r\n1. https://github.com/yeyupiaoling/AudioClassification-Pytorch\r\n2. https://github.com/alibaba-damo-academy/FunASR\r\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Speech Emotion Recognition toolkit on Pytorch",
    "version": "0.0.7",
    "project_urls": {
        "Download": "https://github.com/yeyupiaoling/SpeechEmotionRecognition-Pytorch.git",
        "Homepage": "https://github.com/yeyupiaoling/SpeechEmotionRecognition-Pytorch"
    },
    "split_keywords": [
        "audio",
        " pytorch",
        " emotion"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cbd2eb758478ce535bc7926360e11c4c90cef33b40e177117d7dfa02d9d14d8a",
                "md5": "cfc2afc4b46989d280d3fae4e9d8619f",
                "sha256": "6efe4e44f919d10058a24dc08bb0baf7c884e2554dc3d7c08fadaf4cebc582fe"
            },
            "downloads": -1,
            "filename": "mser-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cfc2afc4b46989d280d3fae4e9d8619f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 32954,
            "upload_time": "2024-09-02T12:20:18",
            "upload_time_iso_8601": "2024-09-02T12:20:18.113383Z",
            "url": "https://files.pythonhosted.org/packages/cb/d2/eb758478ce535bc7926360e11c4c90cef33b40e177117d7dfa02d9d14d8a/mser-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-02 12:20:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yeyupiaoling",
    "github_project": "SpeechEmotionRecognition-Pytorch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "mser"
}
        
Elapsed time: 1.39993s