简体中文 | [English]()
# 基于PaddlePaddle实现的语音合成系统
![python version](https://img.shields.io/badge/python-3.8+-orange.svg)
![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/VITS-PaddlePaddle)
![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/VITS-PaddlePaddle)
![GitHub](https://img.shields.io/github/license/yeyupiaoling/VITS-PaddlePaddle)
![支持系统](https://img.shields.io/badge/支持系统-Win/Linux/MAC-9cf)
# 前言
本项目是基于PaddlePaddle的语音合成项目,使用的是VITS,VITS(Variational Inference with adversarial learning for end-to-end Text-to-Speech)是一种语音合成方法,这种时端到端的模型使用起来非常简单,不需要文本对齐等太复杂的流程,直接一键训练和生成,大大降低了学习门槛。
**欢迎大家扫码入知识星球或者QQ群讨论,知识星球里面提供项目的模型文件和博主其他相关项目的模型文件,也包括其他一些资源。**
<div align="center">
<img src="https://yeyupiaoling.cn/zsxq.png" alt="知识星球" width="400">
<img src="https://yeyupiaoling.cn/qq.png" alt="QQ群" width="400">
</div>
# 使用准备
- Anaconda 3
- Python 3.8
- PaddlePaddle 2.5.1
- Windows 10 or Ubuntu 18.04
# 模型下载
# 模型下载
| 数据集 | 语言(方言) | 说话人数量 | 说话人名称 | 下载地址 |
|:--------------------------------------------------------:|:------:|:-----:|:-------------------:|:----:|
| [BZNSYP](https://aistudio.baidu.com/datasetdetail/36741) | 普通话 | 1 | 标准女声 | |
| 粤语数据集 | 粤语 | 10 | 男声1<br/>女生1<br/>··· | |
## 安装环境
- 首先安装的是PaddlePaddle的GPU版本,如果已经安装过了,请跳过。
```shell
conda install paddlepaddle-gpu==2.4.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
- 安装ppvits库。
使用pip安装,命令如下:
```shell
python -m pip install ppvits -U -i https://pypi.tuna.tsinghua.edu.cn/simple
```
**建议源码安装**,源码安装能保证使用最新代码。
```shell
git clone https://github.com/yeyupiaoling/VITS-PaddlePaddle.git
cd VITS-PaddlePaddle/
pip install .
```
- 安装`monotonic-align-paddle`库
```shell
cd tools/monotonic_align_paddle
pip install .
cd -
```
## 准备数据
项目支持直接生成[BZNSYP](https://aistudio.baidu.com/datasetdetail/36741)和[AiShell3](https://aistudio.baidu.com/datasetdetail/207703)数据列表,以BZNSYP为例,将BZNSYP下载到`dataset`目录下,并解压。然后执行`create_list.py`程序就会生成以下格式的数据表,格式为`<音频路径>|<说话人名称>|<标注数据>`,注意标注数据需要标注语言,例如普通话,就要用`[ZH]`将文本包裹起来,其他语言分别支持日本語:`[JA]`, English:[EN], 한국어:[KO]。自定义数据集按照这个格式生成就行。
项目提供两种文本处理方式,不同的文本处理方式,支持不同的语言,分别是`cjke_cleaners2`和`chinese_dialect_cleaners`,这个配置在`dataset_conf.text_cleaner`上修改。`cjke_cleaners2`支持语言`{"普通话": "[ZH]", "日本語": "[JA]", "English": "[EN]", "한국어": "[KO]"}`,`chinese_dialect_cleaners`支持语言`{"普通话": "[ZH]", "日本語": "[JA]", "English": "[EN]", "粤语": "[GD]", "上海话": "[SH]", "苏州话": "[SZ]", "无锡话": "[WX]", "常州话": "[CZ]", "杭州话": "[HZ]", ·····}`,更多的语言可以查看源码[LANGUAGE_MARKS](./mvits/__init__.py)。
```
dataset/BZNSYP/Wave/000001.wav|标准女声|[ZH]卡尔普陪外孙玩滑梯。[ZH]
dataset/BZNSYP/Wave/000002.wav|标准女声|[ZH]假语村言别再拥抱我。[ZH]
dataset/BZNSYP/Wave/000003.wav|标准女声|[ZH]宝马配挂跛骡鞍,貂蝉怨枕董翁榻。[ZH]
```
有了数据列表之后,需要生成音素数据列表,只要执行`preprocess_data.py --train_data_list=dataset/bznsyp.txt`,即可生成音素数据列表。到这一步数据就全部准备好了。
```
dataset/BZNSYP/Wave/000001.wav|0|kʰa↓↑əɹ`↓↑pʰu↓↑ pʰeɪ↑ waɪ↓swən→ wan↑ xwa↑tʰi→.
dataset/BZNSYP/Wave/000002.wav|0|tʃ⁼ja↓↑ɥ↓↑ tsʰwən→jɛn↑p⁼iɛ↑ ts⁼aɪ↓ jʊŋ→p⁼ɑʊ↓ wo↓↑.
dataset/BZNSYP/Wave/000003.wav|0|p⁼ɑʊ↓↑ma↓↑ pʰeɪ↓k⁼wa↓ p⁼wo↓↑ lwo↑an→, t⁼iɑʊ→ts`ʰan↑ ɥæn↓ ts`⁼ən↓↑ t⁼ʊŋ↓↑ʊŋ→ tʰa↓.
```
## 训练
现在就可以开始训练模型了,配置文件里面的参数一般不需要修改,说话人数量和说话人名称都会在执行`preprocess_data.py`修改过。可能需要修改的只有`train.batch_size`,如果是显存不够的话,可以减小这个参数。
```shell
# 单卡训练
CUDA_VISIBLE_DEVICES=0 python train.py
# 多卡训练
python -m paddle.distributed.launch --gpus '0,1' train.py
```
训练输出日志:
```
[2023-08-28 21:04:42.274452 INFO ] utils:print_arguments:123 - ----------- 额外配置参数 -----------
[2023-08-28 21:04:42.274540 INFO ] utils:print_arguments:125 - config: configs/config.yml
[2023-08-28 21:04:42.274580 INFO ] utils:print_arguments:125 - epochs: 10000
[2023-08-28 21:04:42.274658 INFO ] utils:print_arguments:125 - model_dir: models
[2023-08-28 21:04:42.274702 INFO ] utils:print_arguments:125 - pretrained_model: None
[2023-08-28 21:04:42.274746 INFO ] utils:print_arguments:125 - resume_model: None
[2023-08-28 21:04:42.274788 INFO ] utils:print_arguments:126 - ------------------------------------------------
[2023-08-28 21:04:42.727728 INFO ] utils:print_arguments:128 - ----------- 配置文件参数 -----------
[2023-08-28 21:04:42.727836 INFO ] utils:print_arguments:131 - dataset_conf:
[2023-08-28 21:04:42.727909 INFO ] utils:print_arguments:138 - add_blank: True
[2023-08-28 21:04:42.727975 INFO ] utils:print_arguments:138 - batch_size: 16
[2023-08-28 21:04:42.728037 INFO ] utils:print_arguments:138 - cleaned_text: True
[2023-08-28 21:04:42.728097 INFO ] utils:print_arguments:138 - eval_sum: 2
[2023-08-28 21:04:42.728157 INFO ] utils:print_arguments:138 - filter_length: 1024
[2023-08-28 21:04:42.728204 INFO ] utils:print_arguments:138 - hop_length: 256
[2023-08-28 21:04:42.728235 INFO ] utils:print_arguments:138 - max_wav_value: 32768.0
[2023-08-28 21:04:42.728266 INFO ] utils:print_arguments:138 - mel_fmax: None
[2023-08-28 21:04:42.728298 INFO ] utils:print_arguments:138 - mel_fmin: 0.0
[2023-08-28 21:04:42.728328 INFO ] utils:print_arguments:138 - n_mel_channels: 80
[2023-08-28 21:04:42.728359 INFO ] utils:print_arguments:138 - num_workers: 4
[2023-08-28 21:04:42.728388 INFO ] utils:print_arguments:138 - sampling_rate: 22050
[2023-08-28 21:04:42.728418 INFO ] utils:print_arguments:138 - speakers_file: dataset/speakers.json
[2023-08-28 21:04:42.728448 INFO ] utils:print_arguments:138 - text_cleaner: cjke_cleaners2
[2023-08-28 21:04:42.728483 INFO ] utils:print_arguments:138 - training_file: dataset/train.txt
[2023-08-28 21:04:42.728539 INFO ] utils:print_arguments:138 - validation_file: dataset/val.txt
[2023-08-28 21:04:42.728585 INFO ] utils:print_arguments:138 - win_length: 1024
[2023-08-28 21:04:42.728615 INFO ] utils:print_arguments:131 - model:
[2023-08-28 21:04:42.728648 INFO ] utils:print_arguments:138 - filter_channels: 768
[2023-08-28 21:04:42.728685 INFO ] utils:print_arguments:138 - gin_channels: 256
[2023-08-28 21:04:42.728717 INFO ] utils:print_arguments:138 - hidden_channels: 192
[2023-08-28 21:04:42.728747 INFO ] utils:print_arguments:138 - inter_channels: 192
[2023-08-28 21:04:42.728777 INFO ] utils:print_arguments:138 - kernel_size: 3
[2023-08-28 21:04:42.728808 INFO ] utils:print_arguments:138 - n_heads: 2
[2023-08-28 21:04:42.728839 INFO ] utils:print_arguments:138 - n_layers: 6
[2023-08-28 21:04:42.728870 INFO ] utils:print_arguments:138 - n_layers_q: 3
[2023-08-28 21:04:42.728902 INFO ] utils:print_arguments:138 - p_dropout: 0.1
[2023-08-28 21:04:42.728933 INFO ] utils:print_arguments:138 - resblock: 1
[2023-08-28 21:04:42.728965 INFO ] utils:print_arguments:138 - resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
[2023-08-28 21:04:42.728997 INFO ] utils:print_arguments:138 - resblock_kernel_sizes: [3, 7, 11]
[2023-08-28 21:04:42.729027 INFO ] utils:print_arguments:138 - upsample_initial_channel: 512
[2023-08-28 21:04:42.729058 INFO ] utils:print_arguments:138 - upsample_kernel_sizes: [16, 16, 4, 4]
[2023-08-28 21:04:42.729089 INFO ] utils:print_arguments:138 - upsample_rates: [8, 8, 2, 2]
[2023-08-28 21:04:42.729119 INFO ] utils:print_arguments:138 - use_spectral_norm: False
[2023-08-28 21:04:42.729150 INFO ] utils:print_arguments:131 - optimizer_conf:
[2023-08-28 21:04:42.729184 INFO ] utils:print_arguments:138 - betas: [0.8, 0.99]
[2023-08-28 21:04:42.729217 INFO ] utils:print_arguments:138 - eps: 1e-09
[2023-08-28 21:04:42.729280 INFO ] utils:print_arguments:138 - optimizer: AdamW
[2023-08-28 21:04:42.729311 INFO ] utils:print_arguments:138 - scheduler: ExponentialLR
[2023-08-28 21:04:42.729341 INFO ] utils:print_arguments:134 - scheduler_args:
[2023-08-28 21:04:42.729373 INFO ] utils:print_arguments:136 - gamma: 0.999875
[2023-08-28 21:04:42.729249 INFO ] utils:print_arguments:138 - learning_rate: 0.0002
[2023-08-28 21:04:42.729404 INFO ] utils:print_arguments:131 - train_conf:
[2023-08-28 21:04:42.729437 INFO ] utils:print_arguments:138 - c_kl: 1.0
[2023-08-28 21:04:42.729467 INFO ] utils:print_arguments:138 - c_mel: 45
[2023-08-28 21:04:42.729498 INFO ] utils:print_arguments:138 - enable_amp: True
[2023-08-28 21:04:42.729530 INFO ] utils:print_arguments:138 - log_interval: 200
[2023-08-28 21:04:42.729561 INFO ] utils:print_arguments:138 - seed: 1234
[2023-08-28 21:04:42.729592 INFO ] utils:print_arguments:138 - segment_size: 8192
[2023-08-28 21:04:42.729622 INFO ] utils:print_arguments:141 - ------------------------------------------------
[2023-08-28 21:04:42.729971 INFO ] trainer:__init__:53 - [cjke_cleaners2]支持语言:['日本語', '普通话', 'English', '한국어', "Mix": ""]
[2023-08-28 21:04:42.795955 INFO ] trainer:__setup_dataloader:119 - 训练数据:9984
epoch [1/10000]: 100%|██████████| 619/619 [05:30<00:00, 1.88it/s]]
[2023-08-25 16:44:25.205557 INFO ] trainer:train:168 - ======================================================================
epoch [2/10000]: 100%|██████████| 619/619 [05:20<00:00, 1.93it/s]s]
[2023-08-25 16:49:54.372718 INFO ] trainer:train:168 - ======================================================================
epoch [3/10000]: 100%|██████████| 619/619 [05:19<00:00, 1.94it/s]
[2023-08-25 16:55:21.277194 INFO ] trainer:train:168 - ======================================================================
epoch [4/10000]: 100%|██████████| 619/619 [05:18<00:00, 1.94it/s]
```
训练的日志也会使用VisualDL保存,可以使用这个工具实时查看loss变化和合成效果,只要在项目根目录执行`visualdl --logdir=log/ --host=0.0.0.0`,访问`http://<IP地址>:8040`即可打开页面,效果如下。
<div align="center">
<img src="./docs/images/log.jpg" alt="VisualDL" width="600">
</div>
# 语音合成
训练到一定程度之后,可以开始使用模型进行语音合成了,命令如下,主要参数有三个,分别是`--text`指定所需要合成的文本。`--language`指定合成文本的语言,如果语言指定为`Mix`,为混合模式,需要用户手动对收入文本用语言标签包裹。最后是指定说话人的参数`--spk`。快去试一下吧。
```shell
python infer.py --text="你好,我是智能语音助手。" --language=简体中文 --spk=标准女声
```
## 打赏作者
<br/>
<div align="center">
<p>打赏一块钱支持一下作者</p>
<img src="https://yeyupiaoling.cn/reward.png" alt="打赏作者" width="400">
</div>
# 参考资料
1. https://github.com/jaywalnut310/vits
2. https://github.com/PaddlePaddle/PaddleSpeech
3. https://github.com/yeyupiaoling/PPASR
4. https://github.com/Artrajz/vits-simple-api
Raw data
{
"_id": null,
"home_page": "https://github.com/yeyupiaoling/VITS-PaddlePaddle",
"name": "ppvits",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "audio,paddle,tts",
"author": "yeyupiaoling",
"author_email": "",
"download_url": "https://github.com/yeyupiaoling/VITS-PaddlePaddle.git",
"platform": null,
"description": "\u7b80\u4f53\u4e2d\u6587 | [English]()\r\n\r\n# \u57fa\u4e8ePaddlePaddle\u5b9e\u73b0\u7684\u8bed\u97f3\u5408\u6210\u7cfb\u7edf\r\n\r\n![python version](https://img.shields.io/badge/python-3.8+-orange.svg)\r\n![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/VITS-PaddlePaddle)\r\n![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/VITS-PaddlePaddle)\r\n![GitHub](https://img.shields.io/github/license/yeyupiaoling/VITS-PaddlePaddle)\r\n![\u652f\u6301\u7cfb\u7edf](https://img.shields.io/badge/\u652f\u6301\u7cfb\u7edf-Win/Linux/MAC-9cf)\r\n\r\n# \u524d\u8a00\r\n\r\n\u672c\u9879\u76ee\u662f\u57fa\u4e8ePaddlePaddle\u7684\u8bed\u97f3\u5408\u6210\u9879\u76ee\uff0c\u4f7f\u7528\u7684\u662fVITS\uff0cVITS\uff08Variational Inference with adversarial learning for end-to-end Text-to-Speech\uff09\u662f\u4e00\u79cd\u8bed\u97f3\u5408\u6210\u65b9\u6cd5\uff0c\u8fd9\u79cd\u65f6\u7aef\u5230\u7aef\u7684\u6a21\u578b\u4f7f\u7528\u8d77\u6765\u975e\u5e38\u7b80\u5355\uff0c\u4e0d\u9700\u8981\u6587\u672c\u5bf9\u9f50\u7b49\u592a\u590d\u6742\u7684\u6d41\u7a0b\uff0c\u76f4\u63a5\u4e00\u952e\u8bad\u7ec3\u548c\u751f\u6210\uff0c\u5927\u5927\u964d\u4f4e\u4e86\u5b66\u4e60\u95e8\u69db\u3002\r\n\r\n\r\n**\u6b22\u8fce\u5927\u5bb6\u626b\u7801\u5165\u77e5\u8bc6\u661f\u7403\u6216\u8005QQ\u7fa4\u8ba8\u8bba\uff0c\u77e5\u8bc6\u661f\u7403\u91cc\u9762\u63d0\u4f9b\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\u548c\u535a\u4e3b\u5176\u4ed6\u76f8\u5173\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\uff0c\u4e5f\u5305\u62ec\u5176\u4ed6\u4e00\u4e9b\u8d44\u6e90\u3002**\r\n\r\n<div align=\"center\">\r\n <img src=\"https://yeyupiaoling.cn/zsxq.png\" alt=\"\u77e5\u8bc6\u661f\u7403\" width=\"400\">\r\n <img src=\"https://yeyupiaoling.cn/qq.png\" alt=\"QQ\u7fa4\" width=\"400\">\r\n</div>\r\n\r\n# \u4f7f\u7528\u51c6\u5907\r\n\r\n- Anaconda 3\r\n- Python 3.8\r\n- PaddlePaddle 2.5.1\r\n- Windows 10 or Ubuntu 18.04\r\n\r\n\r\n# \u6a21\u578b\u4e0b\u8f7d\r\n\r\n# \u6a21\u578b\u4e0b\u8f7d\r\n\r\n| \u6570\u636e\u96c6 | \u8bed\u8a00\uff08\u65b9\u8a00\uff09 | \u8bf4\u8bdd\u4eba\u6570\u91cf | \u8bf4\u8bdd\u4eba\u540d\u79f0 | \u4e0b\u8f7d\u5730\u5740 | \r\n|:--------------------------------------------------------:|:------:|:-----:|:-------------------:|:----:|\r\n| [BZNSYP](https://aistudio.baidu.com/datasetdetail/36741) | \u666e\u901a\u8bdd | 1 | \u6807\u51c6\u5973\u58f0 | |\r\n| \u7ca4\u8bed\u6570\u636e\u96c6 | \u7ca4\u8bed | 10 | \u7537\u58f01<br/>\u5973\u751f1<br/>\u00b7\u00b7\u00b7 | |\r\n\r\n\r\n## \u5b89\u88c5\u73af\u5883\r\n\r\n - \u9996\u5148\u5b89\u88c5\u7684\u662fPaddlePaddle\u7684GPU\u7248\u672c\uff0c\u5982\u679c\u5df2\u7ecf\u5b89\u88c5\u8fc7\u4e86\uff0c\u8bf7\u8df3\u8fc7\u3002\r\n```shell\r\nconda install paddlepaddle-gpu==2.4.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/\r\n```\r\n\r\n- \u5b89\u88c5ppvits\u5e93\u3002\r\n\r\n\u4f7f\u7528pip\u5b89\u88c5\uff0c\u547d\u4ee4\u5982\u4e0b\uff1a\r\n\r\n```shell\r\npython -m pip install ppvits -U -i https://pypi.tuna.tsinghua.edu.cn/simple\r\n```\r\n\r\n**\u5efa\u8bae\u6e90\u7801\u5b89\u88c5**\uff0c\u6e90\u7801\u5b89\u88c5\u80fd\u4fdd\u8bc1\u4f7f\u7528\u6700\u65b0\u4ee3\u7801\u3002\r\n\r\n```shell\r\ngit clone https://github.com/yeyupiaoling/VITS-PaddlePaddle.git\r\ncd VITS-PaddlePaddle/\r\npip install .\r\n```\r\n\r\n- \u5b89\u88c5`monotonic-align-paddle`\u5e93\r\n\r\n```shell\r\ncd tools/monotonic_align_paddle\r\npip install .\r\ncd -\r\n```\r\n\r\n## \u51c6\u5907\u6570\u636e\r\n\r\n\u9879\u76ee\u652f\u6301\u76f4\u63a5\u751f\u6210[BZNSYP](https://aistudio.baidu.com/datasetdetail/36741)\u548c[AiShell3](https://aistudio.baidu.com/datasetdetail/207703)\u6570\u636e\u5217\u8868\uff0c\u4ee5BZNSYP\u4e3a\u4f8b\uff0c\u5c06BZNSYP\u4e0b\u8f7d\u5230`dataset`\u76ee\u5f55\u4e0b\uff0c\u5e76\u89e3\u538b\u3002\u7136\u540e\u6267\u884c`create_list.py`\u7a0b\u5e8f\u5c31\u4f1a\u751f\u6210\u4ee5\u4e0b\u683c\u5f0f\u7684\u6570\u636e\u8868\uff0c\u683c\u5f0f\u4e3a`<\u97f3\u9891\u8def\u5f84>|<\u8bf4\u8bdd\u4eba\u540d\u79f0>|<\u6807\u6ce8\u6570\u636e>`\uff0c\u6ce8\u610f\u6807\u6ce8\u6570\u636e\u9700\u8981\u6807\u6ce8\u8bed\u8a00\uff0c\u4f8b\u5982\u666e\u901a\u8bdd\uff0c\u5c31\u8981\u7528`[ZH]`\u5c06\u6587\u672c\u5305\u88f9\u8d77\u6765\uff0c\u5176\u4ed6\u8bed\u8a00\u5206\u522b\u652f\u6301\u65e5\u672c\u8a9e:`[JA]`, English:[EN], \ud55c\uad6d\uc5b4:[KO]\u3002\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\u6309\u7167\u8fd9\u4e2a\u683c\u5f0f\u751f\u6210\u5c31\u884c\u3002\r\n\r\n\u9879\u76ee\u63d0\u4f9b\u4e24\u79cd\u6587\u672c\u5904\u7406\u65b9\u5f0f\uff0c\u4e0d\u540c\u7684\u6587\u672c\u5904\u7406\u65b9\u5f0f\uff0c\u652f\u6301\u4e0d\u540c\u7684\u8bed\u8a00\uff0c\u5206\u522b\u662f`cjke_cleaners2`\u548c`chinese_dialect_cleaners`\uff0c\u8fd9\u4e2a\u914d\u7f6e\u5728`dataset_conf.text_cleaner`\u4e0a\u4fee\u6539\u3002`cjke_cleaners2`\u652f\u6301\u8bed\u8a00`{\"\u666e\u901a\u8bdd\": \"[ZH]\", \"\u65e5\u672c\u8a9e\": \"[JA]\", \"English\": \"[EN]\", \"\ud55c\uad6d\uc5b4\": \"[KO]\"}`\uff0c`chinese_dialect_cleaners`\u652f\u6301\u8bed\u8a00`{\"\u666e\u901a\u8bdd\": \"[ZH]\", \"\u65e5\u672c\u8a9e\": \"[JA]\", \"English\": \"[EN]\", \"\u7ca4\u8bed\": \"[GD]\", \"\u4e0a\u6d77\u8bdd\": \"[SH]\", \"\u82cf\u5dde\u8bdd\": \"[SZ]\", \"\u65e0\u9521\u8bdd\": \"[WX]\", \"\u5e38\u5dde\u8bdd\": \"[CZ]\", \"\u676d\u5dde\u8bdd\": \"[HZ]\", \u00b7\u00b7\u00b7\u00b7\u00b7}`\uff0c\u66f4\u591a\u7684\u8bed\u8a00\u53ef\u4ee5\u67e5\u770b\u6e90\u7801[LANGUAGE_MARKS](./mvits/__init__.py)\u3002\r\n```\r\ndataset/BZNSYP/Wave/000001.wav|\u6807\u51c6\u5973\u58f0|[ZH]\u5361\u5c14\u666e\u966a\u5916\u5b59\u73a9\u6ed1\u68af\u3002[ZH]\r\ndataset/BZNSYP/Wave/000002.wav|\u6807\u51c6\u5973\u58f0|[ZH]\u5047\u8bed\u6751\u8a00\u522b\u518d\u62e5\u62b1\u6211\u3002[ZH]\r\ndataset/BZNSYP/Wave/000003.wav|\u6807\u51c6\u5973\u58f0|[ZH]\u5b9d\u9a6c\u914d\u6302\u8ddb\u9aa1\u978d\uff0c\u8c82\u8749\u6028\u6795\u8463\u7fc1\u69bb\u3002[ZH]\r\n```\r\n\r\n\u6709\u4e86\u6570\u636e\u5217\u8868\u4e4b\u540e\uff0c\u9700\u8981\u751f\u6210\u97f3\u7d20\u6570\u636e\u5217\u8868\uff0c\u53ea\u8981\u6267\u884c`preprocess_data.py --train_data_list=dataset/bznsyp.txt`\uff0c\u5373\u53ef\u751f\u6210\u97f3\u7d20\u6570\u636e\u5217\u8868\u3002\u5230\u8fd9\u4e00\u6b65\u6570\u636e\u5c31\u5168\u90e8\u51c6\u5907\u597d\u4e86\u3002\r\n```\r\ndataset/BZNSYP/Wave/000001.wav|0|k\u02b0a\u2193\u2191\u0259\u0279`\u2193\u2191p\u02b0u\u2193\u2191 p\u02b0e\u026a\u2191 wa\u026a\u2193sw\u0259n\u2192 wan\u2191 xwa\u2191t\u02b0i\u2192.\r\ndataset/BZNSYP/Wave/000002.wav|0|t\u0283\u207cja\u2193\u2191\u0265\u2193\u2191 ts\u02b0w\u0259n\u2192j\u025bn\u2191p\u207ci\u025b\u2191 ts\u207ca\u026a\u2193 j\u028a\u014b\u2192p\u207c\u0251\u028a\u2193 wo\u2193\u2191.\r\ndataset/BZNSYP/Wave/000003.wav|0|p\u207c\u0251\u028a\u2193\u2191ma\u2193\u2191 p\u02b0e\u026a\u2193k\u207cwa\u2193 p\u207cwo\u2193\u2191 lwo\u2191an\u2192, t\u207ci\u0251\u028a\u2192ts`\u02b0an\u2191 \u0265\u00e6n\u2193 ts`\u207c\u0259n\u2193\u2191 t\u207c\u028a\u014b\u2193\u2191\u028a\u014b\u2192 t\u02b0a\u2193.\r\n```\r\n\r\n\r\n## \u8bad\u7ec3\r\n\r\n\u73b0\u5728\u5c31\u53ef\u4ee5\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u4e86\uff0c\u914d\u7f6e\u6587\u4ef6\u91cc\u9762\u7684\u53c2\u6570\u4e00\u822c\u4e0d\u9700\u8981\u4fee\u6539\uff0c\u8bf4\u8bdd\u4eba\u6570\u91cf\u548c\u8bf4\u8bdd\u4eba\u540d\u79f0\u90fd\u4f1a\u5728\u6267\u884c`preprocess_data.py`\u4fee\u6539\u8fc7\u3002\u53ef\u80fd\u9700\u8981\u4fee\u6539\u7684\u53ea\u6709`train.batch_size`\uff0c\u5982\u679c\u662f\u663e\u5b58\u4e0d\u591f\u7684\u8bdd\uff0c\u53ef\u4ee5\u51cf\u5c0f\u8fd9\u4e2a\u53c2\u6570\u3002\r\n\r\n```shell\r\n# \u5355\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0 python train.py\r\n# \u591a\u5361\u8bad\u7ec3\r\npython -m paddle.distributed.launch --gpus '0,1' train.py\r\n```\r\n\r\n\u8bad\u7ec3\u8f93\u51fa\u65e5\u5fd7\uff1a\r\n\r\n```\r\n[2023-08-28 21:04:42.274452 INFO ] utils:print_arguments:123 - ----------- \u989d\u5916\u914d\u7f6e\u53c2\u6570 -----------\r\n[2023-08-28 21:04:42.274540 INFO ] utils:print_arguments:125 - config: configs/config.yml\r\n[2023-08-28 21:04:42.274580 INFO ] utils:print_arguments:125 - epochs: 10000\r\n[2023-08-28 21:04:42.274658 INFO ] utils:print_arguments:125 - model_dir: models\r\n[2023-08-28 21:04:42.274702 INFO ] utils:print_arguments:125 - pretrained_model: None\r\n[2023-08-28 21:04:42.274746 INFO ] utils:print_arguments:125 - resume_model: None\r\n[2023-08-28 21:04:42.274788 INFO ] utils:print_arguments:126 - ------------------------------------------------\r\n[2023-08-28 21:04:42.727728 INFO ] utils:print_arguments:128 - ----------- \u914d\u7f6e\u6587\u4ef6\u53c2\u6570 -----------\r\n[2023-08-28 21:04:42.727836 INFO ] utils:print_arguments:131 - dataset_conf:\r\n[2023-08-28 21:04:42.727909 INFO ] utils:print_arguments:138 - add_blank: True\r\n[2023-08-28 21:04:42.727975 INFO ] utils:print_arguments:138 - batch_size: 16\r\n[2023-08-28 21:04:42.728037 INFO ] utils:print_arguments:138 - cleaned_text: True\r\n[2023-08-28 21:04:42.728097 INFO ] utils:print_arguments:138 - eval_sum: 2\r\n[2023-08-28 21:04:42.728157 INFO ] utils:print_arguments:138 - filter_length: 1024\r\n[2023-08-28 21:04:42.728204 INFO ] utils:print_arguments:138 - hop_length: 256\r\n[2023-08-28 21:04:42.728235 INFO ] utils:print_arguments:138 - max_wav_value: 32768.0\r\n[2023-08-28 21:04:42.728266 INFO ] utils:print_arguments:138 - mel_fmax: None\r\n[2023-08-28 21:04:42.728298 INFO ] utils:print_arguments:138 - mel_fmin: 0.0\r\n[2023-08-28 21:04:42.728328 INFO ] utils:print_arguments:138 - n_mel_channels: 80\r\n[2023-08-28 21:04:42.728359 INFO ] utils:print_arguments:138 - num_workers: 4\r\n[2023-08-28 21:04:42.728388 INFO ] utils:print_arguments:138 - sampling_rate: 22050\r\n[2023-08-28 21:04:42.728418 INFO ] utils:print_arguments:138 - speakers_file: dataset/speakers.json\r\n[2023-08-28 21:04:42.728448 INFO ] utils:print_arguments:138 - text_cleaner: cjke_cleaners2\r\n[2023-08-28 21:04:42.728483 INFO ] utils:print_arguments:138 - training_file: dataset/train.txt\r\n[2023-08-28 21:04:42.728539 INFO ] utils:print_arguments:138 - validation_file: dataset/val.txt\r\n[2023-08-28 21:04:42.728585 INFO ] utils:print_arguments:138 - win_length: 1024\r\n[2023-08-28 21:04:42.728615 INFO ] utils:print_arguments:131 - model:\r\n[2023-08-28 21:04:42.728648 INFO ] utils:print_arguments:138 - filter_channels: 768\r\n[2023-08-28 21:04:42.728685 INFO ] utils:print_arguments:138 - gin_channels: 256\r\n[2023-08-28 21:04:42.728717 INFO ] utils:print_arguments:138 - hidden_channels: 192\r\n[2023-08-28 21:04:42.728747 INFO ] utils:print_arguments:138 - inter_channels: 192\r\n[2023-08-28 21:04:42.728777 INFO ] utils:print_arguments:138 - kernel_size: 3\r\n[2023-08-28 21:04:42.728808 INFO ] utils:print_arguments:138 - n_heads: 2\r\n[2023-08-28 21:04:42.728839 INFO ] utils:print_arguments:138 - n_layers: 6\r\n[2023-08-28 21:04:42.728870 INFO ] utils:print_arguments:138 - n_layers_q: 3\r\n[2023-08-28 21:04:42.728902 INFO ] utils:print_arguments:138 - p_dropout: 0.1\r\n[2023-08-28 21:04:42.728933 INFO ] utils:print_arguments:138 - resblock: 1\r\n[2023-08-28 21:04:42.728965 INFO ] utils:print_arguments:138 - resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]\r\n[2023-08-28 21:04:42.728997 INFO ] utils:print_arguments:138 - resblock_kernel_sizes: [3, 7, 11]\r\n[2023-08-28 21:04:42.729027 INFO ] utils:print_arguments:138 - upsample_initial_channel: 512\r\n[2023-08-28 21:04:42.729058 INFO ] utils:print_arguments:138 - upsample_kernel_sizes: [16, 16, 4, 4]\r\n[2023-08-28 21:04:42.729089 INFO ] utils:print_arguments:138 - upsample_rates: [8, 8, 2, 2]\r\n[2023-08-28 21:04:42.729119 INFO ] utils:print_arguments:138 - use_spectral_norm: False\r\n[2023-08-28 21:04:42.729150 INFO ] utils:print_arguments:131 - optimizer_conf:\r\n[2023-08-28 21:04:42.729184 INFO ] utils:print_arguments:138 - betas: [0.8, 0.99]\r\n[2023-08-28 21:04:42.729217 INFO ] utils:print_arguments:138 - eps: 1e-09\r\n[2023-08-28 21:04:42.729280 INFO ] utils:print_arguments:138 - optimizer: AdamW\r\n[2023-08-28 21:04:42.729311 INFO ] utils:print_arguments:138 - scheduler: ExponentialLR\r\n[2023-08-28 21:04:42.729341 INFO ] utils:print_arguments:134 - scheduler_args:\r\n[2023-08-28 21:04:42.729373 INFO ] utils:print_arguments:136 - gamma: 0.999875\r\n[2023-08-28 21:04:42.729249 INFO ] utils:print_arguments:138 - learning_rate: 0.0002\r\n[2023-08-28 21:04:42.729404 INFO ] utils:print_arguments:131 - train_conf:\r\n[2023-08-28 21:04:42.729437 INFO ] utils:print_arguments:138 - c_kl: 1.0\r\n[2023-08-28 21:04:42.729467 INFO ] utils:print_arguments:138 - c_mel: 45\r\n[2023-08-28 21:04:42.729498 INFO ] utils:print_arguments:138 - enable_amp: True\r\n[2023-08-28 21:04:42.729530 INFO ] utils:print_arguments:138 - log_interval: 200\r\n[2023-08-28 21:04:42.729561 INFO ] utils:print_arguments:138 - seed: 1234\r\n[2023-08-28 21:04:42.729592 INFO ] utils:print_arguments:138 - segment_size: 8192\r\n[2023-08-28 21:04:42.729622 INFO ] utils:print_arguments:141 - ------------------------------------------------\r\n[2023-08-28 21:04:42.729971 INFO ] trainer:__init__:53 - [cjke_cleaners2]\u652f\u6301\u8bed\u8a00\uff1a['\u65e5\u672c\u8a9e', '\u666e\u901a\u8bdd', 'English', '\ud55c\uad6d\uc5b4', \"Mix\": \"\"]\r\n[2023-08-28 21:04:42.795955 INFO ] trainer:__setup_dataloader:119 - \u8bad\u7ec3\u6570\u636e\uff1a9984\r\nepoch [1/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:30<00:00, 1.88it/s]]\r\n[2023-08-25 16:44:25.205557 INFO ] trainer:train:168 - ======================================================================\r\nepoch [2/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:20<00:00, 1.93it/s]s]\r\n[2023-08-25 16:49:54.372718 INFO ] trainer:train:168 - ======================================================================\r\nepoch [3/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:19<00:00, 1.94it/s]\r\n[2023-08-25 16:55:21.277194 INFO ] trainer:train:168 - ======================================================================\r\nepoch [4/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:18<00:00, 1.94it/s]\r\n```\r\n\r\n\u8bad\u7ec3\u7684\u65e5\u5fd7\u4e5f\u4f1a\u4f7f\u7528VisualDL\u4fdd\u5b58\uff0c\u53ef\u4ee5\u4f7f\u7528\u8fd9\u4e2a\u5de5\u5177\u5b9e\u65f6\u67e5\u770bloss\u53d8\u5316\u548c\u5408\u6210\u6548\u679c\uff0c\u53ea\u8981\u5728\u9879\u76ee\u6839\u76ee\u5f55\u6267\u884c`visualdl --logdir=log/ --host=0.0.0.0`\uff0c\u8bbf\u95ee`http://<IP\u5730\u5740>:8040`\u5373\u53ef\u6253\u5f00\u9875\u9762\uff0c\u6548\u679c\u5982\u4e0b\u3002\r\n\r\n<div align=\"center\">\r\n <img src=\"./docs/images/log.jpg\" alt=\"VisualDL\" width=\"600\">\r\n</div>\r\n\r\n# \u8bed\u97f3\u5408\u6210\r\n\r\n\u8bad\u7ec3\u5230\u4e00\u5b9a\u7a0b\u5ea6\u4e4b\u540e\uff0c\u53ef\u4ee5\u5f00\u59cb\u4f7f\u7528\u6a21\u578b\u8fdb\u884c\u8bed\u97f3\u5408\u6210\u4e86\uff0c\u547d\u4ee4\u5982\u4e0b\uff0c\u4e3b\u8981\u53c2\u6570\u6709\u4e09\u4e2a\uff0c\u5206\u522b\u662f`--text`\u6307\u5b9a\u6240\u9700\u8981\u5408\u6210\u7684\u6587\u672c\u3002`--language`\u6307\u5b9a\u5408\u6210\u6587\u672c\u7684\u8bed\u8a00\uff0c\u5982\u679c\u8bed\u8a00\u6307\u5b9a\u4e3a`Mix`\uff0c\u4e3a\u6df7\u5408\u6a21\u5f0f\uff0c\u9700\u8981\u7528\u6237\u624b\u52a8\u5bf9\u6536\u5165\u6587\u672c\u7528\u8bed\u8a00\u6807\u7b7e\u5305\u88f9\u3002\u6700\u540e\u662f\u6307\u5b9a\u8bf4\u8bdd\u4eba\u7684\u53c2\u6570`--spk`\u3002\u5feb\u53bb\u8bd5\u4e00\u4e0b\u5427\u3002\r\n\r\n```shell\r\npython infer.py --text=\"\u4f60\u597d\uff0c\u6211\u662f\u667a\u80fd\u8bed\u97f3\u52a9\u624b\u3002\" --language=\u7b80\u4f53\u4e2d\u6587 --spk=\u6807\u51c6\u5973\u58f0\r\n```\r\n\r\n## \u6253\u8d4f\u4f5c\u8005\r\n\r\n<br/>\r\n<div align=\"center\">\r\n <p>\u6253\u8d4f\u4e00\u5757\u94b1\u652f\u6301\u4e00\u4e0b\u4f5c\u8005</p>\r\n <img src=\"https://yeyupiaoling.cn/reward.png\" alt=\"\u6253\u8d4f\u4f5c\u8005\" width=\"400\">\r\n</div>\r\n\r\n# \u53c2\u8003\u8d44\u6599\r\n\r\n1. https://github.com/jaywalnut310/vits\r\n2. https://github.com/PaddlePaddle/PaddleSpeech\r\n3. https://github.com/yeyupiaoling/PPASR\r\n4. https://github.com/Artrajz/vits-simple-api\r\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "VITS toolkit on PaddlePaddle",
"version": "0.0.1",
"project_urls": {
"Download": "https://github.com/yeyupiaoling/VITS-PaddlePaddle.git",
"Homepage": "https://github.com/yeyupiaoling/VITS-PaddlePaddle"
},
"split_keywords": [
"audio",
"paddle",
"tts"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "282ee8ba1a2417e9f7e8442425a234852a9c1f92cf98a4c724fec638e648b41d",
"md5": "e16f6b0033a411271eb1dce015ca6b57",
"sha256": "b0d25941647cff743d80502050c3e93d7ec60061e99d727dd496cb88ce7ee8a1"
},
"downloads": -1,
"filename": "ppvits-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e16f6b0033a411271eb1dce015ca6b57",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 4195214,
"upload_time": "2023-09-10T14:50:22",
"upload_time_iso_8601": "2023-09-10T14:50:22.629816Z",
"url": "https://files.pythonhosted.org/packages/28/2e/e8ba1a2417e9f7e8442425a234852a9c1f92cf98a4c724fec638e648b41d/ppvits-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-10 14:50:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yeyupiaoling",
"github_project": "VITS-PaddlePaddle",
"github_not_found": true,
"lcname": "ppvits"
}