mvits


Namemvits JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/yeyupiaoling/VITS-Pytorch
SummaryVITS toolkit on Pytorch
upload_time2023-09-01 01:37:55
maintainer
docs_urlNone
authoryeyupiaoling
requires_python
licenseApache License 2.0
keywords audio pytorch tts
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            简体中文 | [English]()

# 基于Pytorch实现的语音合成系统

![python version](https://img.shields.io/badge/python-3.8+-orange.svg)
![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/VITS-Pytorch)
![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/VITS-Pytorch)
![GitHub](https://img.shields.io/github/license/yeyupiaoling/VITS-Pytorch)
![支持系统](https://img.shields.io/badge/支持系统-Win/Linux/MAC-9cf)

# 前言

本项目是基于Pytorch的语音合成项目,使用的是VITS,VITS(Variational Inference with adversarial learning for end-to-end Text-to-Speech)是一种语音合成方法,这种时端到端的模型使用起来非常简单,不需要文本对齐等太复杂的流程,直接一键训练和生成,大大降低了学习门槛。

**欢迎大家扫码入知识星球或者QQ群讨论,知识星球里面提供项目的模型文件和博主其他相关项目的模型文件,也包括其他一些资源。**

<div align="center">
  <img src="https://yeyupiaoling.cn/zsxq.png" alt="知识星球" width="400">
  <img src="https://yeyupiaoling.cn/qq.png" alt="QQ群" width="400">
</div>

# 使用准备

- Anaconda 3
- Python 3.8
- Pytorch 1.13.1
- Windows 10 or Ubuntu 18.04


# 模型下载

|                           数据集                            | 语言(方言) | 说话人数量 |        说话人名称        |                               下载地址                               | 
|:--------------------------------------------------------:|:------:|:-----:|:-------------------:|:----------------------------------------------------------------:|
| [BZNSYP](https://aistudio.baidu.com/datasetdetail/36741) |  普通话   |   1   |        标准女声         | [点击下载](https://pan.baidu.com/s/1l-Sz6017Ie6hsk5dcKyMCg?pwd=x1pw) |
|                          粤语数据集                           |   粤语   |  10   | 男声1<br/>女生1<br/>··· | [点击下载](https://pan.baidu.com/s/1l-Sz6017Ie6hsk5dcKyMCg?pwd=x1pw) |

## 安装环境

- 首先安装的是Pytorch的GPU版本,如果已经安装过了,请跳过。

```shell
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
```

- 安装mvits库。

使用pip安装,命令如下:

```shell
python -m pip install mvits -U -i https://pypi.tuna.tsinghua.edu.cn/simple
```

**建议源码安装**,源码安装能保证使用最新代码。

```shell
git clone https://github.com/yeyupiaoling/VITS-Pytorch.git
cd VITS-Pytorch/
pip install .
```

## 准备数据

项目支持直接生成[BZNSYP](https://aistudio.baidu.com/datasetdetail/36741)和[AiShell3](https://aistudio.baidu.com/datasetdetail/207703)数据列表,以BZNSYP为例,将BZNSYP下载到`dataset`目录下,并解压。然后执行`create_list.py`程序就会生成以下格式的数据表,格式为`<音频路径>|<说话人名称>|<标注数据>`,注意标注数据需要标注语言,例如普通话,就要用`[ZH]`将文本包裹起来,其他语言分别支持日本語:`[JA]`, English:[EN], 한국어:[KO]。自定义数据集按照这个格式生成就行。

项目提供两种文本处理方式,不同的文本处理方式,支持不同的语言,分别是`cjke_cleaners2`和`chinese_dialect_cleaners`,这个配置在`dataset_conf.text_cleaner`上修改。`cjke_cleaners2`支持语言`{"普通话": "[ZH]", "日本語": "[JA]", "English": "[EN]", "한국어": "[KO]"}`,`chinese_dialect_cleaners`支持语言`{"普通话": "[ZH]", "日本語": "[JA]", "English": "[EN]", "粤语": "[GD]", "上海话": "[SH]", "苏州话": "[SZ]", "无锡话": "[WX]", "常州话": "[CZ]", "杭州话": "[HZ]", ·····}`,更多的语言可以查看源码[LANGUAGE_MARKS](./mvits/__init__.py)。
```
dataset/BZNSYP/Wave/000001.wav|标准女声|[ZH]卡尔普陪外孙玩滑梯。[ZH]
dataset/BZNSYP/Wave/000002.wav|标准女声|[ZH]假语村言别再拥抱我。[ZH]
dataset/BZNSYP/Wave/000003.wav|标准女声|[ZH]宝马配挂跛骡鞍,貂蝉怨枕董翁榻。[ZH]
```

有了数据列表之后,需要生成音素数据列表,只要执行`preprocess_data.py --train_data_list=dataset/bznsyp.txt`,即可生成音素数据列表。到这一步数据就全部准备好了。
```
dataset/BZNSYP/Wave/000001.wav|0|kʰa↓↑əɹ`↓↑pʰu↓↑ pʰeɪ↑ waɪ↓swən→ wan↑ xwa↑tʰi→.
dataset/BZNSYP/Wave/000002.wav|0|tʃ⁼ja↓↑ɥ↓↑ tsʰwən→jɛn↑p⁼iɛ↑ ts⁼aɪ↓ jʊŋ→p⁼ɑʊ↓ wo↓↑.
dataset/BZNSYP/Wave/000003.wav|0|p⁼ɑʊ↓↑ma↓↑ pʰeɪ↓k⁼wa↓ p⁼wo↓↑ lwo↑an→, t⁼iɑʊ→ts`ʰan↑ ɥæn↓ ts`⁼ən↓↑ t⁼ʊŋ↓↑ʊŋ→ tʰa↓.
```


## 训练

现在就可以开始训练模型了,配置文件里面的参数一般不需要修改,说话人数量和说话人名称都会在执行`preprocess_data.py`修改过。可能需要修改的只有`train.batch_size`,如果是显存不够的话,可以减小这个参数。

```shell
# 单卡训练
CUDA_VISIBLE_DEVICES=0 python train.py
# 多卡训练
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py
```

训练输出日志:

```
[2023-08-28 17:04:42.274452 INFO   ] utils:print_arguments:123 - ----------- 额外配置参数 -----------
[2023-08-28 17:04:42.274540 INFO   ] utils:print_arguments:125 - config: configs/config.yml
[2023-08-28 17:04:42.274580 INFO   ] utils:print_arguments:125 - epochs: 10000
[2023-08-28 17:04:42.274658 INFO   ] utils:print_arguments:125 - model_dir: models
[2023-08-28 17:04:42.274702 INFO   ] utils:print_arguments:125 - pretrained_model: None
[2023-08-28 17:04:42.274746 INFO   ] utils:print_arguments:125 - resume_model: None
[2023-08-28 17:04:42.274788 INFO   ] utils:print_arguments:126 - ------------------------------------------------
[2023-08-28 17:04:42.727728 INFO   ] utils:print_arguments:128 - ----------- 配置文件参数 -----------
[2023-08-28 17:04:42.727836 INFO   ] utils:print_arguments:131 - dataset_conf:
[2023-08-28 17:04:42.727909 INFO   ] utils:print_arguments:138 -        add_blank: True
[2023-08-28 17:04:42.727975 INFO   ] utils:print_arguments:138 -        batch_size: 16
[2023-08-28 17:04:42.728037 INFO   ] utils:print_arguments:138 -        cleaned_text: True
[2023-08-28 17:04:42.728097 INFO   ] utils:print_arguments:138 -        eval_sum: 2
[2023-08-28 17:04:42.728157 INFO   ] utils:print_arguments:138 -        filter_length: 1024
[2023-08-28 17:04:42.728204 INFO   ] utils:print_arguments:138 -        hop_length: 256
[2023-08-28 17:04:42.728235 INFO   ] utils:print_arguments:138 -        max_wav_value: 32768.0
[2023-08-28 17:04:42.728266 INFO   ] utils:print_arguments:138 -        mel_fmax: None
[2023-08-28 17:04:42.728298 INFO   ] utils:print_arguments:138 -        mel_fmin: 0.0
[2023-08-28 17:04:42.728328 INFO   ] utils:print_arguments:138 -        n_mel_channels: 80
[2023-08-28 17:04:42.728359 INFO   ] utils:print_arguments:138 -        num_workers: 4
[2023-08-28 17:04:42.728388 INFO   ] utils:print_arguments:138 -        sampling_rate: 22050
[2023-08-28 17:04:42.728418 INFO   ] utils:print_arguments:138 -        speakers_file: dataset/speakers.json
[2023-08-28 17:04:42.728448 INFO   ] utils:print_arguments:138 -        text_cleaner: cjke_cleaners2
[2023-08-28 17:04:42.728483 INFO   ] utils:print_arguments:138 -        training_file: dataset/train.txt
[2023-08-28 17:04:42.728539 INFO   ] utils:print_arguments:138 -        validation_file: dataset/val.txt
[2023-08-28 17:04:42.728585 INFO   ] utils:print_arguments:138 -        win_length: 1024
[2023-08-28 17:04:42.728615 INFO   ] utils:print_arguments:131 - model:
[2023-08-28 17:04:42.728648 INFO   ] utils:print_arguments:138 -        filter_channels: 768
[2023-08-28 17:04:42.728685 INFO   ] utils:print_arguments:138 -        gin_channels: 256
[2023-08-28 17:04:42.728717 INFO   ] utils:print_arguments:138 -        hidden_channels: 192
[2023-08-28 17:04:42.728747 INFO   ] utils:print_arguments:138 -        inter_channels: 192
[2023-08-28 17:04:42.728777 INFO   ] utils:print_arguments:138 -        kernel_size: 3
[2023-08-28 17:04:42.728808 INFO   ] utils:print_arguments:138 -        n_heads: 2
[2023-08-28 17:04:42.728839 INFO   ] utils:print_arguments:138 -        n_layers: 6
[2023-08-28 17:04:42.728870 INFO   ] utils:print_arguments:138 -        n_layers_q: 3
[2023-08-28 17:04:42.728902 INFO   ] utils:print_arguments:138 -        p_dropout: 0.1
[2023-08-28 17:04:42.728933 INFO   ] utils:print_arguments:138 -        resblock: 1
[2023-08-28 17:04:42.728965 INFO   ] utils:print_arguments:138 -        resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
[2023-08-28 17:04:42.728997 INFO   ] utils:print_arguments:138 -        resblock_kernel_sizes: [3, 7, 11]
[2023-08-28 17:04:42.729027 INFO   ] utils:print_arguments:138 -        upsample_initial_channel: 512
[2023-08-28 17:04:42.729058 INFO   ] utils:print_arguments:138 -        upsample_kernel_sizes: [16, 16, 4, 4]
[2023-08-28 17:04:42.729089 INFO   ] utils:print_arguments:138 -        upsample_rates: [8, 8, 2, 2]
[2023-08-28 17:04:42.729119 INFO   ] utils:print_arguments:138 -        use_spectral_norm: False
[2023-08-28 17:04:42.729150 INFO   ] utils:print_arguments:131 - optimizer_conf:
[2023-08-28 17:04:42.729184 INFO   ] utils:print_arguments:138 -        betas: [0.8, 0.99]
[2023-08-28 17:04:42.729217 INFO   ] utils:print_arguments:138 -        eps: 1e-09
[2023-08-28 17:04:42.729249 INFO   ] utils:print_arguments:138 -        learning_rate: 0.0002
[2023-08-28 17:04:42.729280 INFO   ] utils:print_arguments:138 -        optimizer: AdamW
[2023-08-28 17:04:42.729311 INFO   ] utils:print_arguments:138 -        scheduler: ExponentialLR
[2023-08-28 17:04:42.729341 INFO   ] utils:print_arguments:134 -        scheduler_args:
[2023-08-28 17:04:42.729373 INFO   ] utils:print_arguments:136 -                gamma: 0.999875
[2023-08-28 17:04:42.729404 INFO   ] utils:print_arguments:131 - train_conf:
[2023-08-28 17:04:42.729437 INFO   ] utils:print_arguments:138 -        c_kl: 1.0
[2023-08-28 17:04:42.729467 INFO   ] utils:print_arguments:138 -        c_mel: 45
[2023-08-28 17:04:42.729498 INFO   ] utils:print_arguments:138 -        enable_amp: True
[2023-08-28 17:04:42.729530 INFO   ] utils:print_arguments:138 -        log_interval: 200
[2023-08-28 17:04:42.729561 INFO   ] utils:print_arguments:138 -        seed: 1234
[2023-08-28 17:04:42.729592 INFO   ] utils:print_arguments:138 -        segment_size: 8192
[2023-08-28 17:04:42.729622 INFO   ] utils:print_arguments:141 - ------------------------------------------------
[2023-08-28 17:04:42.729971 INFO   ] trainer:__init__:53 - [cjke_cleaners2]支持语言:['日本語', '普通话', 'English', '한국어', "Mix": ""]
[2023-08-28 17:04:42.795955 INFO   ] trainer:__setup_dataloader:119 - 训练数据:9984
epoch [1/10000]: 100%|██████████| 619/619 [05:30<00:00,  1.88it/s]]
[2023-08-25 16:44:25.205557 INFO   ] trainer:train:168 - ======================================================================
epoch [2/10000]: 100%|██████████| 619/619 [05:20<00:00,  1.93it/s]s]
[2023-08-25 16:49:54.372718 INFO   ] trainer:train:168 - ======================================================================
epoch [3/10000]: 100%|██████████| 619/619 [05:19<00:00,  1.94it/s]
[2023-08-25 16:55:21.277194 INFO   ] trainer:train:168 - ======================================================================
epoch [4/10000]: 100%|██████████| 619/619 [05:18<00:00,  1.94it/s]
```

训练的日志也会使用VisualDL保存,可以使用这个工具实时查看loss变化和合成效果,只要在项目根目录执行`visualdl --logdir=log/ --host=0.0.0.0`,访问`http://<IP地址>:8040`即可打开页面,效果如下。

<div align="center">
    <img src="./docs/images/log.jpg" alt="VisualDL" width="600">
</div>

# 语音合成

训练到一定程度之后,可以开始使用模型进行语音合成了,命令如下,主要参数有三个,分别是`--text`指定所需要合成的文本。`--language`指定合成文本的语言,如果语言指定为`Mix`,为混合模式,需要用户手动对收入文本用语言标签包裹。最后是指定说话人的参数`--spk`。快去试一下吧。

```shell
python infer.py --text="你好,我是智能语音助手。" --language=简体中文 --spk=标准女声
```

## 打赏作者

<br/>
<div align="center">
    <p>打赏一块钱支持一下作者</p>
    <img src="https://yeyupiaoling.cn/reward.png" alt="打赏作者" width="400">
</div>

# 参考资料

1. https://github.com/Plachtaa/VITS-fast-fine-tuning
2. https://github.com/PaddlePaddle/PaddleSpeech
3. https://github.com/yeyupiaoling/MASR
4. https://github.com/Artrajz/vits-simple-api



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yeyupiaoling/VITS-Pytorch",
    "name": "mvits",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "audio,pytorch,tts",
    "author": "yeyupiaoling",
    "author_email": "",
    "download_url": "https://github.com/yeyupiaoling/VITS-Pytorch.git",
    "platform": null,
    "description": "\u7b80\u4f53\u4e2d\u6587 | [English]()\r\n\r\n# \u57fa\u4e8ePytorch\u5b9e\u73b0\u7684\u8bed\u97f3\u5408\u6210\u7cfb\u7edf\r\n\r\n![python version](https://img.shields.io/badge/python-3.8+-orange.svg)\r\n![GitHub forks](https://img.shields.io/github/forks/yeyupiaoling/VITS-Pytorch)\r\n![GitHub Repo stars](https://img.shields.io/github/stars/yeyupiaoling/VITS-Pytorch)\r\n![GitHub](https://img.shields.io/github/license/yeyupiaoling/VITS-Pytorch)\r\n![\u652f\u6301\u7cfb\u7edf](https://img.shields.io/badge/\u652f\u6301\u7cfb\u7edf-Win/Linux/MAC-9cf)\r\n\r\n# \u524d\u8a00\r\n\r\n\u672c\u9879\u76ee\u662f\u57fa\u4e8ePytorch\u7684\u8bed\u97f3\u5408\u6210\u9879\u76ee\uff0c\u4f7f\u7528\u7684\u662fVITS\uff0cVITS\uff08Variational Inference with adversarial learning for end-to-end Text-to-Speech\uff09\u662f\u4e00\u79cd\u8bed\u97f3\u5408\u6210\u65b9\u6cd5\uff0c\u8fd9\u79cd\u65f6\u7aef\u5230\u7aef\u7684\u6a21\u578b\u4f7f\u7528\u8d77\u6765\u975e\u5e38\u7b80\u5355\uff0c\u4e0d\u9700\u8981\u6587\u672c\u5bf9\u9f50\u7b49\u592a\u590d\u6742\u7684\u6d41\u7a0b\uff0c\u76f4\u63a5\u4e00\u952e\u8bad\u7ec3\u548c\u751f\u6210\uff0c\u5927\u5927\u964d\u4f4e\u4e86\u5b66\u4e60\u95e8\u69db\u3002\r\n\r\n**\u6b22\u8fce\u5927\u5bb6\u626b\u7801\u5165\u77e5\u8bc6\u661f\u7403\u6216\u8005QQ\u7fa4\u8ba8\u8bba\uff0c\u77e5\u8bc6\u661f\u7403\u91cc\u9762\u63d0\u4f9b\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\u548c\u535a\u4e3b\u5176\u4ed6\u76f8\u5173\u9879\u76ee\u7684\u6a21\u578b\u6587\u4ef6\uff0c\u4e5f\u5305\u62ec\u5176\u4ed6\u4e00\u4e9b\u8d44\u6e90\u3002**\r\n\r\n<div align=\"center\">\r\n  <img src=\"https://yeyupiaoling.cn/zsxq.png\" alt=\"\u77e5\u8bc6\u661f\u7403\" width=\"400\">\r\n  <img src=\"https://yeyupiaoling.cn/qq.png\" alt=\"QQ\u7fa4\" width=\"400\">\r\n</div>\r\n\r\n# \u4f7f\u7528\u51c6\u5907\r\n\r\n- Anaconda 3\r\n- Python 3.8\r\n- Pytorch 1.13.1\r\n- Windows 10 or Ubuntu 18.04\r\n\r\n\r\n# \u6a21\u578b\u4e0b\u8f7d\r\n\r\n|                           \u6570\u636e\u96c6                            | \u8bed\u8a00\uff08\u65b9\u8a00\uff09 | \u8bf4\u8bdd\u4eba\u6570\u91cf |        \u8bf4\u8bdd\u4eba\u540d\u79f0        |                               \u4e0b\u8f7d\u5730\u5740                               | \r\n|:--------------------------------------------------------:|:------:|:-----:|:-------------------:|:----------------------------------------------------------------:|\r\n| [BZNSYP](https://aistudio.baidu.com/datasetdetail/36741) |  \u666e\u901a\u8bdd   |   1   |        \u6807\u51c6\u5973\u58f0         | [\u70b9\u51fb\u4e0b\u8f7d](https://pan.baidu.com/s/1l-Sz6017Ie6hsk5dcKyMCg?pwd=x1pw) |\r\n|                          \u7ca4\u8bed\u6570\u636e\u96c6                           |   \u7ca4\u8bed   |  10   | \u7537\u58f01<br/>\u5973\u751f1<br/>\u00b7\u00b7\u00b7 | [\u70b9\u51fb\u4e0b\u8f7d](https://pan.baidu.com/s/1l-Sz6017Ie6hsk5dcKyMCg?pwd=x1pw) |\r\n\r\n## \u5b89\u88c5\u73af\u5883\r\n\r\n- \u9996\u5148\u5b89\u88c5\u7684\u662fPytorch\u7684GPU\u7248\u672c\uff0c\u5982\u679c\u5df2\u7ecf\u5b89\u88c5\u8fc7\u4e86\uff0c\u8bf7\u8df3\u8fc7\u3002\r\n\r\n```shell\r\nconda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia\r\n```\r\n\r\n- \u5b89\u88c5mvits\u5e93\u3002\r\n\r\n\u4f7f\u7528pip\u5b89\u88c5\uff0c\u547d\u4ee4\u5982\u4e0b\uff1a\r\n\r\n```shell\r\npython -m pip install mvits -U -i https://pypi.tuna.tsinghua.edu.cn/simple\r\n```\r\n\r\n**\u5efa\u8bae\u6e90\u7801\u5b89\u88c5**\uff0c\u6e90\u7801\u5b89\u88c5\u80fd\u4fdd\u8bc1\u4f7f\u7528\u6700\u65b0\u4ee3\u7801\u3002\r\n\r\n```shell\r\ngit clone https://github.com/yeyupiaoling/VITS-Pytorch.git\r\ncd VITS-Pytorch/\r\npip install .\r\n```\r\n\r\n## \u51c6\u5907\u6570\u636e\r\n\r\n\u9879\u76ee\u652f\u6301\u76f4\u63a5\u751f\u6210[BZNSYP](https://aistudio.baidu.com/datasetdetail/36741)\u548c[AiShell3](https://aistudio.baidu.com/datasetdetail/207703)\u6570\u636e\u5217\u8868\uff0c\u4ee5BZNSYP\u4e3a\u4f8b\uff0c\u5c06BZNSYP\u4e0b\u8f7d\u5230`dataset`\u76ee\u5f55\u4e0b\uff0c\u5e76\u89e3\u538b\u3002\u7136\u540e\u6267\u884c`create_list.py`\u7a0b\u5e8f\u5c31\u4f1a\u751f\u6210\u4ee5\u4e0b\u683c\u5f0f\u7684\u6570\u636e\u8868\uff0c\u683c\u5f0f\u4e3a`<\u97f3\u9891\u8def\u5f84>|<\u8bf4\u8bdd\u4eba\u540d\u79f0>|<\u6807\u6ce8\u6570\u636e>`\uff0c\u6ce8\u610f\u6807\u6ce8\u6570\u636e\u9700\u8981\u6807\u6ce8\u8bed\u8a00\uff0c\u4f8b\u5982\u666e\u901a\u8bdd\uff0c\u5c31\u8981\u7528`[ZH]`\u5c06\u6587\u672c\u5305\u88f9\u8d77\u6765\uff0c\u5176\u4ed6\u8bed\u8a00\u5206\u522b\u652f\u6301\u65e5\u672c\u8a9e:`[JA]`, English:[EN], \ud55c\uad6d\uc5b4:[KO]\u3002\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\u6309\u7167\u8fd9\u4e2a\u683c\u5f0f\u751f\u6210\u5c31\u884c\u3002\r\n\r\n\u9879\u76ee\u63d0\u4f9b\u4e24\u79cd\u6587\u672c\u5904\u7406\u65b9\u5f0f\uff0c\u4e0d\u540c\u7684\u6587\u672c\u5904\u7406\u65b9\u5f0f\uff0c\u652f\u6301\u4e0d\u540c\u7684\u8bed\u8a00\uff0c\u5206\u522b\u662f`cjke_cleaners2`\u548c`chinese_dialect_cleaners`\uff0c\u8fd9\u4e2a\u914d\u7f6e\u5728`dataset_conf.text_cleaner`\u4e0a\u4fee\u6539\u3002`cjke_cleaners2`\u652f\u6301\u8bed\u8a00`{\"\u666e\u901a\u8bdd\": \"[ZH]\", \"\u65e5\u672c\u8a9e\": \"[JA]\", \"English\": \"[EN]\", \"\ud55c\uad6d\uc5b4\": \"[KO]\"}`\uff0c`chinese_dialect_cleaners`\u652f\u6301\u8bed\u8a00`{\"\u666e\u901a\u8bdd\": \"[ZH]\", \"\u65e5\u672c\u8a9e\": \"[JA]\", \"English\": \"[EN]\", \"\u7ca4\u8bed\": \"[GD]\", \"\u4e0a\u6d77\u8bdd\": \"[SH]\", \"\u82cf\u5dde\u8bdd\": \"[SZ]\", \"\u65e0\u9521\u8bdd\": \"[WX]\", \"\u5e38\u5dde\u8bdd\": \"[CZ]\", \"\u676d\u5dde\u8bdd\": \"[HZ]\", \u00b7\u00b7\u00b7\u00b7\u00b7}`\uff0c\u66f4\u591a\u7684\u8bed\u8a00\u53ef\u4ee5\u67e5\u770b\u6e90\u7801[LANGUAGE_MARKS](./mvits/__init__.py)\u3002\r\n```\r\ndataset/BZNSYP/Wave/000001.wav|\u6807\u51c6\u5973\u58f0|[ZH]\u5361\u5c14\u666e\u966a\u5916\u5b59\u73a9\u6ed1\u68af\u3002[ZH]\r\ndataset/BZNSYP/Wave/000002.wav|\u6807\u51c6\u5973\u58f0|[ZH]\u5047\u8bed\u6751\u8a00\u522b\u518d\u62e5\u62b1\u6211\u3002[ZH]\r\ndataset/BZNSYP/Wave/000003.wav|\u6807\u51c6\u5973\u58f0|[ZH]\u5b9d\u9a6c\u914d\u6302\u8ddb\u9aa1\u978d\uff0c\u8c82\u8749\u6028\u6795\u8463\u7fc1\u69bb\u3002[ZH]\r\n```\r\n\r\n\u6709\u4e86\u6570\u636e\u5217\u8868\u4e4b\u540e\uff0c\u9700\u8981\u751f\u6210\u97f3\u7d20\u6570\u636e\u5217\u8868\uff0c\u53ea\u8981\u6267\u884c`preprocess_data.py --train_data_list=dataset/bznsyp.txt`\uff0c\u5373\u53ef\u751f\u6210\u97f3\u7d20\u6570\u636e\u5217\u8868\u3002\u5230\u8fd9\u4e00\u6b65\u6570\u636e\u5c31\u5168\u90e8\u51c6\u5907\u597d\u4e86\u3002\r\n```\r\ndataset/BZNSYP/Wave/000001.wav|0|k\u02b0a\u2193\u2191\u0259\u0279`\u2193\u2191p\u02b0u\u2193\u2191 p\u02b0e\u026a\u2191 wa\u026a\u2193sw\u0259n\u2192 wan\u2191 xwa\u2191t\u02b0i\u2192.\r\ndataset/BZNSYP/Wave/000002.wav|0|t\u0283\u207cja\u2193\u2191\u0265\u2193\u2191 ts\u02b0w\u0259n\u2192j\u025bn\u2191p\u207ci\u025b\u2191 ts\u207ca\u026a\u2193 j\u028a\u014b\u2192p\u207c\u0251\u028a\u2193 wo\u2193\u2191.\r\ndataset/BZNSYP/Wave/000003.wav|0|p\u207c\u0251\u028a\u2193\u2191ma\u2193\u2191 p\u02b0e\u026a\u2193k\u207cwa\u2193 p\u207cwo\u2193\u2191 lwo\u2191an\u2192, t\u207ci\u0251\u028a\u2192ts`\u02b0an\u2191 \u0265\u00e6n\u2193 ts`\u207c\u0259n\u2193\u2191 t\u207c\u028a\u014b\u2193\u2191\u028a\u014b\u2192 t\u02b0a\u2193.\r\n```\r\n\r\n\r\n## \u8bad\u7ec3\r\n\r\n\u73b0\u5728\u5c31\u53ef\u4ee5\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u4e86\uff0c\u914d\u7f6e\u6587\u4ef6\u91cc\u9762\u7684\u53c2\u6570\u4e00\u822c\u4e0d\u9700\u8981\u4fee\u6539\uff0c\u8bf4\u8bdd\u4eba\u6570\u91cf\u548c\u8bf4\u8bdd\u4eba\u540d\u79f0\u90fd\u4f1a\u5728\u6267\u884c`preprocess_data.py`\u4fee\u6539\u8fc7\u3002\u53ef\u80fd\u9700\u8981\u4fee\u6539\u7684\u53ea\u6709`train.batch_size`\uff0c\u5982\u679c\u662f\u663e\u5b58\u4e0d\u591f\u7684\u8bdd\uff0c\u53ef\u4ee5\u51cf\u5c0f\u8fd9\u4e2a\u53c2\u6570\u3002\r\n\r\n```shell\r\n# \u5355\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0 python train.py\r\n# \u591a\u5361\u8bad\u7ec3\r\nCUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py\r\n```\r\n\r\n\u8bad\u7ec3\u8f93\u51fa\u65e5\u5fd7\uff1a\r\n\r\n```\r\n[2023-08-28 17:04:42.274452 INFO   ] utils:print_arguments:123 - ----------- \u989d\u5916\u914d\u7f6e\u53c2\u6570 -----------\r\n[2023-08-28 17:04:42.274540 INFO   ] utils:print_arguments:125 - config: configs/config.yml\r\n[2023-08-28 17:04:42.274580 INFO   ] utils:print_arguments:125 - epochs: 10000\r\n[2023-08-28 17:04:42.274658 INFO   ] utils:print_arguments:125 - model_dir: models\r\n[2023-08-28 17:04:42.274702 INFO   ] utils:print_arguments:125 - pretrained_model: None\r\n[2023-08-28 17:04:42.274746 INFO   ] utils:print_arguments:125 - resume_model: None\r\n[2023-08-28 17:04:42.274788 INFO   ] utils:print_arguments:126 - ------------------------------------------------\r\n[2023-08-28 17:04:42.727728 INFO   ] utils:print_arguments:128 - ----------- \u914d\u7f6e\u6587\u4ef6\u53c2\u6570 -----------\r\n[2023-08-28 17:04:42.727836 INFO   ] utils:print_arguments:131 - dataset_conf:\r\n[2023-08-28 17:04:42.727909 INFO   ] utils:print_arguments:138 -        add_blank: True\r\n[2023-08-28 17:04:42.727975 INFO   ] utils:print_arguments:138 -        batch_size: 16\r\n[2023-08-28 17:04:42.728037 INFO   ] utils:print_arguments:138 -        cleaned_text: True\r\n[2023-08-28 17:04:42.728097 INFO   ] utils:print_arguments:138 -        eval_sum: 2\r\n[2023-08-28 17:04:42.728157 INFO   ] utils:print_arguments:138 -        filter_length: 1024\r\n[2023-08-28 17:04:42.728204 INFO   ] utils:print_arguments:138 -        hop_length: 256\r\n[2023-08-28 17:04:42.728235 INFO   ] utils:print_arguments:138 -        max_wav_value: 32768.0\r\n[2023-08-28 17:04:42.728266 INFO   ] utils:print_arguments:138 -        mel_fmax: None\r\n[2023-08-28 17:04:42.728298 INFO   ] utils:print_arguments:138 -        mel_fmin: 0.0\r\n[2023-08-28 17:04:42.728328 INFO   ] utils:print_arguments:138 -        n_mel_channels: 80\r\n[2023-08-28 17:04:42.728359 INFO   ] utils:print_arguments:138 -        num_workers: 4\r\n[2023-08-28 17:04:42.728388 INFO   ] utils:print_arguments:138 -        sampling_rate: 22050\r\n[2023-08-28 17:04:42.728418 INFO   ] utils:print_arguments:138 -        speakers_file: dataset/speakers.json\r\n[2023-08-28 17:04:42.728448 INFO   ] utils:print_arguments:138 -        text_cleaner: cjke_cleaners2\r\n[2023-08-28 17:04:42.728483 INFO   ] utils:print_arguments:138 -        training_file: dataset/train.txt\r\n[2023-08-28 17:04:42.728539 INFO   ] utils:print_arguments:138 -        validation_file: dataset/val.txt\r\n[2023-08-28 17:04:42.728585 INFO   ] utils:print_arguments:138 -        win_length: 1024\r\n[2023-08-28 17:04:42.728615 INFO   ] utils:print_arguments:131 - model:\r\n[2023-08-28 17:04:42.728648 INFO   ] utils:print_arguments:138 -        filter_channels: 768\r\n[2023-08-28 17:04:42.728685 INFO   ] utils:print_arguments:138 -        gin_channels: 256\r\n[2023-08-28 17:04:42.728717 INFO   ] utils:print_arguments:138 -        hidden_channels: 192\r\n[2023-08-28 17:04:42.728747 INFO   ] utils:print_arguments:138 -        inter_channels: 192\r\n[2023-08-28 17:04:42.728777 INFO   ] utils:print_arguments:138 -        kernel_size: 3\r\n[2023-08-28 17:04:42.728808 INFO   ] utils:print_arguments:138 -        n_heads: 2\r\n[2023-08-28 17:04:42.728839 INFO   ] utils:print_arguments:138 -        n_layers: 6\r\n[2023-08-28 17:04:42.728870 INFO   ] utils:print_arguments:138 -        n_layers_q: 3\r\n[2023-08-28 17:04:42.728902 INFO   ] utils:print_arguments:138 -        p_dropout: 0.1\r\n[2023-08-28 17:04:42.728933 INFO   ] utils:print_arguments:138 -        resblock: 1\r\n[2023-08-28 17:04:42.728965 INFO   ] utils:print_arguments:138 -        resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]\r\n[2023-08-28 17:04:42.728997 INFO   ] utils:print_arguments:138 -        resblock_kernel_sizes: [3, 7, 11]\r\n[2023-08-28 17:04:42.729027 INFO   ] utils:print_arguments:138 -        upsample_initial_channel: 512\r\n[2023-08-28 17:04:42.729058 INFO   ] utils:print_arguments:138 -        upsample_kernel_sizes: [16, 16, 4, 4]\r\n[2023-08-28 17:04:42.729089 INFO   ] utils:print_arguments:138 -        upsample_rates: [8, 8, 2, 2]\r\n[2023-08-28 17:04:42.729119 INFO   ] utils:print_arguments:138 -        use_spectral_norm: False\r\n[2023-08-28 17:04:42.729150 INFO   ] utils:print_arguments:131 - optimizer_conf:\r\n[2023-08-28 17:04:42.729184 INFO   ] utils:print_arguments:138 -        betas: [0.8, 0.99]\r\n[2023-08-28 17:04:42.729217 INFO   ] utils:print_arguments:138 -        eps: 1e-09\r\n[2023-08-28 17:04:42.729249 INFO   ] utils:print_arguments:138 -        learning_rate: 0.0002\r\n[2023-08-28 17:04:42.729280 INFO   ] utils:print_arguments:138 -        optimizer: AdamW\r\n[2023-08-28 17:04:42.729311 INFO   ] utils:print_arguments:138 -        scheduler: ExponentialLR\r\n[2023-08-28 17:04:42.729341 INFO   ] utils:print_arguments:134 -        scheduler_args:\r\n[2023-08-28 17:04:42.729373 INFO   ] utils:print_arguments:136 -                gamma: 0.999875\r\n[2023-08-28 17:04:42.729404 INFO   ] utils:print_arguments:131 - train_conf:\r\n[2023-08-28 17:04:42.729437 INFO   ] utils:print_arguments:138 -        c_kl: 1.0\r\n[2023-08-28 17:04:42.729467 INFO   ] utils:print_arguments:138 -        c_mel: 45\r\n[2023-08-28 17:04:42.729498 INFO   ] utils:print_arguments:138 -        enable_amp: True\r\n[2023-08-28 17:04:42.729530 INFO   ] utils:print_arguments:138 -        log_interval: 200\r\n[2023-08-28 17:04:42.729561 INFO   ] utils:print_arguments:138 -        seed: 1234\r\n[2023-08-28 17:04:42.729592 INFO   ] utils:print_arguments:138 -        segment_size: 8192\r\n[2023-08-28 17:04:42.729622 INFO   ] utils:print_arguments:141 - ------------------------------------------------\r\n[2023-08-28 17:04:42.729971 INFO   ] trainer:__init__:53 - [cjke_cleaners2]\u652f\u6301\u8bed\u8a00\uff1a['\u65e5\u672c\u8a9e', '\u666e\u901a\u8bdd', 'English', '\ud55c\uad6d\uc5b4', \"Mix\": \"\"]\r\n[2023-08-28 17:04:42.795955 INFO   ] trainer:__setup_dataloader:119 - \u8bad\u7ec3\u6570\u636e\uff1a9984\r\nepoch [1/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:30<00:00,  1.88it/s]]\r\n[2023-08-25 16:44:25.205557 INFO   ] trainer:train:168 - ======================================================================\r\nepoch [2/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:20<00:00,  1.93it/s]s]\r\n[2023-08-25 16:49:54.372718 INFO   ] trainer:train:168 - ======================================================================\r\nepoch [3/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:19<00:00,  1.94it/s]\r\n[2023-08-25 16:55:21.277194 INFO   ] trainer:train:168 - ======================================================================\r\nepoch [4/10000]: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 619/619 [05:18<00:00,  1.94it/s]\r\n```\r\n\r\n\u8bad\u7ec3\u7684\u65e5\u5fd7\u4e5f\u4f1a\u4f7f\u7528VisualDL\u4fdd\u5b58\uff0c\u53ef\u4ee5\u4f7f\u7528\u8fd9\u4e2a\u5de5\u5177\u5b9e\u65f6\u67e5\u770bloss\u53d8\u5316\u548c\u5408\u6210\u6548\u679c\uff0c\u53ea\u8981\u5728\u9879\u76ee\u6839\u76ee\u5f55\u6267\u884c`visualdl --logdir=log/ --host=0.0.0.0`\uff0c\u8bbf\u95ee`http://<IP\u5730\u5740>:8040`\u5373\u53ef\u6253\u5f00\u9875\u9762\uff0c\u6548\u679c\u5982\u4e0b\u3002\r\n\r\n<div align=\"center\">\r\n    <img src=\"./docs/images/log.jpg\" alt=\"VisualDL\" width=\"600\">\r\n</div>\r\n\r\n# \u8bed\u97f3\u5408\u6210\r\n\r\n\u8bad\u7ec3\u5230\u4e00\u5b9a\u7a0b\u5ea6\u4e4b\u540e\uff0c\u53ef\u4ee5\u5f00\u59cb\u4f7f\u7528\u6a21\u578b\u8fdb\u884c\u8bed\u97f3\u5408\u6210\u4e86\uff0c\u547d\u4ee4\u5982\u4e0b\uff0c\u4e3b\u8981\u53c2\u6570\u6709\u4e09\u4e2a\uff0c\u5206\u522b\u662f`--text`\u6307\u5b9a\u6240\u9700\u8981\u5408\u6210\u7684\u6587\u672c\u3002`--language`\u6307\u5b9a\u5408\u6210\u6587\u672c\u7684\u8bed\u8a00\uff0c\u5982\u679c\u8bed\u8a00\u6307\u5b9a\u4e3a`Mix`\uff0c\u4e3a\u6df7\u5408\u6a21\u5f0f\uff0c\u9700\u8981\u7528\u6237\u624b\u52a8\u5bf9\u6536\u5165\u6587\u672c\u7528\u8bed\u8a00\u6807\u7b7e\u5305\u88f9\u3002\u6700\u540e\u662f\u6307\u5b9a\u8bf4\u8bdd\u4eba\u7684\u53c2\u6570`--spk`\u3002\u5feb\u53bb\u8bd5\u4e00\u4e0b\u5427\u3002\r\n\r\n```shell\r\npython infer.py --text=\"\u4f60\u597d\uff0c\u6211\u662f\u667a\u80fd\u8bed\u97f3\u52a9\u624b\u3002\" --language=\u7b80\u4f53\u4e2d\u6587 --spk=\u6807\u51c6\u5973\u58f0\r\n```\r\n\r\n## \u6253\u8d4f\u4f5c\u8005\r\n\r\n<br/>\r\n<div align=\"center\">\r\n    <p>\u6253\u8d4f\u4e00\u5757\u94b1\u652f\u6301\u4e00\u4e0b\u4f5c\u8005</p>\r\n    <img src=\"https://yeyupiaoling.cn/reward.png\" alt=\"\u6253\u8d4f\u4f5c\u8005\" width=\"400\">\r\n</div>\r\n\r\n# \u53c2\u8003\u8d44\u6599\r\n\r\n1. https://github.com/Plachtaa/VITS-fast-fine-tuning\r\n2. https://github.com/PaddlePaddle/PaddleSpeech\r\n3. https://github.com/yeyupiaoling/MASR\r\n4. https://github.com/Artrajz/vits-simple-api\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "VITS toolkit on Pytorch",
    "version": "0.0.4",
    "project_urls": {
        "Download": "https://github.com/yeyupiaoling/VITS-Pytorch.git",
        "Homepage": "https://github.com/yeyupiaoling/VITS-Pytorch"
    },
    "split_keywords": [
        "audio",
        "pytorch",
        "tts"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dd1b28da08b51eba9f11ef1dbe9bedb4231dba32425c1ca9e2dd9bf0b0be2add",
                "md5": "8e28a7ad90afaeff3b49d94a6f7d417a",
                "sha256": "6ef65cba752b1fad960caa7552c44add8d48f45b8da4efc9c158d155af8252ba"
            },
            "downloads": -1,
            "filename": "mvits-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8e28a7ad90afaeff3b49d94a6f7d417a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 4194493,
            "upload_time": "2023-09-01T01:37:55",
            "upload_time_iso_8601": "2023-09-01T01:37:55.563373Z",
            "url": "https://files.pythonhosted.org/packages/dd/1b/28da08b51eba9f11ef1dbe9bedb4231dba32425c1ca9e2dd9bf0b0be2add/mvits-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-01 01:37:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yeyupiaoling",
    "github_project": "VITS-Pytorch",
    "github_not_found": true,
    "lcname": "mvits"
}
        
Elapsed time: 0.11023s