s3prl


Names3prl JSON
Version 0.4.15 PyPI version JSON
download
home_pagehttps://github.com/s3prl/s3prl
SummarySelf-Supervised Speech Pre-training and Representation Learning Toolkit
upload_time2024-03-18 13:38:07
maintainer
docs_urlNone
authorShu-wen (Leo) Yang, Andy T. Liu (Ting-Wei Liu), and the S3PRL Team
requires_python>=3.6, <4
licenseApache Software License
keywords deep learning speech self-supervised representation pretraining
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
    <img src="https://raw.githubusercontent.com/s3prl/s3prl/main/file/S3PRL-logo.png" width="900"/>
    <br>
    <br>
    <a href="./LICENSE.txt"><img alt="Apache License 2.0" src="https://raw.githubusercontent.com/s3prl/s3prl/main/file/license.svg" /></a>
    <a href="https://creativecommons.org/licenses/by-nc/4.0/"><img alt="CC_BY_NC License" src="https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg" /></a>
    <a href="https://github.com/s3prl/s3prl/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/s3prl/s3prl/actions/workflows/ci.yml/badge.svg?branch=main&event=push"></a>
    <a href="#development-pattern-for-contributors"><img alt="Codecov" src="https://img.shields.io/badge/contributions-welcome-brightgreen.svg"></a>
    <a href="https://github.com/s3prl/s3prl/issues"><img alt="Bitbucket open issues" src="https://img.shields.io/github/issues/s3prl/s3prl"></a>
</p>

## Contact

We prefer to have discussions directly on Github issue page, so that all the information is transparent to all the contributors and is auto-archived on the Github.
If you wish to use email, please contact:

- Leo Yang (leo19941227@gmail.com)
- Andy T. Liu (liuandyt@gmail.com)

Please refer to the [legacy citation](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=R1mNI8QAAAAJ&citation_for_view=R1mNI8QAAAAJ:LkGwnXOMwfcC) of S3PRL and the timeline below justifying our ownership on this project. This information is only used to protect us from half-truths. We encourage to cite individual papers most related to the function you are using to give fair credit to the developer of the function. Please find the names in the [Change Log](#change-log).
Finally, we thank our advisor [Prof. Hung-yi Lee](https://speech.ee.ntu.edu.tw/~hylee/index.php) for his advice and support.

If you have any question (e.g., about who developed which functions or how the project started), feel free to engage in an open and responsible conversation on the GitHub issue page, and we'll be happy to help!

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=s3prl/s3prl&type=Date)](https://star-history.com/#s3prl/s3prl&Date)

## Contribution (pull request)

Guidelines:
- If you want to add new features (e.g. new upstream), please discuss with us on the Github issue page first before implementing the actual pull request. So we can discuss about whether we want the feauture, how to achieve the feature, and whether the feature is maintainable. (As S3PRL is growing too large, we might not consider new tasks or complex new features.) Thanks!
- Reporting bugs or the PR fixing the bugs is always welcome!

Steps:
1. [Create a personal fork](https://help.github.com/articles/fork-a-repo/) of the [main S3PRL repository](https://github.com/s3prl/s3prl) in GitHub.
2. Make your changes in a named branch different from `master`, e.g. you create a branch `new-awesome-feature`.
3. Contact us if you have any questions during development.
4. [Generate a pull request](https://help.github.com/articles/creating-a-pull-request/) through the Web interface of GitHub.
5. Please verify that your code is free of basic mistakes, we appreciate your contribution!

## Environment compatibilities [![CI](https://github.com/s3prl/s3prl/actions/workflows/ci.yml/badge.svg?branch=main&event=push)](https://github.com/s3prl/s3prl/actions/workflows/ci.yml)

We support the following environments. The test cases are ran with **[tox](./tox.ini)** locally and on **[github action](.github/workflows/ci.yml)**:

| Env | versions |
| --- | --- |
| os  | `ubuntu-18.04`, `ubuntu-20.04` |
| python | `3.7`, `3.8`, `3.9`, `3.10` |
| pytorch | `1.8.1`, `1.9.1`, `1.10.2`, `1.11.0`, `1.12.1` , `1.13.1` , `2.0.1` , `2.1.0` |

## Change Log

* *Dec 2023*: Support Multi-resolution HuBERT (MR-HuBERT, see [Multiresolution HuBERT](https://arxiv.org/pdf/2310.02720.pdf))
* *Oct 2023*: Support ESPnet pre-trained upstream models (see [ESPnet HuBERT](https://arxiv.org/abs/2306.06672) and [WavLabLM](https://arxiv.org/abs/2309.15317))
* *Sep 2022*: We upgrade the codebase to support testing, documentation and a new [S3PRL PyPI package](https://pypi.org/project/s3prl/) for easy installation and usage for upstream models. See our [online doc](https://s3prl.github.io/s3prl/) for more information.
* *Mar 2022*: Introduce [**SUPERB-SG**](https://arxiv.org/abs/2203.06849), see [Speech Translation](./s3prl/downstream/speech_translation) by [Hsiang-Sheng Tsai](https://github.com/bearhsiang) ***(NTU)***, [Out-of-domain ASR](./s3prl/downstream/ctc/) by [Heng-Jui Chang](https://people.csail.mit.edu/hengjui/) ***(NTU)***, [Voice Conversion](./s3prl/downstream/a2o-vc-vcc2020/) by [Wen-Chin Huang](https://unilight.github.io/) ***(Nagoya)***, [Speech Separation](./s3prl/downstream/separation_stft/) and [Speech Enhancement](./s3prl/downstream/enhancement_stft/) by [Zili Huang](https://scholar.google.com/citations?user=iQ-S0fQAAAAJ&hl=en) ***(JHU)*** for more info.
* *Nov 2021*: Introduce [**S3PRL-VC**](https://arxiv.org/abs/2110.06280) by [Wen-Chin Huang](https://unilight.github.io/) ***(Nagoya)***, see [Any-to-one](https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/a2o-vc-vcc2020) for more info. We highly recommend to consider the [newly released official repo of S3PRL-VC](https://github.com/unilight/s3prl-vc) which is developed and actively maintained by [Wen-Chin Huang](https://unilight.github.io/). The standalone repo contains much more recepies for the VC experiments. In S3PRL we only include the Any-to-one recipe for reproducing the SUPERB results.
* *Oct 2021*: Support [**DistilHuBERT**](https://arxiv.org/abs/2110.01900) by [Heng-Jui Chang](https://people.csail.mit.edu/hengjui/) ***(NTU)***, see [docs](./s3prl/upstream/distiller/README.md) for more info
* *Sep 2021:* We host a *challenge* in [*AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing*](https://aaai-sas-2022.github.io/)! See [**SUPERB official site**](https://superbbenchmark.org/) for the challenge details and the [**SUPERB documentation**](./s3prl/downstream/docs/superb.md) in this toolkit!
* *Aug 2021:* [Andy T. Liu](https://andi611.github.io/) ***(NTU)*** and [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)*** introduces the S3PRL toolkit in [MLSS 2021](https://ai.ntu.edu.tw/%e7%b7%9a%e4%b8%8a%e5%ad%b8%e7%bf%92-2/mlss-2021/), you can also **[watch it on Youtube](https://youtu.be/PkMFnS6cjAc)**!
* *July 2021:* We are now working on packaging s3prl and reorganizing the file structure in **v0.3**. Please consider using the stable **v0.2.0** for now. We will test and release **v0.3** before August.
* *June 2021:* Support [**SUPERB:** **S**peech processing **U**niversal **PER**formance **B**enchmark](https://arxiv.org/abs/2105.01051), submitted to Interspeech 2021. Use the tag **superb-interspeech2021** or **v0.2.0**. The individual task contributors include: [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)***, [Pohan Chi](https://scholar.google.com/citations?user=SiyicoEAAAAJ&hl=zh-TW) ***(NTU)***, [Yist Lin](https://scholar.google.com/citations?user=0lrZq9MAAAAJ&hl=en) ***(NTU)***, [Yung-Sung Chuang](https://scholar.google.com/citations?user=3ar1DOwAAAAJ&hl=zh-TW) ***(NTU)***, [Jiatong Shi](https://scholar.google.com/citations?user=FEDNbgkAAAAJ&hl=en) ***(CMU)***, [Xuankai Chang](https://scholar.google.com/citations?user=cIl2jpMAAAAJ&hl=en) ***(CMU)***, [Wei-Cheng Tseng](https://scholar.google.com.tw/citations?user=-d6aNP0AAAAJ&hl=zh-TW) ***(NTU)*** and Tzu-Hsien Huang ***(NTU)***.
* *June 2021:* Support extracting multiple hidden states for all the SSL pretrained models by [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)***.
* *Jan 2021:* Readme updated with detailed instructions on how to use our latest version!
* *Dec 2020:* We are migrating to a newer version for a more general, flexible, and scalable code. See the introduction below for more information! The legacy version can be accessed the tag **v0.1.0**.
* *Oct 2020:* [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)*** and [Andy T. Liu](https://andi611.github.io/) ***(NTU)*** added varioius classic upstream models, including PASE+, APC, VQ-APC, NPC, wav2vec, vq-wav2vec ...etc.
* *Oct 2019:* [Andy T. Liu](https://andi611.github.io/) ***(NTU)***, [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)*** and [Pohan Chi](https://scholar.google.com/citations?user=SiyicoEAAAAJ&hl=zh-TW) ***(NTU)*** implemented the Mockingjay pre-training and several simple downstream evaluation tasks. This work was the very start of the S3PRL project which established lots of foundamental modules and coding styles. Feel free to checkout to the oldest commits to explore our legacy codebase!

****

## Introduction and Usages

This is an open source toolkit called **s3prl**, which stands for **S**elf-**S**upervised **S**peech **P**re-training and **R**epresentation **L**earning.
Self-supervised speech pre-trained models are called **upstream** in this toolkit, and are utilized in various **downstream** tasks.

The toolkit has **three major usages**:

### Pretrain

- Pretrain upstream models, including Mockingjay, Audio ALBERT and TERA.
- Document: [**pretrain/README.md**](./s3prl/pretrain/README.md)

### Upstream

- Easily load most of the existing upstream models with pretrained weights in a unified I/O interface.
- Pretrained models are registered through **torch.hub**, which means you can use these models in your own project by one-line plug-and-play without depending on this toolkit's coding style.
- Document: [**upstream/README.md**](./s3prl/upstream/README.md)

### Downstream

- Utilize upstream models in lots of downstream tasks
- Benchmark upstream models with [**SUPERB Benchmark**](./s3prl/downstream/docs/superb.md)
- Document: [**downstream/README.md**](./s3prl/downstream/README.md)

Below is an **intuitive illustration** on how this toolkit may help you:
\
\
<img src="https://raw.githubusercontent.com/s3prl/s3prl/main/file/S3PRL-interface.png" width="900"/>
\
\
Feel free to use or modify our toolkit in your research. Here is a [list of papers using our toolkit](#used-by). Any question, bug report or improvement suggestion is welcome through [opening up a new issue](https://github.com/s3prl/s3prl/issues).

If you find this toolkit helpful to your research, please do consider citing [our papers](#citation), thanks!

## Installation

1. **Python** >= 3.6
2. Install **sox** on your OS
3. Install s3prl: [Read doc](https://s3prl.github.io/s3prl/tutorial/installation.html#) or `pip install -e ".[all]"`
4. (Optional) Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the `README.md` under each `upstream` folder. E.g., `upstream/pase/README.md`=

## Reference Repositories

* [Pytorch](https://github.com/pytorch/pytorch), Pytorch.
* [Audio](https://github.com/pytorch/audio), Pytorch.
* [Kaldi](https://github.com/kaldi-asr/kaldi), Kaldi-ASR.
* [Transformers](https://github.com/huggingface/transformers), Hugging Face.
* [PyTorch-Kaldi](https://github.com/mravanelli/pytorch-kaldi), Mirco Ravanelli.
* [fairseq](https://github.com/pytorch/fairseq), Facebook AI Research.
* [CPC](https://github.com/facebookresearch/CPC_audio), Facebook AI Research.
* [APC](https://github.com/iamyuanchung/Autoregressive-Predictive-Coding), Yu-An Chung.
* [VQ-APC](https://github.com/s3prl/VQ-APC), Yu-An Chung.
* [NPC](https://github.com/Alexander-H-Liu/NPC), Alexander-H-Liu.
* [End-to-end-ASR-Pytorch](https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch), Alexander-H-Liu
* [Mockingjay](https://github.com/andi611/Mockingjay-Speech-Representation), Andy T. Liu.
* [ESPnet](https://github.com/espnet/espnet), Shinji Watanabe
* [speech-representations](https://github.com/awslabs/speech-representations), aws lab
* [PASE](https://github.com/santi-pdp/pase), Santiago Pascual and Mirco Ravanelli
* [LibriMix](https://github.com/JorisCos/LibriMix), Joris Cosentino and Manuel Pariente

## License

The majority of S3PRL Toolkit is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.

## Used by
<details><summary>List of papers that used our toolkit (Feel free to add your own paper by making a pull request)</summary><p>

### Self-Supervised Pretraining

+ [Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders (Liu et al., 2020)](https://arxiv.org/abs/1910.12638)
  ```
  @article{mockingjay,
     title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
     ISBN={9781509066315},
     url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
     DOI={10.1109/icassp40776.2020.9054458},
     journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
     publisher={IEEE},
     author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
     year={2020},
     month={May}
  }
  ```
+ [TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech (Liu et al., 2020)](https://arxiv.org/abs/2007.06028)
  ```
  @misc{tera,
      title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
      author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
      year={2020},
      eprint={2007.06028},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
  }
  ```
+ [Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation (Chi et al., 2020)](https://arxiv.org/abs/2005.08575)
  ```
  @inproceedings{audio_albert,
      title={Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation},
      author={Po-Han Chi and Pei-Hung Chung and Tsung-Han Wu and Chun-Cheng Hsieh and Shang-Wen Li and Hung-yi Lee},
      year={2020},
      booktitle={SLT 2020},
  }
  ```

### Explanability

+ [Understanding Self-Attention of Self-Supervised Audio Transformers (Yang et al., 2020)](https://arxiv.org/abs/2006.03265)
  ```
  @inproceedings{understanding_sat,
      author={Shu-wen Yang and Andy T. Liu and Hung-yi Lee},
      title={{Understanding Self-Attention of Self-Supervised Audio Transformers}},
      year=2020,
      booktitle={Proc. Interspeech 2020},
      pages={3785--3789},
      doi={10.21437/Interspeech.2020-2231},
      url={http://dx.doi.org/10.21437/Interspeech.2020-2231}
  }
  ```

### Adversarial Attack

+ [Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning (Wu et al., 2020)](https://arxiv.org/abs/2006.03214), code for computing LNSR: [utility/observe_lnsr.py](https://github.com/s3prl/s3prl/blob/master/utility/observe_lnsr.py)
  ```
  @inproceedings{mockingjay_defense,
      author={Haibin Wu and Andy T. Liu and Hung-yi Lee},
      title={{Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning}},
      year=2020,
      booktitle={Proc. Interspeech 2020},
      pages={3780--3784},
      doi={10.21437/Interspeech.2020-2026},
      url={http://dx.doi.org/10.21437/Interspeech.2020-2026}
  }
  ```

+ [Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models (Wu et al., 2021)](https://arxiv.org/abs/2102.07047)
  ```
  @misc{asv_ssl,
      title={Adversarial defense for automatic speaker verification by cascaded self-supervised learning models},
      author={Haibin Wu and Xu Li and Andy T. Liu and Zhiyong Wu and Helen Meng and Hung-yi Lee},
      year={2021},
      eprint={2102.07047},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
  ```

### Voice Conversion

+ [S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations (Lin et al., 2021)](https://arxiv.org/abs/2104.02901)
  ```
  @misc{s2vc,
        title={S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations},
        author={Jheng-hao Lin and Yist Y. Lin and Chung-Ming Chien and Hung-yi Lee},
        year={2021},
        eprint={2104.02901},
        archivePrefix={arXiv},
        primaryClass={eess.AS}
  }
  ```

### Benchmark and Evaluation

+ [SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)](https://arxiv.org/abs/2105.01051)
  ```
  @misc{superb,
        title={SUPERB: Speech processing Universal PERformance Benchmark},
        author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
        year={2021},
        eprint={2105.01051},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
  }
  ```

+ [Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)](https://arxiv.org/abs/2104.03017)
  ```
  @misc{ssr_mos,
      title={Utilizing Self-supervised Representations for MOS Prediction},
      author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},
      year={2021},
      eprint={2104.03017},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
  }
  ```
}

</p></details>

## Citation

If you find this toolkit useful, please consider citing following papers.

- If you use our pre-training scripts, or the downstream tasks considered in *TERA* and *Mockingjay*, please consider citing the following:
```
@misc{tera,
  title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
  author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
  year={2020},
  eprint={2007.06028},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}
```
```
@article{mockingjay,
   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
   ISBN={9781509066315},
   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
   DOI={10.1109/icassp40776.2020.9054458},
   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   publisher={IEEE},
   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
   year={2020},
   month={May}
}
```

- If you use our organized upstream interface and features, or the *SUPERB* downstream benchmark, please consider citing the following:
```
@inproceedings{yang21c_interspeech,
  author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
  title={{SUPERB: Speech Processing Universal PERformance Benchmark}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1194--1198},
  doi={10.21437/Interspeech.2021-1775}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/s3prl/s3prl",
    "name": "s3prl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6, <4",
    "maintainer_email": "",
    "keywords": "deep learning,speech,self-supervised,representation,pretraining",
    "author": "Shu-wen (Leo) Yang, Andy T. Liu (Ting-Wei Liu), and the S3PRL Team",
    "author_email": "leo19941227@gmail.com, liuandyt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/92/f6/d7e55a154d34b1fff37bcad7d3985a321144759359bcc178844c2cf6c8dd/s3prl-0.4.15.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <img src=\"https://raw.githubusercontent.com/s3prl/s3prl/main/file/S3PRL-logo.png\" width=\"900\"/>\n    <br>\n    <br>\n    <a href=\"./LICENSE.txt\"><img alt=\"Apache License 2.0\" src=\"https://raw.githubusercontent.com/s3prl/s3prl/main/file/license.svg\" /></a>\n    <a href=\"https://creativecommons.org/licenses/by-nc/4.0/\"><img alt=\"CC_BY_NC License\" src=\"https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg\" /></a>\n    <a href=\"https://github.com/s3prl/s3prl/actions/workflows/ci.yml\"><img alt=\"CI\" src=\"https://github.com/s3prl/s3prl/actions/workflows/ci.yml/badge.svg?branch=main&event=push\"></a>\n    <a href=\"#development-pattern-for-contributors\"><img alt=\"Codecov\" src=\"https://img.shields.io/badge/contributions-welcome-brightgreen.svg\"></a>\n    <a href=\"https://github.com/s3prl/s3prl/issues\"><img alt=\"Bitbucket open issues\" src=\"https://img.shields.io/github/issues/s3prl/s3prl\"></a>\n</p>\n\n## Contact\n\nWe prefer to have discussions directly on Github issue page, so that all the information is transparent to all the contributors and is auto-archived on the Github.\nIf you wish to use email, please contact:\n\n- Leo Yang (leo19941227@gmail.com)\n- Andy T. Liu (liuandyt@gmail.com)\n\nPlease refer to the [legacy citation](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=R1mNI8QAAAAJ&citation_for_view=R1mNI8QAAAAJ:LkGwnXOMwfcC) of S3PRL and the timeline below justifying our ownership on this project. This information is only used to protect us from half-truths. We encourage to cite individual papers most related to the function you are using to give fair credit to the developer of the function. Please find the names in the [Change Log](#change-log).\nFinally, we thank our advisor [Prof. Hung-yi Lee](https://speech.ee.ntu.edu.tw/~hylee/index.php) for his advice and support.\n\nIf you have any question (e.g., about who developed which functions or how the project started), feel free to engage in an open and responsible conversation on the GitHub issue page, and we'll be happy to help!\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=s3prl/s3prl&type=Date)](https://star-history.com/#s3prl/s3prl&Date)\n\n## Contribution (pull request)\n\nGuidelines:\n- If you want to add new features (e.g. new upstream), please discuss with us on the Github issue page first before implementing the actual pull request. So we can discuss about whether we want the feauture, how to achieve the feature, and whether the feature is maintainable. (As S3PRL is growing too large, we might not consider new tasks or complex new features.) Thanks!\n- Reporting bugs or the PR fixing the bugs is always welcome!\n\nSteps:\n1. [Create a personal fork](https://help.github.com/articles/fork-a-repo/) of the [main S3PRL repository](https://github.com/s3prl/s3prl) in GitHub.\n2. Make your changes in a named branch different from `master`, e.g. you create a branch `new-awesome-feature`.\n3. Contact us if you have any questions during development.\n4. [Generate a pull request](https://help.github.com/articles/creating-a-pull-request/) through the Web interface of GitHub.\n5. Please verify that your code is free of basic mistakes, we appreciate your contribution!\n\n## Environment compatibilities [![CI](https://github.com/s3prl/s3prl/actions/workflows/ci.yml/badge.svg?branch=main&event=push)](https://github.com/s3prl/s3prl/actions/workflows/ci.yml)\n\nWe support the following environments. The test cases are ran with **[tox](./tox.ini)** locally and on **[github action](.github/workflows/ci.yml)**:\n\n| Env | versions |\n| --- | --- |\n| os  | `ubuntu-18.04`, `ubuntu-20.04` |\n| python | `3.7`, `3.8`, `3.9`, `3.10` |\n| pytorch | `1.8.1`, `1.9.1`, `1.10.2`, `1.11.0`, `1.12.1` , `1.13.1` , `2.0.1` , `2.1.0` |\n\n## Change Log\n\n* *Dec 2023*: Support Multi-resolution HuBERT (MR-HuBERT, see [Multiresolution HuBERT](https://arxiv.org/pdf/2310.02720.pdf))\n* *Oct 2023*: Support ESPnet pre-trained upstream models (see [ESPnet HuBERT](https://arxiv.org/abs/2306.06672) and [WavLabLM](https://arxiv.org/abs/2309.15317))\n* *Sep 2022*: We upgrade the codebase to support testing, documentation and a new [S3PRL PyPI package](https://pypi.org/project/s3prl/) for easy installation and usage for upstream models. See our [online doc](https://s3prl.github.io/s3prl/) for more information.\n* *Mar 2022*: Introduce [**SUPERB-SG**](https://arxiv.org/abs/2203.06849), see [Speech Translation](./s3prl/downstream/speech_translation) by [Hsiang-Sheng Tsai](https://github.com/bearhsiang) ***(NTU)***, [Out-of-domain ASR](./s3prl/downstream/ctc/) by [Heng-Jui Chang](https://people.csail.mit.edu/hengjui/) ***(NTU)***, [Voice Conversion](./s3prl/downstream/a2o-vc-vcc2020/) by [Wen-Chin Huang](https://unilight.github.io/) ***(Nagoya)***, [Speech Separation](./s3prl/downstream/separation_stft/) and [Speech Enhancement](./s3prl/downstream/enhancement_stft/) by [Zili Huang](https://scholar.google.com/citations?user=iQ-S0fQAAAAJ&hl=en) ***(JHU)*** for more info.\n* *Nov 2021*: Introduce [**S3PRL-VC**](https://arxiv.org/abs/2110.06280) by [Wen-Chin Huang](https://unilight.github.io/) ***(Nagoya)***, see [Any-to-one](https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/a2o-vc-vcc2020) for more info. We highly recommend to consider the [newly released official repo of S3PRL-VC](https://github.com/unilight/s3prl-vc) which is developed and actively maintained by [Wen-Chin Huang](https://unilight.github.io/). The standalone repo contains much more recepies for the VC experiments. In S3PRL we only include the Any-to-one recipe for reproducing the SUPERB results.\n* *Oct 2021*: Support [**DistilHuBERT**](https://arxiv.org/abs/2110.01900) by [Heng-Jui Chang](https://people.csail.mit.edu/hengjui/) ***(NTU)***, see [docs](./s3prl/upstream/distiller/README.md) for more info\n* *Sep 2021:* We host a *challenge* in [*AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing*](https://aaai-sas-2022.github.io/)! See [**SUPERB official site**](https://superbbenchmark.org/) for the challenge details and the [**SUPERB documentation**](./s3prl/downstream/docs/superb.md) in this toolkit!\n* *Aug 2021:* [Andy T. Liu](https://andi611.github.io/) ***(NTU)*** and [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)*** introduces the S3PRL toolkit in [MLSS 2021](https://ai.ntu.edu.tw/%e7%b7%9a%e4%b8%8a%e5%ad%b8%e7%bf%92-2/mlss-2021/), you can also **[watch it on Youtube](https://youtu.be/PkMFnS6cjAc)**!\n* *July 2021:* We are now working on packaging s3prl and reorganizing the file structure in **v0.3**. Please consider using the stable **v0.2.0** for now. We will test and release **v0.3** before August.\n* *June 2021:* Support [**SUPERB:** **S**peech processing **U**niversal **PER**formance **B**enchmark](https://arxiv.org/abs/2105.01051), submitted to Interspeech 2021. Use the tag **superb-interspeech2021** or **v0.2.0**. The individual task contributors include: [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)***, [Pohan Chi](https://scholar.google.com/citations?user=SiyicoEAAAAJ&hl=zh-TW) ***(NTU)***, [Yist Lin](https://scholar.google.com/citations?user=0lrZq9MAAAAJ&hl=en) ***(NTU)***, [Yung-Sung Chuang](https://scholar.google.com/citations?user=3ar1DOwAAAAJ&hl=zh-TW) ***(NTU)***, [Jiatong Shi](https://scholar.google.com/citations?user=FEDNbgkAAAAJ&hl=en) ***(CMU)***, [Xuankai Chang](https://scholar.google.com/citations?user=cIl2jpMAAAAJ&hl=en) ***(CMU)***, [Wei-Cheng Tseng](https://scholar.google.com.tw/citations?user=-d6aNP0AAAAJ&hl=zh-TW) ***(NTU)*** and Tzu-Hsien Huang ***(NTU)***.\n* *June 2021:* Support extracting multiple hidden states for all the SSL pretrained models by [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)***.\n* *Jan 2021:* Readme updated with detailed instructions on how to use our latest version!\n* *Dec 2020:* We are migrating to a newer version for a more general, flexible, and scalable code. See the introduction below for more information! The legacy version can be accessed the tag **v0.1.0**.\n* *Oct 2020:* [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)*** and [Andy T. Liu](https://andi611.github.io/) ***(NTU)*** added varioius classic upstream models, including PASE+, APC, VQ-APC, NPC, wav2vec, vq-wav2vec ...etc.\n* *Oct 2019:* [Andy T. Liu](https://andi611.github.io/) ***(NTU)***, [Shu-wen Yang](https://scholar.google.com/citations?user=R1mNI8QAAAAJ&hl=en) ***(NTU)*** and [Pohan Chi](https://scholar.google.com/citations?user=SiyicoEAAAAJ&hl=zh-TW) ***(NTU)*** implemented the Mockingjay pre-training and several simple downstream evaluation tasks. This work was the very start of the S3PRL project which established lots of foundamental modules and coding styles. Feel free to checkout to the oldest commits to explore our legacy codebase!\n\n****\n\n## Introduction and Usages\n\nThis is an open source toolkit called **s3prl**, which stands for **S**elf-**S**upervised **S**peech **P**re-training and **R**epresentation **L**earning.\nSelf-supervised speech pre-trained models are called **upstream** in this toolkit, and are utilized in various **downstream** tasks.\n\nThe toolkit has **three major usages**:\n\n### Pretrain\n\n- Pretrain upstream models, including Mockingjay, Audio ALBERT and TERA.\n- Document: [**pretrain/README.md**](./s3prl/pretrain/README.md)\n\n### Upstream\n\n- Easily load most of the existing upstream models with pretrained weights in a unified I/O interface.\n- Pretrained models are registered through **torch.hub**, which means you can use these models in your own project by one-line plug-and-play without depending on this toolkit's coding style.\n- Document: [**upstream/README.md**](./s3prl/upstream/README.md)\n\n### Downstream\n\n- Utilize upstream models in lots of downstream tasks\n- Benchmark upstream models with [**SUPERB Benchmark**](./s3prl/downstream/docs/superb.md)\n- Document: [**downstream/README.md**](./s3prl/downstream/README.md)\n\nBelow is an **intuitive illustration** on how this toolkit may help you:\n\\\n\\\n<img src=\"https://raw.githubusercontent.com/s3prl/s3prl/main/file/S3PRL-interface.png\" width=\"900\"/>\n\\\n\\\nFeel free to use or modify our toolkit in your research. Here is a [list of papers using our toolkit](#used-by). Any question, bug report or improvement suggestion is welcome through [opening up a new issue](https://github.com/s3prl/s3prl/issues).\n\nIf you find this toolkit helpful to your research, please do consider citing [our papers](#citation), thanks!\n\n## Installation\n\n1. **Python** >= 3.6\n2. Install **sox** on your OS\n3. Install s3prl: [Read doc](https://s3prl.github.io/s3prl/tutorial/installation.html#) or `pip install -e \".[all]\"`\n4. (Optional) Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the `README.md` under each `upstream` folder. E.g., `upstream/pase/README.md`=\n\n## Reference Repositories\n\n* [Pytorch](https://github.com/pytorch/pytorch), Pytorch.\n* [Audio](https://github.com/pytorch/audio), Pytorch.\n* [Kaldi](https://github.com/kaldi-asr/kaldi), Kaldi-ASR.\n* [Transformers](https://github.com/huggingface/transformers), Hugging Face.\n* [PyTorch-Kaldi](https://github.com/mravanelli/pytorch-kaldi), Mirco Ravanelli.\n* [fairseq](https://github.com/pytorch/fairseq), Facebook AI Research.\n* [CPC](https://github.com/facebookresearch/CPC_audio), Facebook AI Research.\n* [APC](https://github.com/iamyuanchung/Autoregressive-Predictive-Coding), Yu-An Chung.\n* [VQ-APC](https://github.com/s3prl/VQ-APC), Yu-An Chung.\n* [NPC](https://github.com/Alexander-H-Liu/NPC), Alexander-H-Liu.\n* [End-to-end-ASR-Pytorch](https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch), Alexander-H-Liu\n* [Mockingjay](https://github.com/andi611/Mockingjay-Speech-Representation), Andy T. Liu.\n* [ESPnet](https://github.com/espnet/espnet), Shinji Watanabe\n* [speech-representations](https://github.com/awslabs/speech-representations), aws lab\n* [PASE](https://github.com/santi-pdp/pase), Santiago Pascual and Mirco Ravanelli\n* [LibriMix](https://github.com/JorisCos/LibriMix), Joris Cosentino and Manuel Pariente\n\n## License\n\nThe majority of S3PRL Toolkit is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.\n\n## Used by\n<details><summary>List of papers that used our toolkit (Feel free to add your own paper by making a pull request)</summary><p>\n\n### Self-Supervised Pretraining\n\n+ [Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders (Liu et al., 2020)](https://arxiv.org/abs/1910.12638)\n  ```\n  @article{mockingjay,\n     title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},\n     ISBN={9781509066315},\n     url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},\n     DOI={10.1109/icassp40776.2020.9054458},\n     journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n     publisher={IEEE},\n     author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},\n     year={2020},\n     month={May}\n  }\n  ```\n+ [TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech (Liu et al., 2020)](https://arxiv.org/abs/2007.06028)\n  ```\n  @misc{tera,\n      title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},\n      author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},\n      year={2020},\n      eprint={2007.06028},\n      archivePrefix={arXiv},\n      primaryClass={eess.AS}\n  }\n  ```\n+ [Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation (Chi et al., 2020)](https://arxiv.org/abs/2005.08575)\n  ```\n  @inproceedings{audio_albert,\n      title={Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation},\n      author={Po-Han Chi and Pei-Hung Chung and Tsung-Han Wu and Chun-Cheng Hsieh and Shang-Wen Li and Hung-yi Lee},\n      year={2020},\n      booktitle={SLT 2020},\n  }\n  ```\n\n### Explanability\n\n+ [Understanding Self-Attention of Self-Supervised Audio Transformers (Yang et al., 2020)](https://arxiv.org/abs/2006.03265)\n  ```\n  @inproceedings{understanding_sat,\n      author={Shu-wen Yang and Andy T. Liu and Hung-yi Lee},\n      title={{Understanding Self-Attention of Self-Supervised Audio Transformers}},\n      year=2020,\n      booktitle={Proc. Interspeech 2020},\n      pages={3785--3789},\n      doi={10.21437/Interspeech.2020-2231},\n      url={http://dx.doi.org/10.21437/Interspeech.2020-2231}\n  }\n  ```\n\n### Adversarial Attack\n\n+ [Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning (Wu et al., 2020)](https://arxiv.org/abs/2006.03214), code for computing LNSR: [utility/observe_lnsr.py](https://github.com/s3prl/s3prl/blob/master/utility/observe_lnsr.py)\n  ```\n  @inproceedings{mockingjay_defense,\n      author={Haibin Wu and Andy T. Liu and Hung-yi Lee},\n      title={{Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning}},\n      year=2020,\n      booktitle={Proc. Interspeech 2020},\n      pages={3780--3784},\n      doi={10.21437/Interspeech.2020-2026},\n      url={http://dx.doi.org/10.21437/Interspeech.2020-2026}\n  }\n  ```\n\n+ [Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models (Wu et al., 2021)](https://arxiv.org/abs/2102.07047)\n  ```\n  @misc{asv_ssl,\n      title={Adversarial defense for automatic speaker verification by cascaded self-supervised learning models},\n      author={Haibin Wu and Xu Li and Andy T. Liu and Zhiyong Wu and Helen Meng and Hung-yi Lee},\n      year={2021},\n      eprint={2102.07047},\n      archivePrefix={arXiv},\n      primaryClass={eess.AS}\n  ```\n\n### Voice Conversion\n\n+ [S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations (Lin et al., 2021)](https://arxiv.org/abs/2104.02901)\n  ```\n  @misc{s2vc,\n        title={S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations},\n        author={Jheng-hao Lin and Yist Y. Lin and Chung-Ming Chien and Hung-yi Lee},\n        year={2021},\n        eprint={2104.02901},\n        archivePrefix={arXiv},\n        primaryClass={eess.AS}\n  }\n  ```\n\n### Benchmark and Evaluation\n\n+ [SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)](https://arxiv.org/abs/2105.01051)\n  ```\n  @misc{superb,\n        title={SUPERB: Speech processing Universal PERformance Benchmark},\n        author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},\n        year={2021},\n        eprint={2105.01051},\n        archivePrefix={arXiv},\n        primaryClass={cs.CL}\n  }\n  ```\n\n+ [Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)](https://arxiv.org/abs/2104.03017)\n  ```\n  @misc{ssr_mos,\n      title={Utilizing Self-supervised Representations for MOS Prediction},\n      author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},\n      year={2021},\n      eprint={2104.03017},\n      archivePrefix={arXiv},\n      primaryClass={eess.AS}\n  }\n  ```\n}\n\n</p></details>\n\n## Citation\n\nIf you find this toolkit useful, please consider citing following papers.\n\n- If you use our pre-training scripts, or the downstream tasks considered in *TERA* and *Mockingjay*, please consider citing the following:\n```\n@misc{tera,\n  title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},\n  author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},\n  year={2020},\n  eprint={2007.06028},\n  archivePrefix={arXiv},\n  primaryClass={eess.AS}\n}\n```\n```\n@article{mockingjay,\n   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},\n   ISBN={9781509066315},\n   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},\n   DOI={10.1109/icassp40776.2020.9054458},\n   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n   publisher={IEEE},\n   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},\n   year={2020},\n   month={May}\n}\n```\n\n- If you use our organized upstream interface and features, or the *SUPERB* downstream benchmark, please consider citing the following:\n```\n@inproceedings{yang21c_interspeech,\n  author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},\n  title={{SUPERB: Speech Processing Universal PERformance Benchmark}},\n  year=2021,\n  booktitle={Proc. Interspeech 2021},\n  pages={1194--1198},\n  doi={10.21437/Interspeech.2021-1775}\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache Software License",
    "summary": "Self-Supervised Speech Pre-training and Representation Learning Toolkit",
    "version": "0.4.15",
    "project_urls": {
        "Bug Reports": "https://github.com/s3prl/s3prl/issues",
        "Homepage": "https://github.com/s3prl/s3prl",
        "Source": "https://github.com/s3prl/s3prl/"
    },
    "split_keywords": [
        "deep learning",
        "speech",
        "self-supervised",
        "representation",
        "pretraining"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92f6d7e55a154d34b1fff37bcad7d3985a321144759359bcc178844c2cf6c8dd",
                "md5": "f7d6f442209aa26caccbd0b2503ba774",
                "sha256": "849a37ee2b811a37ae381c561a854b3a551b080b757e0973c10688540154d495"
            },
            "downloads": -1,
            "filename": "s3prl-0.4.15.tar.gz",
            "has_sig": false,
            "md5_digest": "f7d6f442209aa26caccbd0b2503ba774",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6, <4",
            "size": 648637,
            "upload_time": "2024-03-18T13:38:07",
            "upload_time_iso_8601": "2024-03-18T13:38:07.797609Z",
            "url": "https://files.pythonhosted.org/packages/92/f6/d7e55a154d34b1fff37bcad7d3985a321144759359bcc178844c2cf6c8dd/s3prl-0.4.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-18 13:38:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "s3prl",
    "github_project": "s3prl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "s3prl"
}
        
Elapsed time: 0.23345s