malaya-speech


Namemalaya-speech JSON
Version 1.3.0.2 PyPI version JSON
download
home_pagehttps://github.com/huseinzol05/malaya-speech
SummarySpeech-Toolkit for bahasa Malaysia, powered by Tensorflow and PyTorch.
upload_time2022-09-22 04:20:04
maintainer
docs_urlNone
authorhuseinzol05
requires_python>=3.6.*
licenseMIT
keywords nlp bm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            **Malaya-Speech** is a Speech-Toolkit library for bahasa Malaysia, powered by Tensorflow and PyTorch.

Documentation
--------------

Stable released documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI
----------------------------------

::

    $ pip install malaya-speech

It will automatically install all dependencies except for Tensorflow and PyTorch. So you can choose your own Tensorflow CPU / GPU version and PyTorch CPU / GPU version.

Only **Python >= 3.6.0**, **Tensorflow >= 1.15.0**, and **PyTorch >= 1.10** are supported.

Development Release
---------------------------------

Install from `master` branch,
::

    $ pip install git+https://github.com/huseinzol05/malaya-speech.git


We recommend to use **virtualenv** for development. 

Documentation at https://malaya-speech.readthedocs.io/en/latest/

Features
--------

-  **Age Detection**, detect age in speech using Finetuned Speaker Vector.
-  **Speaker Diarization**, diarizing speakers using Pretrained Speaker Vector.
-  **Emotion Detection**, detect emotions in speech using Finetuned Speaker Vector.
-  **Force Alignment**, generate a time-aligned transcription of an audio file using RNNT and CTC.
-  **Gender Detection**, detect genders in speech using Finetuned Speaker Vector.
-  **Language Detection**, detect hyperlocal languages in speech using Finetuned Speaker Vector.
-  **Language Model**, using KenLM, Masked language model using BERT and RoBERTa, and GPT2 to do ASR decoder scoring.
-  **Multispeaker Separation**, Multispeaker separation using FastSep on 8k Wav.
-  **Noise Reduction**, reduce multilevel noises using STFT UNET.
-  **Speaker Change**, detect changing speakers using Finetuned Speaker Vector.
-  **Speaker overlap**, detect overlap speakers using Finetuned Speaker Vector.
-  **Speaker Vector**, calculate similarity between speakers using Pretrained Speaker Vector.
-  **Speech Enhancement**, enhance voice activities using Waveform UNET.
-  **SpeechSplit Conversion**, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.
-  **Speech-to-Text**, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish and Mandarin) and Singlish using RNNT, Wav2Vec2, HuBERT and BEST-RQ CTC.
-  **Super Resolution**, Super Resolution 4x for Waveform using ResNet UNET and Neural Vocoder.
-  **Text-to-Speech**, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2, FastPitch, GlowTTS, LightSpeech and VITS.
-  **Vocoder**, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.
-  **Voice Activity Detection**, detect voice activities using Finetuned Speaker Vector.
-  **Voice Conversion**, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.
-  **Hybrid 8-bit Quantization**, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models
------------------

Malaya-Speech also released pretrained models, simply check at `malaya-speech/pretrained-model <https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model>`_

-  **Wave UNET**,  Multi-Scale Neural Network for End-to-End Audio Source Separation, https://arxiv.org/abs/1806.03185
-  **Wave ResNet UNET**, added ResNet style into Wave UNET, no paper produced.
-  **Wave ResNext UNET**, added ResNext style into Wave UNET, no paper produced.
-  **Deep Speaker**, An End-to-End Neural Speaker Embedding System, https://arxiv.org/pdf/1705.02304.pdf
-  **SpeakerNet**, 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification, https://arxiv.org/abs/2010.12653
-  **VGGVox**, a large-scale speaker identification dataset, https://arxiv.org/pdf/1706.08612.pdf
-  **GhostVLAD**, Utterance-level Aggregation For Speaker Recognition In The Wild, https://arxiv.org/abs/1902.10107
-  **Conformer**, Convolution-augmented Transformer for Speech Recognition, https://arxiv.org/abs/2005.08100
-  **ALConformer**, A lite Conformer, no paper produced.
-  **Jasper**, An End-to-End Convolutional Neural Acoustic Model, https://arxiv.org/abs/1904.03288
-  **Tacotron2**, Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions, https://arxiv.org/abs/1712.05884
-  **FastSpeech2**, Fast and High-Quality End-to-End Text to Speech, https://arxiv.org/abs/2006.04558
-  **MelGAN**, Generative Adversarial Networks for Conditional Waveform Synthesis, https://arxiv.org/abs/1910.06711
-  **Multi-band MelGAN**, Faster Waveform Generation for High-Quality Text-to-Speech, https://arxiv.org/abs/2005.05106
-  **SRGAN**, Modified version of SRGAN to do 1D Convolution, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, https://arxiv.org/abs/1609.04802
-  **Speech Enhancement UNET**, https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement
-  **Speech Enhancement ResNet UNET**, Added ResNet style into Speech Enhancement UNET, no paper produced.
-  **Speech Enhancement ResNext UNET**, Added ResNext style into Speech Enhancement UNET, no paper produced.
-  **Universal MelGAN**, Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains, https://arxiv.org/abs/2011.09631
-  **FastVC**, Faster and Accurate Voice Conversion using Transformer, no paper produced.
-  **FastSep**, Faster and Accurate Speech Separation using Transformer, no paper produced.
-  **wav2vec 2.0**, A Framework for Self-Supervised Learning of Speech Representations, https://arxiv.org/abs/2006.11477
-  **FastSpeechSplit**, Unsupervised Speech Decomposition Via Triple Information Bottleneck using Transformer, no paper produced.
-  **Sepformer**, Attention is All You Need in Speech Separation, https://arxiv.org/abs/2010.13154
-  **FastSpeechSplit**, Faster and Accurate Speech Split Conversion using Transformer, no paper produced.
-  **HuBERT**, Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, https://arxiv.org/pdf/2106.07447v1.pdf
-  **FastPitch**, Parallel Text-to-speech with Pitch Prediction, https://arxiv.org/abs/2006.06873
-  **GlowTTS**, A Generative Flow for Text-to-Speech via Monotonic Alignment Search, https://arxiv.org/abs/2005.11129
-  **BEST-RQ**, Self-supervised learning with random-projection quantizer for speech recognition, https://arxiv.org/pdf/2202.01855.pdf
-  **LightSpeech**, Lightweight and Fast Text to Speech with Neural Architecture Search, https://arxiv.org/abs/2102.04040
-  **VITS**, Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech, https://arxiv.org/abs/2106.06103
-  **Squeezeformer**, An Efficient Transformer for Automatic Speech Recognition, https://arxiv.org/abs/2206.00888

References
-----------

If you use our software for research, please cite:

::

  @misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
    author = {Husein, Zolkepli},
    title = {Malaya-Speech},
    year = {2020},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
  }

Acknowledgement
----------------

Thanks to `KeyReply <https://www.keyreply.com/>`_ for private V100s cloud and `Mesolitica <https://mesolitica.com/>`_ for private RTXs cloud to train Malaya-Speech models.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/huseinzol05/malaya-speech",
    "name": "malaya-speech",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6.*",
    "maintainer_email": "",
    "keywords": "nlp,bm",
    "author": "huseinzol05",
    "author_email": "husein.zol05@gmail.com",
    "download_url": "https://github.com/huseinzol05/malaya-speech/archive/master.zip",
    "platform": null,
    "description": "**Malaya-Speech** is a Speech-Toolkit library for bahasa Malaysia, powered by Tensorflow and PyTorch.\n\nDocumentation\n--------------\n\nStable released documentation is available at https://malaya-speech.readthedocs.io/\n\nInstalling from the PyPI\n----------------------------------\n\n::\n\n    $ pip install malaya-speech\n\nIt will automatically install all dependencies except for Tensorflow and PyTorch. So you can choose your own Tensorflow CPU / GPU version and PyTorch CPU / GPU version.\n\nOnly **Python >= 3.6.0**, **Tensorflow >= 1.15.0**, and **PyTorch >= 1.10** are supported.\n\nDevelopment Release\n---------------------------------\n\nInstall from `master` branch,\n::\n\n    $ pip install git+https://github.com/huseinzol05/malaya-speech.git\n\n\nWe recommend to use **virtualenv** for development. \n\nDocumentation at https://malaya-speech.readthedocs.io/en/latest/\n\nFeatures\n--------\n\n-  **Age Detection**, detect age in speech using Finetuned Speaker Vector.\n-  **Speaker Diarization**, diarizing speakers using Pretrained Speaker Vector.\n-  **Emotion Detection**, detect emotions in speech using Finetuned Speaker Vector.\n-  **Force Alignment**, generate a time-aligned transcription of an audio file using RNNT and CTC.\n-  **Gender Detection**, detect genders in speech using Finetuned Speaker Vector.\n-  **Language Detection**, detect hyperlocal languages in speech using Finetuned Speaker Vector.\n-  **Language Model**, using KenLM, Masked language model using BERT and RoBERTa, and GPT2 to do ASR decoder scoring.\n-  **Multispeaker Separation**, Multispeaker separation using FastSep on 8k Wav.\n-  **Noise Reduction**, reduce multilevel noises using STFT UNET.\n-  **Speaker Change**, detect changing speakers using Finetuned Speaker Vector.\n-  **Speaker overlap**, detect overlap speakers using Finetuned Speaker Vector.\n-  **Speaker Vector**, calculate similarity between speakers using Pretrained Speaker Vector.\n-  **Speech Enhancement**, enhance voice activities using Waveform UNET.\n-  **SpeechSplit Conversion**, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.\n-  **Speech-to-Text**, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish and Mandarin) and Singlish using RNNT, Wav2Vec2, HuBERT and BEST-RQ CTC.\n-  **Super Resolution**, Super Resolution 4x for Waveform using ResNet UNET and Neural Vocoder.\n-  **Text-to-Speech**, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2, FastPitch, GlowTTS, LightSpeech and VITS.\n-  **Vocoder**, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.\n-  **Voice Activity Detection**, detect voice activities using Finetuned Speaker Vector.\n-  **Voice Conversion**, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.\n-  **Hybrid 8-bit Quantization**, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.\n\nPretrained Models\n------------------\n\nMalaya-Speech also released pretrained models, simply check at `malaya-speech/pretrained-model <https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model>`_\n\n-  **Wave UNET**,  Multi-Scale Neural Network for End-to-End Audio Source Separation, https://arxiv.org/abs/1806.03185\n-  **Wave ResNet UNET**, added ResNet style into Wave UNET, no paper produced.\n-  **Wave ResNext UNET**, added ResNext style into Wave UNET, no paper produced.\n-  **Deep Speaker**, An End-to-End Neural Speaker Embedding System, https://arxiv.org/pdf/1705.02304.pdf\n-  **SpeakerNet**, 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification, https://arxiv.org/abs/2010.12653\n-  **VGGVox**, a large-scale speaker identification dataset, https://arxiv.org/pdf/1706.08612.pdf\n-  **GhostVLAD**, Utterance-level Aggregation For Speaker Recognition In The Wild, https://arxiv.org/abs/1902.10107\n-  **Conformer**, Convolution-augmented Transformer for Speech Recognition, https://arxiv.org/abs/2005.08100\n-  **ALConformer**, A lite Conformer, no paper produced.\n-  **Jasper**, An End-to-End Convolutional Neural Acoustic Model, https://arxiv.org/abs/1904.03288\n-  **Tacotron2**, Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions, https://arxiv.org/abs/1712.05884\n-  **FastSpeech2**, Fast and High-Quality End-to-End Text to Speech, https://arxiv.org/abs/2006.04558\n-  **MelGAN**, Generative Adversarial Networks for Conditional Waveform Synthesis, https://arxiv.org/abs/1910.06711\n-  **Multi-band MelGAN**, Faster Waveform Generation for High-Quality Text-to-Speech, https://arxiv.org/abs/2005.05106\n-  **SRGAN**, Modified version of SRGAN to do 1D Convolution, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, https://arxiv.org/abs/1609.04802\n-  **Speech Enhancement UNET**, https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement\n-  **Speech Enhancement ResNet UNET**, Added ResNet style into Speech Enhancement UNET, no paper produced.\n-  **Speech Enhancement ResNext UNET**, Added ResNext style into Speech Enhancement UNET, no paper produced.\n-  **Universal MelGAN**, Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains, https://arxiv.org/abs/2011.09631\n-  **FastVC**, Faster and Accurate Voice Conversion using Transformer, no paper produced.\n-  **FastSep**, Faster and Accurate Speech Separation using Transformer, no paper produced.\n-  **wav2vec 2.0**, A Framework for Self-Supervised Learning of Speech Representations, https://arxiv.org/abs/2006.11477\n-  **FastSpeechSplit**, Unsupervised Speech Decomposition Via Triple Information Bottleneck using Transformer, no paper produced.\n-  **Sepformer**, Attention is All You Need in Speech Separation, https://arxiv.org/abs/2010.13154\n-  **FastSpeechSplit**, Faster and Accurate Speech Split Conversion using Transformer, no paper produced.\n-  **HuBERT**, Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, https://arxiv.org/pdf/2106.07447v1.pdf\n-  **FastPitch**, Parallel Text-to-speech with Pitch Prediction, https://arxiv.org/abs/2006.06873\n-  **GlowTTS**, A Generative Flow for Text-to-Speech via Monotonic Alignment Search, https://arxiv.org/abs/2005.11129\n-  **BEST-RQ**, Self-supervised learning with random-projection quantizer for speech recognition, https://arxiv.org/pdf/2202.01855.pdf\n-  **LightSpeech**, Lightweight and Fast Text to Speech with Neural Architecture Search, https://arxiv.org/abs/2102.04040\n-  **VITS**, Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech, https://arxiv.org/abs/2106.06103\n-  **Squeezeformer**, An Efficient Transformer for Automatic Speech Recognition, https://arxiv.org/abs/2206.00888\n\nReferences\n-----------\n\nIf you use our software for research, please cite:\n\n::\n\n  @misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,\n    author = {Husein, Zolkepli},\n    title = {Malaya-Speech},\n    year = {2020},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https://github.com/huseinzol05/malaya-speech}}\n  }\n\nAcknowledgement\n----------------\n\nThanks to `KeyReply <https://www.keyreply.com/>`_ for private V100s cloud and `Mesolitica <https://mesolitica.com/>`_ for private RTXs cloud to train Malaya-Speech models.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Speech-Toolkit for bahasa Malaysia, powered by Tensorflow and PyTorch.",
    "version": "1.3.0.2",
    "project_urls": {
        "Download": "https://github.com/huseinzol05/malaya-speech/archive/master.zip",
        "Homepage": "https://github.com/huseinzol05/malaya-speech"
    },
    "split_keywords": [
        "nlp",
        "bm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c7430a50cc833b9e76764543ff9eaf5a0fe791ab86fc86b5391fb11b6b8abce9",
                "md5": "6e0a20f472a0b5bad551c69ef0d6ec29",
                "sha256": "c66fd5b21a1ab94a3f73d8b920691511957e99e9f3191c9cfaa5511bdc00ffa4"
            },
            "downloads": -1,
            "filename": "malaya_speech-1.3.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6e0a20f472a0b5bad551c69ef0d6ec29",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.*",
            "size": 1144048,
            "upload_time": "2022-09-22T04:20:04",
            "upload_time_iso_8601": "2022-09-22T04:20:04.472922Z",
            "url": "https://files.pythonhosted.org/packages/c7/43/0a50cc833b9e76764543ff9eaf5a0fe791ab86fc86b5391fb11b6b8abce9/malaya_speech-1.3.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-09-22 04:20:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "huseinzol05",
    "github_project": "malaya-speech",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "malaya-speech"
}
        
Elapsed time: 0.40471s