asrp

Name	asrp JSON
Version	0.0.72 JSON
	download
home_page	https://github.com/voidful/asrp
Summary
upload_time	2023-04-07 18:11:23
maintainer
docs_url	None
author	Voidful
requires_python
license	Apache
keywords	asr
VCS
bugtrack_url
requirements	Unidecode jiwer transformers editdistance librosa webrtcvad pyctcdecode openai-whisper nlp2
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ASRP: Automatic Speech Recognition Preprocessing Utility

ASRP is a python package that offers a set of tools to preprocess and evaluate ASR (Automatic Speech Recognition) text.
The package also provides a speech-to-text transcription tool and a text-to-speech conversion tool. The code is
open-source and can be installed using pip.

Key Features

- [Preprocess ASR text with ease](#preprocess)
- [Evaluate ASR output quality](#Evaluation)
- [Transcribe speech to Hubert code](#speech-to-discrete-unit)
- [Convert unit code to speech](#discrete-unit-to-speech)
- [Enhance speech quality with a noise reduction tool](#speech-enhancement)
- [LiveASR tool for real-time speech recognition](#liveasr---huggingfaces-model)
- [Speaker Embedding Extraction (x-vector/d-vector)](#speaker-embedding-extraction---x-vector)

## install

`pip install asrp`

## Preprocess

ASRP offers an easy-to-use set of functions to preprocess ASR text data.   
The input data is a dictionary with the key 'sentence', and the output is the preprocessed text.     
You can either use the fun_en function or use dynamic loading. Here's how to use it:

```python
import asrp

batch_data = {
    'sentence': "I'm fine, thanks."
}
asrp.fun_en(batch_data)
```

dynamic loading

```python
import asrp

batch_data = {
    'sentence': "I'm fine, thanks."
}
preprocessor = getattr(asrp, 'fun_en')
preprocessor(batch_data)
```

## Evaluation

ASRP provides functions to evaluate the output quality of ASR systems using     
the Word Error Rate (WER) and Character Error Rate (CER) metrics.   
Here's how to use it:

```python
import asrp

targets = ['HuggingFace is great!', 'Love Transformers!', 'Let\'s wav2vec!']
preds = ['HuggingFace is awesome!', 'Transformers is powerful.', 'Let\'s finetune wav2vec!']
print("chunk size WER: {:2f}".format(100 * asrp.chunked_wer(targets, preds, chunk_size=None)))
print("chunk size CER: {:2f}".format(100 * asrp.chunked_cer(targets, preds, chunk_size=None)))
```

## Speech to Discrete Unit

```python
import asrp
import nlp2

# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md
# https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/gslm/ulm
nlp2.download_file(
    'https://huggingface.co/voidful/mhubert-base/resolve/main/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', './')
hc = asrp.HubertCode("voidful/mhubert-base", './mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', 11,
                     chunk_sec=30,
                     worker=20)
hc('voice file path')
```

## Discrete Unit to speech

```python
import asrp

code = []  # discrete unit
# https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/unit2speech
# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md
cs = asrp.Code2Speech(tts_checkpoint='./tts_checkpoint_best.pt', waveglow_checkpint='waveglow_256channels_new.pt')
cs(code)

# play on notebook
import IPython.display as ipd

ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)
```

mhubert English hifigan vocoder example

```python
import asrp
import nlp2
import IPython.display as ipd
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
nlp2.download_file(
    'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',
    './')


tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts")
model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts")
model.eval()
cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')

inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt")
code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]
code = [int(i) for i in code.replace("</s>","").replace("<s>","").split("v_tok_")[1:]]
print(code)
ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)

```

## Speech Enhancement

ASRP also provides a tool to enhance speech quality with a noise reduction tool.  
from https://github.com/facebookresearch/fairseq/tree/main/examples/speech_synthesis/preprocessing/denoiser

```python
from asrp import SpeechEnhancer

ase = SpeechEnhancer()
print(ase('./test/xxx.wav'))
```

## LiveASR - huggingface's model

* modify from https://github.com/oliverguhr/wav2vec2-live

```python
from asrp.live import LiveSpeech

english_model = "voidful/wav2vec2-xlsr-multilingual-56"
asr = LiveSpeech(english_model, device_name="default")
asr.start()

try:
    while True:
        text, sample_length, inference_time = asr.get_last_text()
        print(f"{sample_length:.3f}s"
              + f"\t{inference_time:.3f}s"
              + f"\t{text}")

except KeyboardInterrupt:
    asr.stop()
```

## LiveASR - whisper's model

```python
from asrp.live import LiveSpeech

whisper_model = "tiny"
asr = LiveSpeech(whisper_model, vad_mode=2, language='zh')
asr.start()
last_text = ""
while True:
    asr_text = ""
    try:
        asr_text, sample_length, inference_time = asr.get_last_text()
        if len(asr_text) > 0:
            print(asr_text, sample_length, inference_time)
    except KeyboardInterrupt:
        asr.stop()
        break

```

## Speaker Embedding Extraction - x vector

from https://speechbrain.readthedocs.io/en/latest/API/speechbrain.lobes.models.Xvector.html

```python
from asrp.speaker_embedding import extract_x_vector

extract_x_vector('./test/xxx.wav')
```

## Speaker Embedding Extraction - d vector

from https://github.com/yistLin/dvector

```python
from asrp.speaker_embedding import extract_d_vector

extract_d_vector('./test/xxx.wav')
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/voidful/asrp",
    "name": "asrp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "asr",
    "author": "Voidful",
    "author_email": "voidful.stack@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0b/d6/fd284b64b221caccf0f69612301c1fbae901dc5d41ffa996fc0d2260d0d5/asrp-0.0.72.tar.gz",
    "platform": null,
    "description": "# ASRP: Automatic Speech Recognition Preprocessing Utility\n\nASRP is a python package that offers a set of tools to preprocess and evaluate ASR (Automatic Speech Recognition) text.\nThe package also provides a speech-to-text transcription tool and a text-to-speech conversion tool. The code is\nopen-source and can be installed using pip.\n\nKey Features\n\n- [Preprocess ASR text with ease](#preprocess)\n- [Evaluate ASR output quality](#Evaluation)\n- [Transcribe speech to Hubert code](#speech-to-discrete-unit)\n- [Convert unit code to speech](#discrete-unit-to-speech)\n- [Enhance speech quality with a noise reduction tool](#speech-enhancement)\n- [LiveASR tool for real-time speech recognition](#liveasr---huggingfaces-model)\n- [Speaker Embedding Extraction (x-vector/d-vector)](#speaker-embedding-extraction---x-vector)\n\n## install\n\n`pip install asrp`\n\n## Preprocess\n\nASRP offers an easy-to-use set of functions to preprocess ASR text data.   \nThe input data is a dictionary with the key 'sentence', and the output is the preprocessed text.     \nYou can either use the fun_en function or use dynamic loading. Here's how to use it:\n\n```python\nimport asrp\n\nbatch_data = {\n    'sentence': \"I'm fine, thanks.\"\n}\nasrp.fun_en(batch_data)\n```\n\ndynamic loading\n\n```python\nimport asrp\n\nbatch_data = {\n    'sentence': \"I'm fine, thanks.\"\n}\npreprocessor = getattr(asrp, 'fun_en')\npreprocessor(batch_data)\n```\n\n## Evaluation\n\nASRP provides functions to evaluate the output quality of ASR systems using     \nthe Word Error Rate (WER) and Character Error Rate (CER) metrics.   \nHere's how to use it:\n\n```python\nimport asrp\n\ntargets = ['HuggingFace is great!', 'Love Transformers!', 'Let\\'s wav2vec!']\npreds = ['HuggingFace is awesome!', 'Transformers is powerful.', 'Let\\'s finetune wav2vec!']\nprint(\"chunk size WER: {:2f}\".format(100 * asrp.chunked_wer(targets, preds, chunk_size=None)))\nprint(\"chunk size CER: {:2f}\".format(100 * asrp.chunked_cer(targets, preds, chunk_size=None)))\n```\n\n## Speech to Discrete Unit\n\n```python\nimport asrp\nimport nlp2\n\n# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md\n# https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/gslm/ulm\nnlp2.download_file(\n    'https://huggingface.co/voidful/mhubert-base/resolve/main/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', './')\nhc = asrp.HubertCode(\"voidful/mhubert-base\", './mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', 11,\n                     chunk_sec=30,\n                     worker=20)\nhc('voice file path')\n```\n\n## Discrete Unit to speech\n\n```python\nimport asrp\n\ncode = []  # discrete unit\n# https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/unit2speech\n# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md\ncs = asrp.Code2Speech(tts_checkpoint='./tts_checkpoint_best.pt', waveglow_checkpint='waveglow_256channels_new.pt')\ncs(code)\n\n# play on notebook\nimport IPython.display as ipd\n\nipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)\n```\n\nmhubert English hifigan vocoder example\n\n```python\nimport asrp\nimport nlp2\nimport IPython.display as ipd\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\nnlp2.download_file(\n    'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',\n    './')\n\n\ntokenizer = AutoTokenizer.from_pretrained(\"voidful/mhubert-unit-tts\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"voidful/mhubert-unit-tts\")\nmodel.eval()\ncs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')\n\ninputs = tokenizer([\"The quick brown fox jumps over the lazy dog.\"], return_tensors=\"pt\")\ncode = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]\ncode = [int(i) for i in code.replace(\"</s>\",\"\").replace(\"<s>\",\"\").split(\"v_tok_\")[1:]]\nprint(code)\nipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)\n\n```\n\n## Speech Enhancement\n\nASRP also provides a tool to enhance speech quality with a noise reduction tool.  \nfrom https://github.com/facebookresearch/fairseq/tree/main/examples/speech_synthesis/preprocessing/denoiser\n\n```python\nfrom asrp import SpeechEnhancer\n\nase = SpeechEnhancer()\nprint(ase('./test/xxx.wav'))\n```\n\n## LiveASR - huggingface's model\n\n* modify from https://github.com/oliverguhr/wav2vec2-live\n\n```python\nfrom asrp.live import LiveSpeech\n\nenglish_model = \"voidful/wav2vec2-xlsr-multilingual-56\"\nasr = LiveSpeech(english_model, device_name=\"default\")\nasr.start()\n\ntry:\n    while True:\n        text, sample_length, inference_time = asr.get_last_text()\n        print(f\"{sample_length:.3f}s\"\n              + f\"\\t{inference_time:.3f}s\"\n              + f\"\\t{text}\")\n\nexcept KeyboardInterrupt:\n    asr.stop()\n```\n\n## LiveASR - whisper's model\n\n```python\nfrom asrp.live import LiveSpeech\n\nwhisper_model = \"tiny\"\nasr = LiveSpeech(whisper_model, vad_mode=2, language='zh')\nasr.start()\nlast_text = \"\"\nwhile True:\n    asr_text = \"\"\n    try:\n        asr_text, sample_length, inference_time = asr.get_last_text()\n        if len(asr_text) > 0:\n            print(asr_text, sample_length, inference_time)\n    except KeyboardInterrupt:\n        asr.stop()\n        break\n\n```\n\n## Speaker Embedding Extraction - x vector\n\nfrom https://speechbrain.readthedocs.io/en/latest/API/speechbrain.lobes.models.Xvector.html\n\n```python\nfrom asrp.speaker_embedding import extract_x_vector\n\nextract_x_vector('./test/xxx.wav')\n```\n\n## Speaker Embedding Extraction - d vector\n\nfrom https://github.com/yistLin/dvector\n\n```python\nfrom asrp.speaker_embedding import extract_d_vector\n\nextract_d_vector('./test/xxx.wav')\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "",
    "version": "0.0.72",
    "split_keywords": [
        "asr"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4eea42cc7a17dcdb16905fe16ec4057d3b48724404ec2416229cde851dcb9fe6",
                "md5": "c7e2f70f2adda334b8d3f490de30b677",
                "sha256": "09979172c489cd05e6bf45c6e8cb7a92157943fc68276638d329fa11ce53ac0f"
            },
            "downloads": -1,
            "filename": "asrp-0.0.72-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c7e2f70f2adda334b8d3f490de30b677",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 53161,
            "upload_time": "2023-04-07T18:11:21",
            "upload_time_iso_8601": "2023-04-07T18:11:21.231661Z",
            "url": "https://files.pythonhosted.org/packages/4e/ea/42cc7a17dcdb16905fe16ec4057d3b48724404ec2416229cde851dcb9fe6/asrp-0.0.72-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0bd6fd284b64b221caccf0f69612301c1fbae901dc5d41ffa996fc0d2260d0d5",
                "md5": "b86b0948e2471e1bfaf44a56b0958a25",
                "sha256": "c640b82d37151eefdf0287bee61ab787e6c23d63af28e3c655c0d92ec50ec83e"
            },
            "downloads": -1,
            "filename": "asrp-0.0.72.tar.gz",
            "has_sig": false,
            "md5_digest": "b86b0948e2471e1bfaf44a56b0958a25",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 51589,
            "upload_time": "2023-04-07T18:11:23",
            "upload_time_iso_8601": "2023-04-07T18:11:23.802993Z",
            "url": "https://files.pythonhosted.org/packages/0b/d6/fd284b64b221caccf0f69612301c1fbae901dc5d41ffa996fc0d2260d0d5/asrp-0.0.72.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-07 18:11:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "voidful",
    "github_project": "asrp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "Unidecode",
            "specs": []
        },
        {
            "name": "jiwer",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.25.1"
                ]
            ]
        },
        {
            "name": "editdistance",
            "specs": []
        },
        {
            "name": "librosa",
            "specs": []
        },
        {
            "name": "webrtcvad",
            "specs": []
        },
        {
            "name": "pyctcdecode",
            "specs": []
        },
        {
            "name": "openai-whisper",
            "specs": []
        },
        {
            "name": "nlp2",
            "specs": []
        }
    ],
    "lcname": "asrp"
}

Voidful