speechbox

Name	speechbox JSON
Version	0.2.1 JSON
	download
home_page	https://github.com/huggingface/speechbox
Summary	Speechbox
upload_time	2023-01-27 18:37:20
maintainer
docs_url	None
author	The HuggingFace team
requires_python	>=3.7.0
license	Apache
keywords	deep learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
    <a href="https://github.com/huggingface/speechbox/releases">
        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/speechbox.svg">
    </a>
    <a href="CODE_OF_CONDUCT.md">
        <img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
    </a>
</p>

🤗 Speechbox offers a set of speech processing tools, such as punctuation restoration.

# Installation

With `pip` (official package)
    
```bash
pip install speechbox
```

# Contributing

We ❤️  contributions from the open-source community! 
If you want to contribute to this library, please check out our [Contribution guide](https://github.com/huggingface/speechbox/blob/main/CONTRIBUTING.md).
You can look out for [issues](https://github.com/huggingface/speechbox/issues) you'd like to tackle to contribute to the library.
- See [Good first issues](https://github.com/huggingface/speechbox/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) for general opportunities to contribute
- See [New Task](https://github.com/huggingface/speechbox/labels/New%20Task) for more advanced contributions. Make sure to have read the [Philosophy guide](https://github.com/huggingface/speechbox/blob/main/CONTRIBUTING.md#philosophy) to succesfully add a new task.

Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a> under **ML for Audio and Speech**. We discuss the new trends about machine learning methods for speech, help each other with contributions, personal projects or
just hang out ☕.

# Tasks

| Task | Description | Author |
|-|-|-|
| [Punctuation Restoration](#punctuation-restoration) | Punctuation restoration allows one to predict capitalized words as well as punctuation by using [Whisper](https://huggingface.co/models?other=whisper). | [Patrick von Platen](https://github.com/patrickvonplaten) |
| [ASR With Speaker Diarization](#asr-with-speaker-diarization) | Transcribe long audio files, such as meeting recordings, with speaker information (who spoke when) and the transcribed text. | [Sanchit Gandhi](https://github.com/sanchit-gandhi) |

## Punctuation Restoration

Punctuation restoration relies on the premise that [Whisper](https://huggingface.co/models?other=whisper) can understand universal speech. The model is forced to predict the passed words, 
but is allowed to capitalized letters, remove or add blank spaces as well as add punctuation. 
Punctuation is simply defined as the offial Python [string.Punctuation](https://docs.python.org/3/library/string.html#string.punctuation) characters.

**Note**: For now this package has only been tested with:
- [openai/whisper-tiny.en](https://huggingface.co/openai/whisper-tiny.en)
- [openai/whisper-base.en](https://huggingface.co/openai/whisper-base.en)
- [openai/whisper-small.en](https://huggingface.co/openai/whisper-small.en)
- [openai/whisper-medium.en](https://huggingface.co/openai/whisper-medium.en)

and **only** on some 80 audio samples of [patrickvonplaten/librispeech_asr_dummy](https://huggingface.co/datasets/patrickvonplaten/librispeech_asr_dummy).

See some transcribed results [here](https://huggingface.co/datasets?other=speechbox_punc).

### Web Demo

If you want to try out the punctuation restoration, you can try out the following 🚀 Spaces:

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/speechbox/whisper-restore-punctuation)

### Example

In order to use the punctuation restoration task, you need to install [Transformers](https://github.com/huggingface/transformers):

```
pip install --upgrade transformers
```

For this example, we will additionally make use of [datasets](https://github.com/huggingface/datasets) to load a sample audio file:

```
pip install --upgrade datasets
```

Now we stream a single audio sample, load the punctuation restoring class with ["openai/whisper-tiny.en"](https://huggingface.co/openai/whisper-tiny.en) and add punctuation to the transcription.


```python
from speechbox import PunctuationRestorer
from datasets import load_dataset

streamed_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)

# get first sample
sample = next(iter(streamed_dataset))

# print out normalized transcript
print(sample["text"])
# => "HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE"

# load the restoring class
restorer = PunctuationRestorer.from_pretrained("openai/whisper-tiny.en")
restorer.to("cuda")

restored_text, log_probs = restorer(sample["audio"]["array"], sample["text"], sampling_rate=sample["audio"]["sampling_rate"], num_beams=1)

print("Restored text:\n", restored_text)
```

See [examples/restore](https://github.com/huggingface/speechbox/blob/main/examples/restore.py) for more information.

## ASR With Speaker Diarization
Given an unlabelled audio segment, a speaker diarization model is used to predict "who spoke when". These speaker 
predictions are paired with the output of a speech recognition system (e.g. Whisper) to give speaker-labelled 
transcriptions.

The combined ASR + Diarization pipeline can be applied directly to long audio samples, such as meeting recordings, to 
give fully annotated meeting transcriptions. 

### Web Demo

If you want to try out the ASR + Diarization pipeline, you can try out the following Space:

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/speechbox/whisper-speaker-diarization)

### Example

In order to use the ASR + Diarization pipeline, you need to install 🤗 [Transformers](https://github.com/huggingface/transformers) 
and [pyannote.audio](https://github.com/pyannote/pyannote-audio):

```
pip install --upgrade transformers pyannote.audio
```

For this example, we will additionally make use of 🤗 [Datasets](https://github.com/huggingface/datasets) to load a sample audio file:

```
pip install --upgrade datasets
```

Now we stream a single audio sample, pass it to the ASR + Diarization pipeline, and return the speaker-segmented transcription:

```python
import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)

# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))

out = pipeline(sample["audio"])
print(out)
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/huggingface/speechbox",
    "name": "speechbox",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7.0",
    "maintainer_email": "",
    "keywords": "deep learning",
    "author": "The HuggingFace team",
    "author_email": "patrick@huggingface.co",
    "download_url": "https://files.pythonhosted.org/packages/bc/df/a8e3a1ecd01896f98be8d23dbc2d488e3b06c03ab75d5b9e87014199c1f8/speechbox-0.2.1.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <a href=\"https://github.com/huggingface/speechbox/releases\">\n        <img alt=\"GitHub release\" src=\"https://img.shields.io/github/release/huggingface/speechbox.svg\">\n    </a>\n    <a href=\"CODE_OF_CONDUCT.md\">\n        <img alt=\"Contributor Covenant\" src=\"https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg\">\n    </a>\n</p>\n\n\ud83e\udd17 Speechbox offers a set of speech processing tools, such as punctuation restoration.\n\n# Installation\n\nWith `pip` (official package)\n    \n```bash\npip install speechbox\n```\n\n# Contributing\n\nWe \u2764\ufe0f  contributions from the open-source community! \nIf you want to contribute to this library, please check out our [Contribution guide](https://github.com/huggingface/speechbox/blob/main/CONTRIBUTING.md).\nYou can look out for [issues](https://github.com/huggingface/speechbox/issues) you'd like to tackle to contribute to the library.\n- See [Good first issues](https://github.com/huggingface/speechbox/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) for general opportunities to contribute\n- See [New Task](https://github.com/huggingface/speechbox/labels/New%20Task) for more advanced contributions. Make sure to have read the [Philosophy guide](https://github.com/huggingface/speechbox/blob/main/CONTRIBUTING.md#philosophy) to succesfully add a new task.\n\nAlso, say \ud83d\udc4b in our public Discord channel <a href=\"https://discord.gg/G7tWnz98XR\"><img alt=\"Join us on Discord\" src=\"https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white\"></a> under **ML for Audio and Speech**. We discuss the new trends about machine learning methods for speech, help each other with contributions, personal projects or\njust hang out \u2615.\n\n# Tasks\n\n| Task | Description | Author |\n|-|-|-|\n| [Punctuation Restoration](#punctuation-restoration) | Punctuation restoration allows one to predict capitalized words as well as punctuation by using [Whisper](https://huggingface.co/models?other=whisper). | [Patrick von Platen](https://github.com/patrickvonplaten) |\n| [ASR With Speaker Diarization](#asr-with-speaker-diarization) | Transcribe long audio files, such as meeting recordings, with speaker information (who spoke when) and the transcribed text. | [Sanchit Gandhi](https://github.com/sanchit-gandhi) |\n\n## Punctuation Restoration\n\nPunctuation restoration relies on the premise that [Whisper](https://huggingface.co/models?other=whisper) can understand universal speech. The model is forced to predict the passed words, \nbut is allowed to capitalized letters, remove or add blank spaces as well as add punctuation. \nPunctuation is simply defined as the offial Python [string.Punctuation](https://docs.python.org/3/library/string.html#string.punctuation) characters.\n\n**Note**: For now this package has only been tested with:\n- [openai/whisper-tiny.en](https://huggingface.co/openai/whisper-tiny.en)\n- [openai/whisper-base.en](https://huggingface.co/openai/whisper-base.en)\n- [openai/whisper-small.en](https://huggingface.co/openai/whisper-small.en)\n- [openai/whisper-medium.en](https://huggingface.co/openai/whisper-medium.en)\n\nand **only** on some 80 audio samples of [patrickvonplaten/librispeech_asr_dummy](https://huggingface.co/datasets/patrickvonplaten/librispeech_asr_dummy).\n\nSee some transcribed results [here](https://huggingface.co/datasets?other=speechbox_punc).\n\n### Web Demo\n\nIf you want to try out the punctuation restoration, you can try out the following \ud83d\ude80 Spaces:\n\n[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/speechbox/whisper-restore-punctuation)\n\n### Example\n\nIn order to use the punctuation restoration task, you need to install [Transformers](https://github.com/huggingface/transformers):\n\n```\npip install --upgrade transformers\n```\n\nFor this example, we will additionally make use of [datasets](https://github.com/huggingface/datasets) to load a sample audio file:\n\n```\npip install --upgrade datasets\n```\n\nNow we stream a single audio sample, load the punctuation restoring class with [\"openai/whisper-tiny.en\"](https://huggingface.co/openai/whisper-tiny.en) and add punctuation to the transcription.\n\n\n```python\nfrom speechbox import PunctuationRestorer\nfrom datasets import load_dataset\n\nstreamed_dataset = load_dataset(\"librispeech_asr\", \"clean\", split=\"validation\", streaming=True)\n\n# get first sample\nsample = next(iter(streamed_dataset))\n\n# print out normalized transcript\nprint(sample[\"text\"])\n# => \"HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE\"\n\n# load the restoring class\nrestorer = PunctuationRestorer.from_pretrained(\"openai/whisper-tiny.en\")\nrestorer.to(\"cuda\")\n\nrestored_text, log_probs = restorer(sample[\"audio\"][\"array\"], sample[\"text\"], sampling_rate=sample[\"audio\"][\"sampling_rate\"], num_beams=1)\n\nprint(\"Restored text:\\n\", restored_text)\n```\n\nSee [examples/restore](https://github.com/huggingface/speechbox/blob/main/examples/restore.py) for more information.\n\n## ASR With Speaker Diarization\nGiven an unlabelled audio segment, a speaker diarization model is used to predict \"who spoke when\". These speaker \npredictions are paired with the output of a speech recognition system (e.g. Whisper) to give speaker-labelled \ntranscriptions.\n\nThe combined ASR + Diarization pipeline can be applied directly to long audio samples, such as meeting recordings, to \ngive fully annotated meeting transcriptions. \n\n### Web Demo\n\nIf you want to try out the ASR + Diarization pipeline, you can try out the following Space:\n\n[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/speechbox/whisper-speaker-diarization)\n\n### Example\n\nIn order to use the ASR + Diarization pipeline, you need to install \ud83e\udd17 [Transformers](https://github.com/huggingface/transformers) \nand [pyannote.audio](https://github.com/pyannote/pyannote-audio):\n\n```\npip install --upgrade transformers pyannote.audio\n```\n\nFor this example, we will additionally make use of \ud83e\udd17 [Datasets](https://github.com/huggingface/datasets) to load a sample audio file:\n\n```\npip install --upgrade datasets\n```\n\nNow we stream a single audio sample, pass it to the ASR + Diarization pipeline, and return the speaker-segmented transcription:\n\n```python\nimport torch\nfrom speechbox import ASRDiarizationPipeline\nfrom datasets import load_dataset\n\ndevice = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\npipeline = ASRDiarizationPipeline.from_pretrained(\"openai/whisper-tiny\", device=device)\n\n# load dataset of concatenated LibriSpeech samples\nconcatenated_librispeech = load_dataset(\"sanchit-gandhi/concatenated_librispeech\", split=\"train\", streaming=True)\n# get first sample\nsample = next(iter(concatenated_librispeech))\n\nout = pipeline(sample[\"audio\"])\nprint(out)\n```\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Speechbox",
    "version": "0.2.1",
    "split_keywords": [
        "deep",
        "learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "32e84cb10f042ea08fd234545e0d386243d2b77f94d3b39ee6432b842242d8c3",
                "md5": "29a686720b2e3fc9701d3e99fed13a16",
                "sha256": "bfd4c63afa57a4dc26179f0143636d1ebc224a8333618bed9c8c971b06500fb5"
            },
            "downloads": -1,
            "filename": "speechbox-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "29a686720b2e3fc9701d3e99fed13a16",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.0",
            "size": 20266,
            "upload_time": "2023-01-27T18:37:17",
            "upload_time_iso_8601": "2023-01-27T18:37:17.386366Z",
            "url": "https://files.pythonhosted.org/packages/32/e8/4cb10f042ea08fd234545e0d386243d2b77f94d3b39ee6432b842242d8c3/speechbox-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bcdfa8e3a1ecd01896f98be8d23dbc2d488e3b06c03ab75d5b9e87014199c1f8",
                "md5": "71ce5368d9215823cc78a7dd3e91546f",
                "sha256": "250de696210e2390af61b7204d84cb9c29a9789919ebbfdf5bebf65c4bf35ce4"
            },
            "downloads": -1,
            "filename": "speechbox-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "71ce5368d9215823cc78a7dd3e91546f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.0",
            "size": 22358,
            "upload_time": "2023-01-27T18:37:20",
            "upload_time_iso_8601": "2023-01-27T18:37:20.867707Z",
            "url": "https://files.pythonhosted.org/packages/bc/df/a8e3a1ecd01896f98be8d23dbc2d488e3b06c03ab75d5b9e87014199c1f8/speechbox-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-27 18:37:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "huggingface",
    "github_project": "speechbox",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "speechbox"
}

The HuggingFace team