whisperplus


Namewhisperplus JSON
Version 0.3.1 PyPI version JSON
download
home_pagehttps://github.com/kadirnar/whisperplus
SummaryWhisperPlus: A Python library for WhisperPlus API.
upload_time2024-05-05 21:41:50
maintainerNone
docs_urlNone
authorkadirnar
requires_pythonNone
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<h2>
    WhisperPlus: Faster, Smarter, and More Capable 🚀
</h2>
<div>
    <img width="500" alt="teaser" src="doc\openai-whisper.jpg">
</div>
<div>
    <a href="https://pypi.org/project/whisperplus" target="_blank">
        <img src="https://img.shields.io/pypi/pyversions/whisperplus.svg?color=%2334D058" alt="Supported Python versions">
    </a>
    <a href="https://badge.fury.io/py/whisperplus"><img src="https://badge.fury.io/py/whisperplus.svg" alt="pypi version"></a>
    <a href="https://huggingface.co/spaces/ArtGAN/Audio-WebUI"><img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="HuggingFace Spaces"></a>
</div>
</div>

## 🛠️ Installation

```bash
pip install whisperplus git+https://github.com/huggingface/transformers
pip install flash-attn --no-build-isolation
```

## 🤗 Model Hub

You can find the models on the [HuggingFace Model Hub](https://huggingface.co/models?search=whisper)

## 🎙️ Usage

To use the whisperplus library, follow the steps below for different tasks:

### 🎵 Youtube URL to Audio

```python
from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3
from transformers import BitsAndBytesConfig, HqqConfig
import torch

url = "https://www.youtube.com/watch?v=di3rHkEZuUw"
audio_path = download_and_convert_to_mp3(url)

hqq_config = HqqConfig(
    nbits=1,
    group_size=64,
    quant_zero=False,
    quant_scale=False,
    axis=0,
    offload_meta=False,
)  # axis=0 is used by default

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

pipeline = SpeechToTextPipeline(
    model_id="distil-whisper/distil-large-v3",
    quant_config=hqq_config,
    hqq=True,
    flash_attention_2=True,
)

transcript = pipeline(
    audio_path=audio_path,
    chunk_length_s=30,
    stride_length_s=5,
    max_new_tokens=128,
    batch_size=100,
    language="english",
    return_timestamps=False,
)

print(transcript)
```

### 📰 Summarization

```python
from whisperplus import TextSummarizationPipeline

summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])
```

### 📰 Long Text Support Summarization

```python
from whisperplus import LongTextSummarizationPipeline

summarizer = LongTextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary_text = summarizer.summarize(transcript)
print(summary_text)
```

### 💬 Speaker Diarization

```python
from whisperplus import (
    ASRDiarizationPipeline,
    download_and_convert_to_mp3,
    format_speech_to_dialogue,
)

audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)
```

### ⭐ RAG - Chat with Video(LanceDB)

```python
from whisperplus.pipelines.chatbot import ChatWithVideo

chat = ChatWithVideo(
    input_file="trascript.txt",
    llm_model_name="TheBloke/Mistral-7B-v0.1-GGUF",
    llm_model_file="mistral-7b-v0.1.Q4_K_M.gguf",
    llm_model_type="mistral",
    embedding_model_name="sentence-transformers/all-MiniLM-L6-v2",
)

query = "what is this video about ?"
response = chat.run_query(query)
print(response)
```

### 🌠 RAG - Chat with Video(AutoLLM)

```python
from whisperplus import AutoLLMChatWithVideo

# service_context_params
system_prompt = """
You are an friendly ai assistant that help users find the most relevant and accurate answers
to their questions based on the documents you have access to.
When answering the questions, mostly rely on the info in documents.
"""
query_wrapper_prompt = """
The document information is below.
---------------------
{context_str}
---------------------
Using the document information and mostly relying on it,
answer the query.
Query: {query_str}
Answer:
"""

chat = AutoLLMChatWithVideo(
    input_file="input_dir",  # path of mp3 file
    openai_key="YOUR_OPENAI_KEY",  # optional
    huggingface_key="YOUR_HUGGINGFACE_KEY",  # optional
    llm_model="gpt-3.5-turbo",
    llm_max_tokens="256",
    llm_temperature="0.1",
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    embed_model="huggingface/BAAI/bge-large-zh",  # "text-embedding-ada-002"
)

query = "what is this video about ?"
response = chat.run_query(query)
print(response)
```

### 🎙️ Speech to Text

```python
from whisperplus import TextToSpeechPipeline

tts = TextToSpeechPipeline(model_id="suno/bark")
audio = tts(text="Hello World", voice_preset="v2/en_speaker_6")
```

### 🎥 AutoCaption

```python
from whisperplus import WhisperAutoCaptionPipeline

caption = WhisperAutoCaptionPipeline(model_id="openai/whisper-large-v3")
caption(video_path="test.mp4", output_path="output.mp4", language="turkish")
```

## 😍 Contributing

```bash
pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files
```

## 📜 License

This project is licensed under the terms of the Apache License 2.0.

## 🤗 Citation

```bibtex
@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kadirnar/whisperplus",
    "name": "whisperplus",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "kadirnar",
    "author_email": "kadir.nar@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a5/09/acc4e860687e1b773f71a21ec6233448bb242b4d9e97757c3cb69a3a28e4/whisperplus-0.3.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<h2>\n    WhisperPlus: Faster, Smarter, and More Capable \ud83d\ude80\n</h2>\n<div>\n    <img width=\"500\" alt=\"teaser\" src=\"doc\\openai-whisper.jpg\">\n</div>\n<div>\n    <a href=\"https://pypi.org/project/whisperplus\" target=\"_blank\">\n        <img src=\"https://img.shields.io/pypi/pyversions/whisperplus.svg?color=%2334D058\" alt=\"Supported Python versions\">\n    </a>\n    <a href=\"https://badge.fury.io/py/whisperplus\"><img src=\"https://badge.fury.io/py/whisperplus.svg\" alt=\"pypi version\"></a>\n    <a href=\"https://huggingface.co/spaces/ArtGAN/Audio-WebUI\"><img src=\"https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg\" alt=\"HuggingFace Spaces\"></a>\n</div>\n</div>\n\n## \ud83d\udee0\ufe0f Installation\n\n```bash\npip install whisperplus git+https://github.com/huggingface/transformers\npip install flash-attn --no-build-isolation\n```\n\n## \ud83e\udd17 Model Hub\n\nYou can find the models on the [HuggingFace Model Hub](https://huggingface.co/models?search=whisper)\n\n## \ud83c\udf99\ufe0f Usage\n\nTo use the whisperplus library, follow the steps below for different tasks:\n\n### \ud83c\udfb5 Youtube URL to Audio\n\n```python\nfrom whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3\nfrom transformers import BitsAndBytesConfig, HqqConfig\nimport torch\n\nurl = \"https://www.youtube.com/watch?v=di3rHkEZuUw\"\naudio_path = download_and_convert_to_mp3(url)\n\nhqq_config = HqqConfig(\n    nbits=1,\n    group_size=64,\n    quant_zero=False,\n    quant_scale=False,\n    axis=0,\n    offload_meta=False,\n)  # axis=0 is used by default\n\nbnb_config = BitsAndBytesConfig(\n    load_in_4bit=True,\n    bnb_4bit_quant_type=\"nf4\",\n    bnb_4bit_compute_dtype=torch.bfloat16,\n    bnb_4bit_use_double_quant=True,\n)\n\npipeline = SpeechToTextPipeline(\n    model_id=\"distil-whisper/distil-large-v3\",\n    quant_config=hqq_config,\n    hqq=True,\n    flash_attention_2=True,\n)\n\ntranscript = pipeline(\n    audio_path=audio_path,\n    chunk_length_s=30,\n    stride_length_s=5,\n    max_new_tokens=128,\n    batch_size=100,\n    language=\"english\",\n    return_timestamps=False,\n)\n\nprint(transcript)\n```\n\n### \ud83d\udcf0 Summarization\n\n```python\nfrom whisperplus import TextSummarizationPipeline\n\nsummarizer = TextSummarizationPipeline(model_id=\"facebook/bart-large-cnn\")\nsummary = summarizer.summarize(transcript)\nprint(summary[0][\"summary_text\"])\n```\n\n### \ud83d\udcf0 Long Text Support Summarization\n\n```python\nfrom whisperplus import LongTextSummarizationPipeline\n\nsummarizer = LongTextSummarizationPipeline(model_id=\"facebook/bart-large-cnn\")\nsummary_text = summarizer.summarize(transcript)\nprint(summary_text)\n```\n\n### \ud83d\udcac Speaker Diarization\n\n```python\nfrom whisperplus import (\n    ASRDiarizationPipeline,\n    download_and_convert_to_mp3,\n    format_speech_to_dialogue,\n)\n\naudio_path = download_and_convert_to_mp3(\"https://www.youtube.com/watch?v=mRB14sFHw2E\")\n\ndevice = \"cuda\"  # cpu or mps\npipeline = ASRDiarizationPipeline.from_pretrained(\n    asr_model=\"openai/whisper-large-v3\",\n    diarizer_model=\"pyannote/speaker-diarization\",\n    use_auth_token=False,\n    chunk_length_s=30,\n    device=device,\n)\n\noutput_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)\ndialogue = format_speech_to_dialogue(output_text)\nprint(dialogue)\n```\n\n### \u2b50 RAG - Chat with Video(LanceDB)\n\n```python\nfrom whisperplus.pipelines.chatbot import ChatWithVideo\n\nchat = ChatWithVideo(\n    input_file=\"trascript.txt\",\n    llm_model_name=\"TheBloke/Mistral-7B-v0.1-GGUF\",\n    llm_model_file=\"mistral-7b-v0.1.Q4_K_M.gguf\",\n    llm_model_type=\"mistral\",\n    embedding_model_name=\"sentence-transformers/all-MiniLM-L6-v2\",\n)\n\nquery = \"what is this video about ?\"\nresponse = chat.run_query(query)\nprint(response)\n```\n\n### \ud83c\udf20 RAG - Chat with Video(AutoLLM)\n\n```python\nfrom whisperplus import AutoLLMChatWithVideo\n\n# service_context_params\nsystem_prompt = \"\"\"\nYou are an friendly ai assistant that help users find the most relevant and accurate answers\nto their questions based on the documents you have access to.\nWhen answering the questions, mostly rely on the info in documents.\n\"\"\"\nquery_wrapper_prompt = \"\"\"\nThe document information is below.\n---------------------\n{context_str}\n---------------------\nUsing the document information and mostly relying on it,\nanswer the query.\nQuery: {query_str}\nAnswer:\n\"\"\"\n\nchat = AutoLLMChatWithVideo(\n    input_file=\"input_dir\",  # path of mp3 file\n    openai_key=\"YOUR_OPENAI_KEY\",  # optional\n    huggingface_key=\"YOUR_HUGGINGFACE_KEY\",  # optional\n    llm_model=\"gpt-3.5-turbo\",\n    llm_max_tokens=\"256\",\n    llm_temperature=\"0.1\",\n    system_prompt=system_prompt,\n    query_wrapper_prompt=query_wrapper_prompt,\n    embed_model=\"huggingface/BAAI/bge-large-zh\",  # \"text-embedding-ada-002\"\n)\n\nquery = \"what is this video about ?\"\nresponse = chat.run_query(query)\nprint(response)\n```\n\n### \ud83c\udf99\ufe0f Speech to Text\n\n```python\nfrom whisperplus import TextToSpeechPipeline\n\ntts = TextToSpeechPipeline(model_id=\"suno/bark\")\naudio = tts(text=\"Hello World\", voice_preset=\"v2/en_speaker_6\")\n```\n\n### \ud83c\udfa5 AutoCaption\n\n```python\nfrom whisperplus import WhisperAutoCaptionPipeline\n\ncaption = WhisperAutoCaptionPipeline(model_id=\"openai/whisper-large-v3\")\ncaption(video_path=\"test.mp4\", output_path=\"output.mp4\", language=\"turkish\")\n```\n\n## \ud83d\ude0d Contributing\n\n```bash\npip install -r dev-requirements.txt\npre-commit install\npre-commit run --all-files\n```\n\n## \ud83d\udcdc License\n\nThis project is licensed under the terms of the Apache License 2.0.\n\n## \ud83e\udd17 Citation\n\n```bibtex\n@misc{radford2022whisper,\n  doi = {10.48550/ARXIV.2212.04356},\n  url = {https://arxiv.org/abs/2212.04356},\n  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},\n  title = {Robust Speech Recognition via Large-Scale Weak Supervision},\n  publisher = {arXiv},\n  year = {2022},\n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "WhisperPlus: A Python library for WhisperPlus API.",
    "version": "0.3.1",
    "project_urls": {
        "Homepage": "https://github.com/kadirnar/whisperplus"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a509acc4e860687e1b773f71a21ec6233448bb242b4d9e97757c3cb69a3a28e4",
                "md5": "b630817979fa8521f7f239c63c3714fc",
                "sha256": "778583b5d6c96dd550c243870a2163824ab18ad5bc20f52b404ecd64b139a4fd"
            },
            "downloads": -1,
            "filename": "whisperplus-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "b630817979fa8521f7f239c63c3714fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22202,
            "upload_time": "2024-05-05T21:41:50",
            "upload_time_iso_8601": "2024-05-05T21:41:50.077242Z",
            "url": "https://files.pythonhosted.org/packages/a5/09/acc4e860687e1b773f71a21ec6233448bb242b4d9e97757c3cb69a3a28e4/whisperplus-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-05 21:41:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kadirnar",
    "github_project": "whisperplus",
    "github_not_found": true,
    "lcname": "whisperplus"
}
        
Elapsed time: 0.44922s