# Speech Text Pipeline
speech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification where one speaker’s identity is known and provided via an additional audio sample.
## Installation
### Prerequisites
Install the following dependencies before install speech_text_pipeline:
- datasets
- omegaconf
- pyannote.audio
- hydra-core
- git+https://github.com/openai/whisper.git
- git+https://github.com/NVIDIA/NeMo.git
### Main Package
Once the prerequisite packages are installed, you can install speech_text_pipeline using pip:
```bash
pip install speech_text_pipeline
```
## Usage
### HF_TOKEN for Speaker Matching
Before using the package you need to have access to 🤗 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:
1. Log in to your 🤗 Hugging Face account and visit [pyannote/embedding model](https://huggingface.co/pyannote/embedding).
2. Request for access of the model(if not done already).
3. After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.
4. After generating token you use it in either of the two ways:
- CLI login:
```bash
huggingface-cli login
```
Then, input your `HF_TOKEN` when prompted.
- In Code:
Pass your `HF_TOKEN` directly to the transcribe function as a parameter:
```bash
import speech_text_pipeline as stp
result = stp.transcribe(audio="path_to_audio_file.wav",
speaker_audio="path_to_known_speaker_audio.wav",
HF_TOKEN="Your HF_TOKEN")
```
Note: The Hugging Face token is only required for the speaker matching functionality.
### Pipeline(anonymous speakers)
This mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., "Speaker 1", "Speaker 2").
```bash
import speech_text_pipeline as stp
audio_url = "path_to_audio_file.wav"
result = stp.transcribe(audio=audio_url)
```
#### Get diarized transcript with anonymous speakers
```bash
print(result)
```
### Pipeline(named speakers)
```bash
import speech_text_pipeline as stp
audio_url = "path_to_audio_file.wav"
agent_audio_url = "path_to_agent_audio.wav" # Sample of the known speaker
result_with_speaker = stp.transcribe(audio=audio_url,
speaker_audio=agent_audio_url,
HF_TOKEN="Your HF_TOKEN") # Passing your geenrated Hugging Face token
```
#### Get diarized transcript with named speaker
```bash
print(result_with_speaker)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "speech-text-pipeline",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "speech, text, diarization, speaker matching, NLP, openai-whisper, nvidia nemo",
"author": null,
"author_email": "Arya Shukla <aryashukla95@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/75/ca/a0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d/speech_text_pipeline-0.1.3.tar.gz",
"platform": null,
"description": "# Speech Text Pipeline\n\nspeech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification where one speaker\u2019s identity is known and provided via an additional audio sample.\n\n## Installation\n\n### Prerequisites\n\nInstall the following dependencies before install speech_text_pipeline:\n\n- datasets \n- omegaconf \n- pyannote.audio \n- hydra-core\n- git+https://github.com/openai/whisper.git\n- git+https://github.com/NVIDIA/NeMo.git\n\n### Main Package\n\nOnce the prerequisite packages are installed, you can install speech_text_pipeline using pip:\n\n```bash\npip install speech_text_pipeline\n```\n\n## Usage\n\n### HF_TOKEN for Speaker Matching\n\nBefore using the package you need to have access to \ud83e\udd17 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:\n\n1. Log in to your \ud83e\udd17 Hugging Face account and visit [pyannote/embedding model](https://huggingface.co/pyannote/embedding).\n\n2. Request for access of the model(if not done already).\n\n3. After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.\n\n4. After generating token you use it in either of the two ways: \n\n - CLI login:\n\n ```bash\n huggingface-cli login\n ```\n\n Then, input your `HF_TOKEN` when prompted.\n\n - In Code:\n\n Pass your `HF_TOKEN` directly to the transcribe function as a parameter:\n ```bash\n import speech_text_pipeline as stp\n\n result = stp.transcribe(audio=\"path_to_audio_file.wav\", \n speaker_audio=\"path_to_known_speaker_audio.wav\", \n HF_TOKEN=\"Your HF_TOKEN\")\n ```\nNote: The Hugging Face token is only required for the speaker matching functionality.\n\n### Pipeline(anonymous speakers)\n\nThis mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., \"Speaker 1\", \"Speaker 2\").\n\n```bash\nimport speech_text_pipeline as stp\n\naudio_url = \"path_to_audio_file.wav\"\n\nresult = stp.transcribe(audio=audio_url)\n```\n#### Get diarized transcript with anonymous speakers\n```bash\nprint(result)\n```\n### Pipeline(named speakers)\n```bash\nimport speech_text_pipeline as stp\n\naudio_url = \"path_to_audio_file.wav\"\n\nagent_audio_url = \"path_to_agent_audio.wav\" # Sample of the known speaker\n\nresult_with_speaker = stp.transcribe(audio=audio_url, \n speaker_audio=agent_audio_url, \n HF_TOKEN=\"Your HF_TOKEN\") # Passing your geenrated Hugging Face token\n```\n#### Get diarized transcript with named speaker\n```bash\nprint(result_with_speaker)\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "A Python package for speech transcription and speaker diarization with speaker matching functionality.",
"version": "0.1.3",
"project_urls": {
"Documentation": "https://github.com/aryashukla/speech_text_pipeline/blob/main/README.md",
"Issue_Tracker": "https://github.com/aryashukla/speech_text_pipeline/issues",
"Repository": "https://github.com/aryashukla/speech_text_pipeline"
},
"split_keywords": [
"speech",
" text",
" diarization",
" speaker matching",
" nlp",
" openai-whisper",
" nvidia nemo"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "042c4eb39fb2efe4fd48732d8962bc811cd729193ced5472da6c36bf5c6bf588",
"md5": "5339c2bfb36b9f7b9384f829eb5e4763",
"sha256": "9cd475dda07b77757a5d4f727c590d78f4527c224a057bbf98ea78629dc2cd7d"
},
"downloads": -1,
"filename": "speech_text_pipeline-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5339c2bfb36b9f7b9384f829eb5e4763",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 11338,
"upload_time": "2024-10-10T11:03:22",
"upload_time_iso_8601": "2024-10-10T11:03:22.252736Z",
"url": "https://files.pythonhosted.org/packages/04/2c/4eb39fb2efe4fd48732d8962bc811cd729193ced5472da6c36bf5c6bf588/speech_text_pipeline-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "75caa0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d",
"md5": "b645efe2120d687246b9227a116dd121",
"sha256": "e2839aad37e526ee21ca31eabef267cd789deae9d0d82e39e7438c079247e9b0"
},
"downloads": -1,
"filename": "speech_text_pipeline-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "b645efe2120d687246b9227a116dd121",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 12547,
"upload_time": "2024-10-10T11:03:24",
"upload_time_iso_8601": "2024-10-10T11:03:24.283541Z",
"url": "https://files.pythonhosted.org/packages/75/ca/a0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d/speech_text_pipeline-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-10 11:03:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aryashukla",
"github_project": "speech_text_pipeline",
"github_not_found": true,
"lcname": "speech-text-pipeline"
}