speech-text-pipeline

Name	speech-text-pipeline JSON
Version	0.1.3 JSON
	download
home_page	None
Summary	A Python package for speech transcription and speaker diarization with speaker matching functionality.
upload_time	2024-10-10 11:03:24
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	Apache-2.0
keywords	speech text diarization speaker matching nlp openai-whisper nvidia nemo
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Speech Text Pipeline

speech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification  where one speaker’s identity is known and provided via an additional audio sample.

## Installation

### Prerequisites

Install the following dependencies before install speech_text_pipeline:

- datasets 
- omegaconf 
- pyannote.audio 
- hydra-core
- git+https://github.com/openai/whisper.git
- git+https://github.com/NVIDIA/NeMo.git

### Main Package

Once the prerequisite packages are installed, you can install speech_text_pipeline using pip:

```bash
pip install speech_text_pipeline
```

## Usage

### HF_TOKEN for Speaker Matching

Before using the package you need to have access to 🤗 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:

1. Log in to your 🤗 Hugging Face account and visit [pyannote/embedding model](https://huggingface.co/pyannote/embedding).

2. Request for access of the model(if not done already).

3. After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.

4. After generating token you use it in either of the two ways: 

    - CLI login:

    ```bash
    huggingface-cli login
    ```

    Then, input your `HF_TOKEN` when prompted.

    - In Code:

    Pass your `HF_TOKEN` directly to the transcribe function as a parameter:
    ```bash
    import speech_text_pipeline as stp

    result = stp.transcribe(audio="path_to_audio_file.wav", 
                            speaker_audio="path_to_known_speaker_audio.wav", 
                            HF_TOKEN="Your HF_TOKEN")
    ```
Note: The Hugging Face token is only required for the speaker matching functionality.

### Pipeline(anonymous speakers)

This mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., "Speaker 1", "Speaker 2").

```bash
import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

result = stp.transcribe(audio=audio_url)
```
#### Get diarized transcript with anonymous speakers
```bash
print(result)
```
### Pipeline(named speakers)
```bash
import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

agent_audio_url = "path_to_agent_audio.wav" # Sample of the known speaker

result_with_speaker = stp.transcribe(audio=audio_url, 
                                    speaker_audio=agent_audio_url, 
                                    HF_TOKEN="Your HF_TOKEN") # Passing your geenrated Hugging Face token
```
#### Get diarized transcript with named speaker
```bash
print(result_with_speaker)
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "speech-text-pipeline",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "speech, text, diarization, speaker matching, NLP, openai-whisper, nvidia nemo",
    "author": null,
    "author_email": "Arya Shukla <aryashukla95@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/75/ca/a0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d/speech_text_pipeline-0.1.3.tar.gz",
    "platform": null,
    "description": "# Speech Text Pipeline\n\nspeech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification  where one speaker\u2019s identity is known and provided via an additional audio sample.\n\n## Installation\n\n### Prerequisites\n\nInstall the following dependencies before install speech_text_pipeline:\n\n- datasets \n- omegaconf \n- pyannote.audio \n- hydra-core\n- git+https://github.com/openai/whisper.git\n- git+https://github.com/NVIDIA/NeMo.git\n\n### Main Package\n\nOnce the prerequisite packages are installed, you can install speech_text_pipeline using pip:\n\n```bash\npip install speech_text_pipeline\n```\n\n## Usage\n\n### HF_TOKEN for Speaker Matching\n\nBefore using the package you need to have access to \ud83e\udd17 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:\n\n1. Log in to your \ud83e\udd17 Hugging Face account and visit [pyannote/embedding model](https://huggingface.co/pyannote/embedding).\n\n2. Request for access of the model(if not done already).\n\n3. After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.\n\n4. After generating token you use it in either of the two ways: \n\n    - CLI login:\n\n    ```bash\n    huggingface-cli login\n    ```\n\n    Then, input your `HF_TOKEN` when prompted.\n\n    - In Code:\n\n    Pass your `HF_TOKEN` directly to the transcribe function as a parameter:\n    ```bash\n    import speech_text_pipeline as stp\n\n    result = stp.transcribe(audio=\"path_to_audio_file.wav\", \n                            speaker_audio=\"path_to_known_speaker_audio.wav\", \n                            HF_TOKEN=\"Your HF_TOKEN\")\n    ```\nNote: The Hugging Face token is only required for the speaker matching functionality.\n\n### Pipeline(anonymous speakers)\n\nThis mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., \"Speaker 1\", \"Speaker 2\").\n\n```bash\nimport speech_text_pipeline as stp\n\naudio_url = \"path_to_audio_file.wav\"\n\nresult = stp.transcribe(audio=audio_url)\n```\n#### Get diarized transcript with anonymous speakers\n```bash\nprint(result)\n```\n### Pipeline(named speakers)\n```bash\nimport speech_text_pipeline as stp\n\naudio_url = \"path_to_audio_file.wav\"\n\nagent_audio_url = \"path_to_agent_audio.wav\" # Sample of the known speaker\n\nresult_with_speaker = stp.transcribe(audio=audio_url, \n                                    speaker_audio=agent_audio_url, \n                                    HF_TOKEN=\"Your HF_TOKEN\") # Passing your geenrated Hugging Face token\n```\n#### Get diarized transcript with named speaker\n```bash\nprint(result_with_speaker)\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A Python package for speech transcription and speaker diarization with speaker matching functionality.",
    "version": "0.1.3",
    "project_urls": {
        "Documentation": "https://github.com/aryashukla/speech_text_pipeline/blob/main/README.md",
        "Issue_Tracker": "https://github.com/aryashukla/speech_text_pipeline/issues",
        "Repository": "https://github.com/aryashukla/speech_text_pipeline"
    },
    "split_keywords": [
        "speech",
        " text",
        " diarization",
        " speaker matching",
        " nlp",
        " openai-whisper",
        " nvidia nemo"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "042c4eb39fb2efe4fd48732d8962bc811cd729193ced5472da6c36bf5c6bf588",
                "md5": "5339c2bfb36b9f7b9384f829eb5e4763",
                "sha256": "9cd475dda07b77757a5d4f727c590d78f4527c224a057bbf98ea78629dc2cd7d"
            },
            "downloads": -1,
            "filename": "speech_text_pipeline-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5339c2bfb36b9f7b9384f829eb5e4763",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 11338,
            "upload_time": "2024-10-10T11:03:22",
            "upload_time_iso_8601": "2024-10-10T11:03:22.252736Z",
            "url": "https://files.pythonhosted.org/packages/04/2c/4eb39fb2efe4fd48732d8962bc811cd729193ced5472da6c36bf5c6bf588/speech_text_pipeline-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "75caa0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d",
                "md5": "b645efe2120d687246b9227a116dd121",
                "sha256": "e2839aad37e526ee21ca31eabef267cd789deae9d0d82e39e7438c079247e9b0"
            },
            "downloads": -1,
            "filename": "speech_text_pipeline-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "b645efe2120d687246b9227a116dd121",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12547,
            "upload_time": "2024-10-10T11:03:24",
            "upload_time_iso_8601": "2024-10-10T11:03:24.283541Z",
            "url": "https://files.pythonhosted.org/packages/75/ca/a0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d/speech_text_pipeline-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-10 11:03:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aryashukla",
    "github_project": "speech_text_pipeline",
    "github_not_found": true,
    "lcname": "speech-text-pipeline"
}

None