audio-scribe


Nameaudio-scribe JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://gitlab.genomicops.cloud/genomicops/audio-scribe
SummaryA command-line tool for audio transcription with Whisper and Pyannote.
upload_time2025-01-16 21:39:50
maintainerNone
docs_urlNone
authorGurasis Osahan
requires_python>=3.8
licenseApache-2.0
keywords whisper pyannote transcription audio diarization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Audio Scribe

**A Command-Line Tool for Audio Transcription and Speaker Diarization Using OpenAI Whisper and Pyannote**

[![PyPI License](https://img.shields.io/pypi/l/audio-scribe)](https://pypi.org/project/audio-scribe/)
![Coverage](https://img.shields.io/badge/coverage-98.1%25-brightgreen)
[![PyPI Downloads](https://img.shields.io/pypi/dm/audio-scribe)](https://pypi.org/project/audio-scribe/)
![Pipeline Status](https://gitlab.genomicops.cloud/innovation-hub/audio-scribe/badges/main/pipeline.svg)
[![PyPI Version](https://badge.fury.io/py/audio-scribe.svg)](https://badge.fury.io/py/audio-scribe)
[![Python Versions](https://img.shields.io/pypi/pyversions/audio-scribe)](https://pypi.org/project/audio-scribe/)
<!-- [![Coverage Report](https://gitlab.genomicops.cloud/innovation-hub/audio-scribe/badges/main/coverage.svg)](https://gitlab.genomicops.cloud/innovation-hub/audio-scribe/-/commits/main) -->

## Overview

**Audio Scribe** is a command-line tool that transcribes audio files with speaker diarization. Leveraging [OpenAI Whisper](https://github.com/openai/whisper) for transcription and [Pyannote Audio](https://github.com/pyannote/pyannote-audio) for speaker diarization, this solution converts audio into segmented text files, identifying each speaker turn. Key features include:

- **Progress Bar & Resource Monitoring**: See real-time CPU, memory, and GPU usage with a live progress bar.  
- **Speaker Diarization**: Automatically separates speaker turns using Pyannote’s state-of-the-art models.  
- **Tab-Completion for File Paths**: Easily navigate your file system when prompted for the audio path.  
- **Secure Token Storage**: Encrypts and stores your Hugging Face token for private model downloads.  
- **Customizable Whisper Models**: Default to `base.en`, or specify `tiny`, `small`, `medium`, `large`, etc.

This repository is licensed under the [Apache License 2.0](#license).

---

## Table of Contents

- [Audio Scribe](#audio-scribe)
  - [Overview](#overview)
  - [Table of Contents](#table-of-contents)
  - [Features](#features)
  - [Installation](#installation)
    - [Installing from PyPI](#installing-from-pypi)
    - [Installing from GitHub](#installing-from-github)
  - [Quick Start](#quick-start)
  - [Usage](#usage)
  - [Dependencies](#dependencies)
    - [Sample `requirements.txt`](#sample-requirementstxt)
  - [Contributing](#contributing)
  - [License](#license)

---

## Features

- **Whisper Transcription**  
  Utilizes [OpenAI Whisper](https://github.com/openai/whisper) to convert speech to text in multiple languages.  
- **Pyannote Speaker Diarization**  
  Identifies different speakers and segments your audio output accordingly.  
- **Progress Bar & Resource Usage**  
  Displays a live progress bar with CPU, memory, and GPU stats through [alive-progress](https://github.com/rsalmei/alive-progress), [psutil](https://pypi.org/project/psutil/), and [GPUtil](https://pypi.org/project/GPUtil/).  
- **Tab-Completion**  
  Press **Tab** to autocomplete file paths on Unix-like systems (and on Windows with [pyreadline3](https://pypi.org/project/pyreadline3/)).  
- **Secure Token Storage**  
  Saves your Hugging Face token via [cryptography](https://pypi.org/project/cryptography/) for model downloads (e.g., `pyannote/speaker-diarization-3.1`).  
- **Configurable Models**  
  Default is `base.en` but you can specify any other Whisper model using `--whisper-model`.

---

## Installation

### Installing from PyPI

**Audio Scribe** is available on PyPI. You can install it with:

```bash
pip install audio-scribe
```

After installation, the **`audio-scribe`** command should be available in your terminal (depending on how your PATH is configured). If you prefer to run via Python module, you can also do:

```bash
python -m audio-scribe --audio path/to/yourfile.wav
```

### Installing from GitHub

To install the latest development version directly from GitHub:

```bash
git clone https://gitlab.genomicops.cloud/genomicops/audio-scribe.git
cd audio-scribe
pip install -r requirements.txt
```

This approach is particularly useful if you want the newest changes or plan to contribute.

---

## Quick Start

1. **Obtain a Hugging Face Token**  
   - Create a token at [Hugging Face Settings](https://huggingface.co/settings/tokens).  
   - Accept the model conditions for `pyannote/segmentation-3.0` and `pyannote/speaker-diarization-3.1`.

2. **Run the Command-Line Tool**  
   ```bash
   audio-scribe --audio path/to/audio.wav
   ```
   > On the first run, you’ll be prompted for your Hugging Face token if you haven’t stored one yet.

3. **Watch the Progress Bar**  
   - The tool displays a progress bar for each diarized speaker turn, along with real-time CPU, GPU, and memory usage.

---

## Usage

Below is a summary of the main command-line options:

```
usage: audio-scribe [options]

Audio Transcription (Audio Scribe) Pipeline using Whisper + Pyannote, with optional progress bar.

optional arguments:
  --audio PATH           Path to the audio file to transcribe.
  --token TOKEN          HuggingFace API token. Overrides any saved token.
  --output PATH          Path to the output directory for transcripts and temporary files.
  --delete-token         Delete any stored Hugging Face token and exit.
  --show-warnings        Enable user warnings (e.g., from pyannote.audio). Disabled by default.
  --whisper-model MODEL  Specify the Whisper model to use (default: 'base.en').
```

**Examples:**

- **Basic Transcription**  
  ```bash
  audio-scribe --audio meeting.wav
  ```

- **Specify a Different Whisper Model**  
  ```bash
  audio-scribe --audio webinar.mp3 --whisper-model small
  ```

- **Delete a Stored Token**  
  ```bash
  audio-scribe --delete-token
  ```

- **Show Internal Warnings**  
  ```bash
  audio-scribe --audio session.wav --show-warnings
  ```

- **Tab-Completion**  
  ```bash
  audio-scribe
  # When prompted for an audio file path, press Tab to autocomplete
  ```

---

## Dependencies

**Core Libraries**  
- **Python 3.8+**  
- [PyTorch](https://pytorch.org/)  
- [openai-whisper](https://github.com/openai/whisper)  
- [pyannote.audio](https://github.com/pyannote/pyannote-audio)  
- [pytorch-lightning](https://pypi.org/project/pytorch-lightning/)  
- [cryptography](https://pypi.org/project/cryptography/)  
- [keyring](https://pypi.org/project/keyring/)  

**Optional for Extended Functionality**  
- [alive-progress](https://pypi.org/project/alive-progress/) – Real-time progress bar  
- [psutil](https://pypi.org/project/psutil/) – CPU/memory usage  
- [GPUtil](https://pypi.org/project/GPUtil/) – GPU usage  
- [pyreadline3](https://pypi.org/project/pyreadline3/) (for Windows tab-completion)

### Sample `requirements.txt`

Below is a typical `requirements.txt` you can place in your repository:

```
torch>=1.9
openai-whisper
pyannote.audio
pytorch-lightning
cryptography
keyring
alive-progress
psutil
GPUtil
pyreadline3; sys_platform == "win32"
```

> Note:  
> - `pyreadline3` is appended with a [PEP 508 marker](https://peps.python.org/pep-0508/) (`; sys_platform == "win32"`) so it only installs on Windows.  
> - For GPU support, ensure you install a compatible PyTorch version with CUDA.

---

## Contributing

We welcome contributions to **Audio Scribe**!

1. **Fork** the repository and clone your fork.  
2. **Create a new branch** for your feature or bugfix.  
3. **Implement your changes**, ensuring code is well-documented and follows best practices.  
4. **Open a pull request**, detailing the changes you’ve made.

Please read any available guidelines or templates in our repository (such as `CONTRIBUTING.md` or `CODE_OF_CONDUCT.md`) before submitting.

---

## License

This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

```
Copyright 2025 Gurasis Osahan

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```

---

**Thank you for using Audio Scribe!**  
For questions or feedback, please open a [GitHub issue](https://gitlab.genomicops.cloud/genomicops/audio-scribe/issues) or contact the maintainers.

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.genomicops.cloud/genomicops/audio-scribe",
    "name": "audio-scribe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "whisper pyannote transcription audio diarization",
    "author": "Gurasis Osahan",
    "author_email": "contact@genomicops.com",
    "download_url": "https://files.pythonhosted.org/packages/23/cf/3317ee77967b519a9b5d7e9212f49c10c5fac20c37c9870aa6fb584aa805/audio_scribe-0.1.3.tar.gz",
    "platform": null,
    "description": "# Audio Scribe\n\n**A Command-Line Tool for Audio Transcription and Speaker Diarization Using OpenAI Whisper and Pyannote**\n\n[![PyPI License](https://img.shields.io/pypi/l/audio-scribe)](https://pypi.org/project/audio-scribe/)\n![Coverage](https://img.shields.io/badge/coverage-98.1%25-brightgreen)\n[![PyPI Downloads](https://img.shields.io/pypi/dm/audio-scribe)](https://pypi.org/project/audio-scribe/)\n![Pipeline Status](https://gitlab.genomicops.cloud/innovation-hub/audio-scribe/badges/main/pipeline.svg)\n[![PyPI Version](https://badge.fury.io/py/audio-scribe.svg)](https://badge.fury.io/py/audio-scribe)\n[![Python Versions](https://img.shields.io/pypi/pyversions/audio-scribe)](https://pypi.org/project/audio-scribe/)\n<!-- [![Coverage Report](https://gitlab.genomicops.cloud/innovation-hub/audio-scribe/badges/main/coverage.svg)](https://gitlab.genomicops.cloud/innovation-hub/audio-scribe/-/commits/main) -->\n\n## Overview\n\n**Audio Scribe** is a command-line tool that transcribes audio files with speaker diarization. Leveraging [OpenAI Whisper](https://github.com/openai/whisper) for transcription and [Pyannote Audio](https://github.com/pyannote/pyannote-audio) for speaker diarization, this solution converts audio into segmented text files, identifying each speaker turn. Key features include:\n\n- **Progress Bar & Resource Monitoring**: See real-time CPU, memory, and GPU usage with a live progress bar.  \n- **Speaker Diarization**: Automatically separates speaker turns using Pyannote\u2019s state-of-the-art models.  \n- **Tab-Completion for File Paths**: Easily navigate your file system when prompted for the audio path.  \n- **Secure Token Storage**: Encrypts and stores your Hugging Face token for private model downloads.  \n- **Customizable Whisper Models**: Default to `base.en`, or specify `tiny`, `small`, `medium`, `large`, etc.\n\nThis repository is licensed under the [Apache License 2.0](#license).\n\n---\n\n## Table of Contents\n\n- [Audio Scribe](#audio-scribe)\n  - [Overview](#overview)\n  - [Table of Contents](#table-of-contents)\n  - [Features](#features)\n  - [Installation](#installation)\n    - [Installing from PyPI](#installing-from-pypi)\n    - [Installing from GitHub](#installing-from-github)\n  - [Quick Start](#quick-start)\n  - [Usage](#usage)\n  - [Dependencies](#dependencies)\n    - [Sample `requirements.txt`](#sample-requirementstxt)\n  - [Contributing](#contributing)\n  - [License](#license)\n\n---\n\n## Features\n\n- **Whisper Transcription**  \n  Utilizes [OpenAI Whisper](https://github.com/openai/whisper) to convert speech to text in multiple languages.  \n- **Pyannote Speaker Diarization**  \n  Identifies different speakers and segments your audio output accordingly.  \n- **Progress Bar & Resource Usage**  \n  Displays a live progress bar with CPU, memory, and GPU stats through [alive-progress](https://github.com/rsalmei/alive-progress), [psutil](https://pypi.org/project/psutil/), and [GPUtil](https://pypi.org/project/GPUtil/).  \n- **Tab-Completion**  \n  Press **Tab** to autocomplete file paths on Unix-like systems (and on Windows with [pyreadline3](https://pypi.org/project/pyreadline3/)).  \n- **Secure Token Storage**  \n  Saves your Hugging Face token via [cryptography](https://pypi.org/project/cryptography/) for model downloads (e.g., `pyannote/speaker-diarization-3.1`).  \n- **Configurable Models**  \n  Default is `base.en` but you can specify any other Whisper model using `--whisper-model`.\n\n---\n\n## Installation\n\n### Installing from PyPI\n\n**Audio Scribe** is available on PyPI. You can install it with:\n\n```bash\npip install audio-scribe\n```\n\nAfter installation, the **`audio-scribe`** command should be available in your terminal (depending on how your PATH is configured). If you prefer to run via Python module, you can also do:\n\n```bash\npython -m audio-scribe --audio path/to/yourfile.wav\n```\n\n### Installing from GitHub\n\nTo install the latest development version directly from GitHub:\n\n```bash\ngit clone https://gitlab.genomicops.cloud/genomicops/audio-scribe.git\ncd audio-scribe\npip install -r requirements.txt\n```\n\nThis approach is particularly useful if you want the newest changes or plan to contribute.\n\n---\n\n## Quick Start\n\n1. **Obtain a Hugging Face Token**  \n   - Create a token at [Hugging Face Settings](https://huggingface.co/settings/tokens).  \n   - Accept the model conditions for `pyannote/segmentation-3.0` and `pyannote/speaker-diarization-3.1`.\n\n2. **Run the Command-Line Tool**  \n   ```bash\n   audio-scribe --audio path/to/audio.wav\n   ```\n   > On the first run, you\u2019ll be prompted for your Hugging Face token if you haven\u2019t stored one yet.\n\n3. **Watch the Progress Bar**  \n   - The tool displays a progress bar for each diarized speaker turn, along with real-time CPU, GPU, and memory usage.\n\n---\n\n## Usage\n\nBelow is a summary of the main command-line options:\n\n```\nusage: audio-scribe [options]\n\nAudio Transcription (Audio Scribe) Pipeline using Whisper + Pyannote, with optional progress bar.\n\noptional arguments:\n  --audio PATH           Path to the audio file to transcribe.\n  --token TOKEN          HuggingFace API token. Overrides any saved token.\n  --output PATH          Path to the output directory for transcripts and temporary files.\n  --delete-token         Delete any stored Hugging Face token and exit.\n  --show-warnings        Enable user warnings (e.g., from pyannote.audio). Disabled by default.\n  --whisper-model MODEL  Specify the Whisper model to use (default: 'base.en').\n```\n\n**Examples:**\n\n- **Basic Transcription**  \n  ```bash\n  audio-scribe --audio meeting.wav\n  ```\n\n- **Specify a Different Whisper Model**  \n  ```bash\n  audio-scribe --audio webinar.mp3 --whisper-model small\n  ```\n\n- **Delete a Stored Token**  \n  ```bash\n  audio-scribe --delete-token\n  ```\n\n- **Show Internal Warnings**  \n  ```bash\n  audio-scribe --audio session.wav --show-warnings\n  ```\n\n- **Tab-Completion**  \n  ```bash\n  audio-scribe\n  # When prompted for an audio file path, press Tab to autocomplete\n  ```\n\n---\n\n## Dependencies\n\n**Core Libraries**  \n- **Python 3.8+**  \n- [PyTorch](https://pytorch.org/)  \n- [openai-whisper](https://github.com/openai/whisper)  \n- [pyannote.audio](https://github.com/pyannote/pyannote-audio)  \n- [pytorch-lightning](https://pypi.org/project/pytorch-lightning/)  \n- [cryptography](https://pypi.org/project/cryptography/)  \n- [keyring](https://pypi.org/project/keyring/)  \n\n**Optional for Extended Functionality**  \n- [alive-progress](https://pypi.org/project/alive-progress/) \u2013 Real-time progress bar  \n- [psutil](https://pypi.org/project/psutil/) \u2013 CPU/memory usage  \n- [GPUtil](https://pypi.org/project/GPUtil/) \u2013 GPU usage  \n- [pyreadline3](https://pypi.org/project/pyreadline3/) (for Windows tab-completion)\n\n### Sample `requirements.txt`\n\nBelow is a typical `requirements.txt` you can place in your repository:\n\n```\ntorch>=1.9\nopenai-whisper\npyannote.audio\npytorch-lightning\ncryptography\nkeyring\nalive-progress\npsutil\nGPUtil\npyreadline3; sys_platform == \"win32\"\n```\n\n> Note:  \n> - `pyreadline3` is appended with a [PEP 508 marker](https://peps.python.org/pep-0508/) (`; sys_platform == \"win32\"`) so it only installs on Windows.  \n> - For GPU support, ensure you install a compatible PyTorch version with CUDA.\n\n---\n\n## Contributing\n\nWe welcome contributions to **Audio Scribe**!\n\n1. **Fork** the repository and clone your fork.  \n2. **Create a new branch** for your feature or bugfix.  \n3. **Implement your changes**, ensuring code is well-documented and follows best practices.  \n4. **Open a pull request**, detailing the changes you\u2019ve made.\n\nPlease read any available guidelines or templates in our repository (such as `CONTRIBUTING.md` or `CODE_OF_CONDUCT.md`) before submitting.\n\n---\n\n## License\n\nThis project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).\n\n```\nCopyright 2025 Gurasis Osahan\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n\n---\n\n**Thank you for using Audio Scribe!**  \nFor questions or feedback, please open a [GitHub issue](https://gitlab.genomicops.cloud/genomicops/audio-scribe/issues) or contact the maintainers.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A command-line tool for audio transcription with Whisper and Pyannote.",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://gitlab.genomicops.cloud/genomicops/audio-scribe",
        "Source": "https://gitlab.genomicops.cloud/genomicops/audio-scribe",
        "Tracker": "https://gitlab.genomicops.cloud/genomicops/audio-scribe/-/issues"
    },
    "split_keywords": [
        "whisper",
        "pyannote",
        "transcription",
        "audio",
        "diarization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03727aa9be434ee406a4a991ef67e254c6defaf45931ace1849f5b660e915553",
                "md5": "31753ecb53792a97724131909cbfc661",
                "sha256": "3b463e5b58f980da0f14591a95d55a688a5bb0292caf5b7447257fecc54ccda8"
            },
            "downloads": -1,
            "filename": "audio_scribe-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "31753ecb53792a97724131909cbfc661",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 12895,
            "upload_time": "2025-01-16T21:39:47",
            "upload_time_iso_8601": "2025-01-16T21:39:47.902876Z",
            "url": "https://files.pythonhosted.org/packages/03/72/7aa9be434ee406a4a991ef67e254c6defaf45931ace1849f5b660e915553/audio_scribe-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "23cf3317ee77967b519a9b5d7e9212f49c10c5fac20c37c9870aa6fb584aa805",
                "md5": "699f5fce4f195d3653bd72ea6d880595",
                "sha256": "a5cbb309b748a7063af3da50022b7458f4ae945b365bd1efa2e5fc40ef546c3d"
            },
            "downloads": -1,
            "filename": "audio_scribe-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "699f5fce4f195d3653bd72ea6d880595",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20764,
            "upload_time": "2025-01-16T21:39:50",
            "upload_time_iso_8601": "2025-01-16T21:39:50.695201Z",
            "url": "https://files.pythonhosted.org/packages/23/cf/3317ee77967b519a9b5d7e9212f49c10c5fac20c37c9870aa6fb584aa805/audio_scribe-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-16 21:39:50",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "audio-scribe"
}
        
Elapsed time: 0.41081s