audio-subtitler


Nameaudio-subtitler JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryConvert audio files to subtitles (VTT, SRT) using Faster-Whisper
upload_time2025-11-01 09:59:36
maintainerNone
docs_urlNone
authorGary Lab
requires_python>=3.9
licenseMIT License Copyright (c) 2025 audio-subtitler contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords audio transcription vtt srt subtitles whisper speech-to-text webvtt
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Audio Subtitler

Convert audio files to subtitles (VTT, SRT) using Faster-Whisper.

[![PyPI](https://img.shields.io/pypi/v/audio-subtitler.svg)](https://pypi.org/project/audio-subtitler/)
[![Python Versions](https://img.shields.io/pypi/pyversions/audio-subtitler.svg)](https://pypi.org/project/audio-subtitler/)
[![Run on RunPod](https://img.shields.io/badge/Run%20on-RunPod-6b3cff?logo=runpod&logoColor=white)](https://runpod.io?ref=hh0mhml0)

## Features

- 🚀 **Full Faster-Whisper support** - All features and parameters from faster-whisper
- 📝 **Multiple formats** - VTT (WebVTT) and SRT subtitle output
- 🎯 **Smart auto-detection** - Automatically detects format from file extension
- 🌍 **Multi-language** - Supports 100+ languages with auto-detection
- ⚡ **GPU acceleration** - CUDA support for faster transcription
- 🎙️ **Voice Activity Detection** - Automatically removes silence
- 💻 **Simple APIs** - Easy-to-use CLI and Python API
- 🐳 **Docker GPU support** - Ready for serverless deployment

## Installation

```bash
pip install audio-subtitler
```

Optional dependencies:
```bash
pip install audio-subtitler[runpod]  # For RunPod serverless
pip install audio-subtitler[dev]     # For development
```

## Quick Start

### CLI

```bash
# Auto-detect format from file extension (recommended)
audiosubtitler input.mp3 -o output.vtt
audiosubtitler input.mp3 -o output.srt

# Specify options
audiosubtitler input.mp3 -o output.vtt --model large-v3 --language en --device cuda

# Output to stdout
audiosubtitler input.mp3 --format srt > output.srt

# Use shorter command
audiosub input.mp3 -o output.vtt
```

### Python API

```python
from src import AudioSubtitler

# Initialize
converter = AudioSubtitler(
    model_size_or_path="base",
    device="cpu",
    compute_type="int8"
)

# Transcribe
result = converter.transcribe("audio.mp3", format="vtt", language="en")

# Access results
print(result["content"])     # Subtitle content
print(result["format"])      # "vtt" or "srt"
print(result["word_count"])  # Number of words
```

## API Reference

### AudioSubtitler

**Constructor**: `AudioSubtitler(**kwargs)`

Accepts all [faster-whisper WhisperModel](https://github.com/SYSTRAN/faster-whisper) parameters:
- `model_size_or_path`: Model name (tiny, base, small, medium, large, large-v3) or path
- `device`: "cpu", "cuda", or "auto"
- `compute_type`: "int8", "int8_float16", "int16", "float16", "float32"
- `cpu_threads`, `num_workers`, `download_root`, `local_files_only`, etc.

**Method**: `transcribe(audio, format="vtt", **kwargs)`

Parameters:
- `audio`: File path (str), file object (BinaryIO), or numpy array
- `format`: "vtt" or "srt" (default: "vtt")
- `**kwargs`: All [faster-whisper transcribe](https://github.com/SYSTRAN/faster-whisper#transcribe) parameters
  - `language`, `beam_size`, `vad_filter`, `vad_parameters`, `word_timestamps`, etc.

Returns:
```python
{
    "content": str,      # Subtitle content
    "format": str,       # "vtt" or "srt"
    "word_count": int    # Word count
}
```

## Docker (GPU only)

```bash
docker-compose -f docker-compose-gpu.yml up
```

Input/Output for RunPod serverless:
```json
// Input
{
  "input": {
    "audio": "<base64_encoded_audio>",
    "language": "en",
    "format": "vtt"
  }
}

// Output
{
  "content": "WEBVTT\n\n00:00:00.000 --> ...",
  "format": "vtt",
  "word_count": 150
}
```

## Output Examples

**VTT:**
```
WEBVTT

00:00:00.000 --> 00:00:03.500
Hello, this is a test transcription.

00:00:03.500 --> 00:00:07.200
The audio is converted to text with timestamps.
```

**SRT:**
```
1
00:00:00,000 --> 00:00:03,500
Hello, this is a test transcription.

2
00:00:03,500 --> 00:00:07,200
The audio is converted to text with timestamps.
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `WHISPER_MODEL` | `base` | Model size |
| `WHISPER_DEVICE` | `cpu` | cpu, cuda, auto |
| `WHISPER_COMPUTE_TYPE` | `int8` | Compute type |
| `WHISPER_BEAM_SIZE` | `5` | Beam size |


## License

MIT License - see [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "audio-subtitler",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "audio, transcription, vtt, srt, subtitles, whisper, speech-to-text, webvtt",
    "author": "Gary Lab",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/c9/64/c67de80d088de48118871eab4ff312c13297d379b0e0ba1b410c121825d1/audio_subtitler-0.1.2.tar.gz",
    "platform": null,
    "description": "# Audio Subtitler\n\nConvert audio files to subtitles (VTT, SRT) using Faster-Whisper.\n\n[![PyPI](https://img.shields.io/pypi/v/audio-subtitler.svg)](https://pypi.org/project/audio-subtitler/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/audio-subtitler.svg)](https://pypi.org/project/audio-subtitler/)\n[![Run on RunPod](https://img.shields.io/badge/Run%20on-RunPod-6b3cff?logo=runpod&logoColor=white)](https://runpod.io?ref=hh0mhml0)\n\n## Features\n\n- \ud83d\ude80 **Full Faster-Whisper support** - All features and parameters from faster-whisper\n- \ud83d\udcdd **Multiple formats** - VTT (WebVTT) and SRT subtitle output\n- \ud83c\udfaf **Smart auto-detection** - Automatically detects format from file extension\n- \ud83c\udf0d **Multi-language** - Supports 100+ languages with auto-detection\n- \u26a1 **GPU acceleration** - CUDA support for faster transcription\n- \ud83c\udf99\ufe0f **Voice Activity Detection** - Automatically removes silence\n- \ud83d\udcbb **Simple APIs** - Easy-to-use CLI and Python API\n- \ud83d\udc33 **Docker GPU support** - Ready for serverless deployment\n\n## Installation\n\n```bash\npip install audio-subtitler\n```\n\nOptional dependencies:\n```bash\npip install audio-subtitler[runpod]  # For RunPod serverless\npip install audio-subtitler[dev]     # For development\n```\n\n## Quick Start\n\n### CLI\n\n```bash\n# Auto-detect format from file extension (recommended)\naudiosubtitler input.mp3 -o output.vtt\naudiosubtitler input.mp3 -o output.srt\n\n# Specify options\naudiosubtitler input.mp3 -o output.vtt --model large-v3 --language en --device cuda\n\n# Output to stdout\naudiosubtitler input.mp3 --format srt > output.srt\n\n# Use shorter command\naudiosub input.mp3 -o output.vtt\n```\n\n### Python API\n\n```python\nfrom src import AudioSubtitler\n\n# Initialize\nconverter = AudioSubtitler(\n    model_size_or_path=\"base\",\n    device=\"cpu\",\n    compute_type=\"int8\"\n)\n\n# Transcribe\nresult = converter.transcribe(\"audio.mp3\", format=\"vtt\", language=\"en\")\n\n# Access results\nprint(result[\"content\"])     # Subtitle content\nprint(result[\"format\"])      # \"vtt\" or \"srt\"\nprint(result[\"word_count\"])  # Number of words\n```\n\n## API Reference\n\n### AudioSubtitler\n\n**Constructor**: `AudioSubtitler(**kwargs)`\n\nAccepts all [faster-whisper WhisperModel](https://github.com/SYSTRAN/faster-whisper) parameters:\n- `model_size_or_path`: Model name (tiny, base, small, medium, large, large-v3) or path\n- `device`: \"cpu\", \"cuda\", or \"auto\"\n- `compute_type`: \"int8\", \"int8_float16\", \"int16\", \"float16\", \"float32\"\n- `cpu_threads`, `num_workers`, `download_root`, `local_files_only`, etc.\n\n**Method**: `transcribe(audio, format=\"vtt\", **kwargs)`\n\nParameters:\n- `audio`: File path (str), file object (BinaryIO), or numpy array\n- `format`: \"vtt\" or \"srt\" (default: \"vtt\")\n- `**kwargs`: All [faster-whisper transcribe](https://github.com/SYSTRAN/faster-whisper#transcribe) parameters\n  - `language`, `beam_size`, `vad_filter`, `vad_parameters`, `word_timestamps`, etc.\n\nReturns:\n```python\n{\n    \"content\": str,      # Subtitle content\n    \"format\": str,       # \"vtt\" or \"srt\"\n    \"word_count\": int    # Word count\n}\n```\n\n## Docker (GPU only)\n\n```bash\ndocker-compose -f docker-compose-gpu.yml up\n```\n\nInput/Output for RunPod serverless:\n```json\n// Input\n{\n  \"input\": {\n    \"audio\": \"<base64_encoded_audio>\",\n    \"language\": \"en\",\n    \"format\": \"vtt\"\n  }\n}\n\n// Output\n{\n  \"content\": \"WEBVTT\\n\\n00:00:00.000 --> ...\",\n  \"format\": \"vtt\",\n  \"word_count\": 150\n}\n```\n\n## Output Examples\n\n**VTT:**\n```\nWEBVTT\n\n00:00:00.000 --> 00:00:03.500\nHello, this is a test transcription.\n\n00:00:03.500 --> 00:00:07.200\nThe audio is converted to text with timestamps.\n```\n\n**SRT:**\n```\n1\n00:00:00,000 --> 00:00:03,500\nHello, this is a test transcription.\n\n2\n00:00:03,500 --> 00:00:07,200\nThe audio is converted to text with timestamps.\n```\n\n## Environment Variables\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `WHISPER_MODEL` | `base` | Model size |\n| `WHISPER_DEVICE` | `cpu` | cpu, cuda, auto |\n| `WHISPER_COMPUTE_TYPE` | `int8` | Compute type |\n| `WHISPER_BEAM_SIZE` | `5` | Beam size |\n\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2025 audio-subtitler contributors  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.  ",
    "summary": "Convert audio files to subtitles (VTT, SRT) using Faster-Whisper",
    "version": "0.1.2",
    "project_urls": {
        "Author Blog": "https://garymeng.com",
        "Homepage": "https://github.com/garylab/audio-subtitler"
    },
    "split_keywords": [
        "audio",
        " transcription",
        " vtt",
        " srt",
        " subtitles",
        " whisper",
        " speech-to-text",
        " webvtt"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0a618f891accd7d43e748dacb3422fafaa46eb10230a971464bdfb107325649b",
                "md5": "d8c853a06dadf8d8d52d09d5e067fb82",
                "sha256": "50194cc8ec1a2c209e0a69bd237ebf7500e5b4c71284d65dee787c4bfb00c41d"
            },
            "downloads": -1,
            "filename": "audio_subtitler-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d8c853a06dadf8d8d52d09d5e067fb82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9652,
            "upload_time": "2025-11-01T09:59:35",
            "upload_time_iso_8601": "2025-11-01T09:59:35.516751Z",
            "url": "https://files.pythonhosted.org/packages/0a/61/8f891accd7d43e748dacb3422fafaa46eb10230a971464bdfb107325649b/audio_subtitler-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c964c67de80d088de48118871eab4ff312c13297d379b0e0ba1b410c121825d1",
                "md5": "3bc055b1e82f41e1df994200548b5d60",
                "sha256": "c337cf30b0f47958c360a5e9685246d03fa88cf0abc677cb62fc41a125d3ca33"
            },
            "downloads": -1,
            "filename": "audio_subtitler-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3bc055b1e82f41e1df994200548b5d60",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 392486,
            "upload_time": "2025-11-01T09:59:36",
            "upload_time_iso_8601": "2025-11-01T09:59:36.767187Z",
            "url": "https://files.pythonhosted.org/packages/c9/64/c67de80d088de48118871eab4ff312c13297d379b0e0ba1b410c121825d1/audio_subtitler-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-01 09:59:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "garylab",
    "github_project": "audio-subtitler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "audio-subtitler"
}
        
Elapsed time: 2.37707s