forcealign


Nameforcealign JSON
Version 1.1.9 PyPI version JSON
download
home_pagehttps://github.com/lukerbs/forcealign
SummaryA Python library for forced alignment of English text to English audio.
upload_time2024-12-04 06:54:05
maintainerNone
docs_urlNone
authorLuke Kerbs
requires_pythonNone
licenseNone
keywords force align forced alignment audio segmentation audio forced alignment python forced alignment phoneme generate subtitles
VCS
bugtrack_url
requirements click Distance filelock fsspec g2p-en inflect Jinja2 joblib MarkupSafe more-itertools mpmath networkx nltk numpy pydub regex sympy torch torchaudio tqdm typeguard typing_extensions
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ForceAlign 
ForceAlign is a Python library for forced alignment of English text to English audio. It can generate **word** or [**phoneme**](https://en.wikipedia.org/wiki/Phoneme)-level alignments, identifying the specific time a word or phoneme was spoken within an audio recording. ForceAlign supports `.mp3` and `.wav` audio file formats.

For phoneme-level alignments, ForceAlign currently supports the [ARPABET](https://en.wikipedia.org/wiki/ARPABET) phonetic transcription encoding.

ForceAlign uses PyTorch's **Wav2Vec2** pretrained model for acoustic feature extraction and can run on both CPU and CUDA GPU devices. It now includes **automatic speech-to-text transcription**, making it even more flexible for use cases where transcripts are not readily available.

---

## Features
- Fast and accurate word and phoneme-level forced alignment of text to audio.
- Includes **automatic speech transcription** if a transcript is not provided.
- Optimized for both CPU and GPU.
- OS-independent—compatible with macOS, Windows, and Linux.
- Supports `.mp3` and `.wav` audio file formats.

---

## Installation and Dependencies
1. Install ForceAlign:
   ```bash
   pip3 install forcealign
   ```
2. Install `ffmpeg` (required for audio processing):
   - **macOS**: `brew install ffmpeg`
   - **Linux**: `sudo apt install ffmpeg`
   - **Windows**: Install from [ffmpeg.org](https://ffmpeg.org/download.html)

---

## Usage Examples

### Example 1: Getting Word-Level Text Alignments with a Provided Transcript
```python
from forcealign import ForceAlign

# Provide path to audio file and corresponding transcript
transcript = "The quick brown fox jumps over the lazy dog."
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Run prediction and return alignment results
words = align.inference()

# Show predicted word-level alignments
for word in words:
    print(f"Word: {word.word}, Start: {word.time_start}s, End: {word.time_end}s")
```

---

### Example 2: Getting Word-Level Text Alignments with Automatic Speech Transcription
If a transcript is not provided, ForceAlign can automatically generate one using Wav2Vec2.

```python
from forcealign import ForceAlign

# Provide path to audio file; omit transcript
align = ForceAlign(audio_file='./speech.mp3')

# Automatically generate transcript and align words
words = align.inference()

# Show the generated transcript
print("Generated Transcript:")
print(align.raw_text)

# Show predicted word-level alignments
for word in words:
    print(f"Word: {word.word}, Start: {word.time_start}s, End: {word.time_end}s")
```

---

### Example 3: Getting Phoneme-Level Text Alignments
```python
from forcealign import ForceAlign

# Provide path to audio file and transcript
transcript = "The quick brown fox jumps over the lazy dog."
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Run prediction and return alignment results
words = align.inference()

# Access predicted phoneme-level alignments
for word in words:
    print(f"Word: {word.word}")
    for phoneme in word.phonemes:
        print(f"Phoneme: {phoneme.phoneme}, Start: {phoneme.time_start}s, End: {phoneme.time_end}s")
```

---

### Example 4: Reviewing Word-Level Alignments in Real-Time
```python
from forcealign import ForceAlign

# Provide path to audio file and transcript
transcript = "The quick brown fox jumps over the lazy dog."
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Play the audio while printing word alignments in real-time
align.review_alignment()
```

---

## Where ForceAlign Works Well
ForceAlign excels in the following scenarios:
1. **Clear Audio Recordings**:
   - Audio with minimal background noise, clear enunciation, and consistent speaking patterns.
2. **Short and Medium-Length Recordings**:
   - Audio files up to ~30 minutes, where transcription and alignment can be processed efficiently.
3. **Standard English Pronunciation**:
   - Recordings with native or near-native English pronunciation.

---

## Where ForceAlign May Struggle
1. **Noisy Audio**:
   - Recordings with heavy background noise or overlapping speech may result in reduced transcription and alignment accuracy.
2. **Non-Standard English Accents**:
   - Strong regional accents or dialects not represented in the Wav2Vec2 training data may lead to transcription errors.
3. **Long Audio Files**:
   - For recordings exceeding ~1 hour, memory and processing time may become significant issues.
4. **Non-English Speech**:
   - ForceAlign currently supports English only.

---

## Use Cases
- **Subtitle Generation**:
  - Generate timestamps for subtitles or closed captions for videos.
- **Phoneme Analysis**:
  - Analyze phoneme-level details for language research, speech therapy, or pronunciation training.
- **Animated Lip Syncing**:
  - Use phoneme alignments to synchronize animated character lip movements with audio.
- **Accessibility Tools**:
  - Enhance accessibility by creating aligned captions or transcripts for audio recordings.

---

## FAQ

**1. Does ForceAlign have speech-to-text capabilities?**  
Yes! If you do not provide a transcript, ForceAlign will automatically generate one using Wav2Vec2. You can also provide your own transcript for better accuracy.

**2. Can ForceAlign be used with both CPU and GPU?**  
Yes. ForceAlign is optimized for both CPU and CUDA-enabled GPU devices. Using a GPU significantly speeds up processing for longer recordings.

**3. Can ForceAlign handle non-English audio?**  
No. Currently, ForceAlign supports English only. Support for additional languages may be added in future updates.

---

## Acknowledgements
This project is heavily based upon a demo from PyTorch by Moto Hira: [FORCED ALIGNMENT WITH WAV2VEC2](https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lukerbs/forcealign",
    "name": "forcealign",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "force align, forced alignment, audio segmentation, audio forced alignment, python forced alignment, phoneme, generate subtitles",
    "author": "Luke Kerbs",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/da/e1/a226b66c525c51e73a3a626a57f98941e92bcdd4103ccf94cdbe20829021/forcealign-1.1.9.tar.gz",
    "platform": null,
    "description": "# ForceAlign \nForceAlign is a Python library for forced alignment of English text to English audio. It can generate **word** or [**phoneme**](https://en.wikipedia.org/wiki/Phoneme)-level alignments, identifying the specific time a word or phoneme was spoken within an audio recording. ForceAlign supports `.mp3` and `.wav` audio file formats.\n\nFor phoneme-level alignments, ForceAlign currently supports the [ARPABET](https://en.wikipedia.org/wiki/ARPABET) phonetic transcription encoding.\n\nForceAlign uses PyTorch's **Wav2Vec2** pretrained model for acoustic feature extraction and can run on both CPU and CUDA GPU devices. It now includes **automatic speech-to-text transcription**, making it even more flexible for use cases where transcripts are not readily available.\n\n---\n\n## Features\n- Fast and accurate word and phoneme-level forced alignment of text to audio.\n- Includes **automatic speech transcription** if a transcript is not provided.\n- Optimized for both CPU and GPU.\n- OS-independent\u2014compatible with macOS, Windows, and Linux.\n- Supports `.mp3` and `.wav` audio file formats.\n\n---\n\n## Installation and Dependencies\n1. Install ForceAlign:\n   ```bash\n   pip3 install forcealign\n   ```\n2. Install `ffmpeg` (required for audio processing):\n   - **macOS**: `brew install ffmpeg`\n   - **Linux**: `sudo apt install ffmpeg`\n   - **Windows**: Install from [ffmpeg.org](https://ffmpeg.org/download.html)\n\n---\n\n## Usage Examples\n\n### Example 1: Getting Word-Level Text Alignments with a Provided Transcript\n```python\nfrom forcealign import ForceAlign\n\n# Provide path to audio file and corresponding transcript\ntranscript = \"The quick brown fox jumps over the lazy dog.\"\nalign = ForceAlign(audio_file='./speech.mp3', transcript=transcript)\n\n# Run prediction and return alignment results\nwords = align.inference()\n\n# Show predicted word-level alignments\nfor word in words:\n    print(f\"Word: {word.word}, Start: {word.time_start}s, End: {word.time_end}s\")\n```\n\n---\n\n### Example 2: Getting Word-Level Text Alignments with Automatic Speech Transcription\nIf a transcript is not provided, ForceAlign can automatically generate one using Wav2Vec2.\n\n```python\nfrom forcealign import ForceAlign\n\n# Provide path to audio file; omit transcript\nalign = ForceAlign(audio_file='./speech.mp3')\n\n# Automatically generate transcript and align words\nwords = align.inference()\n\n# Show the generated transcript\nprint(\"Generated Transcript:\")\nprint(align.raw_text)\n\n# Show predicted word-level alignments\nfor word in words:\n    print(f\"Word: {word.word}, Start: {word.time_start}s, End: {word.time_end}s\")\n```\n\n---\n\n### Example 3: Getting Phoneme-Level Text Alignments\n```python\nfrom forcealign import ForceAlign\n\n# Provide path to audio file and transcript\ntranscript = \"The quick brown fox jumps over the lazy dog.\"\nalign = ForceAlign(audio_file='./speech.mp3', transcript=transcript)\n\n# Run prediction and return alignment results\nwords = align.inference()\n\n# Access predicted phoneme-level alignments\nfor word in words:\n    print(f\"Word: {word.word}\")\n    for phoneme in word.phonemes:\n        print(f\"Phoneme: {phoneme.phoneme}, Start: {phoneme.time_start}s, End: {phoneme.time_end}s\")\n```\n\n---\n\n### Example 4: Reviewing Word-Level Alignments in Real-Time\n```python\nfrom forcealign import ForceAlign\n\n# Provide path to audio file and transcript\ntranscript = \"The quick brown fox jumps over the lazy dog.\"\nalign = ForceAlign(audio_file='./speech.mp3', transcript=transcript)\n\n# Play the audio while printing word alignments in real-time\nalign.review_alignment()\n```\n\n---\n\n## Where ForceAlign Works Well\nForceAlign excels in the following scenarios:\n1. **Clear Audio Recordings**:\n   - Audio with minimal background noise, clear enunciation, and consistent speaking patterns.\n2. **Short and Medium-Length Recordings**:\n   - Audio files up to ~30 minutes, where transcription and alignment can be processed efficiently.\n3. **Standard English Pronunciation**:\n   - Recordings with native or near-native English pronunciation.\n\n---\n\n## Where ForceAlign May Struggle\n1. **Noisy Audio**:\n   - Recordings with heavy background noise or overlapping speech may result in reduced transcription and alignment accuracy.\n2. **Non-Standard English Accents**:\n   - Strong regional accents or dialects not represented in the Wav2Vec2 training data may lead to transcription errors.\n3. **Long Audio Files**:\n   - For recordings exceeding ~1 hour, memory and processing time may become significant issues.\n4. **Non-English Speech**:\n   - ForceAlign currently supports English only.\n\n---\n\n## Use Cases\n- **Subtitle Generation**:\n  - Generate timestamps for subtitles or closed captions for videos.\n- **Phoneme Analysis**:\n  - Analyze phoneme-level details for language research, speech therapy, or pronunciation training.\n- **Animated Lip Syncing**:\n  - Use phoneme alignments to synchronize animated character lip movements with audio.\n- **Accessibility Tools**:\n  - Enhance accessibility by creating aligned captions or transcripts for audio recordings.\n\n---\n\n## FAQ\n\n**1. Does ForceAlign have speech-to-text capabilities?**  \nYes! If you do not provide a transcript, ForceAlign will automatically generate one using Wav2Vec2. You can also provide your own transcript for better accuracy.\n\n**2. Can ForceAlign be used with both CPU and GPU?**  \nYes. ForceAlign is optimized for both CPU and CUDA-enabled GPU devices. Using a GPU significantly speeds up processing for longer recordings.\n\n**3. Can ForceAlign handle non-English audio?**  \nNo. Currently, ForceAlign supports English only. Support for additional languages may be added in future updates.\n\n---\n\n## Acknowledgements\nThis project is heavily based upon a demo from PyTorch by Moto Hira: [FORCED ALIGNMENT WITH WAV2VEC2](https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python library for forced alignment of English text to English audio.",
    "version": "1.1.9",
    "project_urls": {
        "Homepage": "https://github.com/lukerbs/forcealign"
    },
    "split_keywords": [
        "force align",
        " forced alignment",
        " audio segmentation",
        " audio forced alignment",
        " python forced alignment",
        " phoneme",
        " generate subtitles"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "187f7ff5b2e4bc8a01d22482952c16b5ff4931284d94cb364dbbb6e4c594a038",
                "md5": "47d4c275e3c4daf99317247d18273ccb",
                "sha256": "1281c11e8c8c5e96fe890037bd425b0eb427435c3eee0cfa429ecc0aa4d94460"
            },
            "downloads": -1,
            "filename": "forcealign-1.1.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "47d4c275e3c4daf99317247d18273ccb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8633,
            "upload_time": "2024-12-04T06:54:03",
            "upload_time_iso_8601": "2024-12-04T06:54:03.892738Z",
            "url": "https://files.pythonhosted.org/packages/18/7f/7ff5b2e4bc8a01d22482952c16b5ff4931284d94cb364dbbb6e4c594a038/forcealign-1.1.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dae1a226b66c525c51e73a3a626a57f98941e92bcdd4103ccf94cdbe20829021",
                "md5": "60576eeb37be27b187f95f35b0a85a2c",
                "sha256": "a07418d13b33fe1a5375a933f78fb91ddbb97da5b757e7acdb30c5aa59f54a09"
            },
            "downloads": -1,
            "filename": "forcealign-1.1.9.tar.gz",
            "has_sig": false,
            "md5_digest": "60576eeb37be27b187f95f35b0a85a2c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8037,
            "upload_time": "2024-12-04T06:54:05",
            "upload_time_iso_8601": "2024-12-04T06:54:05.293486Z",
            "url": "https://files.pythonhosted.org/packages/da/e1/a226b66c525c51e73a3a626a57f98941e92bcdd4103ccf94cdbe20829021/forcealign-1.1.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-04 06:54:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lukerbs",
    "github_project": "forcealign",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "Distance",
            "specs": [
                [
                    "==",
                    "0.1.3"
                ]
            ]
        },
        {
            "name": "filelock",
            "specs": [
                [
                    "==",
                    "3.16.1"
                ]
            ]
        },
        {
            "name": "fsspec",
            "specs": [
                [
                    "==",
                    "2024.10.0"
                ]
            ]
        },
        {
            "name": "g2p-en",
            "specs": [
                [
                    "==",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "inflect",
            "specs": [
                [
                    "==",
                    "7.4.0"
                ]
            ]
        },
        {
            "name": "Jinja2",
            "specs": [
                [
                    "==",
                    "3.1.4"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "MarkupSafe",
            "specs": [
                [
                    "==",
                    "3.0.2"
                ]
            ]
        },
        {
            "name": "more-itertools",
            "specs": [
                [
                    "==",
                    "10.5.0"
                ]
            ]
        },
        {
            "name": "mpmath",
            "specs": [
                [
                    "==",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    "==",
                    "3.4.2"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    "==",
                    "3.9.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.1.3"
                ]
            ]
        },
        {
            "name": "pydub",
            "specs": [
                [
                    "==",
                    "0.25.1"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2024.11.6"
                ]
            ]
        },
        {
            "name": "sympy",
            "specs": [
                [
                    "==",
                    "1.13.1"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.5.1"
                ]
            ]
        },
        {
            "name": "torchaudio",
            "specs": [
                [
                    "==",
                    "2.5.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.67.1"
                ]
            ]
        },
        {
            "name": "typeguard",
            "specs": [
                [
                    "==",
                    "4.4.1"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        }
    ],
    "lcname": "forcealign"
}
        
Elapsed time: 1.06189s