# ForceAlign
ForceAlign is a Python library for forced alignment of English text to English Audio. You can use this library to get word or [phoneme](https://en.wikipedia.org/wiki/Phoneme)-level text alignments to English audio. In short, forced alignment is the process of identifying the specific time a word (or words) was spoken within an audio recording. ForceAlign supports forced alignment for .mp3 and .wav audio file formats.
For phoneme level text alignments, ForceAlign currently only supports the [ARPABET](https://en.wikipedia.org/wiki/ARPABET) phonetic transcription encoding.
ForceAlign uses Pytorch's WAV2VEC2 pretrained model for acoustic feature extraction and can be ran on both CPU and CUDA GPU devices.
## Features
- Fast and accurate word and phoneme level forced alignment of text to audio.
- Is optimized for both CPU and GPU.
- OS independent! Use ForceAlign on Mac, Windows, and Linux.
## Installation and Dependencies
1. Pip Install ForceAlign
- `pip3 install forcealign`
2. Install ffmpeg
- Mac: `brew install ffmpeg`
- Linux: `sudo apt install ffmpeg`
- Windows: Install from [ffmpeg.org](https://ffmpeg.org/download.html)
## Usage Examples
To use ForceAlign, instantiate a ForceAlign object instance with your specified audio file and corresponding text transcript.
**Example 1: Getting Word-Level Text Alignments**
```
from forcealign import ForceAlign
# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)
# Runs prediction and returns alignment results
words = align.inference()
# Show predicted word-level alignments
for word in words:
print(word.word) # The word spoken in audio at associated time
print(word.time_start) # Time (seconds) the word starts in speech.mp3
print(word.time_end) # Time (seconds) the word ends in speech.mp3w
```
**Example 2: Getting Phoneme-Level Text Alignments**
```
from forcealign import ForceAlign
# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)
# Runs prediction and returns alignment results
words = align.inference()
# Accessing predicted phenome-level alignments
for word in words:
print(word.word)
for phoneme in word.phonemes:
print(phoneme.phoneme) # ARPABET phonome spoken in audio at associated time
print(phoneme.time_start) # Time (seconds) the phoneme starts in speech.mp3
print(phoneme.time_end) # Time (seconds) the phoneme ends in speech.mp3
```
**Example 3: Reviewing Word Level-Alignments**
You can use the review_alignment() method to check the quality of your alignment in real-time. The review_alignment() method will play the audio file and print the individual words at their predicted times. This is useful for heuristically checking the accuracy of the word-level alignment predictions.
```
from forcealign import ForceAlign
# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)
# Runs prediction and returns alignment results
words = align.inference()
# Plays audio and prints each word in real-time at predicted alignment time.
align.review_alignment()
```
## Use Cases
Forced alignment can be useful for generating subtitles for video, and for generating automated lip-syncing of animated characters with phoneme-level forced alignments.
## FAQ
**1. Does ForceAlign have speech-to-text capabilities?**
No. This is a feature that I plan on adding soon when I have time.
**2. Can ForceAlign be used with both CPU and GPU?**
Yes. Running with CPU is surprisingly fast, and it will be even faster with GPU.
## Acknowledgements
This project is heavily based upon a demo from Pytorch by Moto Hira: [FORCED ALIGNMENT WITH WAV2VEC2](https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html)
Raw data
{
"_id": null,
"home_page": "https://github.com/lukerbs/forcealign",
"name": "forcealign",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "force align, forced alignment, audio segmentation, audio forced alignment, python forced alignment, phoneme, generate subtitles",
"author": "Luke Kerbs",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/d7/a2/1b144a4958c0050868d7bed0e208159ff5be00487ef598b4a88992e79871/forcealign-1.1.7.tar.gz",
"platform": null,
"description": "# ForceAlign \nForceAlign is a Python library for forced alignment of English text to English Audio. You can use this library to get word or [phoneme](https://en.wikipedia.org/wiki/Phoneme)-level text alignments to English audio. In short, forced alignment is the process of identifying the specific time a word (or words) was spoken within an audio recording. ForceAlign supports forced alignment for .mp3 and .wav audio file formats.\n\nFor phoneme level text alignments, ForceAlign currently only supports the [ARPABET](https://en.wikipedia.org/wiki/ARPABET) phonetic transcription encoding. \n\nForceAlign uses Pytorch's WAV2VEC2 pretrained model for acoustic feature extraction and can be ran on both CPU and CUDA GPU devices.\n\n## Features\n- Fast and accurate word and phoneme level forced alignment of text to audio.\n- Is optimized for both CPU and GPU.\n- OS independent! Use ForceAlign on Mac, Windows, and Linux.\n\n## Installation and Dependencies\n1. Pip Install ForceAlign\n\t- `pip3 install forcealign`\n2. Install ffmpeg\n\t- Mac: `brew install ffmpeg`\n\t- Linux: `sudo apt install ffmpeg`\n\t- Windows: Install from [ffmpeg.org](https://ffmpeg.org/download.html)\n\n## Usage Examples\nTo use ForceAlign, instantiate a ForceAlign object instance with your specified audio file and corresponding text transcript. \n\n**Example 1: Getting Word-Level Text Alignments**\n```\nfrom forcealign import ForceAlign\n\n# Provide path to audio_file and corresponding transcript\nalign = ForceAlign(audio_file='./speech.mp3', transcript=transcript)\n\n# Runs prediction and returns alignment results\nwords = align.inference()\n\n# Show predicted word-level alignments\nfor word in words:\n\tprint(word.word) # The word spoken in audio at associated time\n\tprint(word.time_start) # Time (seconds) the word starts in speech.mp3\n\tprint(word.time_end) # Time (seconds) the word ends in speech.mp3w\n\n```\n\n**Example 2: Getting Phoneme-Level Text Alignments**\n```\nfrom forcealign import ForceAlign\n\n# Provide path to audio_file and corresponding transcript\nalign = ForceAlign(audio_file='./speech.mp3', transcript=transcript)\n\n# Runs prediction and returns alignment results\nwords = align.inference() \n\n# Accessing predicted phenome-level alignments\nfor word in words:\n\tprint(word.word)\n\tfor phoneme in word.phonemes:\n\t\tprint(phoneme.phoneme) # ARPABET phonome spoken in audio at associated time\n\t\tprint(phoneme.time_start) # Time (seconds) the phoneme starts in speech.mp3\n\t\tprint(phoneme.time_end) # Time (seconds) the phoneme ends in speech.mp3\n\n```\n\n**Example 3: Reviewing Word Level-Alignments**\n\nYou can use the review_alignment() method to check the quality of your alignment in real-time. The review_alignment() method will play the audio file and print the individual words at their predicted times. This is useful for heuristically checking the accuracy of the word-level alignment predictions.\n```\nfrom forcealign import ForceAlign\n\n# Provide path to audio_file and corresponding transcript\nalign = ForceAlign(audio_file='./speech.mp3', transcript=transcript)\n\n# Runs prediction and returns alignment results\nwords = align.inference() \n\n# Plays audio and prints each word in real-time at predicted alignment time.\nalign.review_alignment()\n\n```\n\n## Use Cases\nForced alignment can be useful for generating subtitles for video, and for generating automated lip-syncing of animated characters with phoneme-level forced alignments. \n\n## FAQ\n**1. Does ForceAlign have speech-to-text capabilities?**\nNo. This is a feature that I plan on adding soon when I have time.\n\n**2. Can ForceAlign be used with both CPU and GPU?**\nYes. Running with CPU is surprisingly fast, and it will be even faster with GPU. \n\n## Acknowledgements\nThis project is heavily based upon a demo from Pytorch by Moto Hira: [FORCED ALIGNMENT WITH WAV2VEC2](https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html)\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python library for forced alignment of English text to English audio.",
"version": "1.1.7",
"project_urls": {
"Homepage": "https://github.com/lukerbs/forcealign"
},
"split_keywords": [
"force align",
" forced alignment",
" audio segmentation",
" audio forced alignment",
" python forced alignment",
" phoneme",
" generate subtitles"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4c6363e55204ac1c88fbb3dfd5c3c2edef3d853f78eed84ea3b3d5042b4c667f",
"md5": "5de0d85815dd7cdab46db4aa0a18c23e",
"sha256": "599d6c7484bb9ad6395e6c258b43605adbd9a82c5bde112ff8a62ea69daf2c38"
},
"downloads": -1,
"filename": "forcealign-1.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5de0d85815dd7cdab46db4aa0a18c23e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6481,
"upload_time": "2024-10-15T02:23:44",
"upload_time_iso_8601": "2024-10-15T02:23:44.432611Z",
"url": "https://files.pythonhosted.org/packages/4c/63/63e55204ac1c88fbb3dfd5c3c2edef3d853f78eed84ea3b3d5042b4c667f/forcealign-1.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d7a21b144a4958c0050868d7bed0e208159ff5be00487ef598b4a88992e79871",
"md5": "7c4200f08e5594467ba2841485f102c9",
"sha256": "331fb562045dd0b074562863c36c03a65704ebdfa7a82286cf8b426c25c44a49"
},
"downloads": -1,
"filename": "forcealign-1.1.7.tar.gz",
"has_sig": false,
"md5_digest": "7c4200f08e5594467ba2841485f102c9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6252,
"upload_time": "2024-10-15T02:23:46",
"upload_time_iso_8601": "2024-10-15T02:23:46.002958Z",
"url": "https://files.pythonhosted.org/packages/d7/a2/1b144a4958c0050868d7bed0e208159ff5be00487ef598b4a88992e79871/forcealign-1.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-15 02:23:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lukerbs",
"github_project": "forcealign",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "annotated-types",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "anyio",
"specs": [
[
"==",
"4.3.0"
]
]
},
{
"name": "appnope",
"specs": [
[
"==",
"0.1.4"
]
]
},
{
"name": "argon2-cffi",
"specs": [
[
"==",
"23.1.0"
]
]
},
{
"name": "argon2-cffi-bindings",
"specs": [
[
"==",
"21.2.0"
]
]
},
{
"name": "arrow",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "asttokens",
"specs": [
[
"==",
"2.4.1"
]
]
},
{
"name": "async-lru",
"specs": [
[
"==",
"2.0.4"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"23.2.0"
]
]
},
{
"name": "Babel",
"specs": [
[
"==",
"2.14.0"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
"==",
"4.12.3"
]
]
},
{
"name": "bibtexparser",
"specs": [
[
"==",
"2.0.0b7"
]
]
},
{
"name": "bleach",
"specs": [
[
"==",
"6.1.0"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2024.2.2"
]
]
},
{
"name": "cffi",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.3.2"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "clldutils",
"specs": [
[
"==",
"3.22.2"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "colorlog",
"specs": [
[
"==",
"6.8.2"
]
]
},
{
"name": "comm",
"specs": [
[
"==",
"0.2.2"
]
]
},
{
"name": "contourpy",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "csvw",
"specs": [
[
"==",
"3.3.0"
]
]
},
{
"name": "cycler",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "debugpy",
"specs": [
[
"==",
"1.8.1"
]
]
},
{
"name": "decorator",
"specs": [
[
"==",
"5.1.1"
]
]
},
{
"name": "defusedxml",
"specs": [
[
"==",
"0.7.1"
]
]
},
{
"name": "Distance",
"specs": [
[
"==",
"0.1.3"
]
]
},
{
"name": "dlinfo",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "editdistance",
"specs": [
[
"==",
"0.8.1"
]
]
},
{
"name": "executing",
"specs": [
[
"==",
"2.0.1"
]
]
},
{
"name": "fastjsonschema",
"specs": [
[
"==",
"2.19.1"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.13.1"
]
]
},
{
"name": "fonttools",
"specs": [
[
"==",
"4.50.0"
]
]
},
{
"name": "fqdn",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2024.3.0"
]
]
},
{
"name": "g2p-en",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "h11",
"specs": [
[
"==",
"0.14.0"
]
]
},
{
"name": "httpcore",
"specs": [
[
"==",
"1.0.4"
]
]
},
{
"name": "httpx",
"specs": [
[
"==",
"0.27.0"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.6"
]
]
},
{
"name": "inflect",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "ipykernel",
"specs": [
[
"==",
"6.29.3"
]
]
},
{
"name": "isodate",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "isoduration",
"specs": [
[
"==",
"20.11.0"
]
]
},
{
"name": "jamo",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "jedi",
"specs": [
[
"==",
"0.19.1"
]
]
},
{
"name": "Jinja2",
"specs": [
[
"==",
"3.1.3"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.3.2"
]
]
},
{
"name": "JPype1",
"specs": [
[
"==",
"1.5.0"
]
]
},
{
"name": "json5",
"specs": [
[
"==",
"0.9.24"
]
]
},
{
"name": "jsonpointer",
"specs": [
[
"==",
"2.4"
]
]
},
{
"name": "jsonschema",
"specs": [
[
"==",
"4.21.1"
]
]
},
{
"name": "jsonschema-specifications",
"specs": [
[
"==",
"2023.12.1"
]
]
},
{
"name": "jupyter-events",
"specs": [
[
"==",
"0.9.1"
]
]
},
{
"name": "jupyter-lsp",
"specs": [
[
"==",
"2.2.4"
]
]
},
{
"name": "jupyter_client",
"specs": [
[
"==",
"8.6.1"
]
]
},
{
"name": "jupyter_core",
"specs": [
[
"==",
"5.7.2"
]
]
},
{
"name": "jupyter_server",
"specs": [
[
"==",
"2.13.0"
]
]
},
{
"name": "jupyter_server_terminals",
"specs": [
[
"==",
"0.5.3"
]
]
},
{
"name": "jupyterlab_pygments",
"specs": [
[
"==",
"0.3.0"
]
]
},
{
"name": "jupyterlab_server",
"specs": [
[
"==",
"2.25.4"
]
]
},
{
"name": "kiwisolver",
"specs": [
[
"==",
"1.4.5"
]
]
},
{
"name": "konlpy",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "language-tags",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "lxml",
"specs": [
[
"==",
"5.1.0"
]
]
},
{
"name": "marisa-trie",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "Markdown",
"specs": [
[
"==",
"3.6"
]
]
},
{
"name": "MarkupSafe",
"specs": [
[
"==",
"2.1.5"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.8.3"
]
]
},
{
"name": "matplotlib-inline",
"specs": [
[
"==",
"0.1.6"
]
]
},
{
"name": "mistune",
"specs": [
[
"==",
"3.0.2"
]
]
},
{
"name": "mpmath",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "munkres",
"specs": [
[
"==",
"1.1.4"
]
]
},
{
"name": "nbclient",
"specs": [
[
"==",
"0.10.0"
]
]
},
{
"name": "nbconvert",
"specs": [
[
"==",
"7.16.2"
]
]
},
{
"name": "nbformat",
"specs": [
[
"==",
"5.10.3"
]
]
},
{
"name": "nest-asyncio",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.2.1"
]
]
},
{
"name": "nltk",
"specs": [
[
"==",
"3.8.1"
]
]
},
{
"name": "notebook_shim",
"specs": [
[
"==",
"0.2.4"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "overrides",
"specs": [
[
"==",
"7.7.0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"24.0"
]
]
},
{
"name": "pandocfilters",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "panphon",
"specs": [
[
"==",
"0.20.0"
]
]
},
{
"name": "parso",
"specs": [
[
"==",
"0.8.3"
]
]
},
{
"name": "pexpect",
"specs": [
[
"==",
"4.9.0"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"10.2.0"
]
]
},
{
"name": "platformdirs",
"specs": [
[
"==",
"4.2.0"
]
]
},
{
"name": "prometheus_client",
"specs": [
[
"==",
"0.20.0"
]
]
},
{
"name": "prompt-toolkit",
"specs": [
[
"==",
"3.0.43"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"5.9.8"
]
]
},
{
"name": "ptyprocess",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "pure-eval",
"specs": [
[
"==",
"0.2.2"
]
]
},
{
"name": "pycparser",
"specs": [
[
"==",
"2.21"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.6.4"
]
]
},
{
"name": "pydantic_core",
"specs": [
[
"==",
"2.16.3"
]
]
},
{
"name": "pydub",
"specs": [
[
"==",
"0.25.1"
]
]
},
{
"name": "Pygments",
"specs": [
[
"==",
"2.17.2"
]
]
},
{
"name": "pylatexenc",
"specs": [
[
"==",
"2.10"
]
]
},
{
"name": "pyparsing",
"specs": [
[
"==",
"3.1.2"
]
]
},
{
"name": "pyphen",
"specs": [
[
"==",
"0.14.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "python-json-logger",
"specs": [
[
"==",
"2.0.7"
]
]
},
{
"name": "python-mecab-ko",
"specs": [
[
"==",
"1.3.3"
]
]
},
{
"name": "python-mecab-ko-dic",
"specs": [
[
"==",
"2.1.1.post2"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0.1"
]
]
},
{
"name": "pyzmq",
"specs": [
[
"==",
"25.1.2"
]
]
},
{
"name": "rdflib",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "referencing",
"specs": [
[
"==",
"0.34.0"
]
]
},
{
"name": "regex",
"specs": [
[
"==",
"2023.12.25"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.31.0"
]
]
},
{
"name": "rfc3339-validator",
"specs": [
[
"==",
"0.1.4"
]
]
},
{
"name": "rfc3986",
"specs": [
[
"==",
"1.5.0"
]
]
},
{
"name": "rfc3986-validator",
"specs": [
[
"==",
"0.1.1"
]
]
},
{
"name": "rpds-py",
"specs": [
[
"==",
"0.18.0"
]
]
},
{
"name": "segments",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "Send2Trash",
"specs": [
[
"==",
"1.8.2"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"69.2.0"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "sniffio",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "soupsieve",
"specs": [
[
"==",
"2.5"
]
]
},
{
"name": "stack-data",
"specs": [
[
"==",
"0.6.3"
]
]
},
{
"name": "sympy",
"specs": [
[
"==",
"1.12"
]
]
},
{
"name": "tabulate",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "terminado",
"specs": [
[
"==",
"0.18.1"
]
]
},
{
"name": "tinycss2",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "torchaudio",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "torchvision",
"specs": [
[
"==",
"0.17.1"
]
]
},
{
"name": "tornado",
"specs": [
[
"==",
"6.4"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.2"
]
]
},
{
"name": "traitlets",
"specs": [
[
"==",
"5.14.2"
]
]
},
{
"name": "types-python-dateutil",
"specs": [
[
"==",
"2.9.0.20240316"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.10.0"
]
]
},
{
"name": "unicodecsv",
"specs": [
[
"==",
"0.14.1"
]
]
},
{
"name": "uri-template",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "uritemplate",
"specs": [
[
"==",
"4.1.1"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "wcwidth",
"specs": [
[
"==",
"0.2.13"
]
]
},
{
"name": "webcolors",
"specs": [
[
"==",
"1.13"
]
]
},
{
"name": "webencodings",
"specs": [
[
"==",
"0.5.1"
]
]
},
{
"name": "websocket-client",
"specs": [
[
"==",
"1.7.0"
]
]
}
],
"lcname": "forcealign"
}