pyfoal

Name	pyfoal JSON
Version	1.0.1 JSON
	download
home_page	https://github.com/maxrmorrison/pyfoal
Summary	Python forced aligner
upload_time	2024-04-12 23:12:24
maintainer	None
docs_url	None
author	Max Morrison
requires_python	None
license	MIT
keywords	align alignment attention duration phoneme speech word
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1 align="center">Python forced alignment</h1>
<div align="center">

[![PyPI](https://img.shields.io/pypi/v/pyfoal.svg)](https://pypi.python.org/pypi/pyfoal)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/pyfoal)](https://pepy.tech/project/pyfoal)

</div>

</div>

Forced alignment suite. Includes English grapheme-to-phoneme (G2P) and
phoneme alignment from the following forced alignment tools.
 - RAD-TTS [1]
 - Montreal Forced Aligner (MFA) [2]
 - Penn Phonetic Forced Aligner (P2FA) [3]

RAD-TTS is used by default. Alignments can be saved to disk or accessed via the
`pypar.Alignment` phoneme alignment representation. See
[`pypar`](https://github.com/maxrmorrison/pypar) for more details.

`pyfoal` also includes the following
 - Converting alignments to and from a categorical representation
   suitable for training machine learning models (`pyfoal.convert`)
 - Natural interpolation of forced alignments for time-stretching speech
   (`pyfoal.interpolate`)


## Table of contents

- [Installation](#installation)
- [Inference](#inference)
    * [Application programming interface](#application-programming-interface)
        * [`pyfoal.from_text_and_audio`](#pyfoalfrom_text_and_audio)
        * [`pyfoal.from_file`](#pyfoalfrom_file)
        * [`pyfoal.from_file_to_file`](#pyfoalfrom_file_to_file)
        * [`pyfoal.from_files_to_files`](#pyfoalfrom_files_to_files)
    * [Command-line interface](#command-line-interface)
- [Training](#training)
    * [Download](#download)
    * [Preprocess](#preprocess)
    * [Partition](#partition)
    * [Train](#train)
    * [Monitor](#monitor)
    * [Evaluate](#evaluate)
- [References](#references)


## Installation

`pip install pyfoal`

MFA and P2FA both require additional installation steps found below.


### Montreal Forced Aligner (MFA)

`conda install -c conda-forge montreal-forced-aligner`


### Penn Phonetic Forced Aligner (P2FA)

P2FA depends on the
[Hidden Markov Model Toolkit (HTK)](http://htk.eng.cam.ac.uk/), which has been
tested on Mac OS and Linux using HTK version 3.4.0. There are known issues in
using version 3.4.1 on Linux. HTK is released under a license that prohibits
redistribution, so you must install HTK yourself and verify that the commands
`HCopy` and `HVite` are available as system-wide binaries. After downloading
HTK, I use the following for installation on Linux.

```
sudo apt-get install -y gcc-multilib libx11-dev
sudo chmod +x configure
./configure --disable-hslab
make all
sudo make install
```

For more help with HTK installation, see notes by
[Jaekoo Kang](https://github.com/jaekookang/p2fa_py3#install-htk) and
[Steve Rubin](https://github.com/ucbvislab/p2fa-vislab#install-htk-34-note-341-will-not-work-get-htk-here).


## Inference

### Force-align text and audio

```python
import pyfoal

# Load text
text = pyfoal.load.text(text_file)

# Load and resample audio
audio = pyfoal.load.audio(audio_file)

# Select an aligner. One of ['mfa', 'p2fa', 'radtts' (default)].
aligner = 'radtts'

# For RAD-TTS, select a model checkpoint
checkpoint = pyfoal.DEFAULT_CHECKPOINT

# Select a GPU to run inference on
gpu = 0

alignment = pyfoal.from_text_and_audio(
    text,
    audio,
    pyfoal.SAMPLE_RATE,
    aligner=aligner,
    checkpoint=checkpoint,
    gpu=gpu)
```


### Application programming interface

#### `pyfoal.from_text_and_audio`


```
"""Phoneme-level forced-alignment

Arguments
    text : string
        The speech transcript
    audio : torch.tensor(shape=(1, samples))
        The speech signal to process
    sample_rate : int
        The audio sampling rate

Returns
    alignment : pypar.Alignment
        The forced alignment
"""
```


#### `pyfoal.from_file`

```
"""Phoneme alignment from audio and text files

Arguments
    text_file : Path
        The corresponding transcript file
    audio_file : Path
        The audio file to process
    aligner : str
        The alignment method to use
    checkpoint : Path
        The checkpoint to use for neural methods
    gpu : int
        The index of the gpu to perform alignment on for neural methods

Returns
    alignment : Alignment
        The forced alignment
"""
```


#### `pyfoal.from_file_to_file`

```
"""Perform phoneme alignment from files and save to disk

Arguments
    text_file : Path
        The corresponding transcript file
    audio_file : Path
        The audio file to process
    output_file : Path
        The file to save the alignment
    aligner : str
        The alignment method to use
    checkpoint : Path
        The checkpoint to use for neural methods
    gpu : int
        The index of the gpu to perform alignment on for neural methods
"""
```


#### `pyfoal.from_files_to_files`

```
"""Perform parallel phoneme alignment from many files and save to disk

Arguments
    text_files : list
        The transcript files
    audio_files : list
        The corresponding speech audio files
    output_files : list
        The files to save the alignments
    aligner : str
        The alignment method to use
    num_workers : int
        Number of CPU cores to utilize. Defaults to all cores.
    checkpoint : Path
        The checkpoint to use for neural methods
    gpu : int
        The index of the gpu to perform alignment on for neural methods
"""
```


### Command-line interface

```
python -m pyfoal
    [-h]
    --text_files TEXT_FILES [TEXT_FILES ...]
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
    --output_files OUTPUT_FILES [OUTPUT_FILES ...]
    [--aligner ALIGNER]
    [--num_workers NUM_WORKERS]
    [--checkpoint CHECKPOINT]
    [--gpu GPU]

Arguments:
    -h, --help
        show this help message and exit
    --text_files TEXT_FILES [TEXT_FILES ...]
        The speech transcript files
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
        The speech audio files
    --output_files OUTPUT_FILES [OUTPUT_FILES ...]
        The files to save the alignments
    --aligner ALIGNER
        The alignment method to use
    --num_workers NUM_WORKERS
        Number of CPU cores to utilize. Defaults to all cores.
    --checkpoint CHECKPOINT
        The checkpoint to use for neural methods
    --gpu GPU
        The index of the GPU to use for inference. Defaults to CPU.
```


## Training

### Download

`python -m pyfoal.data.download`

Downloads and uncompresses the `arctic` and `libritts` datasets used for training.


### Preprocess

`python -m pyfoal.data.preprocess`

Converts each dataset to a common format on disk ready for training.


### Partition

`python -m pyfoal.partition`

Generates `train` `valid`, and `test` partitions for `arctic` and `libritts`.
Partitioning is deterministic given the same random seed. You do not need to
run this step, as the original partitions are saved in
`pyfoal/assets/partitions`.


### Train

`python -m pyfoal.train --config <config> --gpus <gpus>`

Trains a model according to a given configuration on the `libritts`
dataset. Uses a list of GPU indices as an argument, and uses distributed
data parallelism (DDP) if more than one index is given. For example,
`--gpus 0 3` will train using DDP on GPUs `0` and `3`.


### Monitor

Run `tensorboard --logdir runs/`. If you are running training remotely, you
must create a SSH connection with port forwarding to view Tensorboard.
This can be done with `ssh -L 6006:localhost:6006 <user>@<server-ip-address>`.
Then, open `localhost:6006` in your browser.

### Evaluate

```
python -m pyfal.evaluate \
    --config <config> \
    --checkpoint <checkpoint> \
    --gpu <gpu>
```

Evaluate a model. `<checkpoint>` is the checkpoint file to evaluate and `<gpu>`
is the GPU index.


## References

[1] R. Badlani, A. Łańcucki, K. J. Shih, R. Valle, W. Ping, and B.
Catanzaro, "One TTS Alignment to Rule Them All," International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

[2] J. Yuan and M. Liberman, “Speaker identification on the scotus
corpus,” Journal of the Acoustical Society of America, vol. 123, p.
3878, 2008.

[3] M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger,
"Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi,"
Interspeech, vol. 2017, p. 498-502. 2017.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maxrmorrison/pyfoal",
    "name": "pyfoal",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "align, alignment, attention, duration, phoneme, speech, word",
    "author": "Max Morrison",
    "author_email": "maxrmorrison@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2c/c9/19cff9a8b51b078ebc5cc31930764a3c7ff09c7a924d53b4d9389e251735/pyfoal-1.0.1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">Python forced alignment</h1>\n<div align=\"center\">\n\n[![PyPI](https://img.shields.io/pypi/v/pyfoal.svg)](https://pypi.python.org/pypi/pyfoal)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/pyfoal)](https://pepy.tech/project/pyfoal)\n\n</div>\n\n</div>\n\nForced alignment suite. Includes English grapheme-to-phoneme (G2P) and\nphoneme alignment from the following forced alignment tools.\n - RAD-TTS [1]\n - Montreal Forced Aligner (MFA) [2]\n - Penn Phonetic Forced Aligner (P2FA) [3]\n\nRAD-TTS is used by default. Alignments can be saved to disk or accessed via the\n`pypar.Alignment` phoneme alignment representation. See\n[`pypar`](https://github.com/maxrmorrison/pypar) for more details.\n\n`pyfoal` also includes the following\n - Converting alignments to and from a categorical representation\n   suitable for training machine learning models (`pyfoal.convert`)\n - Natural interpolation of forced alignments for time-stretching speech\n   (`pyfoal.interpolate`)\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Inference](#inference)\n    * [Application programming interface](#application-programming-interface)\n        * [`pyfoal.from_text_and_audio`](#pyfoalfrom_text_and_audio)\n        * [`pyfoal.from_file`](#pyfoalfrom_file)\n        * [`pyfoal.from_file_to_file`](#pyfoalfrom_file_to_file)\n        * [`pyfoal.from_files_to_files`](#pyfoalfrom_files_to_files)\n    * [Command-line interface](#command-line-interface)\n- [Training](#training)\n    * [Download](#download)\n    * [Preprocess](#preprocess)\n    * [Partition](#partition)\n    * [Train](#train)\n    * [Monitor](#monitor)\n    * [Evaluate](#evaluate)\n- [References](#references)\n\n\n## Installation\n\n`pip install pyfoal`\n\nMFA and P2FA both require additional installation steps found below.\n\n\n### Montreal Forced Aligner (MFA)\n\n`conda install -c conda-forge montreal-forced-aligner`\n\n\n### Penn Phonetic Forced Aligner (P2FA)\n\nP2FA depends on the\n[Hidden Markov Model Toolkit (HTK)](http://htk.eng.cam.ac.uk/), which has been\ntested on Mac OS and Linux using HTK version 3.4.0. There are known issues in\nusing version 3.4.1 on Linux. HTK is released under a license that prohibits\nredistribution, so you must install HTK yourself and verify that the commands\n`HCopy` and `HVite` are available as system-wide binaries. After downloading\nHTK, I use the following for installation on Linux.\n\n```\nsudo apt-get install -y gcc-multilib libx11-dev\nsudo chmod +x configure\n./configure --disable-hslab\nmake all\nsudo make install\n```\n\nFor more help with HTK installation, see notes by\n[Jaekoo Kang](https://github.com/jaekookang/p2fa_py3#install-htk) and\n[Steve Rubin](https://github.com/ucbvislab/p2fa-vislab#install-htk-34-note-341-will-not-work-get-htk-here).\n\n\n## Inference\n\n### Force-align text and audio\n\n```python\nimport pyfoal\n\n# Load text\ntext = pyfoal.load.text(text_file)\n\n# Load and resample audio\naudio = pyfoal.load.audio(audio_file)\n\n# Select an aligner. One of ['mfa', 'p2fa', 'radtts' (default)].\naligner = 'radtts'\n\n# For RAD-TTS, select a model checkpoint\ncheckpoint = pyfoal.DEFAULT_CHECKPOINT\n\n# Select a GPU to run inference on\ngpu = 0\n\nalignment = pyfoal.from_text_and_audio(\n    text,\n    audio,\n    pyfoal.SAMPLE_RATE,\n    aligner=aligner,\n    checkpoint=checkpoint,\n    gpu=gpu)\n```\n\n\n### Application programming interface\n\n#### `pyfoal.from_text_and_audio`\n\n\n```\n\"\"\"Phoneme-level forced-alignment\n\nArguments\n    text : string\n        The speech transcript\n    audio : torch.tensor(shape=(1, samples))\n        The speech signal to process\n    sample_rate : int\n        The audio sampling rate\n\nReturns\n    alignment : pypar.Alignment\n        The forced alignment\n\"\"\"\n```\n\n\n#### `pyfoal.from_file`\n\n```\n\"\"\"Phoneme alignment from audio and text files\n\nArguments\n    text_file : Path\n        The corresponding transcript file\n    audio_file : Path\n        The audio file to process\n    aligner : str\n        The alignment method to use\n    checkpoint : Path\n        The checkpoint to use for neural methods\n    gpu : int\n        The index of the gpu to perform alignment on for neural methods\n\nReturns\n    alignment : Alignment\n        The forced alignment\n\"\"\"\n```\n\n\n#### `pyfoal.from_file_to_file`\n\n```\n\"\"\"Perform phoneme alignment from files and save to disk\n\nArguments\n    text_file : Path\n        The corresponding transcript file\n    audio_file : Path\n        The audio file to process\n    output_file : Path\n        The file to save the alignment\n    aligner : str\n        The alignment method to use\n    checkpoint : Path\n        The checkpoint to use for neural methods\n    gpu : int\n        The index of the gpu to perform alignment on for neural methods\n\"\"\"\n```\n\n\n#### `pyfoal.from_files_to_files`\n\n```\n\"\"\"Perform parallel phoneme alignment from many files and save to disk\n\nArguments\n    text_files : list\n        The transcript files\n    audio_files : list\n        The corresponding speech audio files\n    output_files : list\n        The files to save the alignments\n    aligner : str\n        The alignment method to use\n    num_workers : int\n        Number of CPU cores to utilize. Defaults to all cores.\n    checkpoint : Path\n        The checkpoint to use for neural methods\n    gpu : int\n        The index of the gpu to perform alignment on for neural methods\n\"\"\"\n```\n\n\n### Command-line interface\n\n```\npython -m pyfoal\n    [-h]\n    --text_files TEXT_FILES [TEXT_FILES ...]\n    --audio_files AUDIO_FILES [AUDIO_FILES ...]\n    --output_files OUTPUT_FILES [OUTPUT_FILES ...]\n    [--aligner ALIGNER]\n    [--num_workers NUM_WORKERS]\n    [--checkpoint CHECKPOINT]\n    [--gpu GPU]\n\nArguments:\n    -h, --help\n        show this help message and exit\n    --text_files TEXT_FILES [TEXT_FILES ...]\n        The speech transcript files\n    --audio_files AUDIO_FILES [AUDIO_FILES ...]\n        The speech audio files\n    --output_files OUTPUT_FILES [OUTPUT_FILES ...]\n        The files to save the alignments\n    --aligner ALIGNER\n        The alignment method to use\n    --num_workers NUM_WORKERS\n        Number of CPU cores to utilize. Defaults to all cores.\n    --checkpoint CHECKPOINT\n        The checkpoint to use for neural methods\n    --gpu GPU\n        The index of the GPU to use for inference. Defaults to CPU.\n```\n\n\n## Training\n\n### Download\n\n`python -m pyfoal.data.download`\n\nDownloads and uncompresses the `arctic` and `libritts` datasets used for training.\n\n\n### Preprocess\n\n`python -m pyfoal.data.preprocess`\n\nConverts each dataset to a common format on disk ready for training.\n\n\n### Partition\n\n`python -m pyfoal.partition`\n\nGenerates `train` `valid`, and `test` partitions for `arctic` and `libritts`.\nPartitioning is deterministic given the same random seed. You do not need to\nrun this step, as the original partitions are saved in\n`pyfoal/assets/partitions`.\n\n\n### Train\n\n`python -m pyfoal.train --config <config> --gpus <gpus>`\n\nTrains a model according to a given configuration on the `libritts`\ndataset. Uses a list of GPU indices as an argument, and uses distributed\ndata parallelism (DDP) if more than one index is given. For example,\n`--gpus 0 3` will train using DDP on GPUs `0` and `3`.\n\n\n### Monitor\n\nRun `tensorboard --logdir runs/`. If you are running training remotely, you\nmust create a SSH connection with port forwarding to view Tensorboard.\nThis can be done with `ssh -L 6006:localhost:6006 <user>@<server-ip-address>`.\nThen, open `localhost:6006` in your browser.\n\n### Evaluate\n\n```\npython -m pyfal.evaluate \\\n    --config <config> \\\n    --checkpoint <checkpoint> \\\n    --gpu <gpu>\n```\n\nEvaluate a model. `<checkpoint>` is the checkpoint file to evaluate and `<gpu>`\nis the GPU index.\n\n\n## References\n\n[1] R. Badlani, A. \u0141a\u0144cucki, K. J. Shih, R. Valle, W. Ping, and B.\nCatanzaro, \"One TTS Alignment to Rule Them All,\" International\nConference on Acoustics, Speech and Signal Processing (ICASSP), 2022.\n\n[2] J. Yuan and M. Liberman, \u201cSpeaker identification on the scotus\ncorpus,\u201d Journal of the Acoustical Society of America, vol. 123, p.\n3878, 2008.\n\n[3] M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger,\n\"Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi,\"\nInterspeech, vol. 2017, p. 498-502. 2017.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python forced aligner",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/maxrmorrison/pyfoal"
    },
    "split_keywords": [
        "align",
        " alignment",
        " attention",
        " duration",
        " phoneme",
        " speech",
        " word"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db826e2d86a4cfa5e817b2e5f8231eda3f8cbe97c8b1d20d2480c4d0747034aa",
                "md5": "a90f51e948e31faa52073581ae72ee0d",
                "sha256": "f03072fc3322c8b935e250e5d64b1fdf283e898e67849890781773a78d3fa598"
            },
            "downloads": -1,
            "filename": "pyfoal-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a90f51e948e31faa52073581ae72ee0d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 3326839,
            "upload_time": "2024-04-12T23:12:22",
            "upload_time_iso_8601": "2024-04-12T23:12:22.278490Z",
            "url": "https://files.pythonhosted.org/packages/db/82/6e2d86a4cfa5e817b2e5f8231eda3f8cbe97c8b1d20d2480c4d0747034aa/pyfoal-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2cc919cff9a8b51b078ebc5cc31930764a3c7ff09c7a924d53b4d9389e251735",
                "md5": "821069330cd69b1a862657db6aeab3c6",
                "sha256": "8c67293e7cdd9424aaebdf6193759f5b641bc83d60b6be26e56d58996c2dd722"
            },
            "downloads": -1,
            "filename": "pyfoal-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "821069330cd69b1a862657db6aeab3c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 3244518,
            "upload_time": "2024-04-12T23:12:24",
            "upload_time_iso_8601": "2024-04-12T23:12:24.513786Z",
            "url": "https://files.pythonhosted.org/packages/2c/c9/19cff9a8b51b078ebc5cc31930764a3c7ff09c7a924d53b4d9389e251735/pyfoal-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-12 23:12:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maxrmorrison",
    "github_project": "pyfoal",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyfoal"
}

Max Morrison