<h1 align="center">High-Fidelity Neural Phonetic Posteriorgrams</h1>
<div align="center">
[![PyPI](https://img.shields.io/pypi/v/ppgs.svg)](https://pypi.python.org/pypi/ppgs)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/ppgs)](https://pepy.tech/project/ppgs)
Training, evaluation, and inference of neural phonetic posteriorgrams (PPGs) in PyTorch
[[Paper]](https://www.maxrmorrison.com/pdfs/churchwell2024high.pdf) [[Website]](https://www.maxrmorrison.com/sites/ppgs/)
</div>
## Table of contents
- [Installation](#installation)
- [Inference](#inference)
* [Application programming interface (API)](#application-programming-interface-api)
* [`ppgs.from_audio`](#ppgsfrom_audio)
* [`ppgs.from_file`](#ppgsfrom_file)
* [`ppgs.from_file_to_file`](#ppgsfrom_file_to_file)
* [`ppgs.from_files_to_files`](#ppgsfrom_files_to_files)
* [`ppgs.from_paths_to_paths`](#ppgsfrom_paths_to_paths)
* [Command-line interface (CLI)](#command-line-interface-cli)
- [Distance](#distance)
- [Interpolate](#interpolate)
- [Edit](#edit)
* [`ppgs.edit.grid.constant`](#ppgseditgridconstant)
* [`ppgs.edit.grid.from_alignments`](#ppgseditgridfrom_alignments)
* [`ppgs.edit.grid.of_length`](#ppgseditgridof_length)
* [`ppgs.edit.grid.sample`](#ppgseditgridsample)
* [`ppgs.edit.reallocate`](#ppgseditreallocate)
* [`ppgs.edit.regex`](#ppgseditregex)
* [`ppgs.edit.shift`](#ppgseditshift)
* [`ppgs.edit.swap`](#ppgseditswap)
- [Sparsify](#sparsify)
- [Training](#training)
* [Download](#download)
* [Preprocess](#preprocess)
* [Partition](#partition)
* [Train](#train)
* [Monitor](#monitor)
* [Evaluate](#evaluate)
- [Citation](#citation)
## Installation
An inference-only installation with our best model is pip-installable
`pip install ppgs`
To perform training, install training dependencies and FFMPEG.
```bash
pip install ppgs[train]
conda install -c conda-forge ffmpeg
``````
If you wish to use the Charsiu representation, download the code,
install both inference and training dependencies, and install
Charsiu as a Git submodule.
```bash
# Clone
git clone git@github.com/interactiveaudiolab/ppgs
cd ppgs/
# Install dependencies
pip install -e .[train]
conda install -c conda-forge ffmpeg
# Download Charsiu
git submodule init
git submodule update
```
## Inference
```python
import ppgs
# Load speech audio at correct sample rate
audio = ppgs.load.audio(audio_file)
# Choose a gpu index to use for inference. Set to None to use cpu.
gpu = 0
# Infer PPGs
ppgs = ppgs.from_audio(audio, ppgs.SAMPLE_RATE, gpu=gpu)
```
### Application programming interface (API)
#### `ppgs.from_audio`
```python
def from_audio(
audio: torch.Tensor,
sample_rate: Union[int, float],
checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
gpu: int = None
) -> torch.Tensor:
"""Infer ppgs from audio
Arguments
audio
Batched audio to process
shape=(batch, 1, samples)
sample_rate
Audio sampling rate
checkpoint
The checkpoint file
gpu
The index of the GPU to use for inference
Returns
ppgs
Phonetic posteriorgrams
shape=(batch, len(ppgs.PHONEMES), frames)
"""
```
#### `ppgs.from_file`
```python
def from_file(
file: Union[str, bytes, os.PathLike],
checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
gpu: Optional[int] = None
) -> torch.Tensor:
"""Infer ppgs from an audio file
Arguments
file
The audio file
checkpoint
The checkpoint file
gpu
The index of the GPU to use for inference
Returns
ppgs
Phonetic posteriorgram
shape=(len(ppgs.PHONEMES), frames)
"""
```
#### `ppgs.from_file_to_file`
```python
def from_file_to_file(
audio_file: Union[str, bytes, os.PathLike],
output_file: Union[str, bytes, os.PathLike],
checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
gpu: Optional[int] = None
) -> None:
"""Infer ppg from an audio file and save to a torch tensor file
Arguments
audio_file
The audio file
output_file
The .pt file to save PPGs
checkpoint
The checkpoint file
gpu
The index of the GPU to use for inference
"""
```
#### `ppgs.from_files_to_files`
```python
def from_files_to_files(
audio_files: List[Union[str, bytes, os.PathLike]],
output_files: List[Union[str, bytes, os.PathLike]],
checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
num_workers: int = ppgs.NUM_WORKERS,
gpu: Optional[int] = None,
max_frames: int = ppgs.MAX_INFERENCE_FRAMES
) -> None:
"""Infer ppgs from audio files and save to torch tensor files
Arguments
audio_files
The audio files
output_files
The .pt files to save PPGs
checkpoint
The checkpoint file
num_workers
Number of CPU threads for multiprocessing
gpu
The index of the GPU to use for inference
max_frames
The maximum number of frames on the GPU at once
"""
```
#### `ppgs.from_paths_to_paths`
```python
def from_paths_to_paths(
input_paths: List[Union[str, bytes, os.PathLike]],
output_paths: Optional[List[Union[str, bytes, os.PathLike]]] = None,
extensions: Optional[List[str]] = None,
checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
num_workers: int = ppgs.NUM_WORKERS,
gpu: Optional[int] = None,
max_frames: int = ppgs.MAX_INFERENCE_FRAMES
) -> None:
"""Infer ppgs from audio files and save to torch tensor files
Arguments
input_paths
Paths to audio files and/or directories
output_paths
The one-to-one corresponding outputs
extensions
Extensions to glob for in directories
checkpoint
The checkpoint file
num_workers
Number of CPU threads for multiprocessing
gpu
The index of the GPU to use for inference
max_frames
The maximum number of frames on the GPU at once
"""
```
### Command-line interface (CLI)
```
usage: python -m ppgs
[-h]
[--input_paths INPUT_PATHS [INPUT_PATHS ...]]
[--output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]]
[--extensions EXTENSIONS [EXTENSIONS ...]]
[--checkpoint CHECKPOINT]
[--num-workers NUM_WORKERS]
[--gpu GPU]
[--max-frames MAX_TRAINING_FRAMES]
arguments:
--input_paths INPUT_PATHS [INPUT_PATHS ...]
Paths to audio files and/or directories
optional arguments:
-h, --help
Show this help message and exit
--output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]
The one-to-one corresponding output paths
--extensions EXTENSIONS [EXTENSIONS ...]
Extensions to glob for in directories
--checkpoint CHECKPOINT
The checkpoint file
--num-workers NUM_WORKERS
Number of CPU threads for multiprocessing
--gpu GPU
The index of the GPU to use for inference. Defaults to CPU.
```
## Distance
To compute the proposed normalized Jenson-Shannon divergence pronunciation
distance between two PPGs, use `ppgs.distance()`.
```python
def distance(
ppgX: torch.Tensor,
ppgY: torch.Tensor,
reduction: str = 'mean',
normalize: bool = True
) -> torch.Tensor:
"""Compute the pronunciation distance between two aligned PPGs
Arguments
ppgX
Input PPG X
shape=(len(ppgs.PHONEMES), frames)
ppgY
Input PPG Y to compare with PPG X
shape=(len(ppgs.PHONEMES), frames)
reduction
Reduction to apply to the output. One of ['mean', 'none', 'sum'].
normalize
Apply similarity based normalization
Returns
Normalized Jenson-shannon divergence between PPGs
"""
```
## Interpolate
```python
def interpolate(
ppgX: torch.Tensor,
ppgY: torch.Tensor,
interp: Union[float, torch.Tensor]
) -> torch.Tensor:
"""Spherical linear interpolation
Arguments
ppgX
Input PPG X
shape=(len(ppgs.PHONEMES), frames)
ppgY
Input PPG Y
shape=(len(ppgs.PHONEMES), frames)
interp
Interpolation values
scalar float OR shape=(frames,)
Returns
Interpolated PPGs
shape=(len(ppgs.PHONEMES), frames)
"""
```
## Edit
```python
import ppgs
# Get PPGs to edit
ppg = ppgs.from_file(audio_file, gpu=gpu)
# Constant-ratio time-stretching (slowing down)
grid = ppgs.edit.grid.constant(ppg, ratio=0.8)
slow = ppgs.edit.grid.sample(ppg, grid)
# Stretch to a desired length (e.g., 100 frames)
grid = ppgs.edit.grid.of_length(ppg, 100)
fixed = ppgs.edit.grid.sample(ppg, grid)
```
### `ppgs.edit.grid.constant`
```python
def constant(ppg: torch.Tensor, ratio: float) -> torch.Tensor:
"""Create a grid for constant-ratio time-stretching
Arguments
ppg
Input PPG
ratio
Time-stretching ratio; lower is slower
Returns
Constant-ratio grid for time-stretching ppg
"""
```
### `ppgs.edit.grid.from_alignments`
```python
def from_alignments(
source: pypar.Alignment,
target: pypar.Alignment,
sample_rate: int = ppgs.SAMPLE_RATE,
hopsize: int = ppgs.HOPSIZE
) -> torch.Tensor:
"""Create time-stretch grid to convert source alignment to target
Arguments
source
Forced alignment of PPG to stretch
target
Forced alignment of target PPG
sample_rate
Audio sampling rate
hopsize
Hopsize in samples
Returns
Grid for time-stretching source PPG
"""
```
### `ppgs.edit.grid.of_length`
```python
def of_length(ppg: torch.Tensor, length: int) -> torch.Tensor:
"""Create time-stretch grid to resample PPG to a specified length
Arguments
ppg
Input PPG
length
Target length
Returns
Grid of specified length for time-stretching ppg
"""
```
### `ppgs.edit.grid.sample`
```python
def grid_sample(ppg: torch.Tensor, grid: torch.Tensor) -> torch.Tensor:
"""Grid-based PPG interpolation
Arguments
ppg
Input PPG
grid
Grid of desired length; each item is a float-valued index into ppg
Returns
Interpolated PPG
"""
```
### `ppgs.edit.reallocate`
```python
def reallocate(
ppg: torch.Tensor,
source: str,
target: str,
value: Optional[float] = None
) -> torch.Tensor:
"""Reallocate probability from source phoneme to target phoneme
Arguments
ppg
Input PPG
shape=(len(ppgs.PHONEMES), frames)
source
Source phoneme
target
Target phoneme
value
Max amount to reallocate. If None, reallocates all probability.
Returns
Edited PPG
"""
```
### `ppgs.edit.regex`
```python
def regex(
ppg: torch.Tensor,
source_phonemes: List[str],
target_phonemes: List[str]
) -> torch.Tensor:
"""Regex match and replace (via swap) for phoneme sequences
Arguments
ppg
Input PPG
shape=(len(ppgs.PHONEMES), frames)
source_phonemes
Source phoneme sequence
target_phonemes
Target phoneme sequence
Returns
Edited PPG
"""
```
### `ppgs.edit.shift`
```python
def shift(ppg: torch.Tensor, phoneme: str, value: float):
"""Shift probability of a phoneme and reallocate proportionally
Arguments
ppg
Input PPG
shape=(len(ppgs.PHONEMES), frames)
phoneme
Input phoneme
value
Maximal shift amount
Returns
Edited PPG
"""
```
### `ppgs.edit.swap`
```python
def swap(ppg: torch.Tensor, phonemeA: str, phonemeB: str) -> torch.Tensor:
"""Swap the probabilities of two phonemes
Arguments
ppg
Input PPG
shape=(len(ppg.PHONEMES), frames)
phonemeA
Input phoneme A
phonemeB
Input phoneme B
Returns
Edited PPG
"""
```
## Sparsify
```python
def sparsify(
ppg: torch.Tensor,
method: str='percentile',
threshold: Union[float, int]=0.85
) -> torch.Tensor:
"""Make phonetic posteriorgrams sparse
Arguments
ppg
Input PPG
shape=(*, len(ppgs.PHONEMES), frames)
method
Sparsification method. One of ['constant', 'percentile', 'topk'].
threshold
In [0, 1] for 'contant' and 'percentile'; integer > 0 for 'topk'.
Returns
Sparse phonetic posteriorgram
shape=(*, len(ppgs.PHONEMES), frames)
"""
```
## Training
### Download
Downloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.
Stores formatted datasets in `data/cache/`.
**N.B.** Common voice and TIMIT cannot be automatically downloaded. You must
manually download the tarballs and place them in `data/sources/commonvoice`
or `data/sources/timit`, respectively, prior to running the following.
```bash
python -m ppgs.data.download --datasets <datasets>
```
### Preprocess
Prepares representations for training. Representations are stored
in `data/cache/`.
```
python -m ppgs.data.preprocess \
--datasets <datasets> \
--representatations <representations> \
--gpu <gpu> \
--num-workers <workers>
```
### Partition
Partitions a dataset. You should not need to run this, as the partitions
used in our work are provided for each dataset in
`ppgs/assets/partitions/`.
```
python -m ppgs.partition --datasets <datasets>
```
### Train
Trains a model. Checkpoints and logs are stored in `runs/`.
```
python -m ppgs.train --config <config> --dataset <dataset> --gpu <gpu>
```
If the config file has been previously run, the most recent checkpoint will
automatically be loaded and training will resume from that checkpoint.
### Monitor
You can monitor training via `tensorboard`.
```
tensorboard --logdir runs/ --port <port> --load_fast true
```
To use the `torchutil` notification system to receive notifications for long
jobs (download, preprocess, train, and evaluate), set the
`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as
explained in [the Apprise documentation](https://pypi.org/project/apprise/).
### Evaluate
Performs objective evaluation of phoneme accuracy. Results are stored
in `eval/`.
```
python -m ppgs.evaluate --config <name> --datasets <datasets> --gpu <gpu>
```
## Citation
### IEEE
C. Churchwell, M. Morrison, and B. Pardo, "High-Fidelity Neural Phonetic Posteriorgrams,"
ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio, April 2024.
### BibTex
```
@inproceedings{churchwell2024high,
title={High-Fidelity Neural Phonetic Posteriorgrams},
author={Churchwell, Cameron and Morrison, Max and Pardo, Bryan},
booktitle={ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio},
month={April},
year={2024}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/interactiveaudiolab/ppgs",
"name": "ppgs",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "phonemes,ppg,pronunciation,speech",
"author": "Interactive Audio Lab",
"author_email": "interactiveaudiolab@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e2/57/8f1aee02de7b4246ba20d828f9f8924f67b594409ff2f850b72c0b45ea58/ppgs-0.0.3.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">High-Fidelity Neural Phonetic Posteriorgrams</h1>\n<div align=\"center\">\n\n[![PyPI](https://img.shields.io/pypi/v/ppgs.svg)](https://pypi.python.org/pypi/ppgs)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/ppgs)](https://pepy.tech/project/ppgs)\n\nTraining, evaluation, and inference of neural phonetic posteriorgrams (PPGs) in PyTorch\n\n[[Paper]](https://www.maxrmorrison.com/pdfs/churchwell2024high.pdf) [[Website]](https://www.maxrmorrison.com/sites/ppgs/)\n</div>\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Inference](#inference)\n * [Application programming interface (API)](#application-programming-interface-api)\n * [`ppgs.from_audio`](#ppgsfrom_audio)\n * [`ppgs.from_file`](#ppgsfrom_file)\n * [`ppgs.from_file_to_file`](#ppgsfrom_file_to_file)\n * [`ppgs.from_files_to_files`](#ppgsfrom_files_to_files)\n * [`ppgs.from_paths_to_paths`](#ppgsfrom_paths_to_paths)\n * [Command-line interface (CLI)](#command-line-interface-cli)\n- [Distance](#distance)\n- [Interpolate](#interpolate)\n- [Edit](#edit)\n * [`ppgs.edit.grid.constant`](#ppgseditgridconstant)\n * [`ppgs.edit.grid.from_alignments`](#ppgseditgridfrom_alignments)\n * [`ppgs.edit.grid.of_length`](#ppgseditgridof_length)\n * [`ppgs.edit.grid.sample`](#ppgseditgridsample)\n * [`ppgs.edit.reallocate`](#ppgseditreallocate)\n * [`ppgs.edit.regex`](#ppgseditregex)\n * [`ppgs.edit.shift`](#ppgseditshift)\n * [`ppgs.edit.swap`](#ppgseditswap)\n- [Sparsify](#sparsify)\n- [Training](#training)\n * [Download](#download)\n * [Preprocess](#preprocess)\n * [Partition](#partition)\n * [Train](#train)\n * [Monitor](#monitor)\n * [Evaluate](#evaluate)\n- [Citation](#citation)\n\n\n## Installation\n\nAn inference-only installation with our best model is pip-installable\n\n`pip install ppgs`\n\nTo perform training, install training dependencies and FFMPEG.\n\n```bash\npip install ppgs[train]\nconda install -c conda-forge ffmpeg\n``````\n\nIf you wish to use the Charsiu representation, download the code,\ninstall both inference and training dependencies, and install\nCharsiu as a Git submodule.\n\n```bash\n# Clone\ngit clone git@github.com/interactiveaudiolab/ppgs\ncd ppgs/\n\n# Install dependencies\npip install -e .[train]\nconda install -c conda-forge ffmpeg\n\n# Download Charsiu\ngit submodule init\ngit submodule update\n```\n\n\n## Inference\n\n```python\nimport ppgs\n\n# Load speech audio at correct sample rate\naudio = ppgs.load.audio(audio_file)\n\n# Choose a gpu index to use for inference. Set to None to use cpu.\ngpu = 0\n\n# Infer PPGs\nppgs = ppgs.from_audio(audio, ppgs.SAMPLE_RATE, gpu=gpu)\n```\n\n\n### Application programming interface (API)\n\n#### `ppgs.from_audio`\n\n```python\ndef from_audio(\n audio: torch.Tensor,\n sample_rate: Union[int, float],\n checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n gpu: int = None\n) -> torch.Tensor:\n \"\"\"Infer ppgs from audio\n\n Arguments\n audio\n Batched audio to process\n shape=(batch, 1, samples)\n sample_rate\n Audio sampling rate\n checkpoint\n The checkpoint file\n gpu\n The index of the GPU to use for inference\n\n Returns\n ppgs\n Phonetic posteriorgrams\n shape=(batch, len(ppgs.PHONEMES), frames)\n \"\"\"\n```\n\n\n#### `ppgs.from_file`\n\n```python\ndef from_file(\n file: Union[str, bytes, os.PathLike],\n checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n gpu: Optional[int] = None\n) -> torch.Tensor:\n \"\"\"Infer ppgs from an audio file\n\n Arguments\n file\n The audio file\n checkpoint\n The checkpoint file\n gpu\n The index of the GPU to use for inference\n\n Returns\n ppgs\n Phonetic posteriorgram\n shape=(len(ppgs.PHONEMES), frames)\n \"\"\"\n```\n\n\n#### `ppgs.from_file_to_file`\n\n```python\ndef from_file_to_file(\n audio_file: Union[str, bytes, os.PathLike],\n output_file: Union[str, bytes, os.PathLike],\n checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n gpu: Optional[int] = None\n) -> None:\n \"\"\"Infer ppg from an audio file and save to a torch tensor file\n\n Arguments\n audio_file\n The audio file\n output_file\n The .pt file to save PPGs\n checkpoint\n The checkpoint file\n gpu\n The index of the GPU to use for inference\n \"\"\"\n```\n\n\n#### `ppgs.from_files_to_files`\n\n```python\ndef from_files_to_files(\n audio_files: List[Union[str, bytes, os.PathLike]],\n output_files: List[Union[str, bytes, os.PathLike]],\n checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n num_workers: int = ppgs.NUM_WORKERS,\n gpu: Optional[int] = None,\n max_frames: int = ppgs.MAX_INFERENCE_FRAMES\n) -> None:\n \"\"\"Infer ppgs from audio files and save to torch tensor files\n\n Arguments\n audio_files\n The audio files\n output_files\n The .pt files to save PPGs\n checkpoint\n The checkpoint file\n num_workers\n Number of CPU threads for multiprocessing\n gpu\n The index of the GPU to use for inference\n max_frames\n The maximum number of frames on the GPU at once\n \"\"\"\n```\n\n\n#### `ppgs.from_paths_to_paths`\n\n```python\ndef from_paths_to_paths(\n input_paths: List[Union[str, bytes, os.PathLike]],\n output_paths: Optional[List[Union[str, bytes, os.PathLike]]] = None,\n extensions: Optional[List[str]] = None,\n checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n num_workers: int = ppgs.NUM_WORKERS,\n gpu: Optional[int] = None,\n max_frames: int = ppgs.MAX_INFERENCE_FRAMES\n) -> None:\n \"\"\"Infer ppgs from audio files and save to torch tensor files\n\n Arguments\n input_paths\n Paths to audio files and/or directories\n output_paths\n The one-to-one corresponding outputs\n extensions\n Extensions to glob for in directories\n checkpoint\n The checkpoint file\n num_workers\n Number of CPU threads for multiprocessing\n gpu\n The index of the GPU to use for inference\n max_frames\n The maximum number of frames on the GPU at once\n \"\"\"\n```\n\n\n### Command-line interface (CLI)\n\n```\nusage: python -m ppgs\n [-h]\n [--input_paths INPUT_PATHS [INPUT_PATHS ...]]\n [--output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]]\n [--extensions EXTENSIONS [EXTENSIONS ...]]\n [--checkpoint CHECKPOINT]\n [--num-workers NUM_WORKERS]\n [--gpu GPU]\n [--max-frames MAX_TRAINING_FRAMES]\n\narguments:\n --input_paths INPUT_PATHS [INPUT_PATHS ...]\n Paths to audio files and/or directories\n\noptional arguments:\n -h, --help\n Show this help message and exit\n --output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]\n The one-to-one corresponding output paths\n --extensions EXTENSIONS [EXTENSIONS ...]\n Extensions to glob for in directories\n --checkpoint CHECKPOINT\n The checkpoint file\n --num-workers NUM_WORKERS\n Number of CPU threads for multiprocessing\n --gpu GPU\n The index of the GPU to use for inference. Defaults to CPU.\n```\n\n\n## Distance\n\nTo compute the proposed normalized Jenson-Shannon divergence pronunciation\ndistance between two PPGs, use `ppgs.distance()`.\n\n```python\ndef distance(\n ppgX: torch.Tensor,\n ppgY: torch.Tensor,\n reduction: str = 'mean',\n normalize: bool = True\n) -> torch.Tensor:\n \"\"\"Compute the pronunciation distance between two aligned PPGs\n\n Arguments\n ppgX\n Input PPG X\n shape=(len(ppgs.PHONEMES), frames)\n ppgY\n Input PPG Y to compare with PPG X\n shape=(len(ppgs.PHONEMES), frames)\n reduction\n Reduction to apply to the output. One of ['mean', 'none', 'sum'].\n normalize\n Apply similarity based normalization\n\n Returns\n Normalized Jenson-shannon divergence between PPGs\n \"\"\"\n```\n\n\n## Interpolate\n\n```python\ndef interpolate(\n ppgX: torch.Tensor,\n ppgY: torch.Tensor,\n interp: Union[float, torch.Tensor]\n) -> torch.Tensor:\n \"\"\"Spherical linear interpolation\n\n Arguments\n ppgX\n Input PPG X\n shape=(len(ppgs.PHONEMES), frames)\n ppgY\n Input PPG Y\n shape=(len(ppgs.PHONEMES), frames)\n interp\n Interpolation values\n scalar float OR shape=(frames,)\n\n Returns\n Interpolated PPGs\n shape=(len(ppgs.PHONEMES), frames)\n \"\"\"\n```\n\n\n## Edit\n\n```python\nimport ppgs\n\n# Get PPGs to edit\nppg = ppgs.from_file(audio_file, gpu=gpu)\n\n# Constant-ratio time-stretching (slowing down)\ngrid = ppgs.edit.grid.constant(ppg, ratio=0.8)\nslow = ppgs.edit.grid.sample(ppg, grid)\n\n# Stretch to a desired length (e.g., 100 frames)\ngrid = ppgs.edit.grid.of_length(ppg, 100)\nfixed = ppgs.edit.grid.sample(ppg, grid)\n```\n\n\n### `ppgs.edit.grid.constant`\n\n```python\ndef constant(ppg: torch.Tensor, ratio: float) -> torch.Tensor:\n \"\"\"Create a grid for constant-ratio time-stretching\n\n Arguments\n ppg\n Input PPG\n ratio\n Time-stretching ratio; lower is slower\n\n Returns\n Constant-ratio grid for time-stretching ppg\n \"\"\"\n```\n\n\n### `ppgs.edit.grid.from_alignments`\n\n```python\ndef from_alignments(\n source: pypar.Alignment,\n target: pypar.Alignment,\n sample_rate: int = ppgs.SAMPLE_RATE,\n hopsize: int = ppgs.HOPSIZE\n) -> torch.Tensor:\n \"\"\"Create time-stretch grid to convert source alignment to target\n\n Arguments\n source\n Forced alignment of PPG to stretch\n target\n Forced alignment of target PPG\n sample_rate\n Audio sampling rate\n hopsize\n Hopsize in samples\n\n Returns\n Grid for time-stretching source PPG\n \"\"\"\n```\n\n\n### `ppgs.edit.grid.of_length`\n\n```python\ndef of_length(ppg: torch.Tensor, length: int) -> torch.Tensor:\n \"\"\"Create time-stretch grid to resample PPG to a specified length\n\n Arguments\n ppg\n Input PPG\n length\n Target length\n\n Returns\n Grid of specified length for time-stretching ppg\n \"\"\"\n```\n\n\n### `ppgs.edit.grid.sample`\n\n```python\ndef grid_sample(ppg: torch.Tensor, grid: torch.Tensor) -> torch.Tensor:\n \"\"\"Grid-based PPG interpolation\n\n Arguments\n ppg\n Input PPG\n grid\n Grid of desired length; each item is a float-valued index into ppg\n\n Returns\n Interpolated PPG\n \"\"\"\n```\n\n\n### `ppgs.edit.reallocate`\n\n```python\ndef reallocate(\n ppg: torch.Tensor,\n source: str,\n target: str,\n value: Optional[float] = None\n) -> torch.Tensor:\n \"\"\"Reallocate probability from source phoneme to target phoneme\n\n Arguments\n ppg\n Input PPG\n shape=(len(ppgs.PHONEMES), frames)\n source\n Source phoneme\n target\n Target phoneme\n value\n Max amount to reallocate. If None, reallocates all probability.\n\n Returns\n Edited PPG\n \"\"\"\n```\n\n\n### `ppgs.edit.regex`\n\n```python\ndef regex(\n ppg: torch.Tensor,\n source_phonemes: List[str],\n target_phonemes: List[str]\n) -> torch.Tensor:\n \"\"\"Regex match and replace (via swap) for phoneme sequences\n\n Arguments\n ppg\n Input PPG\n shape=(len(ppgs.PHONEMES), frames)\n source_phonemes\n Source phoneme sequence\n target_phonemes\n Target phoneme sequence\n\n Returns\n Edited PPG\n \"\"\"\n```\n\n\n### `ppgs.edit.shift`\n\n```python\ndef shift(ppg: torch.Tensor, phoneme: str, value: float):\n \"\"\"Shift probability of a phoneme and reallocate proportionally\n\n Arguments\n ppg\n Input PPG\n shape=(len(ppgs.PHONEMES), frames)\n phoneme\n Input phoneme\n value\n Maximal shift amount\n\n Returns\n Edited PPG\n \"\"\"\n```\n\n\n### `ppgs.edit.swap`\n\n```python\ndef swap(ppg: torch.Tensor, phonemeA: str, phonemeB: str) -> torch.Tensor:\n \"\"\"Swap the probabilities of two phonemes\n\n Arguments\n ppg\n Input PPG\n shape=(len(ppg.PHONEMES), frames)\n phonemeA\n Input phoneme A\n phonemeB\n Input phoneme B\n\n Returns\n Edited PPG\n \"\"\"\n```\n\n## Sparsify\n\n```python\ndef sparsify(\n ppg: torch.Tensor,\n method: str='percentile',\n threshold: Union[float, int]=0.85\n) -> torch.Tensor:\n \"\"\"Make phonetic posteriorgrams sparse\n\n Arguments\n ppg\n Input PPG\n shape=(*, len(ppgs.PHONEMES), frames)\n method\n Sparsification method. One of ['constant', 'percentile', 'topk'].\n threshold\n In [0, 1] for 'contant' and 'percentile'; integer > 0 for 'topk'.\n\n Returns\n Sparse phonetic posteriorgram\n shape=(*, len(ppgs.PHONEMES), frames)\n \"\"\"\n```\n\n\n## Training\n\n### Download\n\nDownloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.\nStores formatted datasets in `data/cache/`.\n\n**N.B.** Common voice and TIMIT cannot be automatically downloaded. You must\nmanually download the tarballs and place them in `data/sources/commonvoice`\nor `data/sources/timit`, respectively, prior to running the following.\n\n```bash\npython -m ppgs.data.download --datasets <datasets>\n```\n\n\n### Preprocess\n\nPrepares representations for training. Representations are stored\nin `data/cache/`.\n\n```\npython -m ppgs.data.preprocess \\\n --datasets <datasets> \\\n --representatations <representations> \\\n --gpu <gpu> \\\n --num-workers <workers>\n```\n\n\n### Partition\n\nPartitions a dataset. You should not need to run this, as the partitions\nused in our work are provided for each dataset in\n`ppgs/assets/partitions/`.\n\n```\npython -m ppgs.partition --datasets <datasets>\n```\n\n\n### Train\n\nTrains a model. Checkpoints and logs are stored in `runs/`.\n\n```\npython -m ppgs.train --config <config> --dataset <dataset> --gpu <gpu>\n```\n\nIf the config file has been previously run, the most recent checkpoint will\nautomatically be loaded and training will resume from that checkpoint.\n\n\n### Monitor\n\nYou can monitor training via `tensorboard`.\n\n```\ntensorboard --logdir runs/ --port <port> --load_fast true\n```\n\nTo use the `torchutil` notification system to receive notifications for long\njobs (download, preprocess, train, and evaluate), set the\n`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as\nexplained in [the Apprise documentation](https://pypi.org/project/apprise/).\n\n\n### Evaluate\n\nPerforms objective evaluation of phoneme accuracy. Results are stored\nin `eval/`.\n\n```\npython -m ppgs.evaluate --config <name> --datasets <datasets> --gpu <gpu>\n```\n\n\n## Citation\n\n### IEEE\nC. Churchwell, M. Morrison, and B. Pardo, \"High-Fidelity Neural Phonetic Posteriorgrams,\"\nICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio, April 2024.\n\n\n### BibTex\n\n```\n@inproceedings{churchwell2024high,\n title={High-Fidelity Neural Phonetic Posteriorgrams},\n author={Churchwell, Cameron and Morrison, Max and Pardo, Bryan},\n booktitle={ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio},\n month={April},\n year={2024}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Phonetic posteriorgrams",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/interactiveaudiolab/ppgs"
},
"split_keywords": [
"phonemes",
"ppg",
"pronunciation",
"speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "64f0500180f33e041e40cdce0c33e6c1813a96e9a3796e43f1c56ea920d665ef",
"md5": "1bcf820915ce6f73a0d6967bb0171980",
"sha256": "b349a39c558d169f99b115751068217e523d03c576fdd51a6deeffa334b2b859"
},
"downloads": -1,
"filename": "ppgs-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1bcf820915ce6f73a0d6967bb0171980",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 165936,
"upload_time": "2024-03-04T22:14:58",
"upload_time_iso_8601": "2024-03-04T22:14:58.404245Z",
"url": "https://files.pythonhosted.org/packages/64/f0/500180f33e041e40cdce0c33e6c1813a96e9a3796e43f1c56ea920d665ef/ppgs-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e2578f1aee02de7b4246ba20d828f9f8924f67b594409ff2f850b72c0b45ea58",
"md5": "7df6b3eb49d7a6c307c680852979788f",
"sha256": "3c1749f0b9893458dd526b55fa738120e2e95adcbbe49e2952d520fbcf349dae"
},
"downloads": -1,
"filename": "ppgs-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "7df6b3eb49d7a6c307c680852979788f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 138627,
"upload_time": "2024-03-04T22:15:00",
"upload_time_iso_8601": "2024-03-04T22:15:00.112277Z",
"url": "https://files.pythonhosted.org/packages/e2/57/8f1aee02de7b4246ba20d828f9f8924f67b594409ff2f850b72c0b45ea58/ppgs-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-04 22:15:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "interactiveaudiolab",
"github_project": "ppgs",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ppgs"
}