ppgs

Name	ppgs JSON
Version	0.0.8 JSON
	download
home_page	https://github.com/interactiveaudiolab/ppgs
Summary	Phonetic posteriorgrams
upload_time	2024-11-06 21:07:49
maintainer	None
docs_url	None
author	Interactive Audio Lab
requires_python	None
license	MIT
keywords	phonemes ppg pronunciation speech
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1 align="center">High-Fidelity Neural Phonetic Posteriorgrams</h1>
<div align="center">

[![PyPI](https://img.shields.io/pypi/v/ppgs.svg)](https://pypi.python.org/pypi/ppgs)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/ppgs)](https://pepy.tech/project/ppgs)

Training, evaluation, and inference of neural phonetic posteriorgrams (PPGs) in PyTorch

[[Paper]](https://www.maxrmorrison.com/pdfs/churchwell2024high.pdf) [[Website]](https://www.maxrmorrison.com/sites/ppgs/)
</div>


## Table of contents

- [Installation](#installation)
- [Inference](#inference)
    * [Application programming interface (API)](#application-programming-interface-api)
        * [`ppgs.from_audio`](#ppgsfrom_audio)
        * [`ppgs.from_file`](#ppgsfrom_file)
        * [`ppgs.from_file_to_file`](#ppgsfrom_file_to_file)
        * [`ppgs.from_files_to_files`](#ppgsfrom_files_to_files)
    * [Command-line interface (CLI)](#command-line-interface-cli)
- [Distance](#distance)
- [Interpolate](#interpolate)
- [Edit](#edit)
    * [`ppgs.edit.grid.constant`](#ppgseditgridconstant)
    * [`ppgs.edit.grid.from_alignments`](#ppgseditgridfrom_alignments)
    * [`ppgs.edit.grid.of_length`](#ppgseditgridof_length)
    * [`ppgs.edit.grid.sample`](#ppgseditgridsample)
    * [`ppgs.edit.reallocate`](#ppgseditreallocate)
    * [`ppgs.edit.regex`](#ppgseditregex)
    * [`ppgs.edit.shift`](#ppgseditshift)
    * [`ppgs.edit.swap`](#ppgseditswap)
- [Sparsify](#sparsify)
- [Training](#training)
    * [Download](#download)
    * [Preprocess](#preprocess)
    * [Partition](#partition)
    * [Train](#train)
    * [Monitor](#monitor)
    * [Evaluate](#evaluate)
- [Citation](#citation)


## Installation

An inference-only installation with our best model is pip-installable

`pip install ppgs`

To perform training, install training dependencies and FFMPEG.

```bash
pip install ppgs[train]
conda install -c conda-forge ffmpeg
```

If you wish to use the Charsiu representation, download the code,
install both inference and training dependencies, and install
Charsiu as a Git submodule.

```bash
# Clone
git clone git@github.com/interactiveaudiolab/ppgs
cd ppgs/

# Install dependencies
pip install -e .[train]
conda install -c conda-forge ffmpeg

# Download Charsiu
git submodule init
git submodule update
```


## Inference

```python
import ppgs

# Load speech audio at correct sample rate
audio = ppgs.load.audio(audio_file)

# Choose a gpu index to use for inference. Set to None to use cpu.
gpu = 0

# Infer PPGs
ppgs = ppgs.from_audio(audio, ppgs.SAMPLE_RATE, gpu=gpu)
```


### Application programming interface (API)

#### `ppgs.from_audio`

```python
def from_audio(
    audio: torch.Tensor,
    sample_rate: Union[int, float],
    representation: str = ppgs.REPRESENTATION,
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    gpu: int = None
) -> torch.Tensor:
    """Infer ppgs from audio

    Arguments
        audio
            Batched audio to process
            shape=(batch, 1, samples)
        sample_rate
            Audio sampling rate
        representation
            The representation to use, 'mel' and 'w2v2fb' are currently supported
        checkpoint
            The checkpoint file
        gpu
            The index of the GPU to use for inference

    Returns
        ppgs
            Phonetic posteriorgrams
            shape=(batch, len(ppgs.PHONEMES), frames)
    """
```


#### `ppgs.from_file`

```python
def from_file(
    file: Union[str, bytes, os.PathLike],
    representation: str = ppgs.REPRESENTATION,
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    gpu: Optional[int] = None
) -> torch.Tensor:
    """Infer ppgs from an audio file

    Arguments
        file
            The audio file
        representation
            The representation to use, 'mel' and 'w2v2fb' are currently supported
        checkpoint
            The checkpoint file
        gpu
            The index of the GPU to use for inference

    Returns
        ppgs
            Phonetic posteriorgram
            shape=(len(ppgs.PHONEMES), frames)
    """
```


#### `ppgs.from_file_to_file`

```python
def from_file_to_file(
    audio_file: Union[str, bytes, os.PathLike],
    output_file: Union[str, bytes, os.PathLike],
    representation: str = ppgs.REPRESENTATION,
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    gpu: Optional[int] = None
) -> None:
    """Infer ppg from an audio file and save to a torch tensor file

    Arguments
        audio_file
            The audio file
        output_file
            The .pt file to save PPGs
        representation
            The representation to use, 'mel' and 'w2v2fb' are currently supported
        checkpoint
            The checkpoint file
        gpu
            The index of the GPU to use for inference
    """
```


#### `ppgs.from_files_to_files`

```python
def from_files_to_files(
    audio_files: List[Union[str, bytes, os.PathLike]],
    output_files: List[Union[str, bytes, os.PathLike]],
    representation: str = ppgs.REPRESENTATION,
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    num_workers: int = 0,
    gpu: Optional[int] = None,
    max_frames: int = ppgs.MAX_INFERENCE_FRAMES
) -> None:
    """Infer ppgs from audio files and save to torch tensor files

    Arguments
        audio_files
            The audio files
        output_files
            The .pt files to save PPGs
        representation
            The representation to use, 'mel' and 'w2v2fb' are currently supported
        checkpoint
            The checkpoint file
        num_workers
            Number of CPU threads for multiprocessing
        gpu
            The index of the GPU to use for inference
        max_frames
            The maximum number of frames on the GPU at once
    """
```


### Command-line interface (CLI)

```
usage: python -m ppgs
    [-h]
    [--audio_files AUDIO_FILES [AUDIO_FILES ...]]
    [--output_files OUTPUT_FILES [OUTPUT_FILES ...]]
    [--representation REPRESENTATION]
    [--checkpoint CHECKPOINT]
    [--num-workers NUM_WORKERS]
    [--gpu GPU]
    [--max-frames MAX_TRAINING_FRAMES]

arguments:
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
        Paths to input audio files
    --output_files OUTPUT_FILES [OUTPUT_FILES ...]
        The one-to-one corresponding output files

optional arguments:
    -h, --help
        Show this help message and exit
    --representation REPRESENTATION
        Representation to use for inference
    --checkpoint CHECKPOINT
        The checkpoint file
    --num-workers NUM_WORKERS
        Number of CPU threads for multiprocessing
    --gpu GPU
        The index of the GPU to use for inference. Defaults to CPU.
    --max-frames MAX_FRAMES
        Maximum number of frames in a batch
```


## Distance

To compute the proposed normalized Jenson-Shannon divergence pronunciation
distance between two PPGs, use `ppgs.distance()`.

```python
def distance(
    ppgX: torch.Tensor,
    ppgY: torch.Tensor,
    reduction: str = 'mean',
    normalize: bool = True,
    exponent: float = ppgs.SIMILARITY_EXPONENT
) -> torch.Tensor:
    """Compute the pronunciation distance between two aligned PPGs

    Arguments
        ppgX
            Input PPG X
            shape=(len(ppgs.PHONEMES), frames)
        ppgY
            Input PPG Y to compare with PPG X
            shape=(len(ppgs.PHONEMES), frames)
        reduction
            Reduction to apply to the output. One of ['mean', 'none', 'sum'].
        normalize
            Apply similarity based normalization
        exponent
            Similarty exponent

    Returns
        Normalized Jenson-shannon divergence between PPGs
    """
```


## Interpolate

```python
def interpolate(
    ppgX: torch.Tensor,
    ppgY: torch.Tensor,
    interp: Union[float, torch.Tensor]
) -> torch.Tensor:
    """Linear interpolation

    Arguments
        ppgX
            Input PPG X
            shape=(len(ppgs.PHONEMES), frames)
        ppgY
            Input PPG Y
            shape=(len(ppgs.PHONEMES), frames)
        interp
            Interpolation values
            scalar float OR shape=(frames,)

    Returns
        Interpolated PPGs
        shape=(len(ppgs.PHONEMES), frames)
    """
```


## Edit

```python
import ppgs

# Get PPGs to edit
ppg = ppgs.from_file(audio_file, gpu=gpu)

# Constant-ratio time-stretching (slowing down)
grid = ppgs.edit.grid.constant(ppg, ratio=0.8)
slow = ppgs.edit.grid.sample(ppg, grid)

# Stretch to a desired length (e.g., 100 frames)
grid = ppgs.edit.grid.of_length(ppg, 100)
fixed = ppgs.edit.grid.sample(ppg, grid)
```


### `ppgs.edit.grid.constant`

```python
def constant(ppg: torch.Tensor, ratio: float) -> torch.Tensor:
    """Create a grid for constant-ratio time-stretching

    Arguments
        ppg
            Input PPG
        ratio
            Time-stretching ratio; lower is slower

    Returns
        Constant-ratio grid for time-stretching ppg
    """
```


### `ppgs.edit.grid.from_alignments`

```python
def from_alignments(
    source: pypar.Alignment,
    target: pypar.Alignment,
    sample_rate: int = ppgs.SAMPLE_RATE,
    hopsize: int = ppgs.HOPSIZE
) -> torch.Tensor:
    """Create time-stretch grid to convert source alignment to target

    Arguments
        source
            Forced alignment of PPG to stretch
        target
            Forced alignment of target PPG
        sample_rate
            Audio sampling rate
        hopsize
            Hopsize in samples

    Returns
        Grid for time-stretching source PPG
    """
```


### `ppgs.edit.grid.of_length`

```python
def of_length(ppg: torch.Tensor, length: int) -> torch.Tensor:
    """Create time-stretch grid to resample PPG to a specified length

    Arguments
        ppg
            Input PPG
        length
            Target length

    Returns
        Grid of specified length for time-stretching ppg
    """
```


### `ppgs.edit.grid.sample`

```python
def grid_sample(ppg: torch.Tensor, grid: torch.Tensor) -> torch.Tensor:
    """Grid-based PPG interpolation

    Arguments
        ppg
            Input PPG
        grid
            Grid of desired length; each item is a float-valued index into ppg

    Returns
        Interpolated PPG
    """
```


### `ppgs.edit.reallocate`

```python
def reallocate(
    ppg: torch.Tensor,
    source: str,
    target: str,
    value: Optional[float] = None
) -> torch.Tensor:
    """Reallocate probability from source phoneme to target phoneme

    Arguments
        ppg
            Input PPG
            shape=(len(ppgs.PHONEMES), frames)
        source
            Source phoneme
        target
            Target phoneme
        value
            Max amount to reallocate. If None, reallocates all probability.

    Returns
        Edited PPG
    """
```


### `ppgs.edit.regex`

```python
def regex(
    ppg: torch.Tensor,
    source_phonemes: List[str],
    target_phonemes: List[str]
) -> torch.Tensor:
    """Regex match and replace (via swap) for phoneme sequences

    Arguments
        ppg
            Input PPG
            shape=(len(ppgs.PHONEMES), frames)
        source_phonemes
            Source phoneme sequence
        target_phonemes
            Target phoneme sequence

    Returns
        Edited PPG
    """
```


### `ppgs.edit.shift`

```python
def shift(ppg: torch.Tensor, phoneme: str, value: float):
    """Shift probability of a phoneme and reallocate proportionally

    Arguments
        ppg
            Input PPG
            shape=(len(ppgs.PHONEMES), frames)
        phoneme
            Input phoneme
        value
            Maximal shift amount

    Returns
        Edited PPG
    """
```


### `ppgs.edit.swap`

```python
def swap(ppg: torch.Tensor, phonemeA: str, phonemeB: str) -> torch.Tensor:
    """Swap the probabilities of two phonemes

    Arguments
        ppg
            Input PPG
            shape=(len(ppg.PHONEMES), frames)
        phonemeA
            Input phoneme A
        phonemeB
            Input phoneme B

    Returns
        Edited PPG
    """
```


## Sparsify

```python
def sparsify(
    ppg: torch.Tensor,
    method: str = 'percentile',
    threshold: torch.Tensor = torch.Tensor([0.85])
) -> torch.Tensor:
    """Make phonetic posteriorgrams sparse

    Arguments
        ppg
            Input PPG
            shape=(batch, len(ppgs.PHONEMES), frames)
        method
            Sparsification method. One of ['constant', 'percentile', 'topk'].
        threshold
            In [0, 1] for 'contant' and 'percentile'; integer > 0 for 'topk'.

    Returns
        Sparse phonetic posteriorgram
        shape=(batch, len(ppgs.PHONEMES), frames)
    """
```


## Training

### Download

Downloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.
Stores formatted datasets in `data/cache/`.

**N.B.** Common voice and TIMIT cannot be automatically downloaded. You must
manually download the tarballs and place them in `data/sources/commonvoice`
or `data/sources/timit`, respectively, prior to running the following.

```bash
python -m ppgs.data.download --datasets <datasets>
```


### Preprocess

Prepares representations for training. Representations are stored
in `data/cache/`.

```
python -m ppgs.preprocess \
   --datasets <datasets> \
   --representatations <representations> \
   --gpu <gpu> \
   --num-workers <workers>
```


### Partition

Partitions a dataset. You should not need to run this, as the partitions
used in our work are provided for each dataset in
`ppgs/assets/partitions/`.

```
python -m ppgs.partition --datasets <datasets>
```


### Train

Trains a model. Checkpoints and logs are stored in `runs/`.

```
python -m ppgs.train --config <config> --dataset <dataset> --gpu <gpu>
```

If the config file has been previously run, the most recent checkpoint will
automatically be loaded and training will resume from that checkpoint.


### Monitor

You can monitor training via `tensorboard`.

```
tensorboard --logdir runs/ --port <port> --load_fast true
```

To use the `torchutil` notification system to receive notifications for long
jobs (download, preprocess, train, and evaluate), set the
`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as
explained in [the Apprise documentation](https://pypi.org/project/apprise/).


### Evaluate

Performs objective evaluation of phoneme accuracy. Results are stored
in `eval/`.

```
python -m ppgs.evaluate \
    --config <name> \
    --datasets <datasets> \
    --checkpoint <checkpoint> \
    --gpu <gpu>
```


## Citation

### IEEE
C. Churchwell, M. Morrison, and B. Pardo, "High-Fidelity Neural Phonetic Posteriorgrams,"
ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio, April 2024.


### BibTex

```
@inproceedings{churchwell2024high,
    title={High-Fidelity Neural Phonetic Posteriorgrams},
    author={Churchwell, Cameron and Morrison, Max and Pardo, Bryan},
    booktitle={ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio},
    month={April},
    year={2024}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/interactiveaudiolab/ppgs",
    "name": "ppgs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "phonemes, ppg, pronunciation, speech",
    "author": "Interactive Audio Lab",
    "author_email": "interactiveaudiolab@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b2/93/fbf9e4dadd755f1c58c6e6db28c50066d99915f9ea0936bc3ffdaea10634/ppgs-0.0.8.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">High-Fidelity Neural Phonetic Posteriorgrams</h1>\n<div align=\"center\">\n\n[![PyPI](https://img.shields.io/pypi/v/ppgs.svg)](https://pypi.python.org/pypi/ppgs)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/ppgs)](https://pepy.tech/project/ppgs)\n\nTraining, evaluation, and inference of neural phonetic posteriorgrams (PPGs) in PyTorch\n\n[[Paper]](https://www.maxrmorrison.com/pdfs/churchwell2024high.pdf) [[Website]](https://www.maxrmorrison.com/sites/ppgs/)\n</div>\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Inference](#inference)\n    * [Application programming interface (API)](#application-programming-interface-api)\n        * [`ppgs.from_audio`](#ppgsfrom_audio)\n        * [`ppgs.from_file`](#ppgsfrom_file)\n        * [`ppgs.from_file_to_file`](#ppgsfrom_file_to_file)\n        * [`ppgs.from_files_to_files`](#ppgsfrom_files_to_files)\n    * [Command-line interface (CLI)](#command-line-interface-cli)\n- [Distance](#distance)\n- [Interpolate](#interpolate)\n- [Edit](#edit)\n    * [`ppgs.edit.grid.constant`](#ppgseditgridconstant)\n    * [`ppgs.edit.grid.from_alignments`](#ppgseditgridfrom_alignments)\n    * [`ppgs.edit.grid.of_length`](#ppgseditgridof_length)\n    * [`ppgs.edit.grid.sample`](#ppgseditgridsample)\n    * [`ppgs.edit.reallocate`](#ppgseditreallocate)\n    * [`ppgs.edit.regex`](#ppgseditregex)\n    * [`ppgs.edit.shift`](#ppgseditshift)\n    * [`ppgs.edit.swap`](#ppgseditswap)\n- [Sparsify](#sparsify)\n- [Training](#training)\n    * [Download](#download)\n    * [Preprocess](#preprocess)\n    * [Partition](#partition)\n    * [Train](#train)\n    * [Monitor](#monitor)\n    * [Evaluate](#evaluate)\n- [Citation](#citation)\n\n\n## Installation\n\nAn inference-only installation with our best model is pip-installable\n\n`pip install ppgs`\n\nTo perform training, install training dependencies and FFMPEG.\n\n```bash\npip install ppgs[train]\nconda install -c conda-forge ffmpeg\n```\n\nIf you wish to use the Charsiu representation, download the code,\ninstall both inference and training dependencies, and install\nCharsiu as a Git submodule.\n\n```bash\n# Clone\ngit clone git@github.com/interactiveaudiolab/ppgs\ncd ppgs/\n\n# Install dependencies\npip install -e .[train]\nconda install -c conda-forge ffmpeg\n\n# Download Charsiu\ngit submodule init\ngit submodule update\n```\n\n\n## Inference\n\n```python\nimport ppgs\n\n# Load speech audio at correct sample rate\naudio = ppgs.load.audio(audio_file)\n\n# Choose a gpu index to use for inference. Set to None to use cpu.\ngpu = 0\n\n# Infer PPGs\nppgs = ppgs.from_audio(audio, ppgs.SAMPLE_RATE, gpu=gpu)\n```\n\n\n### Application programming interface (API)\n\n#### `ppgs.from_audio`\n\n```python\ndef from_audio(\n    audio: torch.Tensor,\n    sample_rate: Union[int, float],\n    representation: str = ppgs.REPRESENTATION,\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    gpu: int = None\n) -> torch.Tensor:\n    \"\"\"Infer ppgs from audio\n\n    Arguments\n        audio\n            Batched audio to process\n            shape=(batch, 1, samples)\n        sample_rate\n            Audio sampling rate\n        representation\n            The representation to use, 'mel' and 'w2v2fb' are currently supported\n        checkpoint\n            The checkpoint file\n        gpu\n            The index of the GPU to use for inference\n\n    Returns\n        ppgs\n            Phonetic posteriorgrams\n            shape=(batch, len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n#### `ppgs.from_file`\n\n```python\ndef from_file(\n    file: Union[str, bytes, os.PathLike],\n    representation: str = ppgs.REPRESENTATION,\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    gpu: Optional[int] = None\n) -> torch.Tensor:\n    \"\"\"Infer ppgs from an audio file\n\n    Arguments\n        file\n            The audio file\n        representation\n            The representation to use, 'mel' and 'w2v2fb' are currently supported\n        checkpoint\n            The checkpoint file\n        gpu\n            The index of the GPU to use for inference\n\n    Returns\n        ppgs\n            Phonetic posteriorgram\n            shape=(len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n#### `ppgs.from_file_to_file`\n\n```python\ndef from_file_to_file(\n    audio_file: Union[str, bytes, os.PathLike],\n    output_file: Union[str, bytes, os.PathLike],\n    representation: str = ppgs.REPRESENTATION,\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    gpu: Optional[int] = None\n) -> None:\n    \"\"\"Infer ppg from an audio file and save to a torch tensor file\n\n    Arguments\n        audio_file\n            The audio file\n        output_file\n            The .pt file to save PPGs\n        representation\n            The representation to use, 'mel' and 'w2v2fb' are currently supported\n        checkpoint\n            The checkpoint file\n        gpu\n            The index of the GPU to use for inference\n    \"\"\"\n```\n\n\n#### `ppgs.from_files_to_files`\n\n```python\ndef from_files_to_files(\n    audio_files: List[Union[str, bytes, os.PathLike]],\n    output_files: List[Union[str, bytes, os.PathLike]],\n    representation: str = ppgs.REPRESENTATION,\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    num_workers: int = 0,\n    gpu: Optional[int] = None,\n    max_frames: int = ppgs.MAX_INFERENCE_FRAMES\n) -> None:\n    \"\"\"Infer ppgs from audio files and save to torch tensor files\n\n    Arguments\n        audio_files\n            The audio files\n        output_files\n            The .pt files to save PPGs\n        representation\n            The representation to use, 'mel' and 'w2v2fb' are currently supported\n        checkpoint\n            The checkpoint file\n        num_workers\n            Number of CPU threads for multiprocessing\n        gpu\n            The index of the GPU to use for inference\n        max_frames\n            The maximum number of frames on the GPU at once\n    \"\"\"\n```\n\n\n### Command-line interface (CLI)\n\n```\nusage: python -m ppgs\n    [-h]\n    [--audio_files AUDIO_FILES [AUDIO_FILES ...]]\n    [--output_files OUTPUT_FILES [OUTPUT_FILES ...]]\n    [--representation REPRESENTATION]\n    [--checkpoint CHECKPOINT]\n    [--num-workers NUM_WORKERS]\n    [--gpu GPU]\n    [--max-frames MAX_TRAINING_FRAMES]\n\narguments:\n    --audio_files AUDIO_FILES [AUDIO_FILES ...]\n        Paths to input audio files\n    --output_files OUTPUT_FILES [OUTPUT_FILES ...]\n        The one-to-one corresponding output files\n\noptional arguments:\n    -h, --help\n        Show this help message and exit\n    --representation REPRESENTATION\n        Representation to use for inference\n    --checkpoint CHECKPOINT\n        The checkpoint file\n    --num-workers NUM_WORKERS\n        Number of CPU threads for multiprocessing\n    --gpu GPU\n        The index of the GPU to use for inference. Defaults to CPU.\n    --max-frames MAX_FRAMES\n        Maximum number of frames in a batch\n```\n\n\n## Distance\n\nTo compute the proposed normalized Jenson-Shannon divergence pronunciation\ndistance between two PPGs, use `ppgs.distance()`.\n\n```python\ndef distance(\n    ppgX: torch.Tensor,\n    ppgY: torch.Tensor,\n    reduction: str = 'mean',\n    normalize: bool = True,\n    exponent: float = ppgs.SIMILARITY_EXPONENT\n) -> torch.Tensor:\n    \"\"\"Compute the pronunciation distance between two aligned PPGs\n\n    Arguments\n        ppgX\n            Input PPG X\n            shape=(len(ppgs.PHONEMES), frames)\n        ppgY\n            Input PPG Y to compare with PPG X\n            shape=(len(ppgs.PHONEMES), frames)\n        reduction\n            Reduction to apply to the output. One of ['mean', 'none', 'sum'].\n        normalize\n            Apply similarity based normalization\n        exponent\n            Similarty exponent\n\n    Returns\n        Normalized Jenson-shannon divergence between PPGs\n    \"\"\"\n```\n\n\n## Interpolate\n\n```python\ndef interpolate(\n    ppgX: torch.Tensor,\n    ppgY: torch.Tensor,\n    interp: Union[float, torch.Tensor]\n) -> torch.Tensor:\n    \"\"\"Linear interpolation\n\n    Arguments\n        ppgX\n            Input PPG X\n            shape=(len(ppgs.PHONEMES), frames)\n        ppgY\n            Input PPG Y\n            shape=(len(ppgs.PHONEMES), frames)\n        interp\n            Interpolation values\n            scalar float OR shape=(frames,)\n\n    Returns\n        Interpolated PPGs\n        shape=(len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n## Edit\n\n```python\nimport ppgs\n\n# Get PPGs to edit\nppg = ppgs.from_file(audio_file, gpu=gpu)\n\n# Constant-ratio time-stretching (slowing down)\ngrid = ppgs.edit.grid.constant(ppg, ratio=0.8)\nslow = ppgs.edit.grid.sample(ppg, grid)\n\n# Stretch to a desired length (e.g., 100 frames)\ngrid = ppgs.edit.grid.of_length(ppg, 100)\nfixed = ppgs.edit.grid.sample(ppg, grid)\n```\n\n\n### `ppgs.edit.grid.constant`\n\n```python\ndef constant(ppg: torch.Tensor, ratio: float) -> torch.Tensor:\n    \"\"\"Create a grid for constant-ratio time-stretching\n\n    Arguments\n        ppg\n            Input PPG\n        ratio\n            Time-stretching ratio; lower is slower\n\n    Returns\n        Constant-ratio grid for time-stretching ppg\n    \"\"\"\n```\n\n\n### `ppgs.edit.grid.from_alignments`\n\n```python\ndef from_alignments(\n    source: pypar.Alignment,\n    target: pypar.Alignment,\n    sample_rate: int = ppgs.SAMPLE_RATE,\n    hopsize: int = ppgs.HOPSIZE\n) -> torch.Tensor:\n    \"\"\"Create time-stretch grid to convert source alignment to target\n\n    Arguments\n        source\n            Forced alignment of PPG to stretch\n        target\n            Forced alignment of target PPG\n        sample_rate\n            Audio sampling rate\n        hopsize\n            Hopsize in samples\n\n    Returns\n        Grid for time-stretching source PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.grid.of_length`\n\n```python\ndef of_length(ppg: torch.Tensor, length: int) -> torch.Tensor:\n    \"\"\"Create time-stretch grid to resample PPG to a specified length\n\n    Arguments\n        ppg\n            Input PPG\n        length\n            Target length\n\n    Returns\n        Grid of specified length for time-stretching ppg\n    \"\"\"\n```\n\n\n### `ppgs.edit.grid.sample`\n\n```python\ndef grid_sample(ppg: torch.Tensor, grid: torch.Tensor) -> torch.Tensor:\n    \"\"\"Grid-based PPG interpolation\n\n    Arguments\n        ppg\n            Input PPG\n        grid\n            Grid of desired length; each item is a float-valued index into ppg\n\n    Returns\n        Interpolated PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.reallocate`\n\n```python\ndef reallocate(\n    ppg: torch.Tensor,\n    source: str,\n    target: str,\n    value: Optional[float] = None\n) -> torch.Tensor:\n    \"\"\"Reallocate probability from source phoneme to target phoneme\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppgs.PHONEMES), frames)\n        source\n            Source phoneme\n        target\n            Target phoneme\n        value\n            Max amount to reallocate. If None, reallocates all probability.\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.regex`\n\n```python\ndef regex(\n    ppg: torch.Tensor,\n    source_phonemes: List[str],\n    target_phonemes: List[str]\n) -> torch.Tensor:\n    \"\"\"Regex match and replace (via swap) for phoneme sequences\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppgs.PHONEMES), frames)\n        source_phonemes\n            Source phoneme sequence\n        target_phonemes\n            Target phoneme sequence\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.shift`\n\n```python\ndef shift(ppg: torch.Tensor, phoneme: str, value: float):\n    \"\"\"Shift probability of a phoneme and reallocate proportionally\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppgs.PHONEMES), frames)\n        phoneme\n            Input phoneme\n        value\n            Maximal shift amount\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.swap`\n\n```python\ndef swap(ppg: torch.Tensor, phonemeA: str, phonemeB: str) -> torch.Tensor:\n    \"\"\"Swap the probabilities of two phonemes\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppg.PHONEMES), frames)\n        phonemeA\n            Input phoneme A\n        phonemeB\n            Input phoneme B\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n\n## Sparsify\n\n```python\ndef sparsify(\n    ppg: torch.Tensor,\n    method: str = 'percentile',\n    threshold: torch.Tensor = torch.Tensor([0.85])\n) -> torch.Tensor:\n    \"\"\"Make phonetic posteriorgrams sparse\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(batch, len(ppgs.PHONEMES), frames)\n        method\n            Sparsification method. One of ['constant', 'percentile', 'topk'].\n        threshold\n            In [0, 1] for 'contant' and 'percentile'; integer > 0 for 'topk'.\n\n    Returns\n        Sparse phonetic posteriorgram\n        shape=(batch, len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n## Training\n\n### Download\n\nDownloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.\nStores formatted datasets in `data/cache/`.\n\n**N.B.** Common voice and TIMIT cannot be automatically downloaded. You must\nmanually download the tarballs and place them in `data/sources/commonvoice`\nor `data/sources/timit`, respectively, prior to running the following.\n\n```bash\npython -m ppgs.data.download --datasets <datasets>\n```\n\n\n### Preprocess\n\nPrepares representations for training. Representations are stored\nin `data/cache/`.\n\n```\npython -m ppgs.preprocess \\\n   --datasets <datasets> \\\n   --representatations <representations> \\\n   --gpu <gpu> \\\n   --num-workers <workers>\n```\n\n\n### Partition\n\nPartitions a dataset. You should not need to run this, as the partitions\nused in our work are provided for each dataset in\n`ppgs/assets/partitions/`.\n\n```\npython -m ppgs.partition --datasets <datasets>\n```\n\n\n### Train\n\nTrains a model. Checkpoints and logs are stored in `runs/`.\n\n```\npython -m ppgs.train --config <config> --dataset <dataset> --gpu <gpu>\n```\n\nIf the config file has been previously run, the most recent checkpoint will\nautomatically be loaded and training will resume from that checkpoint.\n\n\n### Monitor\n\nYou can monitor training via `tensorboard`.\n\n```\ntensorboard --logdir runs/ --port <port> --load_fast true\n```\n\nTo use the `torchutil` notification system to receive notifications for long\njobs (download, preprocess, train, and evaluate), set the\n`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as\nexplained in [the Apprise documentation](https://pypi.org/project/apprise/).\n\n\n### Evaluate\n\nPerforms objective evaluation of phoneme accuracy. Results are stored\nin `eval/`.\n\n```\npython -m ppgs.evaluate \\\n    --config <name> \\\n    --datasets <datasets> \\\n    --checkpoint <checkpoint> \\\n    --gpu <gpu>\n```\n\n\n## Citation\n\n### IEEE\nC. Churchwell, M. Morrison, and B. Pardo, \"High-Fidelity Neural Phonetic Posteriorgrams,\"\nICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio, April 2024.\n\n\n### BibTex\n\n```\n@inproceedings{churchwell2024high,\n    title={High-Fidelity Neural Phonetic Posteriorgrams},\n    author={Churchwell, Cameron and Morrison, Max and Pardo, Bryan},\n    booktitle={ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio},\n    month={April},\n    year={2024}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Phonetic posteriorgrams",
    "version": "0.0.8",
    "project_urls": {
        "Homepage": "https://github.com/interactiveaudiolab/ppgs"
    },
    "split_keywords": [
        "phonemes",
        " ppg",
        " pronunciation",
        " speech"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "54270c12563fa3bda51021e4db9b5ef49ecbe8234d673e7cd6b9d9c2810b1bed",
                "md5": "be347d590bea7f197b9faf3f53172566",
                "sha256": "70957719c99207888c0bd7160dcce9935f0d48984d4407a46917405adc611863"
            },
            "downloads": -1,
            "filename": "ppgs-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "be347d590bea7f197b9faf3f53172566",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 2865242,
            "upload_time": "2024-11-06T21:07:48",
            "upload_time_iso_8601": "2024-11-06T21:07:48.052265Z",
            "url": "https://files.pythonhosted.org/packages/54/27/0c12563fa3bda51021e4db9b5ef49ecbe8234d673e7cd6b9d9c2810b1bed/ppgs-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b293fbf9e4dadd755f1c58c6e6db28c50066d99915f9ea0936bc3ffdaea10634",
                "md5": "a05606093cb5c6b8a65a22fd3525ca45",
                "sha256": "75d539c90994ae110cf79325b926affad4adc21c1b971ee2e5a2322f58fc8a46"
            },
            "downloads": -1,
            "filename": "ppgs-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "a05606093cb5c6b8a65a22fd3525ca45",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 2602432,
            "upload_time": "2024-11-06T21:07:49",
            "upload_time_iso_8601": "2024-11-06T21:07:49.829840Z",
            "url": "https://files.pythonhosted.org/packages/b2/93/fbf9e4dadd755f1c58c6e6db28c50066d99915f9ea0936bc3ffdaea10634/ppgs-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-06 21:07:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "interactiveaudiolab",
    "github_project": "ppgs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ppgs"
}

Interactive Audio Lab