ppgs


Nameppgs JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/interactiveaudiolab/ppgs
SummaryPhonetic posteriorgrams
upload_time2024-03-04 22:15:00
maintainer
docs_urlNone
authorInteractive Audio Lab
requires_python
licenseMIT
keywords phonemes ppg pronunciation speech
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">High-Fidelity Neural Phonetic Posteriorgrams</h1>
<div align="center">

[![PyPI](https://img.shields.io/pypi/v/ppgs.svg)](https://pypi.python.org/pypi/ppgs)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/ppgs)](https://pepy.tech/project/ppgs)

Training, evaluation, and inference of neural phonetic posteriorgrams (PPGs) in PyTorch

[[Paper]](https://www.maxrmorrison.com/pdfs/churchwell2024high.pdf) [[Website]](https://www.maxrmorrison.com/sites/ppgs/)
</div>


## Table of contents

- [Installation](#installation)
- [Inference](#inference)
    * [Application programming interface (API)](#application-programming-interface-api)
        * [`ppgs.from_audio`](#ppgsfrom_audio)
        * [`ppgs.from_file`](#ppgsfrom_file)
        * [`ppgs.from_file_to_file`](#ppgsfrom_file_to_file)
        * [`ppgs.from_files_to_files`](#ppgsfrom_files_to_files)
        * [`ppgs.from_paths_to_paths`](#ppgsfrom_paths_to_paths)
    * [Command-line interface (CLI)](#command-line-interface-cli)
- [Distance](#distance)
- [Interpolate](#interpolate)
- [Edit](#edit)
    * [`ppgs.edit.grid.constant`](#ppgseditgridconstant)
    * [`ppgs.edit.grid.from_alignments`](#ppgseditgridfrom_alignments)
    * [`ppgs.edit.grid.of_length`](#ppgseditgridof_length)
    * [`ppgs.edit.grid.sample`](#ppgseditgridsample)
    * [`ppgs.edit.reallocate`](#ppgseditreallocate)
    * [`ppgs.edit.regex`](#ppgseditregex)
    * [`ppgs.edit.shift`](#ppgseditshift)
    * [`ppgs.edit.swap`](#ppgseditswap)
- [Sparsify](#sparsify)
- [Training](#training)
    * [Download](#download)
    * [Preprocess](#preprocess)
    * [Partition](#partition)
    * [Train](#train)
    * [Monitor](#monitor)
    * [Evaluate](#evaluate)
- [Citation](#citation)


## Installation

An inference-only installation with our best model is pip-installable

`pip install ppgs`

To perform training, install training dependencies and FFMPEG.

```bash
pip install ppgs[train]
conda install -c conda-forge ffmpeg
``````

If you wish to use the Charsiu representation, download the code,
install both inference and training dependencies, and install
Charsiu as a Git submodule.

```bash
# Clone
git clone git@github.com/interactiveaudiolab/ppgs
cd ppgs/

# Install dependencies
pip install -e .[train]
conda install -c conda-forge ffmpeg

# Download Charsiu
git submodule init
git submodule update
```


## Inference

```python
import ppgs

# Load speech audio at correct sample rate
audio = ppgs.load.audio(audio_file)

# Choose a gpu index to use for inference. Set to None to use cpu.
gpu = 0

# Infer PPGs
ppgs = ppgs.from_audio(audio, ppgs.SAMPLE_RATE, gpu=gpu)
```


### Application programming interface (API)

#### `ppgs.from_audio`

```python
def from_audio(
    audio: torch.Tensor,
    sample_rate: Union[int, float],
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    gpu: int = None
) -> torch.Tensor:
    """Infer ppgs from audio

    Arguments
        audio
            Batched audio to process
            shape=(batch, 1, samples)
        sample_rate
            Audio sampling rate
        checkpoint
            The checkpoint file
        gpu
            The index of the GPU to use for inference

    Returns
        ppgs
            Phonetic posteriorgrams
            shape=(batch, len(ppgs.PHONEMES), frames)
    """
```


#### `ppgs.from_file`

```python
def from_file(
    file: Union[str, bytes, os.PathLike],
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    gpu: Optional[int] = None
) -> torch.Tensor:
    """Infer ppgs from an audio file

    Arguments
        file
            The audio file
        checkpoint
            The checkpoint file
        gpu
            The index of the GPU to use for inference

    Returns
        ppgs
            Phonetic posteriorgram
            shape=(len(ppgs.PHONEMES), frames)
    """
```


#### `ppgs.from_file_to_file`

```python
def from_file_to_file(
    audio_file: Union[str, bytes, os.PathLike],
    output_file: Union[str, bytes, os.PathLike],
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    gpu: Optional[int] = None
) -> None:
    """Infer ppg from an audio file and save to a torch tensor file

    Arguments
        audio_file
            The audio file
        output_file
            The .pt file to save PPGs
        checkpoint
            The checkpoint file
        gpu
            The index of the GPU to use for inference
    """
```


#### `ppgs.from_files_to_files`

```python
def from_files_to_files(
    audio_files: List[Union[str, bytes, os.PathLike]],
    output_files: List[Union[str, bytes, os.PathLike]],
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    num_workers: int = ppgs.NUM_WORKERS,
    gpu: Optional[int] = None,
    max_frames: int = ppgs.MAX_INFERENCE_FRAMES
) -> None:
    """Infer ppgs from audio files and save to torch tensor files

    Arguments
        audio_files
            The audio files
        output_files
            The .pt files to save PPGs
        checkpoint
            The checkpoint file
        num_workers
            Number of CPU threads for multiprocessing
        gpu
            The index of the GPU to use for inference
        max_frames
            The maximum number of frames on the GPU at once
    """
```


#### `ppgs.from_paths_to_paths`

```python
def from_paths_to_paths(
    input_paths: List[Union[str, bytes, os.PathLike]],
    output_paths: Optional[List[Union[str, bytes, os.PathLike]]] = None,
    extensions: Optional[List[str]] = None,
    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,
    num_workers: int = ppgs.NUM_WORKERS,
    gpu: Optional[int] = None,
    max_frames: int = ppgs.MAX_INFERENCE_FRAMES
) -> None:
    """Infer ppgs from audio files and save to torch tensor files

    Arguments
        input_paths
            Paths to audio files and/or directories
        output_paths
            The one-to-one corresponding outputs
        extensions
            Extensions to glob for in directories
        checkpoint
            The checkpoint file
        num_workers
            Number of CPU threads for multiprocessing
        gpu
            The index of the GPU to use for inference
        max_frames
            The maximum number of frames on the GPU at once
    """
```


### Command-line interface (CLI)

```
usage: python -m ppgs
    [-h]
    [--input_paths INPUT_PATHS [INPUT_PATHS ...]]
    [--output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]]
    [--extensions EXTENSIONS [EXTENSIONS ...]]
    [--checkpoint CHECKPOINT]
    [--num-workers NUM_WORKERS]
    [--gpu GPU]
    [--max-frames MAX_TRAINING_FRAMES]

arguments:
    --input_paths INPUT_PATHS [INPUT_PATHS ...]
        Paths to audio files and/or directories

optional arguments:
    -h, --help
        Show this help message and exit
    --output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]
        The one-to-one corresponding output paths
    --extensions EXTENSIONS [EXTENSIONS ...]
        Extensions to glob for in directories
    --checkpoint CHECKPOINT
        The checkpoint file
    --num-workers NUM_WORKERS
        Number of CPU threads for multiprocessing
    --gpu GPU
        The index of the GPU to use for inference. Defaults to CPU.
```


## Distance

To compute the proposed normalized Jenson-Shannon divergence pronunciation
distance between two PPGs, use `ppgs.distance()`.

```python
def distance(
    ppgX: torch.Tensor,
    ppgY: torch.Tensor,
    reduction: str = 'mean',
    normalize: bool = True
) -> torch.Tensor:
    """Compute the pronunciation distance between two aligned PPGs

    Arguments
        ppgX
            Input PPG X
            shape=(len(ppgs.PHONEMES), frames)
        ppgY
            Input PPG Y to compare with PPG X
            shape=(len(ppgs.PHONEMES), frames)
        reduction
            Reduction to apply to the output. One of ['mean', 'none', 'sum'].
        normalize
            Apply similarity based normalization

    Returns
        Normalized Jenson-shannon divergence between PPGs
    """
```


## Interpolate

```python
def interpolate(
    ppgX: torch.Tensor,
    ppgY: torch.Tensor,
    interp: Union[float, torch.Tensor]
) -> torch.Tensor:
    """Spherical linear interpolation

    Arguments
        ppgX
            Input PPG X
            shape=(len(ppgs.PHONEMES), frames)
        ppgY
            Input PPG Y
            shape=(len(ppgs.PHONEMES), frames)
        interp
            Interpolation values
            scalar float OR shape=(frames,)

    Returns
        Interpolated PPGs
        shape=(len(ppgs.PHONEMES), frames)
    """
```


## Edit

```python
import ppgs

# Get PPGs to edit
ppg = ppgs.from_file(audio_file, gpu=gpu)

# Constant-ratio time-stretching (slowing down)
grid = ppgs.edit.grid.constant(ppg, ratio=0.8)
slow = ppgs.edit.grid.sample(ppg, grid)

# Stretch to a desired length (e.g., 100 frames)
grid = ppgs.edit.grid.of_length(ppg, 100)
fixed = ppgs.edit.grid.sample(ppg, grid)
```


### `ppgs.edit.grid.constant`

```python
def constant(ppg: torch.Tensor, ratio: float) -> torch.Tensor:
    """Create a grid for constant-ratio time-stretching

    Arguments
        ppg
            Input PPG
        ratio
            Time-stretching ratio; lower is slower

    Returns
        Constant-ratio grid for time-stretching ppg
    """
```


### `ppgs.edit.grid.from_alignments`

```python
def from_alignments(
    source: pypar.Alignment,
    target: pypar.Alignment,
    sample_rate: int = ppgs.SAMPLE_RATE,
    hopsize: int = ppgs.HOPSIZE
) -> torch.Tensor:
    """Create time-stretch grid to convert source alignment to target

    Arguments
        source
            Forced alignment of PPG to stretch
        target
            Forced alignment of target PPG
        sample_rate
            Audio sampling rate
        hopsize
            Hopsize in samples

    Returns
        Grid for time-stretching source PPG
    """
```


### `ppgs.edit.grid.of_length`

```python
def of_length(ppg: torch.Tensor, length: int) -> torch.Tensor:
    """Create time-stretch grid to resample PPG to a specified length

    Arguments
        ppg
            Input PPG
        length
            Target length

    Returns
        Grid of specified length for time-stretching ppg
    """
```


### `ppgs.edit.grid.sample`

```python
def grid_sample(ppg: torch.Tensor, grid: torch.Tensor) -> torch.Tensor:
    """Grid-based PPG interpolation

    Arguments
        ppg
            Input PPG
        grid
            Grid of desired length; each item is a float-valued index into ppg

    Returns
        Interpolated PPG
    """
```


### `ppgs.edit.reallocate`

```python
def reallocate(
    ppg: torch.Tensor,
    source: str,
    target: str,
    value: Optional[float] = None
) -> torch.Tensor:
    """Reallocate probability from source phoneme to target phoneme

    Arguments
        ppg
            Input PPG
            shape=(len(ppgs.PHONEMES), frames)
        source
            Source phoneme
        target
            Target phoneme
        value
            Max amount to reallocate. If None, reallocates all probability.

    Returns
        Edited PPG
    """
```


### `ppgs.edit.regex`

```python
def regex(
    ppg: torch.Tensor,
    source_phonemes: List[str],
    target_phonemes: List[str]
) -> torch.Tensor:
    """Regex match and replace (via swap) for phoneme sequences

    Arguments
        ppg
            Input PPG
            shape=(len(ppgs.PHONEMES), frames)
        source_phonemes
            Source phoneme sequence
        target_phonemes
            Target phoneme sequence

    Returns
        Edited PPG
    """
```


### `ppgs.edit.shift`

```python
def shift(ppg: torch.Tensor, phoneme: str, value: float):
    """Shift probability of a phoneme and reallocate proportionally

    Arguments
        ppg
            Input PPG
            shape=(len(ppgs.PHONEMES), frames)
        phoneme
            Input phoneme
        value
            Maximal shift amount

    Returns
        Edited PPG
    """
```


### `ppgs.edit.swap`

```python
def swap(ppg: torch.Tensor, phonemeA: str, phonemeB: str) -> torch.Tensor:
    """Swap the probabilities of two phonemes

    Arguments
        ppg
            Input PPG
            shape=(len(ppg.PHONEMES), frames)
        phonemeA
            Input phoneme A
        phonemeB
            Input phoneme B

    Returns
        Edited PPG
    """
```

## Sparsify

```python
def sparsify(
    ppg: torch.Tensor,
    method: str='percentile',
    threshold: Union[float, int]=0.85
) -> torch.Tensor:
    """Make phonetic posteriorgrams sparse

    Arguments
        ppg
            Input PPG
            shape=(*, len(ppgs.PHONEMES), frames)
        method
            Sparsification method. One of ['constant', 'percentile', 'topk'].
        threshold
            In [0, 1] for 'contant' and 'percentile'; integer > 0 for 'topk'.

    Returns
        Sparse phonetic posteriorgram
        shape=(*, len(ppgs.PHONEMES), frames)
    """
```


## Training

### Download

Downloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.
Stores formatted datasets in `data/cache/`.

**N.B.** Common voice and TIMIT cannot be automatically downloaded. You must
manually download the tarballs and place them in `data/sources/commonvoice`
or `data/sources/timit`, respectively, prior to running the following.

```bash
python -m ppgs.data.download --datasets <datasets>
```


### Preprocess

Prepares representations for training. Representations are stored
in `data/cache/`.

```
python -m ppgs.data.preprocess \
   --datasets <datasets> \
   --representatations <representations> \
   --gpu <gpu> \
   --num-workers <workers>
```


### Partition

Partitions a dataset. You should not need to run this, as the partitions
used in our work are provided for each dataset in
`ppgs/assets/partitions/`.

```
python -m ppgs.partition --datasets <datasets>
```


### Train

Trains a model. Checkpoints and logs are stored in `runs/`.

```
python -m ppgs.train --config <config> --dataset <dataset> --gpu <gpu>
```

If the config file has been previously run, the most recent checkpoint will
automatically be loaded and training will resume from that checkpoint.


### Monitor

You can monitor training via `tensorboard`.

```
tensorboard --logdir runs/ --port <port> --load_fast true
```

To use the `torchutil` notification system to receive notifications for long
jobs (download, preprocess, train, and evaluate), set the
`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as
explained in [the Apprise documentation](https://pypi.org/project/apprise/).


### Evaluate

Performs objective evaluation of phoneme accuracy. Results are stored
in `eval/`.

```
python -m ppgs.evaluate --config <name> --datasets <datasets> --gpu <gpu>
```


## Citation

### IEEE
C. Churchwell, M. Morrison, and B. Pardo, "High-Fidelity Neural Phonetic Posteriorgrams,"
ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio, April 2024.


### BibTex

```
@inproceedings{churchwell2024high,
    title={High-Fidelity Neural Phonetic Posteriorgrams},
    author={Churchwell, Cameron and Morrison, Max and Pardo, Bryan},
    booktitle={ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio},
    month={April},
    year={2024}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/interactiveaudiolab/ppgs",
    "name": "ppgs",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "phonemes,ppg,pronunciation,speech",
    "author": "Interactive Audio Lab",
    "author_email": "interactiveaudiolab@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e2/57/8f1aee02de7b4246ba20d828f9f8924f67b594409ff2f850b72c0b45ea58/ppgs-0.0.3.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">High-Fidelity Neural Phonetic Posteriorgrams</h1>\n<div align=\"center\">\n\n[![PyPI](https://img.shields.io/pypi/v/ppgs.svg)](https://pypi.python.org/pypi/ppgs)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/ppgs)](https://pepy.tech/project/ppgs)\n\nTraining, evaluation, and inference of neural phonetic posteriorgrams (PPGs) in PyTorch\n\n[[Paper]](https://www.maxrmorrison.com/pdfs/churchwell2024high.pdf) [[Website]](https://www.maxrmorrison.com/sites/ppgs/)\n</div>\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Inference](#inference)\n    * [Application programming interface (API)](#application-programming-interface-api)\n        * [`ppgs.from_audio`](#ppgsfrom_audio)\n        * [`ppgs.from_file`](#ppgsfrom_file)\n        * [`ppgs.from_file_to_file`](#ppgsfrom_file_to_file)\n        * [`ppgs.from_files_to_files`](#ppgsfrom_files_to_files)\n        * [`ppgs.from_paths_to_paths`](#ppgsfrom_paths_to_paths)\n    * [Command-line interface (CLI)](#command-line-interface-cli)\n- [Distance](#distance)\n- [Interpolate](#interpolate)\n- [Edit](#edit)\n    * [`ppgs.edit.grid.constant`](#ppgseditgridconstant)\n    * [`ppgs.edit.grid.from_alignments`](#ppgseditgridfrom_alignments)\n    * [`ppgs.edit.grid.of_length`](#ppgseditgridof_length)\n    * [`ppgs.edit.grid.sample`](#ppgseditgridsample)\n    * [`ppgs.edit.reallocate`](#ppgseditreallocate)\n    * [`ppgs.edit.regex`](#ppgseditregex)\n    * [`ppgs.edit.shift`](#ppgseditshift)\n    * [`ppgs.edit.swap`](#ppgseditswap)\n- [Sparsify](#sparsify)\n- [Training](#training)\n    * [Download](#download)\n    * [Preprocess](#preprocess)\n    * [Partition](#partition)\n    * [Train](#train)\n    * [Monitor](#monitor)\n    * [Evaluate](#evaluate)\n- [Citation](#citation)\n\n\n## Installation\n\nAn inference-only installation with our best model is pip-installable\n\n`pip install ppgs`\n\nTo perform training, install training dependencies and FFMPEG.\n\n```bash\npip install ppgs[train]\nconda install -c conda-forge ffmpeg\n``````\n\nIf you wish to use the Charsiu representation, download the code,\ninstall both inference and training dependencies, and install\nCharsiu as a Git submodule.\n\n```bash\n# Clone\ngit clone git@github.com/interactiveaudiolab/ppgs\ncd ppgs/\n\n# Install dependencies\npip install -e .[train]\nconda install -c conda-forge ffmpeg\n\n# Download Charsiu\ngit submodule init\ngit submodule update\n```\n\n\n## Inference\n\n```python\nimport ppgs\n\n# Load speech audio at correct sample rate\naudio = ppgs.load.audio(audio_file)\n\n# Choose a gpu index to use for inference. Set to None to use cpu.\ngpu = 0\n\n# Infer PPGs\nppgs = ppgs.from_audio(audio, ppgs.SAMPLE_RATE, gpu=gpu)\n```\n\n\n### Application programming interface (API)\n\n#### `ppgs.from_audio`\n\n```python\ndef from_audio(\n    audio: torch.Tensor,\n    sample_rate: Union[int, float],\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    gpu: int = None\n) -> torch.Tensor:\n    \"\"\"Infer ppgs from audio\n\n    Arguments\n        audio\n            Batched audio to process\n            shape=(batch, 1, samples)\n        sample_rate\n            Audio sampling rate\n        checkpoint\n            The checkpoint file\n        gpu\n            The index of the GPU to use for inference\n\n    Returns\n        ppgs\n            Phonetic posteriorgrams\n            shape=(batch, len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n#### `ppgs.from_file`\n\n```python\ndef from_file(\n    file: Union[str, bytes, os.PathLike],\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    gpu: Optional[int] = None\n) -> torch.Tensor:\n    \"\"\"Infer ppgs from an audio file\n\n    Arguments\n        file\n            The audio file\n        checkpoint\n            The checkpoint file\n        gpu\n            The index of the GPU to use for inference\n\n    Returns\n        ppgs\n            Phonetic posteriorgram\n            shape=(len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n#### `ppgs.from_file_to_file`\n\n```python\ndef from_file_to_file(\n    audio_file: Union[str, bytes, os.PathLike],\n    output_file: Union[str, bytes, os.PathLike],\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    gpu: Optional[int] = None\n) -> None:\n    \"\"\"Infer ppg from an audio file and save to a torch tensor file\n\n    Arguments\n        audio_file\n            The audio file\n        output_file\n            The .pt file to save PPGs\n        checkpoint\n            The checkpoint file\n        gpu\n            The index of the GPU to use for inference\n    \"\"\"\n```\n\n\n#### `ppgs.from_files_to_files`\n\n```python\ndef from_files_to_files(\n    audio_files: List[Union[str, bytes, os.PathLike]],\n    output_files: List[Union[str, bytes, os.PathLike]],\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    num_workers: int = ppgs.NUM_WORKERS,\n    gpu: Optional[int] = None,\n    max_frames: int = ppgs.MAX_INFERENCE_FRAMES\n) -> None:\n    \"\"\"Infer ppgs from audio files and save to torch tensor files\n\n    Arguments\n        audio_files\n            The audio files\n        output_files\n            The .pt files to save PPGs\n        checkpoint\n            The checkpoint file\n        num_workers\n            Number of CPU threads for multiprocessing\n        gpu\n            The index of the GPU to use for inference\n        max_frames\n            The maximum number of frames on the GPU at once\n    \"\"\"\n```\n\n\n#### `ppgs.from_paths_to_paths`\n\n```python\ndef from_paths_to_paths(\n    input_paths: List[Union[str, bytes, os.PathLike]],\n    output_paths: Optional[List[Union[str, bytes, os.PathLike]]] = None,\n    extensions: Optional[List[str]] = None,\n    checkpoint: Optional[Union[str, bytes, os.PathLike]] = None,\n    num_workers: int = ppgs.NUM_WORKERS,\n    gpu: Optional[int] = None,\n    max_frames: int = ppgs.MAX_INFERENCE_FRAMES\n) -> None:\n    \"\"\"Infer ppgs from audio files and save to torch tensor files\n\n    Arguments\n        input_paths\n            Paths to audio files and/or directories\n        output_paths\n            The one-to-one corresponding outputs\n        extensions\n            Extensions to glob for in directories\n        checkpoint\n            The checkpoint file\n        num_workers\n            Number of CPU threads for multiprocessing\n        gpu\n            The index of the GPU to use for inference\n        max_frames\n            The maximum number of frames on the GPU at once\n    \"\"\"\n```\n\n\n### Command-line interface (CLI)\n\n```\nusage: python -m ppgs\n    [-h]\n    [--input_paths INPUT_PATHS [INPUT_PATHS ...]]\n    [--output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]]\n    [--extensions EXTENSIONS [EXTENSIONS ...]]\n    [--checkpoint CHECKPOINT]\n    [--num-workers NUM_WORKERS]\n    [--gpu GPU]\n    [--max-frames MAX_TRAINING_FRAMES]\n\narguments:\n    --input_paths INPUT_PATHS [INPUT_PATHS ...]\n        Paths to audio files and/or directories\n\noptional arguments:\n    -h, --help\n        Show this help message and exit\n    --output_paths OUTPUT_PATHS [OUTPUT_PATHS ...]\n        The one-to-one corresponding output paths\n    --extensions EXTENSIONS [EXTENSIONS ...]\n        Extensions to glob for in directories\n    --checkpoint CHECKPOINT\n        The checkpoint file\n    --num-workers NUM_WORKERS\n        Number of CPU threads for multiprocessing\n    --gpu GPU\n        The index of the GPU to use for inference. Defaults to CPU.\n```\n\n\n## Distance\n\nTo compute the proposed normalized Jenson-Shannon divergence pronunciation\ndistance between two PPGs, use `ppgs.distance()`.\n\n```python\ndef distance(\n    ppgX: torch.Tensor,\n    ppgY: torch.Tensor,\n    reduction: str = 'mean',\n    normalize: bool = True\n) -> torch.Tensor:\n    \"\"\"Compute the pronunciation distance between two aligned PPGs\n\n    Arguments\n        ppgX\n            Input PPG X\n            shape=(len(ppgs.PHONEMES), frames)\n        ppgY\n            Input PPG Y to compare with PPG X\n            shape=(len(ppgs.PHONEMES), frames)\n        reduction\n            Reduction to apply to the output. One of ['mean', 'none', 'sum'].\n        normalize\n            Apply similarity based normalization\n\n    Returns\n        Normalized Jenson-shannon divergence between PPGs\n    \"\"\"\n```\n\n\n## Interpolate\n\n```python\ndef interpolate(\n    ppgX: torch.Tensor,\n    ppgY: torch.Tensor,\n    interp: Union[float, torch.Tensor]\n) -> torch.Tensor:\n    \"\"\"Spherical linear interpolation\n\n    Arguments\n        ppgX\n            Input PPG X\n            shape=(len(ppgs.PHONEMES), frames)\n        ppgY\n            Input PPG Y\n            shape=(len(ppgs.PHONEMES), frames)\n        interp\n            Interpolation values\n            scalar float OR shape=(frames,)\n\n    Returns\n        Interpolated PPGs\n        shape=(len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n## Edit\n\n```python\nimport ppgs\n\n# Get PPGs to edit\nppg = ppgs.from_file(audio_file, gpu=gpu)\n\n# Constant-ratio time-stretching (slowing down)\ngrid = ppgs.edit.grid.constant(ppg, ratio=0.8)\nslow = ppgs.edit.grid.sample(ppg, grid)\n\n# Stretch to a desired length (e.g., 100 frames)\ngrid = ppgs.edit.grid.of_length(ppg, 100)\nfixed = ppgs.edit.grid.sample(ppg, grid)\n```\n\n\n### `ppgs.edit.grid.constant`\n\n```python\ndef constant(ppg: torch.Tensor, ratio: float) -> torch.Tensor:\n    \"\"\"Create a grid for constant-ratio time-stretching\n\n    Arguments\n        ppg\n            Input PPG\n        ratio\n            Time-stretching ratio; lower is slower\n\n    Returns\n        Constant-ratio grid for time-stretching ppg\n    \"\"\"\n```\n\n\n### `ppgs.edit.grid.from_alignments`\n\n```python\ndef from_alignments(\n    source: pypar.Alignment,\n    target: pypar.Alignment,\n    sample_rate: int = ppgs.SAMPLE_RATE,\n    hopsize: int = ppgs.HOPSIZE\n) -> torch.Tensor:\n    \"\"\"Create time-stretch grid to convert source alignment to target\n\n    Arguments\n        source\n            Forced alignment of PPG to stretch\n        target\n            Forced alignment of target PPG\n        sample_rate\n            Audio sampling rate\n        hopsize\n            Hopsize in samples\n\n    Returns\n        Grid for time-stretching source PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.grid.of_length`\n\n```python\ndef of_length(ppg: torch.Tensor, length: int) -> torch.Tensor:\n    \"\"\"Create time-stretch grid to resample PPG to a specified length\n\n    Arguments\n        ppg\n            Input PPG\n        length\n            Target length\n\n    Returns\n        Grid of specified length for time-stretching ppg\n    \"\"\"\n```\n\n\n### `ppgs.edit.grid.sample`\n\n```python\ndef grid_sample(ppg: torch.Tensor, grid: torch.Tensor) -> torch.Tensor:\n    \"\"\"Grid-based PPG interpolation\n\n    Arguments\n        ppg\n            Input PPG\n        grid\n            Grid of desired length; each item is a float-valued index into ppg\n\n    Returns\n        Interpolated PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.reallocate`\n\n```python\ndef reallocate(\n    ppg: torch.Tensor,\n    source: str,\n    target: str,\n    value: Optional[float] = None\n) -> torch.Tensor:\n    \"\"\"Reallocate probability from source phoneme to target phoneme\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppgs.PHONEMES), frames)\n        source\n            Source phoneme\n        target\n            Target phoneme\n        value\n            Max amount to reallocate. If None, reallocates all probability.\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.regex`\n\n```python\ndef regex(\n    ppg: torch.Tensor,\n    source_phonemes: List[str],\n    target_phonemes: List[str]\n) -> torch.Tensor:\n    \"\"\"Regex match and replace (via swap) for phoneme sequences\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppgs.PHONEMES), frames)\n        source_phonemes\n            Source phoneme sequence\n        target_phonemes\n            Target phoneme sequence\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.shift`\n\n```python\ndef shift(ppg: torch.Tensor, phoneme: str, value: float):\n    \"\"\"Shift probability of a phoneme and reallocate proportionally\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppgs.PHONEMES), frames)\n        phoneme\n            Input phoneme\n        value\n            Maximal shift amount\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n\n### `ppgs.edit.swap`\n\n```python\ndef swap(ppg: torch.Tensor, phonemeA: str, phonemeB: str) -> torch.Tensor:\n    \"\"\"Swap the probabilities of two phonemes\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(len(ppg.PHONEMES), frames)\n        phonemeA\n            Input phoneme A\n        phonemeB\n            Input phoneme B\n\n    Returns\n        Edited PPG\n    \"\"\"\n```\n\n## Sparsify\n\n```python\ndef sparsify(\n    ppg: torch.Tensor,\n    method: str='percentile',\n    threshold: Union[float, int]=0.85\n) -> torch.Tensor:\n    \"\"\"Make phonetic posteriorgrams sparse\n\n    Arguments\n        ppg\n            Input PPG\n            shape=(*, len(ppgs.PHONEMES), frames)\n        method\n            Sparsification method. One of ['constant', 'percentile', 'topk'].\n        threshold\n            In [0, 1] for 'contant' and 'percentile'; integer > 0 for 'topk'.\n\n    Returns\n        Sparse phonetic posteriorgram\n        shape=(*, len(ppgs.PHONEMES), frames)\n    \"\"\"\n```\n\n\n## Training\n\n### Download\n\nDownloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.\nStores formatted datasets in `data/cache/`.\n\n**N.B.** Common voice and TIMIT cannot be automatically downloaded. You must\nmanually download the tarballs and place them in `data/sources/commonvoice`\nor `data/sources/timit`, respectively, prior to running the following.\n\n```bash\npython -m ppgs.data.download --datasets <datasets>\n```\n\n\n### Preprocess\n\nPrepares representations for training. Representations are stored\nin `data/cache/`.\n\n```\npython -m ppgs.data.preprocess \\\n   --datasets <datasets> \\\n   --representatations <representations> \\\n   --gpu <gpu> \\\n   --num-workers <workers>\n```\n\n\n### Partition\n\nPartitions a dataset. You should not need to run this, as the partitions\nused in our work are provided for each dataset in\n`ppgs/assets/partitions/`.\n\n```\npython -m ppgs.partition --datasets <datasets>\n```\n\n\n### Train\n\nTrains a model. Checkpoints and logs are stored in `runs/`.\n\n```\npython -m ppgs.train --config <config> --dataset <dataset> --gpu <gpu>\n```\n\nIf the config file has been previously run, the most recent checkpoint will\nautomatically be loaded and training will resume from that checkpoint.\n\n\n### Monitor\n\nYou can monitor training via `tensorboard`.\n\n```\ntensorboard --logdir runs/ --port <port> --load_fast true\n```\n\nTo use the `torchutil` notification system to receive notifications for long\njobs (download, preprocess, train, and evaluate), set the\n`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as\nexplained in [the Apprise documentation](https://pypi.org/project/apprise/).\n\n\n### Evaluate\n\nPerforms objective evaluation of phoneme accuracy. Results are stored\nin `eval/`.\n\n```\npython -m ppgs.evaluate --config <name> --datasets <datasets> --gpu <gpu>\n```\n\n\n## Citation\n\n### IEEE\nC. Churchwell, M. Morrison, and B. Pardo, \"High-Fidelity Neural Phonetic Posteriorgrams,\"\nICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio, April 2024.\n\n\n### BibTex\n\n```\n@inproceedings{churchwell2024high,\n    title={High-Fidelity Neural Phonetic Posteriorgrams},\n    author={Churchwell, Cameron and Morrison, Max and Pardo, Bryan},\n    booktitle={ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio},\n    month={April},\n    year={2024}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Phonetic posteriorgrams",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/interactiveaudiolab/ppgs"
    },
    "split_keywords": [
        "phonemes",
        "ppg",
        "pronunciation",
        "speech"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64f0500180f33e041e40cdce0c33e6c1813a96e9a3796e43f1c56ea920d665ef",
                "md5": "1bcf820915ce6f73a0d6967bb0171980",
                "sha256": "b349a39c558d169f99b115751068217e523d03c576fdd51a6deeffa334b2b859"
            },
            "downloads": -1,
            "filename": "ppgs-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1bcf820915ce6f73a0d6967bb0171980",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 165936,
            "upload_time": "2024-03-04T22:14:58",
            "upload_time_iso_8601": "2024-03-04T22:14:58.404245Z",
            "url": "https://files.pythonhosted.org/packages/64/f0/500180f33e041e40cdce0c33e6c1813a96e9a3796e43f1c56ea920d665ef/ppgs-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2578f1aee02de7b4246ba20d828f9f8924f67b594409ff2f850b72c0b45ea58",
                "md5": "7df6b3eb49d7a6c307c680852979788f",
                "sha256": "3c1749f0b9893458dd526b55fa738120e2e95adcbbe49e2952d520fbcf349dae"
            },
            "downloads": -1,
            "filename": "ppgs-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "7df6b3eb49d7a6c307c680852979788f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 138627,
            "upload_time": "2024-03-04T22:15:00",
            "upload_time_iso_8601": "2024-03-04T22:15:00.112277Z",
            "url": "https://files.pythonhosted.org/packages/e2/57/8f1aee02de7b4246ba20d828f9f8924f67b594409ff2f850b72c0b45ea58/ppgs-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-04 22:15:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "interactiveaudiolab",
    "github_project": "ppgs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ppgs"
}
        
Elapsed time: 0.19352s