promonet

Name	promonet JSON
Version	0.0.1 JSON
	download
home_page	https://github.com/maxrmorrison/promonet
Summary	Prosody Modification Network
upload_time	2024-07-07 18:47:24
maintainer	None
docs_url	None
author	Interactive Audio Lab
requires_python	None
license	MIT
keywords	speech prosody editing synthesis pronunciation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1 align="center">Prosody and Pronunciation Modification Network (ProMoNet)</h1>
<div align="center">

[![PyPI](https://img.shields.io/pypi/v/promonet.svg)](https://pypi.python.org/pypi/promonet)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://pepy.tech/badge/promonet)](https://pepy.tech/project/promonet)

Official code for the paper _Fine-Grained and Interpretable Neural Speech Editing_

[[paper]](https://www.maxrmorrison.com/pdfs/morrison2024fine.pdf)
[[website]](https://www.maxrmorrison.com/sites/promonet/)

</div>


## Table of contents

- [Installation](#installation)
- [Usage](#usage)
- [Application programming interface (API)](#application-programming-interface-api)
    * [Adaptation API](#adaptation-api)
        * [`promonet.adapt.speaker`](#promonetadaptspeaker)
    * [Preprocessing API](#preprocessing-api)
        * [`promonet.preprocess.from_audio`](#promonetpreprocessfrom_audio)
        * [`promonet.preprocess.from_file`](#promonetpreprocessfrom_file)
        * [`promonet.preprocess.from_file_to_file`](#promonetpreprocessfrom_file_to_file)
        * [`promonet.preprocess.from_files_to_files`](#promonetpreprocessfrom_files_to_files)
    * [Editing API](#editing-api)
        * [`promonet.edit.from_features`](#promoneteditfrom_features)
        * [`promonet.edit.from_file`](#promoneteditfrom_file)
        * [`promonet.edit.from_file_to_file`](#promoneteditfrom_file_to_file)
        * [`promonet.edit.from_files_to_files`](#promoneteditfrom_files_to_files)
    * [Synthesis API](#synthesis-api)
        * [`promonet.synthesize.from_features`](#promonetsynthesizefrom_features)
        * [`promonet.synthesize.from_file`](#promonetsynthesizefrom_file)
        * [`promonet.synthesize.from_file_to_file`](#promonetsynthesizefrom_file_to_file)
        * [`promonet.synthesize.from_files_to_files`](#promonetsynthesizefrom_files_to_files)
- [Command-line interface (CLI)](#command-line-interface-cli)
    * [Adaptation CLI](#adaptation-cli)
        * [`promonet.adapt`](#promonetadapt)
    * [Preprocessing CLI](#preprocessing-cli)
        * [`promonet.preprocess`](#promonetpreprocess)
    * [Editing CLI](#editing-cli)
        * [`promonet.edit`](#promonetedit)
    * [Synthesis CLI](#synthesis-cli)
        * [`promonet.synthesize`](#promonetsynthesize)
- [Training](#training)
    * [Download](#download)
    * [Preprocess](#preprocess)
    * [Partition](#partition)
    * [Train](#train)
    * [Monitor](#monitor)
    * [Evaluate](#evaluate)
- [Citation](#citation)


## Installation

`pip install promonet`

We are working on adding [`torbi`, our fast Viterbi decoding implementation](https://github.com/maxrmorrison/torbi) to PyTorch. Until then, you must manually download and install `torbi`. You can track the progress of incorporation into PyTorch [here](https://github.com/pytorch/pytorch/issues/121160).


## Usage

Our included model checkpoint allows speech editing and synthesis for VCTK speakers.
To use `promonet` with other speakers, you must first perform speaker
adaptation on a dataset of recordings of the target speaker. You can then use
the resulting model checkpoint to perform speech editing in the target
speaker's voice. All of this can be done using either the API or CLI.

```python
import promonet


###############################################################################
# Speaker adaptation
###############################################################################


# Speaker's name
name = 'max'

# Audio files for adaptation
files = [...]

# GPU index to perform adaptation and editing on
gpu = 0

# Perform speaker adaptation
checkpoint = promonet.adapt.speaker(name, files, gpu=gpu)


###############################################################################
# Speech editing
###############################################################################


# Load speech to edit
audio = promonet.load.audio('test.wav')

# Get features to edit
loudness, pitch, periodicity, ppg = promonet.preprocess.from_audio(
    audio,
    promonet.SAMPLE_RATE,
    gpu)

# We'll use a ratio of 2.0 for all editing examples
ratio = 2.0

# Perform pitch-shifting
shifted = promonet.synthesize.from_features(
    *promonet.edit.from_features(
        loudness,
        pitch,
        periodicity,
        ppg,
        pitch_shift_cents=promonet.convert.ratio_to_cents(ratio)),
    checkpoint=checkpoint,
    gpu=gpu)

# Perform time-stretching
stretched = promonet.synthesize.from_features(
    *promonet.edit.from_features(
        loudness,
        pitch,
        periodicity,
        ppg,
        time_stretch_ratio=ratio),
    checkpoint=checkpoint,
    gpu=gpu)

# Perform loudness editing
scaled = promonet.synthesize.from_features(
    *promonet.edit.from_features(
        loudness,
        pitch,
        periodicity,
        ppg,
        loudness_scale_db=promonet.convert.ratio_to_db(ratio)),
    checkpoint=checkpoint,
    gpu=gpu)

# Edit spectral balance (> 1 for Alvin and the Chipmunks; < 1 for Patrick Star)
alvin = promonet.synthesize.from_features(
    loudness,
    pitch,
    periodicity,
    ppg,
    spectral_balance_ratio=ratio,
    checkpoint=checkpoint,
    gpu=gpu)
```

See the [`ppgs.edit`](https://github.com/interactiveaudiolab/ppgs#ppgsedit) submodule documentation for the pronunciation (PPG) editing API.


## Application programming interface (API)

### Adaptation API

#### `promonet.adapt.speaker`

```python
def speaker(
    name: str,
    files: List[Path],
    checkpoint: Path = None,
    gpu: Optional[int] = None
) -> Path:
    """Perform speaker adaptation

    Args:
        name: The name of the speaker
        files: The audio files to use for adaptation
        checkpoint: The model checkpoint directory
        gpu: The gpu to run adaptation on

    Returns:
        checkpoint: The file containing the trained generator checkpoint
    """
```


### Preprocessing API

#### `promonet.preprocess.from_audio`

```python
def from_audio(
    audio: torch.Tensor,
    sample_rate: int = promonet.SAMPLE_RATE,
    gpu: Optional[int] = None,
    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']
) -> Union[
    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor],
    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, str]
]:
    """Preprocess audio

    Arguments
        audio: Audio to preprocess
        sample_rate: Audio sample rate
        gpu: The GPU index
        features: The features to preprocess.
            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].

    Returns
        loudness: The loudness contour
        periodicity: The periodicity contour
        pitch: The pitch contour
        ppg: The phonetic posteriorgram
        text: The text transcript
    """
```


#### `promonet.preprocess.from_file`

```python
def from_file(
    file: Union[str, bytes, os.PathLike],
    gpu: Optional[int] = None,
    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']
) -> Union[
    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor],
    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, str]
]:
    """Preprocess audio on disk

    Arguments
        file: Audio file to preprocess
        gpu: The GPU index
        features: The features to preprocess.
            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].

    Returns
        loudness: The loudness contour
        pitch: The pitch contour
        periodicity: The periodicity contour
        ppg: The phonetic posteriorgram
        text: The text transcript
    """
```


#### `promonet.preprocess.from_file_to_file`

```python
def from_file_to_file(
    file: Union[str, bytes, os.PathLike],
    output_prefix: Optional[Union[str, os.PathLike]] = None,
    gpu: Optional[int] = None,
    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']
) -> None:
    """Preprocess audio on disk and save

    Arguments
        file: Audio file to preprocess
        output_prefix: File to save features, minus extension
        gpu: The GPU index
        features: The features to preprocess.
            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].
    """
```


#### `promonet.preprocess.from_files_to_files`

```python
def from_files_to_files(
    files: List[Union[str, bytes, os.PathLike]],
    output_prefixes: Optional[List[Union[str, os.PathLike]]] = None,
    gpu: Optional[int] = None,
    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']
) -> None:
    """Preprocess multiple audio files on disk and save

    Arguments
        files: Audio files to preprocess
        output_prefixes: Files to save features, minus extension
        gpu: The GPU index
        features: The features to preprocess.
            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].
    """
```


### Editing API

##### `promonet.edit.from_features`

```python
def from_features(
    loudness: torch.Tensor,
    pitch: torch.Tensor,
    periodicity: torch.Tensor,
    ppg: torch.Tensor,
    pitch_shift_cents: Optional[float] = None,
    time_stretch_ratio: Optional[float] = None,
    loudness_scale_db: Optional[float] = None
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    """Edit speech representation

    Arguments
        loudness: Loudness contour to edit
        pitch: Pitch contour to edit
        periodicity: Periodicity contour to edit
        ppg: PPG to edit
        pitch_shift_cents: Amount of pitch-shifting in cents
        time_stretch_ratio: Amount of time-stretching. Faster when above one.
        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)

    Returns
        edited_loudness, edited_pitch, edited_periodicity, edited_ppg
    """
```


##### `promonet.edit.from_file`

```python
def from_file(
    loudness_file: Union[str, bytes, os.PathLike],
    pitch_file: Union[str, bytes, os.PathLike],
    periodicity_file: Union[str, bytes, os.PathLike],
    ppg_file: Union[str, bytes, os.PathLike],
    pitch_shift_cents: Optional[float] = None,
    time_stretch_ratio: Optional[float] = None,
    loudness_scale_db: Optional[float] = None
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    """Edit speech representation on disk

    Arguments
        loudness_file: Loudness file to edit
        pitch_file: Pitch file to edit
        periodicity_file: Periodicity file to edit
        ppg_file: PPG file to edit
        pitch_shift_cents: Amount of pitch-shifting in cents
        time_stretch_ratio: Amount of time-stretching. Faster when above one.
        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)

    Returns
        edited_loudness, edited_pitch, edited_periodicity, edited_ppg
    """
```


##### `promonet.edit.from_file_to_file`

```python
def from_file_to_file(
    loudness_file: Union[str, bytes, os.PathLike],
    pitch_file: Union[str, bytes, os.PathLike],
    periodicity_file: Union[str, bytes, os.PathLike],
    ppg_file: Union[str, bytes, os.PathLike],
    output_prefix: Union[str, bytes, os.PathLike],
    pitch_shift_cents: Optional[float] = None,
    time_stretch_ratio: Optional[float] = None,
    loudness_scale_db: Optional[float] = None
) -> None:
    """Edit speech representation on disk and save to disk

    Arguments
        loudness_file: Loudness file to edit
        pitch_file: Pitch file to edit
        periodicity_file: Periodicity file to edit
        ppg_file: PPG file to edit
        output_prefix: File to save output, minus extension
        pitch_shift_cents: Amount of pitch-shifting in cents
        time_stretch_ratio: Amount of time-stretching. Faster when above one.
        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)
    """
```


##### `promonet.edit.from_files_to_files`

```python
def from_files_to_files(
    loudness_files: List[Union[str, bytes, os.PathLike]],
    pitch_files: List[Union[str, bytes, os.PathLike]],
    periodicity_files: List[Union[str, bytes, os.PathLike]],
    ppg_files: List[Union[str, bytes, os.PathLike]],
    output_prefixes: List[Union[str, bytes, os.PathLike]],
    pitch_shift_cents: Optional[float] = None,
    time_stretch_ratio: Optional[float] = None,
    loudness_scale_db: Optional[float] = None
) -> None:
    """Edit speech representations on disk and save to disk

    Arguments
        loudness_files: Loudness files to edit
        pitch_files: Pitch files to edit
        periodicity_files: Periodicity files to edit
        ppg_files: Phonetic posteriorgram files to edit
        output_prefixes: Files to save output, minus extension
        pitch_shift_cents: Amount of pitch-shifting in cents
        time_stretch_ratio: Amount of time-stretching. Faster when above one.
        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)
    """
```


### Synthesis API

##### `promonet.synthesize.from_features`

```python
def from_features(
    loudness: torch.Tensor,
    pitch: torch.Tensor,
    periodicity: torch.Tensor,
    ppg: torch.Tensor,
    speaker: Union[int, torch.Tensor] = 0,
    spectral_balance_ratio: float = 1.,
    checkpoint: Optional[Union[str, os.PathLike]] = None,
    gpu: Optional[int] = None) -> torch.Tensor:
    """Perform speech synthesis

    Args:
        loudness: The loudness contour
        pitch: The pitch contour
        periodicity: The periodicity contour
        ppg: The phonetic posteriorgram
        speaker: The speaker index
        spectral_balance_ratio: > 1 for Alvin and the Chipmunks; < 1 for Patrick Star
        checkpoint: The generator checkpoint
        gpu: The GPU index

    Returns
        generated: The generated speech
    """
```


##### `promonet.synthesize.from_file`

```python
def from_file(
    loudness_file: Union[str, os.PathLike],
    pitch_file: Union[str, os.PathLike],
    periodicity_file: Union[str, os.PathLike],
    ppg_file: Union[str, os.PathLike],
    speaker: Union[int, torch.Tensor] = 0,
    checkpoint: Optional[Union[str, os.PathLike]] = None,
    gpu: Optional[int] = None
) -> torch.Tensor:
    """Perform speech synthesis from features on disk

    Args:
        loudness_file: The loudness file
        pitch_file: The pitch file
        periodicity_file: The periodicity file
        ppg_file: The phonetic posteriorgram file
        speaker: The speaker index
        checkpoint: The generator checkpoint
        gpu: The GPU index

    Returns
        generated: The generated speech
    """
```


##### `promonet.synthesize.from_file_to_file`

```python
def from_file_to_file(
    loudness_file: Union[str, os.PathLike],
    pitch_file: Union[str, os.PathLike],
    periodicity_file: Union[str, os.PathLike],
    ppg_file: Union[str, os.PathLike],
    output_file: Union[str, os.PathLike],
    speaker: Union[int, torch.Tensor] = 0,
    checkpoint: Optional[Union[str, os.PathLike]] = None,
    gpu: Optional[int] = None
) -> None:
    """Perform speech synthesis from features on disk and save

    Args:
        loudness_file: The loudness file
        pitch_file: The pitch file
        periodicity_file: The periodicity file
        ppg_file: The phonetic posteriorgram file
        output_file: The file to save generated speech audio
        speaker: The speaker index
        checkpoint: The generator checkpoint
        gpu: The GPU index
    """
```


##### `promonet.synthesize.from_files_to_files`

```python
def from_files_to_files(
    loudness_files: List[Union[str, os.PathLike]],
    pitch_files: List[Union[str, os.PathLike]],
    periodicity_files: List[Union[str, os.PathLike]],
    ppg_files: List[Union[str, os.PathLike]],
    output_files: List[Union[str, os.PathLike]],
    speakers: Optional[Union[List[int], torch.Tensor]] = None,
    checkpoint: Optional[Union[str, os.PathLike]] = None,
    gpu: Optional[int] = None
) -> None:
    """Perform batched speech synthesis from features on disk and save

    Args:
        loudness_files: The loudness files
        pitch_files: The pitch files
        periodicity_files: The periodicity files
        ppg_files: The phonetic posteriorgram files
        output_files: The files to save generated speech audio
        speakers: The speaker indices
        checkpoint: The generator checkpoint
        gpu: The GPU index
    """
```


## Command-line interface (CLI)

### Adaptation CLI

#### `promonet.adapt`

```
python -m promonet.adapt \
    --name NAME \
    --files FILES [FILES ...] \
    [--checkpoint CHECKPOINT] \
    [--gpu GPU]

Perform speaker adaptation

optional arguments:
  -h, --help
    show this help message and exit
  --name NAME
    The name of the speaker
  --files FILES [FILES ...]
    The audio files to use for adaptation
  --checkpoint CHECKPOINT
    The model checkpoint directory
  --gpu GPU
    The gpu to run adaptation on
```


### Preprocessing CLI

#### `promonet.preprocess`

```
python -m promonet.preprocess \
    [-h] \
    --files FILES [FILES ...] \
    [--output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]] \
    [--features {loudness,pitch,periodicity,ppg} [{loudness,pitch,periodicity,ppg} ...]] \
    [--gpu GPU]

Preprocess

arguments:
  --files FILES [FILES ...]
    Audio files to preprocess

optional arguments:
  -h, --help
    show this help message and exit
  --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]
    Files to save features, minus extension
  --features {loudness,pitch,periodicity,ppg} [{loudness,pitch,periodicity,ppg} ...]
    The features to preprocess
  --gpu GPU
    The index of the gpu to use
```


### Editing CLI

#### `promonet.edit`

```
python -m promonet.edit \
    [-h] \
    --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...] \
    --pitch_files PITCH_FILES [PITCH_FILES ...] \
    --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...] \
    --ppg_files PPG_FILES [PPG_FILES ...] \
    --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...] \
    [--pitch_shift_cents PITCH_SHIFT_CENTS] \
    [--time_stretch_ratio TIME_STRETCH_RATIO] \
    [--loudness_scale_db LOUDNESS_SCALE_DB]

Edit speech representation

arguments:
  --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...]
    The loudness files to edit
  --pitch_files PITCH_FILES [PITCH_FILES ...]
    The pitch files to edit
  --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...]
    The periodicity files to edit
  --ppg_files PPG_FILES [PPG_FILES ...]
    The ppg files to edit
  --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]
    The locations to save output files, minus extension

optional arguments:
  -h, --help
    show this help message and exit
  --pitch_shift_cents PITCH_SHIFT_CENTS
    Amount of pitch-shifting in cents
  --time_stretch_ratio TIME_STRETCH_RATIO
    Amount of time-stretching. Faster when above one.
  --loudness_scale_db LOUDNESS_SCALE_DB
    Loudness ratio editing in dB (not recommended; use loudness)
```


### Synthesis CLI

#### `promonet.synthesize`

```
python -m promonet.synthesize \
    --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...] \
    --pitch_files PITCH_FILES [PITCH_FILES ...] \
    --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...] \
    --ppg_files PPG_FILES [PPG_FILES ...] \
    --output_files OUTPUT_FILES [OUTPUT_FILES ...] \
    [--speakers SPEAKERS [SPEAKERS ...]] \
    [--checkpoint CHECKPOINT] \
    [--gpu GPU]

Synthesize speech from features

arguments:
  --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...]
    The loudness files
  --pitch_files PITCH_FILES [PITCH_FILES ...]
    The pitch files
  --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...]
    The periodicity files
  --ppg_files PPG_FILES [PPG_FILES ...]
    The phonetic posteriorgram files
  --output_files OUTPUT_FILES [OUTPUT_FILES ...]
    The files to save the edited audio

optional arguments:
  -h, --help
    show this help message and exit
  --speakers SPEAKERS [SPEAKERS ...]
    The IDs of the speakers for voice conversion
  --checkpoint CHECKPOINT
    The generator checkpoint
  --gpu GPU
    The GPU index
```


## Training

### Download

Downloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.
Stores formatted datasets in `data/cache/`.

```
python -m promonet.data.download --datasets <datasets>
```


### Preprocess

Prepares features for training. Features are stored in `data/cache/`.

```
python -m promonet.data.preprocess \
    --datasets <datasets> \
    --features <features> \
    --gpu <gpu>
```


### Partition

Partitions a dataset. You should not need to run this, as the partitions
used in our work are provided for each dataset in
`promonet/assets/partitions/`.

```
python -m promonet.partition --datasets <datasets>
```


### Train

Trains a model. Checkpoints and logs are stored in `runs/`.

```
python -m promonet.train \
    --config <config> \
    --dataset <dataset> \
    --gpu <gpu>
```

If the config file has been previously run, the most recent checkpoint will
automatically be loaded and training will resume from that checkpoint.


### Monitor

You can monitor training via `tensorboard`.

```
tensorboard --logdir runs/ --port <port> --load_fast true
```

To use the `torchutil` notification system to receive notifications for long
jobs (download, preprocess, train, and evaluate), set the
`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as
explained in [the Apprise documentation](https://pypi.org/project/apprise/).


### Evaluate

Performs objective evaluation and generates examples for subjective evaluation.
Also performs benchmarking of generation speed. Results are stored in `eval/`.

```
python -m promonet.evaluate \
    --config <name> \
    --datasets <datasets> \
    --gpu <gpu>
```


## Citation

### IEEE
M. Morrison, C. Churchwell, N. Pruyne, and B. Pardo, "Fine-Grained and Interpretable Neural Speech Editing," Interspeech, September 2024.


### BibTex

```
@inproceedings{morrison2024adaptive,
    title={Fine-Grained and Interpretable Neural Speech Editing},
    author={Morrison, Max and Churchwell, Cameron and Pruyne, Nathan and Pardo, Bryan},
    booktitle={Interspeech},
    month={September},
    year={2024}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maxrmorrison/promonet",
    "name": "promonet",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "speech, prosody, editing, synthesis, pronunciation",
    "author": "Interactive Audio Lab",
    "author_email": "interactiveaudiolab@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/85/83/6685c97d2b171d3d78585a96962edefb5252bc36b95f5ee0706c69f67bbb/promonet-0.0.1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">Prosody and Pronunciation Modification Network (ProMoNet)</h1>\n<div align=\"center\">\n\n[![PyPI](https://img.shields.io/pypi/v/promonet.svg)](https://pypi.python.org/pypi/promonet)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://pepy.tech/badge/promonet)](https://pepy.tech/project/promonet)\n\nOfficial code for the paper _Fine-Grained and Interpretable Neural Speech Editing_\n\n[[paper]](https://www.maxrmorrison.com/pdfs/morrison2024fine.pdf)\n[[website]](https://www.maxrmorrison.com/sites/promonet/)\n\n</div>\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Application programming interface (API)](#application-programming-interface-api)\n    * [Adaptation API](#adaptation-api)\n        * [`promonet.adapt.speaker`](#promonetadaptspeaker)\n    * [Preprocessing API](#preprocessing-api)\n        * [`promonet.preprocess.from_audio`](#promonetpreprocessfrom_audio)\n        * [`promonet.preprocess.from_file`](#promonetpreprocessfrom_file)\n        * [`promonet.preprocess.from_file_to_file`](#promonetpreprocessfrom_file_to_file)\n        * [`promonet.preprocess.from_files_to_files`](#promonetpreprocessfrom_files_to_files)\n    * [Editing API](#editing-api)\n        * [`promonet.edit.from_features`](#promoneteditfrom_features)\n        * [`promonet.edit.from_file`](#promoneteditfrom_file)\n        * [`promonet.edit.from_file_to_file`](#promoneteditfrom_file_to_file)\n        * [`promonet.edit.from_files_to_files`](#promoneteditfrom_files_to_files)\n    * [Synthesis API](#synthesis-api)\n        * [`promonet.synthesize.from_features`](#promonetsynthesizefrom_features)\n        * [`promonet.synthesize.from_file`](#promonetsynthesizefrom_file)\n        * [`promonet.synthesize.from_file_to_file`](#promonetsynthesizefrom_file_to_file)\n        * [`promonet.synthesize.from_files_to_files`](#promonetsynthesizefrom_files_to_files)\n- [Command-line interface (CLI)](#command-line-interface-cli)\n    * [Adaptation CLI](#adaptation-cli)\n        * [`promonet.adapt`](#promonetadapt)\n    * [Preprocessing CLI](#preprocessing-cli)\n        * [`promonet.preprocess`](#promonetpreprocess)\n    * [Editing CLI](#editing-cli)\n        * [`promonet.edit`](#promonetedit)\n    * [Synthesis CLI](#synthesis-cli)\n        * [`promonet.synthesize`](#promonetsynthesize)\n- [Training](#training)\n    * [Download](#download)\n    * [Preprocess](#preprocess)\n    * [Partition](#partition)\n    * [Train](#train)\n    * [Monitor](#monitor)\n    * [Evaluate](#evaluate)\n- [Citation](#citation)\n\n\n## Installation\n\n`pip install promonet`\n\nWe are working on adding [`torbi`, our fast Viterbi decoding implementation](https://github.com/maxrmorrison/torbi) to PyTorch. Until then, you must manually download and install `torbi`. You can track the progress of incorporation into PyTorch [here](https://github.com/pytorch/pytorch/issues/121160).\n\n\n## Usage\n\nOur included model checkpoint allows speech editing and synthesis for VCTK speakers.\nTo use `promonet` with other speakers, you must first perform speaker\nadaptation on a dataset of recordings of the target speaker. You can then use\nthe resulting model checkpoint to perform speech editing in the target\nspeaker's voice. All of this can be done using either the API or CLI.\n\n```python\nimport promonet\n\n\n###############################################################################\n# Speaker adaptation\n###############################################################################\n\n\n# Speaker's name\nname = 'max'\n\n# Audio files for adaptation\nfiles = [...]\n\n# GPU index to perform adaptation and editing on\ngpu = 0\n\n# Perform speaker adaptation\ncheckpoint = promonet.adapt.speaker(name, files, gpu=gpu)\n\n\n###############################################################################\n# Speech editing\n###############################################################################\n\n\n# Load speech to edit\naudio = promonet.load.audio('test.wav')\n\n# Get features to edit\nloudness, pitch, periodicity, ppg = promonet.preprocess.from_audio(\n    audio,\n    promonet.SAMPLE_RATE,\n    gpu)\n\n# We'll use a ratio of 2.0 for all editing examples\nratio = 2.0\n\n# Perform pitch-shifting\nshifted = promonet.synthesize.from_features(\n    *promonet.edit.from_features(\n        loudness,\n        pitch,\n        periodicity,\n        ppg,\n        pitch_shift_cents=promonet.convert.ratio_to_cents(ratio)),\n    checkpoint=checkpoint,\n    gpu=gpu)\n\n# Perform time-stretching\nstretched = promonet.synthesize.from_features(\n    *promonet.edit.from_features(\n        loudness,\n        pitch,\n        periodicity,\n        ppg,\n        time_stretch_ratio=ratio),\n    checkpoint=checkpoint,\n    gpu=gpu)\n\n# Perform loudness editing\nscaled = promonet.synthesize.from_features(\n    *promonet.edit.from_features(\n        loudness,\n        pitch,\n        periodicity,\n        ppg,\n        loudness_scale_db=promonet.convert.ratio_to_db(ratio)),\n    checkpoint=checkpoint,\n    gpu=gpu)\n\n# Edit spectral balance (> 1 for Alvin and the Chipmunks; < 1 for Patrick Star)\nalvin = promonet.synthesize.from_features(\n    loudness,\n    pitch,\n    periodicity,\n    ppg,\n    spectral_balance_ratio=ratio,\n    checkpoint=checkpoint,\n    gpu=gpu)\n```\n\nSee the [`ppgs.edit`](https://github.com/interactiveaudiolab/ppgs#ppgsedit) submodule documentation for the pronunciation (PPG) editing API.\n\n\n## Application programming interface (API)\n\n### Adaptation API\n\n#### `promonet.adapt.speaker`\n\n```python\ndef speaker(\n    name: str,\n    files: List[Path],\n    checkpoint: Path = None,\n    gpu: Optional[int] = None\n) -> Path:\n    \"\"\"Perform speaker adaptation\n\n    Args:\n        name: The name of the speaker\n        files: The audio files to use for adaptation\n        checkpoint: The model checkpoint directory\n        gpu: The gpu to run adaptation on\n\n    Returns:\n        checkpoint: The file containing the trained generator checkpoint\n    \"\"\"\n```\n\n\n### Preprocessing API\n\n#### `promonet.preprocess.from_audio`\n\n```python\ndef from_audio(\n    audio: torch.Tensor,\n    sample_rate: int = promonet.SAMPLE_RATE,\n    gpu: Optional[int] = None,\n    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']\n) -> Union[\n    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor],\n    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, str]\n]:\n    \"\"\"Preprocess audio\n\n    Arguments\n        audio: Audio to preprocess\n        sample_rate: Audio sample rate\n        gpu: The GPU index\n        features: The features to preprocess.\n            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].\n\n    Returns\n        loudness: The loudness contour\n        periodicity: The periodicity contour\n        pitch: The pitch contour\n        ppg: The phonetic posteriorgram\n        text: The text transcript\n    \"\"\"\n```\n\n\n#### `promonet.preprocess.from_file`\n\n```python\ndef from_file(\n    file: Union[str, bytes, os.PathLike],\n    gpu: Optional[int] = None,\n    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']\n) -> Union[\n    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor],\n    Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, str]\n]:\n    \"\"\"Preprocess audio on disk\n\n    Arguments\n        file: Audio file to preprocess\n        gpu: The GPU index\n        features: The features to preprocess.\n            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].\n\n    Returns\n        loudness: The loudness contour\n        pitch: The pitch contour\n        periodicity: The periodicity contour\n        ppg: The phonetic posteriorgram\n        text: The text transcript\n    \"\"\"\n```\n\n\n#### `promonet.preprocess.from_file_to_file`\n\n```python\ndef from_file_to_file(\n    file: Union[str, bytes, os.PathLike],\n    output_prefix: Optional[Union[str, os.PathLike]] = None,\n    gpu: Optional[int] = None,\n    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']\n) -> None:\n    \"\"\"Preprocess audio on disk and save\n\n    Arguments\n        file: Audio file to preprocess\n        output_prefix: File to save features, minus extension\n        gpu: The GPU index\n        features: The features to preprocess.\n            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].\n    \"\"\"\n```\n\n\n#### `promonet.preprocess.from_files_to_files`\n\n```python\ndef from_files_to_files(\n    files: List[Union[str, bytes, os.PathLike]],\n    output_prefixes: Optional[List[Union[str, os.PathLike]]] = None,\n    gpu: Optional[int] = None,\n    features: list = ['loudness', 'pitch', 'periodicity', 'ppg']\n) -> None:\n    \"\"\"Preprocess multiple audio files on disk and save\n\n    Arguments\n        files: Audio files to preprocess\n        output_prefixes: Files to save features, minus extension\n        gpu: The GPU index\n        features: The features to preprocess.\n            Options: ['loudness', 'pitch', 'periodicity', 'ppg', 'text'].\n    \"\"\"\n```\n\n\n### Editing API\n\n##### `promonet.edit.from_features`\n\n```python\ndef from_features(\n    loudness: torch.Tensor,\n    pitch: torch.Tensor,\n    periodicity: torch.Tensor,\n    ppg: torch.Tensor,\n    pitch_shift_cents: Optional[float] = None,\n    time_stretch_ratio: Optional[float] = None,\n    loudness_scale_db: Optional[float] = None\n) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:\n    \"\"\"Edit speech representation\n\n    Arguments\n        loudness: Loudness contour to edit\n        pitch: Pitch contour to edit\n        periodicity: Periodicity contour to edit\n        ppg: PPG to edit\n        pitch_shift_cents: Amount of pitch-shifting in cents\n        time_stretch_ratio: Amount of time-stretching. Faster when above one.\n        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)\n\n    Returns\n        edited_loudness, edited_pitch, edited_periodicity, edited_ppg\n    \"\"\"\n```\n\n\n##### `promonet.edit.from_file`\n\n```python\ndef from_file(\n    loudness_file: Union[str, bytes, os.PathLike],\n    pitch_file: Union[str, bytes, os.PathLike],\n    periodicity_file: Union[str, bytes, os.PathLike],\n    ppg_file: Union[str, bytes, os.PathLike],\n    pitch_shift_cents: Optional[float] = None,\n    time_stretch_ratio: Optional[float] = None,\n    loudness_scale_db: Optional[float] = None\n) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:\n    \"\"\"Edit speech representation on disk\n\n    Arguments\n        loudness_file: Loudness file to edit\n        pitch_file: Pitch file to edit\n        periodicity_file: Periodicity file to edit\n        ppg_file: PPG file to edit\n        pitch_shift_cents: Amount of pitch-shifting in cents\n        time_stretch_ratio: Amount of time-stretching. Faster when above one.\n        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)\n\n    Returns\n        edited_loudness, edited_pitch, edited_periodicity, edited_ppg\n    \"\"\"\n```\n\n\n##### `promonet.edit.from_file_to_file`\n\n```python\ndef from_file_to_file(\n    loudness_file: Union[str, bytes, os.PathLike],\n    pitch_file: Union[str, bytes, os.PathLike],\n    periodicity_file: Union[str, bytes, os.PathLike],\n    ppg_file: Union[str, bytes, os.PathLike],\n    output_prefix: Union[str, bytes, os.PathLike],\n    pitch_shift_cents: Optional[float] = None,\n    time_stretch_ratio: Optional[float] = None,\n    loudness_scale_db: Optional[float] = None\n) -> None:\n    \"\"\"Edit speech representation on disk and save to disk\n\n    Arguments\n        loudness_file: Loudness file to edit\n        pitch_file: Pitch file to edit\n        periodicity_file: Periodicity file to edit\n        ppg_file: PPG file to edit\n        output_prefix: File to save output, minus extension\n        pitch_shift_cents: Amount of pitch-shifting in cents\n        time_stretch_ratio: Amount of time-stretching. Faster when above one.\n        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)\n    \"\"\"\n```\n\n\n##### `promonet.edit.from_files_to_files`\n\n```python\ndef from_files_to_files(\n    loudness_files: List[Union[str, bytes, os.PathLike]],\n    pitch_files: List[Union[str, bytes, os.PathLike]],\n    periodicity_files: List[Union[str, bytes, os.PathLike]],\n    ppg_files: List[Union[str, bytes, os.PathLike]],\n    output_prefixes: List[Union[str, bytes, os.PathLike]],\n    pitch_shift_cents: Optional[float] = None,\n    time_stretch_ratio: Optional[float] = None,\n    loudness_scale_db: Optional[float] = None\n) -> None:\n    \"\"\"Edit speech representations on disk and save to disk\n\n    Arguments\n        loudness_files: Loudness files to edit\n        pitch_files: Pitch files to edit\n        periodicity_files: Periodicity files to edit\n        ppg_files: Phonetic posteriorgram files to edit\n        output_prefixes: Files to save output, minus extension\n        pitch_shift_cents: Amount of pitch-shifting in cents\n        time_stretch_ratio: Amount of time-stretching. Faster when above one.\n        loudness_scale_db: Loudness ratio editing in dB (not recommended; use loudness)\n    \"\"\"\n```\n\n\n### Synthesis API\n\n##### `promonet.synthesize.from_features`\n\n```python\ndef from_features(\n    loudness: torch.Tensor,\n    pitch: torch.Tensor,\n    periodicity: torch.Tensor,\n    ppg: torch.Tensor,\n    speaker: Union[int, torch.Tensor] = 0,\n    spectral_balance_ratio: float = 1.,\n    checkpoint: Optional[Union[str, os.PathLike]] = None,\n    gpu: Optional[int] = None) -> torch.Tensor:\n    \"\"\"Perform speech synthesis\n\n    Args:\n        loudness: The loudness contour\n        pitch: The pitch contour\n        periodicity: The periodicity contour\n        ppg: The phonetic posteriorgram\n        speaker: The speaker index\n        spectral_balance_ratio: > 1 for Alvin and the Chipmunks; < 1 for Patrick Star\n        checkpoint: The generator checkpoint\n        gpu: The GPU index\n\n    Returns\n        generated: The generated speech\n    \"\"\"\n```\n\n\n##### `promonet.synthesize.from_file`\n\n```python\ndef from_file(\n    loudness_file: Union[str, os.PathLike],\n    pitch_file: Union[str, os.PathLike],\n    periodicity_file: Union[str, os.PathLike],\n    ppg_file: Union[str, os.PathLike],\n    speaker: Union[int, torch.Tensor] = 0,\n    checkpoint: Optional[Union[str, os.PathLike]] = None,\n    gpu: Optional[int] = None\n) -> torch.Tensor:\n    \"\"\"Perform speech synthesis from features on disk\n\n    Args:\n        loudness_file: The loudness file\n        pitch_file: The pitch file\n        periodicity_file: The periodicity file\n        ppg_file: The phonetic posteriorgram file\n        speaker: The speaker index\n        checkpoint: The generator checkpoint\n        gpu: The GPU index\n\n    Returns\n        generated: The generated speech\n    \"\"\"\n```\n\n\n##### `promonet.synthesize.from_file_to_file`\n\n```python\ndef from_file_to_file(\n    loudness_file: Union[str, os.PathLike],\n    pitch_file: Union[str, os.PathLike],\n    periodicity_file: Union[str, os.PathLike],\n    ppg_file: Union[str, os.PathLike],\n    output_file: Union[str, os.PathLike],\n    speaker: Union[int, torch.Tensor] = 0,\n    checkpoint: Optional[Union[str, os.PathLike]] = None,\n    gpu: Optional[int] = None\n) -> None:\n    \"\"\"Perform speech synthesis from features on disk and save\n\n    Args:\n        loudness_file: The loudness file\n        pitch_file: The pitch file\n        periodicity_file: The periodicity file\n        ppg_file: The phonetic posteriorgram file\n        output_file: The file to save generated speech audio\n        speaker: The speaker index\n        checkpoint: The generator checkpoint\n        gpu: The GPU index\n    \"\"\"\n```\n\n\n##### `promonet.synthesize.from_files_to_files`\n\n```python\ndef from_files_to_files(\n    loudness_files: List[Union[str, os.PathLike]],\n    pitch_files: List[Union[str, os.PathLike]],\n    periodicity_files: List[Union[str, os.PathLike]],\n    ppg_files: List[Union[str, os.PathLike]],\n    output_files: List[Union[str, os.PathLike]],\n    speakers: Optional[Union[List[int], torch.Tensor]] = None,\n    checkpoint: Optional[Union[str, os.PathLike]] = None,\n    gpu: Optional[int] = None\n) -> None:\n    \"\"\"Perform batched speech synthesis from features on disk and save\n\n    Args:\n        loudness_files: The loudness files\n        pitch_files: The pitch files\n        periodicity_files: The periodicity files\n        ppg_files: The phonetic posteriorgram files\n        output_files: The files to save generated speech audio\n        speakers: The speaker indices\n        checkpoint: The generator checkpoint\n        gpu: The GPU index\n    \"\"\"\n```\n\n\n## Command-line interface (CLI)\n\n### Adaptation CLI\n\n#### `promonet.adapt`\n\n```\npython -m promonet.adapt \\\n    --name NAME \\\n    --files FILES [FILES ...] \\\n    [--checkpoint CHECKPOINT] \\\n    [--gpu GPU]\n\nPerform speaker adaptation\n\noptional arguments:\n  -h, --help\n    show this help message and exit\n  --name NAME\n    The name of the speaker\n  --files FILES [FILES ...]\n    The audio files to use for adaptation\n  --checkpoint CHECKPOINT\n    The model checkpoint directory\n  --gpu GPU\n    The gpu to run adaptation on\n```\n\n\n### Preprocessing CLI\n\n#### `promonet.preprocess`\n\n```\npython -m promonet.preprocess \\\n    [-h] \\\n    --files FILES [FILES ...] \\\n    [--output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]] \\\n    [--features {loudness,pitch,periodicity,ppg} [{loudness,pitch,periodicity,ppg} ...]] \\\n    [--gpu GPU]\n\nPreprocess\n\narguments:\n  --files FILES [FILES ...]\n    Audio files to preprocess\n\noptional arguments:\n  -h, --help\n    show this help message and exit\n  --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]\n    Files to save features, minus extension\n  --features {loudness,pitch,periodicity,ppg} [{loudness,pitch,periodicity,ppg} ...]\n    The features to preprocess\n  --gpu GPU\n    The index of the gpu to use\n```\n\n\n### Editing CLI\n\n#### `promonet.edit`\n\n```\npython -m promonet.edit \\\n    [-h] \\\n    --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...] \\\n    --pitch_files PITCH_FILES [PITCH_FILES ...] \\\n    --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...] \\\n    --ppg_files PPG_FILES [PPG_FILES ...] \\\n    --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...] \\\n    [--pitch_shift_cents PITCH_SHIFT_CENTS] \\\n    [--time_stretch_ratio TIME_STRETCH_RATIO] \\\n    [--loudness_scale_db LOUDNESS_SCALE_DB]\n\nEdit speech representation\n\narguments:\n  --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...]\n    The loudness files to edit\n  --pitch_files PITCH_FILES [PITCH_FILES ...]\n    The pitch files to edit\n  --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...]\n    The periodicity files to edit\n  --ppg_files PPG_FILES [PPG_FILES ...]\n    The ppg files to edit\n  --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]\n    The locations to save output files, minus extension\n\noptional arguments:\n  -h, --help\n    show this help message and exit\n  --pitch_shift_cents PITCH_SHIFT_CENTS\n    Amount of pitch-shifting in cents\n  --time_stretch_ratio TIME_STRETCH_RATIO\n    Amount of time-stretching. Faster when above one.\n  --loudness_scale_db LOUDNESS_SCALE_DB\n    Loudness ratio editing in dB (not recommended; use loudness)\n```\n\n\n### Synthesis CLI\n\n#### `promonet.synthesize`\n\n```\npython -m promonet.synthesize \\\n    --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...] \\\n    --pitch_files PITCH_FILES [PITCH_FILES ...] \\\n    --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...] \\\n    --ppg_files PPG_FILES [PPG_FILES ...] \\\n    --output_files OUTPUT_FILES [OUTPUT_FILES ...] \\\n    [--speakers SPEAKERS [SPEAKERS ...]] \\\n    [--checkpoint CHECKPOINT] \\\n    [--gpu GPU]\n\nSynthesize speech from features\n\narguments:\n  --loudness_files LOUDNESS_FILES [LOUDNESS_FILES ...]\n    The loudness files\n  --pitch_files PITCH_FILES [PITCH_FILES ...]\n    The pitch files\n  --periodicity_files PERIODICITY_FILES [PERIODICITY_FILES ...]\n    The periodicity files\n  --ppg_files PPG_FILES [PPG_FILES ...]\n    The phonetic posteriorgram files\n  --output_files OUTPUT_FILES [OUTPUT_FILES ...]\n    The files to save the edited audio\n\noptional arguments:\n  -h, --help\n    show this help message and exit\n  --speakers SPEAKERS [SPEAKERS ...]\n    The IDs of the speakers for voice conversion\n  --checkpoint CHECKPOINT\n    The generator checkpoint\n  --gpu GPU\n    The GPU index\n```\n\n\n## Training\n\n### Download\n\nDownloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.\nStores formatted datasets in `data/cache/`.\n\n```\npython -m promonet.data.download --datasets <datasets>\n```\n\n\n### Preprocess\n\nPrepares features for training. Features are stored in `data/cache/`.\n\n```\npython -m promonet.data.preprocess \\\n    --datasets <datasets> \\\n    --features <features> \\\n    --gpu <gpu>\n```\n\n\n### Partition\n\nPartitions a dataset. You should not need to run this, as the partitions\nused in our work are provided for each dataset in\n`promonet/assets/partitions/`.\n\n```\npython -m promonet.partition --datasets <datasets>\n```\n\n\n### Train\n\nTrains a model. Checkpoints and logs are stored in `runs/`.\n\n```\npython -m promonet.train \\\n    --config <config> \\\n    --dataset <dataset> \\\n    --gpu <gpu>\n```\n\nIf the config file has been previously run, the most recent checkpoint will\nautomatically be loaded and training will resume from that checkpoint.\n\n\n### Monitor\n\nYou can monitor training via `tensorboard`.\n\n```\ntensorboard --logdir runs/ --port <port> --load_fast true\n```\n\nTo use the `torchutil` notification system to receive notifications for long\njobs (download, preprocess, train, and evaluate), set the\n`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as\nexplained in [the Apprise documentation](https://pypi.org/project/apprise/).\n\n\n### Evaluate\n\nPerforms objective evaluation and generates examples for subjective evaluation.\nAlso performs benchmarking of generation speed. Results are stored in `eval/`.\n\n```\npython -m promonet.evaluate \\\n    --config <name> \\\n    --datasets <datasets> \\\n    --gpu <gpu>\n```\n\n\n## Citation\n\n### IEEE\nM. Morrison, C. Churchwell, N. Pruyne, and B. Pardo, \"Fine-Grained and Interpretable Neural Speech Editing,\" Interspeech, September 2024.\n\n\n### BibTex\n\n```\n@inproceedings{morrison2024adaptive,\n    title={Fine-Grained and Interpretable Neural Speech Editing},\n    author={Morrison, Max and Churchwell, Cameron and Pruyne, Nathan and Pardo, Bryan},\n    booktitle={Interspeech},\n    month={September},\n    year={2024}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Prosody Modification Network",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/maxrmorrison/promonet"
    },
    "split_keywords": [
        "speech",
        " prosody",
        " editing",
        " synthesis",
        " pronunciation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a821c330d06735c9810bfc6cefbb67840f450cecbe346e2312814caa4b9dec83",
                "md5": "fbbfeca73fbdf1c4b02178221db42af1",
                "sha256": "fa22dc2ae34c05fe121687e6717b00e56b67e41796761f64f457e219b8413eb7"
            },
            "downloads": -1,
            "filename": "promonet-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fbbfeca73fbdf1c4b02178221db42af1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 2933281,
            "upload_time": "2024-07-07T18:47:22",
            "upload_time_iso_8601": "2024-07-07T18:47:22.467265Z",
            "url": "https://files.pythonhosted.org/packages/a8/21/c330d06735c9810bfc6cefbb67840f450cecbe346e2312814caa4b9dec83/promonet-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "85836685c97d2b171d3d78585a96962edefb5252bc36b95f5ee0706c69f67bbb",
                "md5": "5c38e036cb980668e32419f08b248c08",
                "sha256": "7864f980270abbbecb8d3f6192037941f8bf51c141ee5ec74f69d04c51feb594"
            },
            "downloads": -1,
            "filename": "promonet-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5c38e036cb980668e32419f08b248c08",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 2979828,
            "upload_time": "2024-07-07T18:47:24",
            "upload_time_iso_8601": "2024-07-07T18:47:24.650418Z",
            "url": "https://files.pythonhosted.org/packages/85/83/6685c97d2b171d3d78585a96962edefb5252bc36b95f5ee0706c69f67bbb/promonet-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-07 18:47:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maxrmorrison",
    "github_project": "promonet",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "promonet"
}

Interactive Audio Lab