<h1 align="center">Pitch-Estimating Neural Networks (PENN)</h1>
<div align="center">
[![PyPI](https://img.shields.io/pypi/v/penn.svg)](https://pypi.python.org/pypi/penn)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/penn)](https://pepy.tech/project/penn)
</div>
Training, evaluation, and inference of neural pitch and periodicity estimators in PyTorch. Includes the original code for the paper ["Cross-domain Neural Pitch and Periodicity Estimation"](https://arxiv.org/abs/2301.12258).
## Table of contents
- [Installation](#installation)
- [Inference](#inference)
* [Application programming interface](#application-programming-interface)
* [`penn.from_audio`](#pennfrom_audio)
* [`penn.from_file`](#pennfrom_file)
* [`penn.from_file_to_file`](#pennfrom_file_to_file)
* [`penn.from_files_to_files`](#pennfrom_files_to_files)
* [Command-line interface](#command-line-interface)
- [Training](#training)
* [Download](#download)
* [Preprocess](#preprocess)
* [Partition](#partition)
* [Train](#train)
* [Monitor](#monitor)
- [Evaluation](#evaluation)
* [Evaluate](#evaluate)
* [Plot](#plot)
- [Citation](#citation)
## Installation
If you want to perform pitch estimation using a pretrained FCNF0++ model, run
`pip install penn`
If you want to train or use your own models, run
`pip install penn[train]`
## Inference
Perform inference using FCNF0++
```
import penn
# Load audio
audio, sample_rate = torchaudio.load('test/assets/gershwin.wav')
# Here we'll use a 10 millisecond hopsize
hopsize = .01
# Provide a sensible frequency range given your domain and model
fmin = 30.
fmax = 1000.
# Choose a gpu index to use for inference. Set to None to use cpu.
gpu = 0
# If you are using a gpu, pick a batch size that doesn't cause memory errors
# on your gpu
batch_size = 2048
# Select a checkpoint to use for inference. Selecting None will
# download and use FCNF0++ pretrained on MDB-stem-synth and PTDB
checkpoint = None
# Centers frames at hopsize / 2, 3 * hopsize / 2, 5 * hopsize / 2, ...
center = 'half-hop'
# (Optional) Linearly interpolate unvoiced regions below periodicity threshold
interp_unvoiced_at = .065
# Infer pitch and periodicity
pitch, periodicity = penn.from_audio(
audio,
sample_rate,
hopsize=hopsize,
fmin=fmin,
fmax=fmax,
checkpoint=checkpoint,
batch_size=batch_size,
center=center,
interp_unvoiced_at=interp_unvoiced_at,
gpu=gpu)
```
### Application programming interface
#### `penn.from_audio`
```
"""Perform pitch and periodicity estimation
Args:
audio: The audio to extract pitch and periodicity from
sample_rate: The audio sample rate
hopsize: The hopsize in seconds
fmin: The minimum allowable frequency in Hz
fmax: The maximum allowable frequency in Hz
checkpoint: The checkpoint file
batch_size: The number of frames per batch
center: Padding options. One of ['half-window', 'half-hop', 'zero'].
interp_unvoiced_at: Specifies voicing threshold for interpolation
gpu: The index of the gpu to run inference on
Returns:
pitch: torch.tensor(
shape=(1, int(samples // penn.seconds_to_sample(hopsize))))
periodicity: torch.tensor(
shape=(1, int(samples // penn.seconds_to_sample(hopsize))))
"""
```
#### `penn.from_file`
```
"""Perform pitch and periodicity estimation from audio on disk
Args:
file: The audio file
hopsize: The hopsize in seconds
fmin: The minimum allowable frequency in Hz
fmax: The maximum allowable frequency in Hz
checkpoint: The checkpoint file
batch_size: The number of frames per batch
center: Padding options. One of ['half-window', 'half-hop', 'zero'].
interp_unvoiced_at: Specifies voicing threshold for interpolation
gpu: The index of the gpu to run inference on
Returns:
pitch: torch.tensor(shape=(1, int(samples // hopsize)))
periodicity: torch.tensor(shape=(1, int(samples // hopsize)))
"""
```
#### `penn.from_file_to_file`
```
"""Perform pitch and periodicity estimation from audio on disk and save
Args:
file: The audio file
output_prefix: The file to save pitch and periodicity without extension
hopsize: The hopsize in seconds
fmin: The minimum allowable frequency in Hz
fmax: The maximum allowable frequency in Hz
checkpoint: The checkpoint file
batch_size: The number of frames per batch
center: Padding options. One of ['half-window', 'half-hop', 'zero'].
interp_unvoiced_at: Specifies voicing threshold for interpolation
gpu: The index of the gpu to run inference on
"""
```
#### `penn.from_files_to_files`
```
"""Perform pitch and periodicity estimation from files on disk and save
Args:
files: The audio files
output_prefixes: Files to save pitch and periodicity without extension
hopsize: The hopsize in seconds
fmin: The minimum allowable frequency in Hz
fmax: The maximum allowable frequency in Hz
checkpoint: The checkpoint file
batch_size: The number of frames per batch
center: Padding options. One of ['half-window', 'half-hop', 'zero'].
interp_unvoiced_at: Specifies voicing threshold for interpolation
gpu: The index of the gpu to run inference on
"""
```
### Command-line interface
```
python -m penn
--audio_files AUDIO_FILES [AUDIO_FILES ...]
[-h]
[--config CONFIG]
[--output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]]
[--hopsize HOPSIZE]
[--fmin FMIN]
[--fmax FMAX]
[--checkpoint CHECKPOINT]
[--batch_size BATCH_SIZE]
[--center {half-window,half-hop,zero}]
[--interp_unvoiced_at INTERP_UNVOICED_AT]
[--gpu GPU]
required arguments:
--audio_files AUDIO_FILES [AUDIO_FILES ...]
The audio files to process
optional arguments:
-h, --help
show this help message and exit
--config CONFIG
The configuration file. Defaults to using FCNF0++.
--output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]
The files to save pitch and periodicity without extension.
Defaults to audio_files without extensions.
--hopsize HOPSIZE
The hopsize in seconds. Defaults to 0.01 seconds.
--fmin FMIN
The minimum frequency allowed in Hz. Defaults to 31.0 Hz.
--fmax FMAX
The maximum frequency allowed in Hz. Defaults to 1984.0 Hz.
--checkpoint CHECKPOINT
The model checkpoint file. Defaults to ./penn/assets/checkpoints/fcnf0++.pt.
--batch_size BATCH_SIZE
The number of frames per batch. Defaults to 2048.
--center {half-window,half-hop,zero}
Padding options
--interp_unvoiced_at INTERP_UNVOICED_AT
Specifies voicing threshold for interpolation. Defaults to 0.1625.
--gpu GPU
The index of the gpu to perform inference on. Defaults to CPU.
```
## Training
### Download
`python -m penn.data.download`
Downloads and uncompresses the `mdb` and `ptdb` datasets used for training.
### Preprocess
`python -m penn.data.preprocess --config <config>`
Converts each dataset to a common format on disk ready for training. You
can optionally pass a configuration file to override the default configuration.
### Partition
`python -m penn.partition`
Generates `train`, `valid`, and `test` partitions for `mdb` and `ptdb`.
Partitioning is deterministic given the same random seed. You do not need to
run this step, as the original partitions are saved in
`penn/assets/partitions`.
### Train
`python -m penn.train --config <config> --gpu <gpu>`
Trains a model according to a given configuration on the `mdb` and `ptdb`
datasets.
### Monitor
You can monitor training via `tensorboard`.
```
tensorboard --logdir runs/ --port <port> --load_fast true
```
To use the `torchutil` notification system to receive notifications for long
jobs (download, preprocess, train, and evaluate), set the
`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as
explained in [the Apprise documentation](https://pypi.org/project/apprise/).
## Evaluation
### Evaluate
```
python -m penn.evaluate \
--config <config> \
--checkpoint <checkpoint> \
--gpu <gpu>
```
Evaluate a model. `<checkpoint>` is the checkpoint file to evaluate and `<gpu>`
is the GPU index.
### Plot
```
python -m penn.plot.density \
--config <config> \
--true_datasets <true_datasets> \
--inference_datasets <inference_datasets> \
--output_file <output_file> \
--checkpoint <checkpoint> \
--gpu <gpu>
```
Plot the data distribution and inferred distribution for a given dataset and
save to a jpg file.
```
python -m penn.plot.logits \
--config <config> \
--audio_file <audio_file> \
--output_file <output_file> \
--checkpoint <checkpoint> \
--gpu <gpu>
```
Plot the pitch posteriorgram of an audio file and save to a jpg file.
```
python -m penn.plot.thresholds \
--names <names> \
--evaluations <evaluations> \
--output_file <output_file>
```
Plot the periodicity performance (voiced/unvoiced F1) over mdb and ptdb as a
function of the voiced/unvoiced threshold. `names` are the plot labels to give
each evaluation. `evaluations` are the names of the evaluations to plot.
## Citation
### IEEE
M. Morrison, C. Hsieh, N. Pruyne, and B. Pardo, "Cross-domain Neural Pitch and Periodicity Estimation," Submitted to IEEE Transactions on Audio, Speech, and Language Processing, <TODO - month> 2023.
### BibTex
```
@inproceedings{morrison2023cross,
title={Cross-domain Neural Pitch and Periodicity Estimation},
author={Morrison, Max and Hsieh, Caedon and Pruyne, Nathan and Pardo, Bryan},
booktitle={Submitted to IEEE Transactions on Audio, Speech, and Language Processing},
month={TODO},
year={2023}
}
Raw data
{
"_id": null,
"home_page": "https://github.com/interactiveaudiolab/penn",
"name": "penn",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "audio,frequency,music,periodicity,pitch,speech",
"author": "Max Morrison, Caedon Hsieh, Nathan Pruyne, and Bryan Pardo",
"author_email": "interactiveaudiolab@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/13/14/71617999d02462b14485ed597660bdabe1f3f502c260cb0f2e45dc934327/penn-0.0.12.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">Pitch-Estimating Neural Networks (PENN)</h1>\n<div align=\"center\">\n\n[![PyPI](https://img.shields.io/pypi/v/penn.svg)](https://pypi.python.org/pypi/penn)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/penn)](https://pepy.tech/project/penn)\n\n</div>\n\nTraining, evaluation, and inference of neural pitch and periodicity estimators in PyTorch. Includes the original code for the paper [\"Cross-domain Neural Pitch and Periodicity Estimation\"](https://arxiv.org/abs/2301.12258).\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Inference](#inference)\n * [Application programming interface](#application-programming-interface)\n * [`penn.from_audio`](#pennfrom_audio)\n * [`penn.from_file`](#pennfrom_file)\n * [`penn.from_file_to_file`](#pennfrom_file_to_file)\n * [`penn.from_files_to_files`](#pennfrom_files_to_files)\n * [Command-line interface](#command-line-interface)\n- [Training](#training)\n * [Download](#download)\n * [Preprocess](#preprocess)\n * [Partition](#partition)\n * [Train](#train)\n * [Monitor](#monitor)\n- [Evaluation](#evaluation)\n * [Evaluate](#evaluate)\n * [Plot](#plot)\n- [Citation](#citation)\n\n\n## Installation\n\nIf you want to perform pitch estimation using a pretrained FCNF0++ model, run\n`pip install penn`\n\nIf you want to train or use your own models, run\n`pip install penn[train]`\n\n\n## Inference\n\nPerform inference using FCNF0++\n\n```\nimport penn\n\n# Load audio\naudio, sample_rate = torchaudio.load('test/assets/gershwin.wav')\n\n# Here we'll use a 10 millisecond hopsize\nhopsize = .01\n\n# Provide a sensible frequency range given your domain and model\nfmin = 30.\nfmax = 1000.\n\n# Choose a gpu index to use for inference. Set to None to use cpu.\ngpu = 0\n\n# If you are using a gpu, pick a batch size that doesn't cause memory errors\n# on your gpu\nbatch_size = 2048\n\n# Select a checkpoint to use for inference. Selecting None will\n# download and use FCNF0++ pretrained on MDB-stem-synth and PTDB\ncheckpoint = None\n\n# Centers frames at hopsize / 2, 3 * hopsize / 2, 5 * hopsize / 2, ...\ncenter = 'half-hop'\n\n# (Optional) Linearly interpolate unvoiced regions below periodicity threshold\ninterp_unvoiced_at = .065\n\n# Infer pitch and periodicity\npitch, periodicity = penn.from_audio(\n audio,\n sample_rate,\n hopsize=hopsize,\n fmin=fmin,\n fmax=fmax,\n checkpoint=checkpoint,\n batch_size=batch_size,\n center=center,\n interp_unvoiced_at=interp_unvoiced_at,\n gpu=gpu)\n```\n\n\n### Application programming interface\n\n#### `penn.from_audio`\n\n```\n\"\"\"Perform pitch and periodicity estimation\n\nArgs:\n audio: The audio to extract pitch and periodicity from\n sample_rate: The audio sample rate\n hopsize: The hopsize in seconds\n fmin: The minimum allowable frequency in Hz\n fmax: The maximum allowable frequency in Hz\n checkpoint: The checkpoint file\n batch_size: The number of frames per batch\n center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n interp_unvoiced_at: Specifies voicing threshold for interpolation\n gpu: The index of the gpu to run inference on\n\nReturns:\n pitch: torch.tensor(\n shape=(1, int(samples // penn.seconds_to_sample(hopsize))))\n periodicity: torch.tensor(\n shape=(1, int(samples // penn.seconds_to_sample(hopsize))))\n\"\"\"\n```\n\n\n#### `penn.from_file`\n\n```\n\"\"\"Perform pitch and periodicity estimation from audio on disk\n\nArgs:\n file: The audio file\n hopsize: The hopsize in seconds\n fmin: The minimum allowable frequency in Hz\n fmax: The maximum allowable frequency in Hz\n checkpoint: The checkpoint file\n batch_size: The number of frames per batch\n center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n interp_unvoiced_at: Specifies voicing threshold for interpolation\n gpu: The index of the gpu to run inference on\n\nReturns:\n pitch: torch.tensor(shape=(1, int(samples // hopsize)))\n periodicity: torch.tensor(shape=(1, int(samples // hopsize)))\n\"\"\"\n```\n\n\n#### `penn.from_file_to_file`\n\n```\n\"\"\"Perform pitch and periodicity estimation from audio on disk and save\n\nArgs:\n file: The audio file\n output_prefix: The file to save pitch and periodicity without extension\n hopsize: The hopsize in seconds\n fmin: The minimum allowable frequency in Hz\n fmax: The maximum allowable frequency in Hz\n checkpoint: The checkpoint file\n batch_size: The number of frames per batch\n center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n interp_unvoiced_at: Specifies voicing threshold for interpolation\n gpu: The index of the gpu to run inference on\n\"\"\"\n```\n\n\n#### `penn.from_files_to_files`\n\n```\n\"\"\"Perform pitch and periodicity estimation from files on disk and save\n\nArgs:\n files: The audio files\n output_prefixes: Files to save pitch and periodicity without extension\n hopsize: The hopsize in seconds\n fmin: The minimum allowable frequency in Hz\n fmax: The maximum allowable frequency in Hz\n checkpoint: The checkpoint file\n batch_size: The number of frames per batch\n center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n interp_unvoiced_at: Specifies voicing threshold for interpolation\n gpu: The index of the gpu to run inference on\n\"\"\"\n```\n\n\n### Command-line interface\n\n```\npython -m penn\n --audio_files AUDIO_FILES [AUDIO_FILES ...]\n [-h]\n [--config CONFIG]\n [--output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]]\n [--hopsize HOPSIZE]\n [--fmin FMIN]\n [--fmax FMAX]\n [--checkpoint CHECKPOINT]\n [--batch_size BATCH_SIZE]\n [--center {half-window,half-hop,zero}]\n [--interp_unvoiced_at INTERP_UNVOICED_AT]\n [--gpu GPU]\n\nrequired arguments:\n --audio_files AUDIO_FILES [AUDIO_FILES ...]\n The audio files to process\n\noptional arguments:\n -h, --help\n show this help message and exit\n --config CONFIG\n The configuration file. Defaults to using FCNF0++.\n --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]\n The files to save pitch and periodicity without extension.\n Defaults to audio_files without extensions.\n --hopsize HOPSIZE\n The hopsize in seconds. Defaults to 0.01 seconds.\n --fmin FMIN\n The minimum frequency allowed in Hz. Defaults to 31.0 Hz.\n --fmax FMAX\n The maximum frequency allowed in Hz. Defaults to 1984.0 Hz.\n --checkpoint CHECKPOINT\n The model checkpoint file. Defaults to ./penn/assets/checkpoints/fcnf0++.pt.\n --batch_size BATCH_SIZE\n The number of frames per batch. Defaults to 2048.\n --center {half-window,half-hop,zero}\n Padding options\n --interp_unvoiced_at INTERP_UNVOICED_AT\n Specifies voicing threshold for interpolation. Defaults to 0.1625.\n --gpu GPU\n The index of the gpu to perform inference on. Defaults to CPU.\n```\n\n\n## Training\n\n### Download\n\n`python -m penn.data.download`\n\nDownloads and uncompresses the `mdb` and `ptdb` datasets used for training.\n\n\n### Preprocess\n\n`python -m penn.data.preprocess --config <config>`\n\nConverts each dataset to a common format on disk ready for training. You\ncan optionally pass a configuration file to override the default configuration.\n\n\n### Partition\n\n`python -m penn.partition`\n\nGenerates `train`, `valid`, and `test` partitions for `mdb` and `ptdb`.\nPartitioning is deterministic given the same random seed. You do not need to\nrun this step, as the original partitions are saved in\n`penn/assets/partitions`.\n\n\n### Train\n\n`python -m penn.train --config <config> --gpu <gpu>`\n\nTrains a model according to a given configuration on the `mdb` and `ptdb`\ndatasets.\n\n\n### Monitor\n\nYou can monitor training via `tensorboard`.\n\n```\ntensorboard --logdir runs/ --port <port> --load_fast true\n```\n\nTo use the `torchutil` notification system to receive notifications for long\njobs (download, preprocess, train, and evaluate), set the\n`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as\nexplained in [the Apprise documentation](https://pypi.org/project/apprise/).\n\n\n## Evaluation\n\n### Evaluate\n\n```\npython -m penn.evaluate \\\n --config <config> \\\n --checkpoint <checkpoint> \\\n --gpu <gpu>\n```\n\nEvaluate a model. `<checkpoint>` is the checkpoint file to evaluate and `<gpu>`\nis the GPU index.\n\n\n### Plot\n\n```\npython -m penn.plot.density \\\n --config <config> \\\n --true_datasets <true_datasets> \\\n --inference_datasets <inference_datasets> \\\n --output_file <output_file> \\\n --checkpoint <checkpoint> \\\n --gpu <gpu>\n```\n\nPlot the data distribution and inferred distribution for a given dataset and\nsave to a jpg file.\n\n```\npython -m penn.plot.logits \\\n --config <config> \\\n --audio_file <audio_file> \\\n --output_file <output_file> \\\n --checkpoint <checkpoint> \\\n --gpu <gpu>\n```\n\nPlot the pitch posteriorgram of an audio file and save to a jpg file.\n\n```\npython -m penn.plot.thresholds \\\n --names <names> \\\n --evaluations <evaluations> \\\n --output_file <output_file>\n```\n\nPlot the periodicity performance (voiced/unvoiced F1) over mdb and ptdb as a\nfunction of the voiced/unvoiced threshold. `names` are the plot labels to give\neach evaluation. `evaluations` are the names of the evaluations to plot.\n\n\n## Citation\n\n### IEEE\nM. Morrison, C. Hsieh, N. Pruyne, and B. Pardo, \"Cross-domain Neural Pitch and Periodicity Estimation,\" Submitted to IEEE Transactions on Audio, Speech, and Language Processing, <TODO - month> 2023.\n\n\n### BibTex\n\n```\n@inproceedings{morrison2023cross,\n title={Cross-domain Neural Pitch and Periodicity Estimation},\n author={Morrison, Max and Hsieh, Caedon and Pruyne, Nathan and Pardo, Bryan},\n booktitle={Submitted to IEEE Transactions on Audio, Speech, and Language Processing},\n month={TODO},\n year={2023}\n}\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pitch Estimating Neural Networks (PENN)",
"version": "0.0.12",
"project_urls": {
"Homepage": "https://github.com/interactiveaudiolab/penn"
},
"split_keywords": [
"audio",
"frequency",
"music",
"periodicity",
"pitch",
"speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7ccb7f11c7012d83848dbd32cba517c5f41fc50de13a0851f7285ecee38ec58e",
"md5": "2f3e1e8959e2513642b3d425e9bd4bc3",
"sha256": "c068fa794f25f04f15093267baf0a5a3ee917f9fcbeda396a003d46c6f1aa1b8"
},
"downloads": -1,
"filename": "penn-0.0.12-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f3e1e8959e2513642b3d425e9bd4bc3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 61456,
"upload_time": "2023-11-07T17:44:49",
"upload_time_iso_8601": "2023-11-07T17:44:49.582431Z",
"url": "https://files.pythonhosted.org/packages/7c/cb/7f11c7012d83848dbd32cba517c5f41fc50de13a0851f7285ecee38ec58e/penn-0.0.12-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "131471617999d02462b14485ed597660bdabe1f3f502c260cb0f2e45dc934327",
"md5": "de967a78f4589007a5e3367a2fe682e3",
"sha256": "53840523841f5cd783615a505ada9e399c9f969f6f4afb518beb306c6bddeccd"
},
"downloads": -1,
"filename": "penn-0.0.12.tar.gz",
"has_sig": false,
"md5_digest": "de967a78f4589007a5e3367a2fe682e3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 53482,
"upload_time": "2023-11-07T17:44:51",
"upload_time_iso_8601": "2023-11-07T17:44:51.898018Z",
"url": "https://files.pythonhosted.org/packages/13/14/71617999d02462b14485ed597660bdabe1f3f502c260cb0f2e45dc934327/penn-0.0.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-07 17:44:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "interactiveaudiolab",
"github_project": "penn",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "penn"
}