torchcrepe


Nametorchcrepe JSON
Version 0.0.22 PyPI version JSON
download
home_pagehttps://github.com/maxrmorrison/torchcrepe
SummaryPytorch implementation of CREPE pitch tracker
upload_time2023-10-09 20:54:19
maintainer
docs_urlNone
authorMax Morrison
requires_python
licenseMIT
keywords pitch audio speech music pytorch crepe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">torchcrepe</h1>
<div align="center">

[![PyPI](https://img.shields.io/pypi/v/torchcrepe.svg)](https://pypi.python.org/pypi/torchcrepe)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/torchcrepe)](https://pepy.tech/project/torchcrepe)

</div>

Pytorch implementation of the CREPE [1] pitch tracker. The original Tensorflow
implementation can be found [here](https://github.com/marl/crepe/). The
provided model weights were obtained by converting the "tiny" and "full" models
using [MMdnn](https://github.com/microsoft/MMdnn), an open-source model
management framework.


## Installation
Perform the system-dependent PyTorch install using the instructions found
[here](https://pytorch.org/).

`pip install torchcrepe`


## Usage

### Computing pitch and periodicity from audio


```python
import torchcrepe


# Load audio
audio, sr = torchcrepe.load.audio( ... )

# Here we'll use a 5 millisecond hop length
hop_length = int(sr / 200.)

# Provide a sensible frequency range for your domain (upper limit is 2006 Hz)
# This would be a reasonable range for speech
fmin = 50
fmax = 550

# Select a model capacity--one of "tiny" or "full"
model = 'tiny'

# Choose a device to use for inference
device = 'cuda:0'

# Pick a batch size that doesn't cause memory errors on your gpu
batch_size = 2048

# Compute pitch using first gpu
pitch = torchcrepe.predict(audio,
                           sr,
                           hop_length,
                           fmin,
                           fmax,
                           model,
                           batch_size=batch_size,
                           device=device)
```

A periodicity metric similar to the Crepe confidence score can also be
extracted by passing `return_periodicity=True` to `torchcrepe.predict`.


### Decoding

By default, `torchcrepe` uses Viterbi decoding on the softmax of the network
output. This is different than the original implementation, which uses a
weighted average near the argmax of binary cross-entropy probabilities.
The argmax operation can cause double/half frequency errors. These can be
removed by penalizing large pitch jumps via Viterbi decoding. The `decode`
submodule provides some options for decoding.

```python
# Decode using viterbi decoding (default)
torchcrepe.predict(..., decoder=torchcrepe.decode.viterbi)

# Decode using weighted argmax (as in the original implementation)
torchcrepe.predict(..., decoder=torchcrepe.decode.weighted_argmax)

# Decode using argmax
torchcrepe.predict(..., decoder=torchcrepe.decode.argmax)
```


### Filtering and thresholding

When periodicity is low, the pitch is less reliable. For some problems, it
makes sense to mask these less reliable pitch values. However, the periodicity
can be noisy and the pitch has quantization artifacts. `torchcrepe` provides
submodules `filter` and `threshold` for this purpose. The filter and threshold
parameters should be tuned to your data. For clean speech, a 10-20 millisecond
window with a threshold of 0.21 has worked.

```python
# We'll use a 15 millisecond window assuming a hop length of 5 milliseconds
win_length = 3

# Median filter noisy confidence value
periodicity = torchcrepe.filter.median(periodicity, win_length)

# Remove inharmonic regions
pitch = torchcrepe.threshold.At(.21)(pitch, periodicity)

# Optionally smooth pitch to remove quantization artifacts
pitch = torchcrepe.filter.mean(pitch, win_length)
```

For more fine-grained control over pitch thresholding, see
`torchcrepe.threshold.Hysteresis`. This is especially useful for removing
spurious voiced regions caused by noise in the periodicity values, but
has more parameters and may require more manual tuning to your data.

CREPE was not trained on silent audio. Therefore, it sometimes assigns high
confidence to pitch bins in silent regions. You can use
`torchcrepe.threshold.Silence` to manually set the periodicity in silent
regions to zero.

```python
periodicity = torchcrepe.threshold.Silence(-60.)(periodicity,
                                                 audio,
                                                 sr,
                                                 hop_length)
```


### Computing the CREPE model output activations

```python
batch = next(torchcrepe.preprocess(audio, sr, hop_length))
probabilities = torchcrepe.infer(batch)
```


### Computing the CREPE embedding space

As in Differentiable Digital Signal Processing [2], this uses the output of the
fifth max-pooling layer as a pretrained pitch embedding

```python
embeddings = torchcrepe.embed(audio, sr, hop_length)
```

### Computing from files

`torchcrepe` defines the following functions convenient for predicting
directly from audio files on disk. Each of these functions also takes
a `device` argument that can be used for device placement (e.g.,
`device='cuda:0'`).

```python
torchcrepe.predict_from_file(audio_file, ...)
torchcrepe.predict_from_file_to_file(
    audio_file, output_pitch_file, output_periodicity_file, ...)
torchcrepe.predict_from_files_to_files(
    audio_files, output_pitch_files, output_periodicity_files, ...)

torchcrepe.embed_from_file(audio_file, ...)
torchcrepe.embed_from_file_to_file(audio_file, output_file, ...)
torchcrepe.embed_from_files_to_files(audio_files, output_files, ...)
```

### Command-line interface

```bash
usage: python -m torchcrepe
    [-h]
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
    --output_files OUTPUT_FILES [OUTPUT_FILES ...]
    [--hop_length HOP_LENGTH]
    [--output_periodicity_files OUTPUT_PERIODICITY_FILES [OUTPUT_PERIODICITY_FILES ...]]
    [--embed]
    [--fmin FMIN]
    [--fmax FMAX]
    [--model MODEL]
    [--decoder DECODER]
    [--gpu GPU]
    [--no_pad]

optional arguments:
  -h, --help            show this help message and exit
  --audio_files AUDIO_FILES [AUDIO_FILES ...]
                        The audio file to process
  --output_files OUTPUT_FILES [OUTPUT_FILES ...]
                        The file to save pitch or embedding
  --hop_length HOP_LENGTH
                        The hop length of the analysis window
  --output_periodicity_files OUTPUT_PERIODICITY_FILES [OUTPUT_PERIODICITY_FILES ...]
                        The file to save periodicity
  --embed               Performs embedding instead of pitch prediction
  --fmin FMIN           The minimum frequency allowed
  --fmax FMAX           The maximum frequency allowed
  --model MODEL         The model capacity. One of "tiny" or "full"
  --decoder DECODER     The decoder to use. One of "argmax", "viterbi", or
                        "weighted_argmax"
  --gpu GPU             The gpu to perform inference on
  --no_pad              Whether to pad the audio
```


## Tests

The module tests can be run as follows.

```bash
pip install pytest
pytest
```


## References
[1] J. W. Kim, J. Salamon, P. Li, and J. P. Bello, “Crepe: A
Convolutional Representation for Pitch Estimation,” in 2018 IEEE
International Conference on Acoustics, Speech and Signal
Processing (ICASSP).

[2] J. H. Engel, L. Hantrakul, C. Gu, and A. Roberts,
“DDSP: Differentiable Digital Signal Processing,” in
2020 International Conference on Learning
Representations (ICLR).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maxrmorrison/torchcrepe",
    "name": "torchcrepe",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "pitch,audio,speech,music,pytorch,crepe",
    "author": "Max Morrison",
    "author_email": "maxrmorrison@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/68/17/7a4e2ee7f566771c3a623ff1e5765a4179cc01f305a57570188f8c4cd13c/torchcrepe-0.0.22.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">torchcrepe</h1>\n<div align=\"center\">\n\n[![PyPI](https://img.shields.io/pypi/v/torchcrepe.svg)](https://pypi.python.org/pypi/torchcrepe)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/torchcrepe)](https://pepy.tech/project/torchcrepe)\n\n</div>\n\nPytorch implementation of the CREPE [1] pitch tracker. The original Tensorflow\nimplementation can be found [here](https://github.com/marl/crepe/). The\nprovided model weights were obtained by converting the \"tiny\" and \"full\" models\nusing [MMdnn](https://github.com/microsoft/MMdnn), an open-source model\nmanagement framework.\n\n\n## Installation\nPerform the system-dependent PyTorch install using the instructions found\n[here](https://pytorch.org/).\n\n`pip install torchcrepe`\n\n\n## Usage\n\n### Computing pitch and periodicity from audio\n\n\n```python\nimport torchcrepe\n\n\n# Load audio\naudio, sr = torchcrepe.load.audio( ... )\n\n# Here we'll use a 5 millisecond hop length\nhop_length = int(sr / 200.)\n\n# Provide a sensible frequency range for your domain (upper limit is 2006 Hz)\n# This would be a reasonable range for speech\nfmin = 50\nfmax = 550\n\n# Select a model capacity--one of \"tiny\" or \"full\"\nmodel = 'tiny'\n\n# Choose a device to use for inference\ndevice = 'cuda:0'\n\n# Pick a batch size that doesn't cause memory errors on your gpu\nbatch_size = 2048\n\n# Compute pitch using first gpu\npitch = torchcrepe.predict(audio,\n                           sr,\n                           hop_length,\n                           fmin,\n                           fmax,\n                           model,\n                           batch_size=batch_size,\n                           device=device)\n```\n\nA periodicity metric similar to the Crepe confidence score can also be\nextracted by passing `return_periodicity=True` to `torchcrepe.predict`.\n\n\n### Decoding\n\nBy default, `torchcrepe` uses Viterbi decoding on the softmax of the network\noutput. This is different than the original implementation, which uses a\nweighted average near the argmax of binary cross-entropy probabilities.\nThe argmax operation can cause double/half frequency errors. These can be\nremoved by penalizing large pitch jumps via Viterbi decoding. The `decode`\nsubmodule provides some options for decoding.\n\n```python\n# Decode using viterbi decoding (default)\ntorchcrepe.predict(..., decoder=torchcrepe.decode.viterbi)\n\n# Decode using weighted argmax (as in the original implementation)\ntorchcrepe.predict(..., decoder=torchcrepe.decode.weighted_argmax)\n\n# Decode using argmax\ntorchcrepe.predict(..., decoder=torchcrepe.decode.argmax)\n```\n\n\n### Filtering and thresholding\n\nWhen periodicity is low, the pitch is less reliable. For some problems, it\nmakes sense to mask these less reliable pitch values. However, the periodicity\ncan be noisy and the pitch has quantization artifacts. `torchcrepe` provides\nsubmodules `filter` and `threshold` for this purpose. The filter and threshold\nparameters should be tuned to your data. For clean speech, a 10-20 millisecond\nwindow with a threshold of 0.21 has worked.\n\n```python\n# We'll use a 15 millisecond window assuming a hop length of 5 milliseconds\nwin_length = 3\n\n# Median filter noisy confidence value\nperiodicity = torchcrepe.filter.median(periodicity, win_length)\n\n# Remove inharmonic regions\npitch = torchcrepe.threshold.At(.21)(pitch, periodicity)\n\n# Optionally smooth pitch to remove quantization artifacts\npitch = torchcrepe.filter.mean(pitch, win_length)\n```\n\nFor more fine-grained control over pitch thresholding, see\n`torchcrepe.threshold.Hysteresis`. This is especially useful for removing\nspurious voiced regions caused by noise in the periodicity values, but\nhas more parameters and may require more manual tuning to your data.\n\nCREPE was not trained on silent audio. Therefore, it sometimes assigns high\nconfidence to pitch bins in silent regions. You can use\n`torchcrepe.threshold.Silence` to manually set the periodicity in silent\nregions to zero.\n\n```python\nperiodicity = torchcrepe.threshold.Silence(-60.)(periodicity,\n                                                 audio,\n                                                 sr,\n                                                 hop_length)\n```\n\n\n### Computing the CREPE model output activations\n\n```python\nbatch = next(torchcrepe.preprocess(audio, sr, hop_length))\nprobabilities = torchcrepe.infer(batch)\n```\n\n\n### Computing the CREPE embedding space\n\nAs in Differentiable Digital Signal Processing [2], this uses the output of the\nfifth max-pooling layer as a pretrained pitch embedding\n\n```python\nembeddings = torchcrepe.embed(audio, sr, hop_length)\n```\n\n### Computing from files\n\n`torchcrepe` defines the following functions convenient for predicting\ndirectly from audio files on disk. Each of these functions also takes\na `device` argument that can be used for device placement (e.g.,\n`device='cuda:0'`).\n\n```python\ntorchcrepe.predict_from_file(audio_file, ...)\ntorchcrepe.predict_from_file_to_file(\n    audio_file, output_pitch_file, output_periodicity_file, ...)\ntorchcrepe.predict_from_files_to_files(\n    audio_files, output_pitch_files, output_periodicity_files, ...)\n\ntorchcrepe.embed_from_file(audio_file, ...)\ntorchcrepe.embed_from_file_to_file(audio_file, output_file, ...)\ntorchcrepe.embed_from_files_to_files(audio_files, output_files, ...)\n```\n\n### Command-line interface\n\n```bash\nusage: python -m torchcrepe\n    [-h]\n    --audio_files AUDIO_FILES [AUDIO_FILES ...]\n    --output_files OUTPUT_FILES [OUTPUT_FILES ...]\n    [--hop_length HOP_LENGTH]\n    [--output_periodicity_files OUTPUT_PERIODICITY_FILES [OUTPUT_PERIODICITY_FILES ...]]\n    [--embed]\n    [--fmin FMIN]\n    [--fmax FMAX]\n    [--model MODEL]\n    [--decoder DECODER]\n    [--gpu GPU]\n    [--no_pad]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --audio_files AUDIO_FILES [AUDIO_FILES ...]\n                        The audio file to process\n  --output_files OUTPUT_FILES [OUTPUT_FILES ...]\n                        The file to save pitch or embedding\n  --hop_length HOP_LENGTH\n                        The hop length of the analysis window\n  --output_periodicity_files OUTPUT_PERIODICITY_FILES [OUTPUT_PERIODICITY_FILES ...]\n                        The file to save periodicity\n  --embed               Performs embedding instead of pitch prediction\n  --fmin FMIN           The minimum frequency allowed\n  --fmax FMAX           The maximum frequency allowed\n  --model MODEL         The model capacity. One of \"tiny\" or \"full\"\n  --decoder DECODER     The decoder to use. One of \"argmax\", \"viterbi\", or\n                        \"weighted_argmax\"\n  --gpu GPU             The gpu to perform inference on\n  --no_pad              Whether to pad the audio\n```\n\n\n## Tests\n\nThe module tests can be run as follows.\n\n```bash\npip install pytest\npytest\n```\n\n\n## References\n[1] J. W. Kim, J. Salamon, P. Li, and J. P. Bello, \u201cCrepe: A\nConvolutional Representation for Pitch Estimation,\u201d in 2018 IEEE\nInternational Conference on Acoustics, Speech and Signal\nProcessing (ICASSP).\n\n[2] J. H. Engel, L. Hantrakul, C. Gu, and A. Roberts,\n\u201cDDSP: Differentiable Digital Signal Processing,\u201d in\n2020 International Conference on Learning\nRepresentations (ICLR).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pytorch implementation of CREPE pitch tracker",
    "version": "0.0.22",
    "project_urls": {
        "Homepage": "https://github.com/maxrmorrison/torchcrepe"
    },
    "split_keywords": [
        "pitch",
        "audio",
        "speech",
        "music",
        "pytorch",
        "crepe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "517a687bc17e2c3e240520df954932c06f2819244bb8868088d523b3a029bc8f",
                "md5": "9831a254841a369c5f07bee5308e84f4",
                "sha256": "5279c76c202cd7968e7d9a7ee2a5af9451b1a4338ed8e4ad385e357ad355fcff"
            },
            "downloads": -1,
            "filename": "torchcrepe-0.0.22-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9831a254841a369c5f07bee5308e84f4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 72326248,
            "upload_time": "2023-10-09T20:54:14",
            "upload_time_iso_8601": "2023-10-09T20:54:14.046814Z",
            "url": "https://files.pythonhosted.org/packages/51/7a/687bc17e2c3e240520df954932c06f2819244bb8868088d523b3a029bc8f/torchcrepe-0.0.22-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "68177a4e2ee7f566771c3a623ff1e5765a4179cc01f305a57570188f8c4cd13c",
                "md5": "07b126d7922311e9942d7d972669c6d0",
                "sha256": "80afa152af0fce02c6d13071996b8c5aa3ded7110ef600b41150f267d54d10a2"
            },
            "downloads": -1,
            "filename": "torchcrepe-0.0.22.tar.gz",
            "has_sig": false,
            "md5_digest": "07b126d7922311e9942d7d972669c6d0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 72329882,
            "upload_time": "2023-10-09T20:54:19",
            "upload_time_iso_8601": "2023-10-09T20:54:19.412484Z",
            "url": "https://files.pythonhosted.org/packages/68/17/7a4e2ee7f566771c3a623ff1e5765a4179cc01f305a57570188f8c4cd13c/torchcrepe-0.0.22.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-09 20:54:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maxrmorrison",
    "github_project": "torchcrepe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "torchcrepe"
}
        
Elapsed time: 0.13078s