diffsptk


Namediffsptk JSON
Version 1.2.1 PyPI version JSON
download
home_page
SummarySpeech signal processing modules for machine learning
upload_time2024-02-05 05:51:00
maintainer
docs_urlNone
authorSPTK Working Group
requires_python>=3.8
licenseApache 2.0
keywords dsp pytorch signal processing sptk
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            diffsptk
========
*diffsptk* is a differentiable version of [SPTK](https://github.com/sp-nitech/SPTK) based on the PyTorch framework.

[![Latest Manual](https://img.shields.io/badge/docs-latest-blue.svg)](https://sp-nitech.github.io/diffsptk/latest/)
[![Stable Manual](https://img.shields.io/badge/docs-stable-blue.svg)](https://sp-nitech.github.io/diffsptk/1.2.1/)
[![Downloads](https://static.pepy.tech/badge/diffsptk)](https://pepy.tech/project/diffsptk)
[![Python Version](https://img.shields.io/pypi/pyversions/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)
[![PyTorch Version](https://img.shields.io/badge/pytorch-1.11.0%20%7C%202.2.0-orange.svg)](https://pypi.python.org/pypi/diffsptk)
[![PyPI Version](https://img.shields.io/pypi/v/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)
[![Codecov](https://codecov.io/gh/sp-nitech/diffsptk/branch/master/graph/badge.svg)](https://app.codecov.io/gh/sp-nitech/diffsptk)
[![License](https://img.shields.io/github/license/sp-nitech/diffsptk.svg)](https://github.com/sp-nitech/diffsptk/blob/master/LICENSE)
[![GitHub Actions](https://github.com/sp-nitech/diffsptk/workflows/package/badge.svg)](https://github.com/sp-nitech/diffsptk/actions)
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)


Requirements
------------
- Python 3.8+
- PyTorch 1.11.0+


Documentation
-------------
- See [this page](https://sp-nitech.github.io/diffsptk/latest/) for a reference manual.
- Our [paper](https://www.isca-speech.org/archive/ssw_2023/yoshimura23_ssw.html) is available on the ISCA Archive.


Installation
------------
The latest stable release can be installed through PyPI by running
```sh
pip install diffsptk
```
The development release can be installed from the master branch:
```sh
pip install git+https://github.com/sp-nitech/diffsptk.git@master
```


Examples
--------
### Mel-cepstral analysis and synthesis
```python
import diffsptk

# Set analysis condition.
fl = 400
fp = 80
n_fft = 512
M = 24

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Compute STFT amplitude of x.
stft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)
X = stft(x)

# Estimate mel-cepstrum of x.
alpha = diffsptk.get_alpha(sr)
mcep = diffsptk.MelCepstralAnalysis(cep_order=M, fft_length=n_fft, alpha=alpha, n_iter=10)
mc = mcep(X)

# Reconstruct x.
mlsa = diffsptk.MLSA(filter_order=M, frame_period=fp, alpha=alpha, taylor_order=30)
x_hat = mlsa(mlsa(x, -mc), mc)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)

# Extract pitch of x.
pitch = diffsptk.Pitch(frame_period=fp, sample_rate=sr, f_min=80, f_max=180)
p = pitch(x)

# Generate excitation signal.
excite = diffsptk.ExcitationGeneration(frame_period=fp)
e = excite(p)
n = diffsptk.nrand(x.size(0) - 1)

# Synthesize waveform.
x_voiced = mlsa(e, mc)
x_unvoiced = mlsa(n, mc)

# Output analysis-synthesis result.
diffsptk.write("voiced.wav", x_voiced, sr)
diffsptk.write("unvoiced.wav", x_unvoiced, sr)
```

### Mel-spectrogram, MFCC, and PLP extraction
```python
import diffsptk

# Set analysis condition.
fl = 400
fp = 80
n_fft = 512
n_channel = 80
M = 12

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Compute STFT amplitude of x.
stft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)
X = stft(x)

# Extract mel-spectrogram.
fbank = diffsptk.MelFilterBankAnalysis(
    n_channel=n_channel,
    fft_length=n_fft,
    sample_rate=sr,
)
Y = fbank(X)
print(Y.shape)

# Extract MFCC.
mfcc = diffsptk.MFCC(
    mfcc_order=M,
    n_channel=n_channel,
    fft_length=n_fft,
    sample_rate=sr,
)
Y = mfcc(X)
print(Y.shape)

# Extract PLP.
plp = diffsptk.PLP(
    plp_order=M,
    n_channel=n_channel,
    fft_length=n_fft,
    sample_rate=sr,
)
Y = plp(X)
print(Y.shape)
```

### Subband decomposition
```python
import diffsptk

K = 4   # Number of subbands.
M = 40  # Order of filter.

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Decompose x.
pqmf = diffsptk.PQMF(K, M)
decimate = diffsptk.Decimation(K)
y = decimate(pqmf(x), dim=-1)

# Reconstruct x.
interpolate = diffsptk.Interpolation(K)
ipqmf = diffsptk.IPQMF(K, M)
x_hat = ipqmf(interpolate(K * y, dim=-1)).reshape(-1)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Vector quantization
```python
import diffsptk

K = 2  # Codebook size.
M = 4  # Order of vector.

# Prepare input.
x = diffsptk.nrand(M)

# Quantize x.
vq = diffsptk.VectorQuantization(M, K)
x_hat, indices, commitment_loss = vq(x)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```


License
-------
This software is released under the Apache License 2.0.


Reference
---------
```bibtex
@InProceedings{sp-nitech2023sptk,
  author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
  title = {{SPTK4}: An open-source software toolkit for speech signal processing},
  booktitle = {12th ISCASpeech Synthesis Workshop (SSW 2023)},
  pages = {211--217},
  year = {2023},
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "diffsptk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Takenori Yoshimura <takenori@sp.nitech.ac.jp>",
    "keywords": "dsp,pytorch,signal processing,sptk",
    "author": "SPTK Working Group",
    "author_email": "",
    "download_url": "",
    "platform": null,
    "description": "diffsptk\n========\n*diffsptk* is a differentiable version of [SPTK](https://github.com/sp-nitech/SPTK) based on the PyTorch framework.\n\n[![Latest Manual](https://img.shields.io/badge/docs-latest-blue.svg)](https://sp-nitech.github.io/diffsptk/latest/)\n[![Stable Manual](https://img.shields.io/badge/docs-stable-blue.svg)](https://sp-nitech.github.io/diffsptk/1.2.1/)\n[![Downloads](https://static.pepy.tech/badge/diffsptk)](https://pepy.tech/project/diffsptk)\n[![Python Version](https://img.shields.io/pypi/pyversions/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)\n[![PyTorch Version](https://img.shields.io/badge/pytorch-1.11.0%20%7C%202.2.0-orange.svg)](https://pypi.python.org/pypi/diffsptk)\n[![PyPI Version](https://img.shields.io/pypi/v/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)\n[![Codecov](https://codecov.io/gh/sp-nitech/diffsptk/branch/master/graph/badge.svg)](https://app.codecov.io/gh/sp-nitech/diffsptk)\n[![License](https://img.shields.io/github/license/sp-nitech/diffsptk.svg)](https://github.com/sp-nitech/diffsptk/blob/master/LICENSE)\n[![GitHub Actions](https://github.com/sp-nitech/diffsptk/workflows/package/badge.svg)](https://github.com/sp-nitech/diffsptk/actions)\n[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n\nRequirements\n------------\n- Python 3.8+\n- PyTorch 1.11.0+\n\n\nDocumentation\n-------------\n- See [this page](https://sp-nitech.github.io/diffsptk/latest/) for a reference manual.\n- Our [paper](https://www.isca-speech.org/archive/ssw_2023/yoshimura23_ssw.html) is available on the ISCA Archive.\n\n\nInstallation\n------------\nThe latest stable release can be installed through PyPI by running\n```sh\npip install diffsptk\n```\nThe development release can be installed from the master branch:\n```sh\npip install git+https://github.com/sp-nitech/diffsptk.git@master\n```\n\n\nExamples\n--------\n### Mel-cepstral analysis and synthesis\n```python\nimport diffsptk\n\n# Set analysis condition.\nfl = 400\nfp = 80\nn_fft = 512\nM = 24\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Compute STFT amplitude of x.\nstft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)\nX = stft(x)\n\n# Estimate mel-cepstrum of x.\nalpha = diffsptk.get_alpha(sr)\nmcep = diffsptk.MelCepstralAnalysis(cep_order=M, fft_length=n_fft, alpha=alpha, n_iter=10)\nmc = mcep(X)\n\n# Reconstruct x.\nmlsa = diffsptk.MLSA(filter_order=M, frame_period=fp, alpha=alpha, taylor_order=30)\nx_hat = mlsa(mlsa(x, -mc), mc)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n\n# Extract pitch of x.\npitch = diffsptk.Pitch(frame_period=fp, sample_rate=sr, f_min=80, f_max=180)\np = pitch(x)\n\n# Generate excitation signal.\nexcite = diffsptk.ExcitationGeneration(frame_period=fp)\ne = excite(p)\nn = diffsptk.nrand(x.size(0) - 1)\n\n# Synthesize waveform.\nx_voiced = mlsa(e, mc)\nx_unvoiced = mlsa(n, mc)\n\n# Output analysis-synthesis result.\ndiffsptk.write(\"voiced.wav\", x_voiced, sr)\ndiffsptk.write(\"unvoiced.wav\", x_unvoiced, sr)\n```\n\n### Mel-spectrogram, MFCC, and PLP extraction\n```python\nimport diffsptk\n\n# Set analysis condition.\nfl = 400\nfp = 80\nn_fft = 512\nn_channel = 80\nM = 12\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Compute STFT amplitude of x.\nstft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)\nX = stft(x)\n\n# Extract mel-spectrogram.\nfbank = diffsptk.MelFilterBankAnalysis(\n    n_channel=n_channel,\n    fft_length=n_fft,\n    sample_rate=sr,\n)\nY = fbank(X)\nprint(Y.shape)\n\n# Extract MFCC.\nmfcc = diffsptk.MFCC(\n    mfcc_order=M,\n    n_channel=n_channel,\n    fft_length=n_fft,\n    sample_rate=sr,\n)\nY = mfcc(X)\nprint(Y.shape)\n\n# Extract PLP.\nplp = diffsptk.PLP(\n    plp_order=M,\n    n_channel=n_channel,\n    fft_length=n_fft,\n    sample_rate=sr,\n)\nY = plp(X)\nprint(Y.shape)\n```\n\n### Subband decomposition\n```python\nimport diffsptk\n\nK = 4   # Number of subbands.\nM = 40  # Order of filter.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Decompose x.\npqmf = diffsptk.PQMF(K, M)\ndecimate = diffsptk.Decimation(K)\ny = decimate(pqmf(x), dim=-1)\n\n# Reconstruct x.\ninterpolate = diffsptk.Interpolation(K)\nipqmf = diffsptk.IPQMF(K, M)\nx_hat = ipqmf(interpolate(K * y, dim=-1)).reshape(-1)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Vector quantization\n```python\nimport diffsptk\n\nK = 2  # Codebook size.\nM = 4  # Order of vector.\n\n# Prepare input.\nx = diffsptk.nrand(M)\n\n# Quantize x.\nvq = diffsptk.VectorQuantization(M, K)\nx_hat, indices, commitment_loss = vq(x)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n\nLicense\n-------\nThis software is released under the Apache License 2.0.\n\n\nReference\n---------\n```bibtex\n@InProceedings{sp-nitech2023sptk,\n  author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},\n  title = {{SPTK4}: An open-source software toolkit for speech signal processing},\n  booktitle = {12th ISCASpeech Synthesis Workshop (SSW 2023)},\n  pages = {211--217},\n  year = {2023},\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Speech signal processing modules for machine learning",
    "version": "1.2.1",
    "project_urls": {
        "Documentation": "https://sp-nitech.github.io/diffsptk/latest/",
        "Homepage": "https://sp-tk.sourceforge.net/",
        "Source": "https://github.com/sp-nitech/diffsptk"
    },
    "split_keywords": [
        "dsp",
        "pytorch",
        "signal processing",
        "sptk"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "468879510d9e487064e2d30c6b4d4447603e363698644756e26a5b2293bf0f7e",
                "md5": "1256cb0ebf7c83d67220e2b22d15cc53",
                "sha256": "8897ce2334383373d6fc7df9f4420961d2ee27c49ee01a839f77555373575d03"
            },
            "downloads": -1,
            "filename": "diffsptk-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1256cb0ebf7c83d67220e2b22d15cc53",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 149332,
            "upload_time": "2024-02-05T05:51:00",
            "upload_time_iso_8601": "2024-02-05T05:51:00.662789Z",
            "url": "https://files.pythonhosted.org/packages/46/88/79510d9e487064e2d30c6b4d4447603e363698644756e26a5b2293bf0f7e/diffsptk-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-05 05:51:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sp-nitech",
    "github_project": "diffsptk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "diffsptk"
}
        
Elapsed time: 0.18053s