diffsptk

Name	diffsptk JSON
Version	3.3.1 JSON
	download
home_page	None
Summary	Speech signal processing modules for machine learning
upload_time	2025-08-09 14:42:57
maintainer	None
docs_url	None
author	SPTK Working Group
requires_python	>=3.10
license	Apache 2.0
keywords	dsp pytorch signal processing sptk
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # diffsptk

*diffsptk* is a differentiable version of [SPTK](https://github.com/sp-nitech/SPTK) based on the PyTorch framework.

[![Manual](https://img.shields.io/badge/docs-stable-blue.svg)](https://sp-nitech.github.io/diffsptk/3.3.1/)
[![Downloads](https://static.pepy.tech/badge/diffsptk)](https://pepy.tech/project/diffsptk)
[![ClickPy](https://img.shields.io/badge/downloads-clickpy-yellow.svg)](https://clickpy.clickhouse.com/dashboard/diffsptk)
[![Python Version](https://img.shields.io/pypi/pyversions/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)
[![PyTorch Version](https://img.shields.io/badge/pytorch-2.3.1%20%7C%202.8.0-orange.svg)](https://pypi.python.org/pypi/diffsptk)
[![PyPI Version](https://img.shields.io/pypi/v/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)
[![Codecov](https://codecov.io/gh/sp-nitech/diffsptk/branch/master/graph/badge.svg)](https://app.codecov.io/gh/sp-nitech/diffsptk)
[![License](https://img.shields.io/github/license/sp-nitech/diffsptk.svg)](https://github.com/sp-nitech/diffsptk/blob/master/LICENSE)
[![GitHub Actions](https://github.com/sp-nitech/diffsptk/workflows/package/badge.svg)](https://github.com/sp-nitech/diffsptk/actions)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

## Requirements

- Python 3.10+
- PyTorch 2.3.1+

## Documentation

- See [this page](https://sp-nitech.github.io/diffsptk/3.3.1/) for the reference manual.
- Our [paper](https://www.isca-speech.org/archive/ssw_2023/yoshimura23_ssw.html) is available on the ISCA Archive.

## Installation

The latest stable release can be installed through PyPI by running

```sh
pip install diffsptk
```

The development release can be installed from the master branch:

```sh
pip install git+https://github.com/sp-nitech/diffsptk.git@master
```

## Examples

### Running on a GPU

```python
import diffsptk

stft_params = {"frame_length": 400, "frame_period": 80, "fft_length": 512}

# Read waveform.
x, sr = diffsptk.read("assets/data.wav", device="cuda")

# Compute spectrogram using a nn.Module class.
X1 = diffsptk.STFT(**stft_params, device="cuda")(x)

# Compute spectrogram using a functional method.
X2 = diffsptk.functional.stft(x, **stft_params)

print(X1.allclose(X2))
```

### Mel-cepstral analysis and synthesis

```python
import diffsptk

fl = 400     # Frame length.
fp = 80      # Frame period.
n_fft = 512  # FFT length.
M = 24       # Mel-cepstrum dimensions.

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Compute STFT amplitude of x.
stft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)
X = stft(x)

# Estimate mel-cepstrum of x.
alpha = diffsptk.get_alpha(sr)
mcep = diffsptk.MelCepstralAnalysis(
    fft_length=n_fft,
    cep_order=M,
    alpha=alpha,
    n_iter=10,
)
mc = mcep(X)

# Reconstruct x.
mlsa = diffsptk.MLSA(filter_order=M, frame_period=fp, alpha=alpha, taylor_order=20)
x_hat = mlsa(mlsa(x, -mc), mc)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)

# Extract pitch of x.
pitch = diffsptk.Pitch(
    frame_period=fp,
    sample_rate=sr,
    f_min=80,
    f_max=180,
    voicing_threshold=0.4,
    out_format="pitch",
)
p = pitch(x)

# Generate excitation signal.
excite = diffsptk.ExcitationGeneration(frame_period=fp)
e = excite(p)
n = diffsptk.nrand(x.size(0) - 1)

# Synthesize waveform.
x_voiced = mlsa(e, mc)
x_unvoiced = mlsa(n, mc)

# Output analysis-synthesis result.
diffsptk.write("voiced.wav", x_voiced, sr)
diffsptk.write("unvoiced.wav", x_unvoiced, sr)
```

### WORLD analysis and synthesis

```python
import diffsptk

fp = 80       # Frame period.
n_fft = 1024  # FFT length.

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Extract F0 of x, or prepare well-estimated F0.
pitch = diffsptk.Pitch(
    frame_period=fp,
    sample_rate=sr,
    f_min=80,
    f_max=180,
    voicing_threshold=0.4,
    out_format="f0",
)
f0 = pitch(x)

# Extract aperiodicity of x by D4C.
ap = diffsptk.Aperiodicity(
    frame_period=fp,
    sample_rate=sr,
    fft_length=n_fft,
    algorithm="d4c",
    out_format="a",
)
A = ap(x, f0)

# Extract spectral envelope of x by CheapTrick.
pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(
    frame_period=fp,
    sample_rate=sr,
    fft_length=n_fft,
    algorithm="cheap-trick",
    out_format="power",
)
S = pitch_spec(x, f0)

# Reconstruct x.
world_synth = diffsptk.WorldSynthesis(
    frame_period=fp,
    sample_rate=sr,
    fft_length=n_fft,
)
x_hat = world_synth(f0, A, S)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### LPC analysis and synthesis

```python
import diffsptk

fl = 400  # Frame length.
fp = 80   # Frame period.
M = 24    # LPC dimensions.

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Estimate LPC of x.
frame = diffsptk.Frame(frame_length=fl, frame_period=fp)
window = diffsptk.Window(in_length=fl)
lpc = diffsptk.LPC(frame_length=fl, lpc_order=M, eps=1e-5)
a = lpc(window(frame(x)))

# Convert to inverse filter coefficients.
norm0 = diffsptk.AllPoleToAllZeroDigitalFilterCoefficients(filter_order=M)
b = norm0(a)

# Reconstruct x.
zerodf = diffsptk.AllZeroDigitalFilter(filter_order=M, frame_period=fp)
poledf = diffsptk.AllPoleDigitalFilter(filter_order=M, frame_period=fp)
x_hat = poledf(zerodf(x, b), a)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Mel-spectrogram analysis and synthesis

```python
import diffsptk

fl = 400         # Frame length.
fp = 80          # Frame period.
n_fft = 512      # FFT length.
n_channel = 128  # Number of channels.

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Compute STFT amplitude of x.
stft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)
X = stft(x)

# Extract log mel-spectrogram.
fbank = diffsptk.FBANK(
    fft_length=n_fft,
    n_channel=n_channel,
    sample_rate=sr,
)
Y = fbank(X)

# Reconstruct linear spectrogram.
ifbank = diffsptk.IFBANK(
    n_channel=n_channel,
    fft_length=n_fft,
    sample_rate=sr,
)
X_hat = ifbank(Y)

# Reconstruct x.
griffin = diffsptk.GriffinLim(
    frame_length=fl,
    frame_period=fp,
    fft_length=n_fft,
)
x_hat = griffin(X_hat, out_length=x.size(0))

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Subband decomposition

```python
import diffsptk

K = 4   # Number of subbands.
M = 40  # Order of filter.

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Decompose x.
pqmf = diffsptk.PQMF(K, M)
decimate = diffsptk.Decimation(K)
y = decimate(pqmf(x))

# Reconstruct x.
interpolate = diffsptk.Interpolation(K)
ipqmf = diffsptk.IPQMF(K, M)
x_hat = ipqmf(interpolate(K * y)).reshape(-1)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Gammatone filter bank analysis and synthesis

```python
import diffsptk

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Decompose x.
gammatone = diffsptk.GammatoneFilterBankAnalysis(sr)
y = gammatone(x)

# Reconstruct x.
igammatone = diffsptk.GammatoneFilterBankSynthesis(sr)
x_hat = igammatone(y).reshape(-1)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Fractional octave band analysis and synthesis

```python
import diffsptk

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Decompose x.
oband = diffsptk.FractionalOctaveBandAnalysis(sr)
y = oband(x)

# Reconstruct x.
x_hat = y.sum(1).reshape(-1)

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Constant-Q transform

```python
import diffsptk
import librosa  # This is to get sample audio.

fp = 128  # Frame period.
K = 252   # Number of CQ-bins.
B = 36    # Number of bins per octave.

# Read waveform.
x, sr = diffsptk.read(librosa.ex("trumpet"))

# Transform x.
cqt = diffsptk.CQT(fp, sr, n_bin=K, n_bin_per_octave=B)
c = cqt(x)

# Reconstruct x.
icqt = diffsptk.ICQT(fp, sr, n_bin=K, n_bin_per_octave=B)
x_hat = icqt(c, out_length=x.size(0))

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Modified discrete cosine transform

```python
import diffsptk

fl = 512  # Frame length.

# Read waveform.
x, sr = diffsptk.read("assets/data.wav")

# Transform x.
mdct = diffsptk.MDCT(fl)
c = mdct(x)

# Reconstruct x.
imdct = diffsptk.IMDCT(fl)
x_hat = imdct(c, out_length=x.size(0))

# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

### Vector quantization

```python
import diffsptk

K = 2  # Codebook size.
M = 4  # Order of vector.

# Prepare input.
x = diffsptk.nrand(M)

# Quantize x.
vq = diffsptk.VectorQuantization(M, K)
x_hat, indices, commitment_loss = vq(x)

# Compute error.
error = (x_hat - x).abs().sum()
print(error)
```

## License

This software is released under the Apache License 2.0.

## Citation

```bibtex
@InProceedings{sp-nitech2023sptk,
  author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
  title = {{SPTK4}: An open-source software toolkit for speech signal processing},
  booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
  pages = {211--217},
  year = {2023},
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "diffsptk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Takenori Yoshimura <takenori@sp.nitech.ac.jp>",
    "keywords": "dsp, pytorch, signal processing, sptk",
    "author": "SPTK Working Group",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/99/6d/f6940df7f588a939925877a32cd71e0a9996922a9438a1cb63021597509c/diffsptk-3.3.1.tar.gz",
    "platform": null,
    "description": "# diffsptk\n\n*diffsptk* is a differentiable version of [SPTK](https://github.com/sp-nitech/SPTK) based on the PyTorch framework.\n\n[![Manual](https://img.shields.io/badge/docs-stable-blue.svg)](https://sp-nitech.github.io/diffsptk/3.3.1/)\n[![Downloads](https://static.pepy.tech/badge/diffsptk)](https://pepy.tech/project/diffsptk)\n[![ClickPy](https://img.shields.io/badge/downloads-clickpy-yellow.svg)](https://clickpy.clickhouse.com/dashboard/diffsptk)\n[![Python Version](https://img.shields.io/pypi/pyversions/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)\n[![PyTorch Version](https://img.shields.io/badge/pytorch-2.3.1%20%7C%202.8.0-orange.svg)](https://pypi.python.org/pypi/diffsptk)\n[![PyPI Version](https://img.shields.io/pypi/v/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)\n[![Codecov](https://codecov.io/gh/sp-nitech/diffsptk/branch/master/graph/badge.svg)](https://app.codecov.io/gh/sp-nitech/diffsptk)\n[![License](https://img.shields.io/github/license/sp-nitech/diffsptk.svg)](https://github.com/sp-nitech/diffsptk/blob/master/LICENSE)\n[![GitHub Actions](https://github.com/sp-nitech/diffsptk/workflows/package/badge.svg)](https://github.com/sp-nitech/diffsptk/actions)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\n## Requirements\n\n- Python 3.10+\n- PyTorch 2.3.1+\n\n## Documentation\n\n- See [this page](https://sp-nitech.github.io/diffsptk/3.3.1/) for the reference manual.\n- Our [paper](https://www.isca-speech.org/archive/ssw_2023/yoshimura23_ssw.html) is available on the ISCA Archive.\n\n## Installation\n\nThe latest stable release can be installed through PyPI by running\n\n```sh\npip install diffsptk\n```\n\nThe development release can be installed from the master branch:\n\n```sh\npip install git+https://github.com/sp-nitech/diffsptk.git@master\n```\n\n## Examples\n\n### Running on a GPU\n\n```python\nimport diffsptk\n\nstft_params = {\"frame_length\": 400, \"frame_period\": 80, \"fft_length\": 512}\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\", device=\"cuda\")\n\n# Compute spectrogram using a nn.Module class.\nX1 = diffsptk.STFT(**stft_params, device=\"cuda\")(x)\n\n# Compute spectrogram using a functional method.\nX2 = diffsptk.functional.stft(x, **stft_params)\n\nprint(X1.allclose(X2))\n```\n\n### Mel-cepstral analysis and synthesis\n\n```python\nimport diffsptk\n\nfl = 400     # Frame length.\nfp = 80      # Frame period.\nn_fft = 512  # FFT length.\nM = 24       # Mel-cepstrum dimensions.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Compute STFT amplitude of x.\nstft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)\nX = stft(x)\n\n# Estimate mel-cepstrum of x.\nalpha = diffsptk.get_alpha(sr)\nmcep = diffsptk.MelCepstralAnalysis(\n    fft_length=n_fft,\n    cep_order=M,\n    alpha=alpha,\n    n_iter=10,\n)\nmc = mcep(X)\n\n# Reconstruct x.\nmlsa = diffsptk.MLSA(filter_order=M, frame_period=fp, alpha=alpha, taylor_order=20)\nx_hat = mlsa(mlsa(x, -mc), mc)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n\n# Extract pitch of x.\npitch = diffsptk.Pitch(\n    frame_period=fp,\n    sample_rate=sr,\n    f_min=80,\n    f_max=180,\n    voicing_threshold=0.4,\n    out_format=\"pitch\",\n)\np = pitch(x)\n\n# Generate excitation signal.\nexcite = diffsptk.ExcitationGeneration(frame_period=fp)\ne = excite(p)\nn = diffsptk.nrand(x.size(0) - 1)\n\n# Synthesize waveform.\nx_voiced = mlsa(e, mc)\nx_unvoiced = mlsa(n, mc)\n\n# Output analysis-synthesis result.\ndiffsptk.write(\"voiced.wav\", x_voiced, sr)\ndiffsptk.write(\"unvoiced.wav\", x_unvoiced, sr)\n```\n\n### WORLD analysis and synthesis\n\n```python\nimport diffsptk\n\nfp = 80       # Frame period.\nn_fft = 1024  # FFT length.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Extract F0 of x, or prepare well-estimated F0.\npitch = diffsptk.Pitch(\n    frame_period=fp,\n    sample_rate=sr,\n    f_min=80,\n    f_max=180,\n    voicing_threshold=0.4,\n    out_format=\"f0\",\n)\nf0 = pitch(x)\n\n# Extract aperiodicity of x by D4C.\nap = diffsptk.Aperiodicity(\n    frame_period=fp,\n    sample_rate=sr,\n    fft_length=n_fft,\n    algorithm=\"d4c\",\n    out_format=\"a\",\n)\nA = ap(x, f0)\n\n# Extract spectral envelope of x by CheapTrick.\npitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(\n    frame_period=fp,\n    sample_rate=sr,\n    fft_length=n_fft,\n    algorithm=\"cheap-trick\",\n    out_format=\"power\",\n)\nS = pitch_spec(x, f0)\n\n# Reconstruct x.\nworld_synth = diffsptk.WorldSynthesis(\n    frame_period=fp,\n    sample_rate=sr,\n    fft_length=n_fft,\n)\nx_hat = world_synth(f0, A, S)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### LPC analysis and synthesis\n\n```python\nimport diffsptk\n\nfl = 400  # Frame length.\nfp = 80   # Frame period.\nM = 24    # LPC dimensions.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Estimate LPC of x.\nframe = diffsptk.Frame(frame_length=fl, frame_period=fp)\nwindow = diffsptk.Window(in_length=fl)\nlpc = diffsptk.LPC(frame_length=fl, lpc_order=M, eps=1e-5)\na = lpc(window(frame(x)))\n\n# Convert to inverse filter coefficients.\nnorm0 = diffsptk.AllPoleToAllZeroDigitalFilterCoefficients(filter_order=M)\nb = norm0(a)\n\n# Reconstruct x.\nzerodf = diffsptk.AllZeroDigitalFilter(filter_order=M, frame_period=fp)\npoledf = diffsptk.AllPoleDigitalFilter(filter_order=M, frame_period=fp)\nx_hat = poledf(zerodf(x, b), a)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Mel-spectrogram analysis and synthesis\n\n```python\nimport diffsptk\n\nfl = 400         # Frame length.\nfp = 80          # Frame period.\nn_fft = 512      # FFT length.\nn_channel = 128  # Number of channels.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Compute STFT amplitude of x.\nstft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)\nX = stft(x)\n\n# Extract log mel-spectrogram.\nfbank = diffsptk.FBANK(\n    fft_length=n_fft,\n    n_channel=n_channel,\n    sample_rate=sr,\n)\nY = fbank(X)\n\n# Reconstruct linear spectrogram.\nifbank = diffsptk.IFBANK(\n    n_channel=n_channel,\n    fft_length=n_fft,\n    sample_rate=sr,\n)\nX_hat = ifbank(Y)\n\n# Reconstruct x.\ngriffin = diffsptk.GriffinLim(\n    frame_length=fl,\n    frame_period=fp,\n    fft_length=n_fft,\n)\nx_hat = griffin(X_hat, out_length=x.size(0))\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Subband decomposition\n\n```python\nimport diffsptk\n\nK = 4   # Number of subbands.\nM = 40  # Order of filter.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Decompose x.\npqmf = diffsptk.PQMF(K, M)\ndecimate = diffsptk.Decimation(K)\ny = decimate(pqmf(x))\n\n# Reconstruct x.\ninterpolate = diffsptk.Interpolation(K)\nipqmf = diffsptk.IPQMF(K, M)\nx_hat = ipqmf(interpolate(K * y)).reshape(-1)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Gammatone filter bank analysis and synthesis\n\n```python\nimport diffsptk\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Decompose x.\ngammatone = diffsptk.GammatoneFilterBankAnalysis(sr)\ny = gammatone(x)\n\n# Reconstruct x.\nigammatone = diffsptk.GammatoneFilterBankSynthesis(sr)\nx_hat = igammatone(y).reshape(-1)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Fractional octave band analysis and synthesis\n\n```python\nimport diffsptk\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Decompose x.\noband = diffsptk.FractionalOctaveBandAnalysis(sr)\ny = oband(x)\n\n# Reconstruct x.\nx_hat = y.sum(1).reshape(-1)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Constant-Q transform\n\n```python\nimport diffsptk\nimport librosa  # This is to get sample audio.\n\nfp = 128  # Frame period.\nK = 252   # Number of CQ-bins.\nB = 36    # Number of bins per octave.\n\n# Read waveform.\nx, sr = diffsptk.read(librosa.ex(\"trumpet\"))\n\n# Transform x.\ncqt = diffsptk.CQT(fp, sr, n_bin=K, n_bin_per_octave=B)\nc = cqt(x)\n\n# Reconstruct x.\nicqt = diffsptk.ICQT(fp, sr, n_bin=K, n_bin_per_octave=B)\nx_hat = icqt(c, out_length=x.size(0))\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Modified discrete cosine transform\n\n```python\nimport diffsptk\n\nfl = 512  # Frame length.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Transform x.\nmdct = diffsptk.MDCT(fl)\nc = mdct(x)\n\n# Reconstruct x.\nimdct = diffsptk.IMDCT(fl)\nx_hat = imdct(c, out_length=x.size(0))\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Vector quantization\n\n```python\nimport diffsptk\n\nK = 2  # Codebook size.\nM = 4  # Order of vector.\n\n# Prepare input.\nx = diffsptk.nrand(M)\n\n# Quantize x.\nvq = diffsptk.VectorQuantization(M, K)\nx_hat, indices, commitment_loss = vq(x)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n## License\n\nThis software is released under the Apache License 2.0.\n\n## Citation\n\n```bibtex\n@InProceedings{sp-nitech2023sptk,\n  author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},\n  title = {{SPTK4}: An open-source software toolkit for speech signal processing},\n  booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},\n  pages = {211--217},\n  year = {2023},\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Speech signal processing modules for machine learning",
    "version": "3.3.1",
    "project_urls": {
        "Documentation": "https://sp-nitech.github.io/diffsptk/latest/",
        "Homepage": "https://sp-tk.sourceforge.net/",
        "Source": "https://github.com/sp-nitech/diffsptk"
    },
    "split_keywords": [
        "dsp",
        " pytorch",
        " signal processing",
        " sptk"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5d729f44396e5352ea5138b6a7e9a540334941db2e37dc71795fd37180df460e",
                "md5": "c2bc13e49f53d6e3532079ca717eda60",
                "sha256": "67cb31ed0b52e37321cc8d6f6b442037b6a25519ea2cde9152f8550f115b036a"
            },
            "downloads": -1,
            "filename": "diffsptk-3.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c2bc13e49f53d6e3532079ca717eda60",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 298641,
            "upload_time": "2025-08-09T14:42:50",
            "upload_time_iso_8601": "2025-08-09T14:42:50.612694Z",
            "url": "https://files.pythonhosted.org/packages/5d/72/9f44396e5352ea5138b6a7e9a540334941db2e37dc71795fd37180df460e/diffsptk-3.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "996df6940df7f588a939925877a32cd71e0a9996922a9438a1cb63021597509c",
                "md5": "e4390a6af320f4757bf3b9ad46edf305",
                "sha256": "abcbbfde5aef0ea1585b09ba55db569e9d016d57bf72e5ddf1b4c34938de23ca"
            },
            "downloads": -1,
            "filename": "diffsptk-3.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e4390a6af320f4757bf3b9ad46edf305",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 151878,
            "upload_time": "2025-08-09T14:42:57",
            "upload_time_iso_8601": "2025-08-09T14:42:57.008991Z",
            "url": "https://files.pythonhosted.org/packages/99/6d/f6940df7f588a939925877a32cd71e0a9996922a9438a1cb63021597509c/diffsptk-3.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-09 14:42:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sp-nitech",
    "github_project": "diffsptk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "diffsptk"
}

SPTK Working Group