# VQT: Variable Q-Transform
[![PyPI
version](https://badge.fury.io/py/vqt.svg)](https://badge.fury.io/py/vqt)
Contributions are welcome! Feel free to open an issue or a pull request.
### Variable Q-Transform
This is a novel python implementation of the variable Q-transform that was
developed due to the need for a more accurate and flexible VQT for use in
research. It is battle-tested and has been used in a number of research
projects. <br>
- **Accuracy**: The approach is different in that it is a **direct
implementation** of a spectrogram via a Hilbert transformation at each desired
frequency. This results in an exact computation of the spectrogram and is
appropriate for research applications where accuracy is critical. The
implementation seen in `librosa` and `nnAudio` uses recursive downsampling,
which can introduce artifacts in the spectrogram under certain conditions.
- **Flexibility**: The parameters and codebase are less complex than in other
libraries, and the filter bank is fully customizable and exposed to the user.
Built in plotting of the filter bank makes tuning the parameters easy and
intuitive. The main class is a PyTorch Module and the gradient function is
maintained, so backpropagation is possible.
- **Speed**: The backend is written using PyTorch, and allows for GPU
acceleration. It is faster than the `librosa` implementation under most cases.
Though it is typically a bit slower (1X-8X) than the `nnAudio` implementation,
however under some conditions (low hop_length), it is as fast or faster. See
below section 'What to improve on?' for more details on how to speed it up
further.
## Installation
Using `pip`:
```
pip install vqt
```
From source:
```
git clone https://github.com/RichieHakim/vqt.git
cd vqt
pip install -e .
```
**Requirements**: `torch`, `numpy`, `scipy`, `matplotlib`, `tqdm` <br>
These will be installed automatically if you install from PyPI.
### Usage
<img src="docs/media/filter_bank.png" alt="filter_bank" width="300"
align="right" style="margin-left: 10px"/>
```
import vqt
signal = torch.as_tensor(X) ## torch Tensor of shape (n_channels, n_samples)
my_vqt = vqt.VQT(
Fs_sample=1000, ## In Hz
Q_lowF=3, ## In periods per octave
Q_highF=20, ## In periods per octave
F_min=10, ## In Hz
F_max=400, ## In Hz
n_freq_bins=55, ## Number of frequency bins
window_type='hann',
downsample_factor=8, ## Reduce the output sample rate
fft_conv=True, ## Use FFT convolution for speed
plot_pref=False, ## Can show the filter bank
)
spectrograms = my_vqt(signal)
x_axis = my_vqt.get_xAxis(n_samples=signal.shape[1])
frequencies = my_vqt.get_freqs()
```
<img src="docs/media/freqs.png" alt="freqs" width="300" align="right"
style="margin-left: 10px"/>
#### What is the Variable Q-Transform?
The [Variable Q-Transform
(VQT)](https://en.wikipedia.org/wiki/Constant-Q_transform#Variable-Q_bandwidth_calculation)
is a time-frequency analysis tool that generates spectrograms, similar to the
Short-time Fourier Transform (STFT). It can also be defined as a special case of
a wavelet transform (complex Morlet), as well as the generalization of the
[Constant Q-Transform
(CQT)](https://en.wikipedia.org/wiki/Constant-Q_transform). In fact, the VQT
subsumes the CQT and the STFT since both can be recreated using specific
parameters of the VQT. <br>
<br>
In brief, the VQT generates a spectrogram where the frequencies are spaced
logarithmically, and the bandwidth of the filters are tuned using two
parameters: `Q_low` and `Q_high`, where `Q` describes the number of periods of
the oscillatory wavelet at a particular frequency (aka the 'bandwidth'); 'low'
refers to the lowest frequency bin, and 'high' refers to the highest frequency
bin.
#### Why use the VQT?
It provides enough knobs to tune the time-frequency resolution trade-off to suit
your needs. It is especially useful when time resolution is needed at lower
frequencies.
#### How exactly does this implementation differ from others?
<img src="docs/media/freq_response.png" alt="freq_response" width="300"
align="right" style="margin-left: 10px"/>
This function works differently than the VQT from `librosa` or `nnAudio` in that
it does not use the recursive downsampling algorithm from [this
paper](http://academics.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf).
Instead, it computes the power at each frequency using either direct- or
FFT-convolution with a filter bank of complex oscillations, followed by a
Hilbert transform. This results in a **more accurate** computation of the same
spectrogram without any artifacts. The direct computation approach also results
in code that is more flexible, easier to understand, and it has fewer
constraints on the input parameters compared to `librosa` and `nnAudio`.
#### What to improve on?
Contributions are welcome! Feel free to open an issue or a pull request.
- Speed / Memory usage:
- **Lossless approaches**:
- For the `conv1d` approach: I think it would be much faster if we cropped
the filters to remove the blank space from the higher frequency filters.
This would be pretty easy to implement and could give a >10x speedup.
- **Lossy approaches**:
- For the `fft_conv` approach: I believe a large (5-50x) speedup is
possible. The lower frequency filters use only a small portion of the
spectrum, therefore most of the compute is spent multiplying zeros.
- Idea 1: Separate out filters in the filter bank whose spectra are all
zeros above `n_samples_downsampled`, crop the spectra above that level,
then use `ifft` with `n=n_samples_downsampled` to downsample the filter.
This would allow for a much faster convolution. For filters that can't
be cropped, downsampling would have to be done after the iFFT.
- Idea 2: using an efficient sparse or non-uniform FFT. An approach where
only the non-zero frequencies are computed in the `fft`, product, and
`ifft`. There is an implmentation of the NUFFT in PyTorch
[here](https://github.com/mmuckley/torchkbnufft).
- Idea 3: Similar to above, a log-frequency iFFT could be used to allow
for only the non-zero segment of the filter's spectrum to be used in the
convolution.
- Idea 4: Try using the overlap-add method.
- Recursive downsampling: Under many circumstances (like when `Q_high` is
not much greater than `Q_low`), recursive downsampling is fine.
Implementing it would be nice just for completeness ([from this
paper](http://academics.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf))
- For conv1d approach: Use a strided convolution.
- For fftconv approach: Downsample using `n=n_samples_downsampled` in `ifft`
function.
- Non-trivial ideas that theoretically could speed things up:
- An FFT implementation that allows for a reduced set of frequencies to be
computed.
- Flexibility:
- `librosa` parameter mode: It would be nice to have a mode that allows for
the same parameters as `librosa` to be used.
#### Demo:
<img src="docs/media/example_ECG.png" alt="ECG" width="500" align="right"
style="margin-left: 10px"/>
```
import vqt
import numpy as np
import torch
import matplotlib.pyplot as plt
import scipy
data_ecg = torch.as_tensor(scipy.datasets.electrocardiogram()[:10000])
sample_rate = 360
my_vqt = vqt.VQT(
Fs_sample=sample_rate,
Q_lowF=2,
Q_highF=8,
F_min=1,
F_max=120,
n_freq_bins=150,
win_size=1501,
window_type='gaussian',
downsample_factor=8,
padding='same',
fft_conv=True,
take_abs=True,
plot_pref=False,
)
specs = my_vqt(data_ecg)
xaxis = my_vqt.get_xAxis(n_samples=data_ecg.shape[0])
freqs = my_vqt.get_freqs()
fig, axs = plt.subplots(nrows=2, ncols=1, sharex=True, )
axs[0].plot(np.arange(data_ecg.shape[0]) / sample_rate, data_ecg)
axs[0].title.set_text('Electrocardiogram')
axs[1].pcolor(
xaxis / sample_rate,
np.arange(specs[0].shape[0]), specs[0] * (freqs)[:, None],
vmin=0,
vmax=30,
cmap='hot',
)
axs[1].set_yticks(np.arange(specs.numpy()[0].shape[0])[::10], np.round(freqs.numpy()[::10], 1));
axs[1].set_xlim([13, 22])
axs[0].set_ylabel('mV')
axs[1].set_ylabel('frequency (Hz)')
axs[1].set_xlabel('time (s)')
plt.show()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/RichieHakim/vqt",
"name": "vqt",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "neuroscience, neuroimaging, machine learning, deep learning",
"author": "Richard Hakim",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/8f/8e/d90ffcc3950b8ac2766b455e9676eee45ebdc9af7d901ad925859362c102/vqt-0.1.3.tar.gz",
"platform": null,
"description": "# VQT: Variable Q-Transform\n[![PyPI\nversion](https://badge.fury.io/py/vqt.svg)](https://badge.fury.io/py/vqt)\n\nContributions are welcome! Feel free to open an issue or a pull request.\n\n### Variable Q-Transform\n\nThis is a novel python implementation of the variable Q-transform that was\ndeveloped due to the need for a more accurate and flexible VQT for use in\nresearch. It is battle-tested and has been used in a number of research\nprojects. <br>\n- **Accuracy**: The approach is different in that it is a **direct\nimplementation** of a spectrogram via a Hilbert transformation at each desired\nfrequency. This results in an exact computation of the spectrogram and is\nappropriate for research applications where accuracy is critical. The\nimplementation seen in `librosa` and `nnAudio` uses recursive downsampling,\nwhich can introduce artifacts in the spectrogram under certain conditions.\n- **Flexibility**: The parameters and codebase are less complex than in other\nlibraries, and the filter bank is fully customizable and exposed to the user.\nBuilt in plotting of the filter bank makes tuning the parameters easy and\nintuitive. The main class is a PyTorch Module and the gradient function is\nmaintained, so backpropagation is possible.\n- **Speed**: The backend is written using PyTorch, and allows for GPU\nacceleration. It is faster than the `librosa` implementation under most cases.\nThough it is typically a bit slower (1X-8X) than the `nnAudio` implementation,\nhowever under some conditions (low hop_length), it is as fast or faster. See\nbelow section 'What to improve on?' for more details on how to speed it up\nfurther.\n\n\n## Installation\nUsing `pip`: \n```\npip install vqt\n```\n\nFrom source:\n```\ngit clone https://github.com/RichieHakim/vqt.git\ncd vqt\npip install -e .\n```\n\n**Requirements**: `torch`, `numpy`, `scipy`, `matplotlib`, `tqdm` <br>\nThese will be installed automatically if you install from PyPI.\n \n### Usage\n<img src=\"docs/media/filter_bank.png\" alt=\"filter_bank\" width=\"300\"\nalign=\"right\" style=\"margin-left: 10px\"/>\n\n```\nimport vqt\n\nsignal = torch.as_tensor(X) ## torch Tensor of shape (n_channels, n_samples)\n\nmy_vqt = vqt.VQT(\n Fs_sample=1000, ## In Hz\n Q_lowF=3, ## In periods per octave\n Q_highF=20, ## In periods per octave\n F_min=10, ## In Hz\n F_max=400, ## In Hz\n n_freq_bins=55, ## Number of frequency bins\n window_type='hann',\n downsample_factor=8, ## Reduce the output sample rate\n fft_conv=True, ## Use FFT convolution for speed\n plot_pref=False, ## Can show the filter bank\n)\n\nspectrograms = my_vqt(signal)\nx_axis = my_vqt.get_xAxis(n_samples=signal.shape[1])\nfrequencies = my_vqt.get_freqs()\n```\n<img src=\"docs/media/freqs.png\" alt=\"freqs\" width=\"300\" align=\"right\"\nstyle=\"margin-left: 10px\"/>\n\n#### What is the Variable Q-Transform?\n\nThe [Variable Q-Transform\n(VQT)](https://en.wikipedia.org/wiki/Constant-Q_transform#Variable-Q_bandwidth_calculation)\nis a time-frequency analysis tool that generates spectrograms, similar to the\nShort-time Fourier Transform (STFT). It can also be defined as a special case of\na wavelet transform (complex Morlet), as well as the generalization of the\n[Constant Q-Transform\n(CQT)](https://en.wikipedia.org/wiki/Constant-Q_transform). In fact, the VQT\nsubsumes the CQT and the STFT since both can be recreated using specific\nparameters of the VQT. <br>\n<br>\nIn brief, the VQT generates a spectrogram where the frequencies are spaced\nlogarithmically, and the bandwidth of the filters are tuned using two\nparameters: `Q_low` and `Q_high`, where `Q` describes the number of periods of\nthe oscillatory wavelet at a particular frequency (aka the 'bandwidth'); 'low'\nrefers to the lowest frequency bin, and 'high' refers to the highest frequency\nbin.\n\n#### Why use the VQT?\n\nIt provides enough knobs to tune the time-frequency resolution trade-off to suit\nyour needs. It is especially useful when time resolution is needed at lower\nfrequencies.\n\n#### How exactly does this implementation differ from others?\n<img src=\"docs/media/freq_response.png\" alt=\"freq_response\" width=\"300\"\nalign=\"right\" style=\"margin-left: 10px\"/>\n\nThis function works differently than the VQT from `librosa` or `nnAudio` in that\nit does not use the recursive downsampling algorithm from [this\npaper](http://academics.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf).\nInstead, it computes the power at each frequency using either direct- or\nFFT-convolution with a filter bank of complex oscillations, followed by a\nHilbert transform. This results in a **more accurate** computation of the same\nspectrogram without any artifacts. The direct computation approach also results\nin code that is more flexible, easier to understand, and it has fewer\nconstraints on the input parameters compared to `librosa` and `nnAudio`.\n\n#### What to improve on?\nContributions are welcome! Feel free to open an issue or a pull request.\n \n- Speed / Memory usage:\n - **Lossless approaches**:\n - For the `conv1d` approach: I think it would be much faster if we cropped\n the filters to remove the blank space from the higher frequency filters.\n This would be pretty easy to implement and could give a >10x speedup.\n - **Lossy approaches**:\n - For the `fft_conv` approach: I believe a large (5-50x) speedup is\n possible. The lower frequency filters use only a small portion of the\n spectrum, therefore most of the compute is spent multiplying zeros.\n - Idea 1: Separate out filters in the filter bank whose spectra are all\n zeros above `n_samples_downsampled`, crop the spectra above that level,\n then use `ifft` with `n=n_samples_downsampled` to downsample the filter.\n This would allow for a much faster convolution. For filters that can't\n be cropped, downsampling would have to be done after the iFFT.\n - Idea 2: using an efficient sparse or non-uniform FFT. An approach where\n only the non-zero frequencies are computed in the `fft`, product, and\n `ifft`. There is an implmentation of the NUFFT in PyTorch\n [here](https://github.com/mmuckley/torchkbnufft).\n - Idea 3: Similar to above, a log-frequency iFFT could be used to allow\n for only the non-zero segment of the filter's spectrum to be used in the\n convolution.\n - Idea 4: Try using the overlap-add method.\n - Recursive downsampling: Under many circumstances (like when `Q_high` is\n not much greater than `Q_low`), recursive downsampling is fine.\n Implementing it would be nice just for completeness ([from this\n paper](http://academics.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf))\n - For conv1d approach: Use a strided convolution.\n - For fftconv approach: Downsample using `n=n_samples_downsampled` in `ifft`\n function.\n - Non-trivial ideas that theoretically could speed things up:\n - An FFT implementation that allows for a reduced set of frequencies to be\n computed.\n- Flexibility:\n - `librosa` parameter mode: It would be nice to have a mode that allows for\n the same parameters as `librosa` to be used.\n\n#### Demo:\n<img src=\"docs/media/example_ECG.png\" alt=\"ECG\" width=\"500\" align=\"right\"\nstyle=\"margin-left: 10px\"/>\n\n```\nimport vqt\nimport numpy as np\nimport torch\nimport matplotlib.pyplot as plt\nimport scipy\n\ndata_ecg = torch.as_tensor(scipy.datasets.electrocardiogram()[:10000])\nsample_rate = 360\n\nmy_vqt = vqt.VQT(\n Fs_sample=sample_rate,\n Q_lowF=2,\n Q_highF=8,\n F_min=1,\n F_max=120,\n n_freq_bins=150,\n win_size=1501,\n window_type='gaussian',\n downsample_factor=8,\n padding='same',\n fft_conv=True,\n take_abs=True,\n plot_pref=False,\n)\n\nspecs = my_vqt(data_ecg)\nxaxis = my_vqt.get_xAxis(n_samples=data_ecg.shape[0])\nfreqs = my_vqt.get_freqs()\n\nfig, axs = plt.subplots(nrows=2, ncols=1, sharex=True, )\naxs[0].plot(np.arange(data_ecg.shape[0]) / sample_rate, data_ecg)\naxs[0].title.set_text('Electrocardiogram')\naxs[1].pcolor(\n xaxis / sample_rate, \n np.arange(specs[0].shape[0]), specs[0] * (freqs)[:, None], \n vmin=0, \n vmax=30,\n cmap='hot',\n)\naxs[1].set_yticks(np.arange(specs.numpy()[0].shape[0])[::10], np.round(freqs.numpy()[::10], 1));\naxs[1].set_xlim([13, 22])\naxs[0].set_ylabel('mV')\naxs[1].set_ylabel('frequency (Hz)')\naxs[1].set_xlabel('time (s)')\nplt.show()\n```\n",
"bugtrack_url": null,
"license": "LICENSE",
"summary": "Variable Q-Transform with PyTorch backend",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://github.com/RichieHakim/vqt"
},
"split_keywords": [
"neuroscience",
" neuroimaging",
" machine learning",
" deep learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "23195599edac1c2105c7118dcaa3a2147e00e8585368c3a804d646aebec5f534",
"md5": "17b5a6c545502d52ce58b8fccb288c2d",
"sha256": "2171c0093bf7a3fa39a6fe3b7e46aa200dd603aa6c4d279fa2346e5bfa695397"
},
"downloads": -1,
"filename": "vqt-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "17b5a6c545502d52ce58b8fccb288c2d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 27111,
"upload_time": "2024-04-01T19:32:46",
"upload_time_iso_8601": "2024-04-01T19:32:46.760221Z",
"url": "https://files.pythonhosted.org/packages/23/19/5599edac1c2105c7118dcaa3a2147e00e8585368c3a804d646aebec5f534/vqt-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8f8ed90ffcc3950b8ac2766b455e9676eee45ebdc9af7d901ad925859362c102",
"md5": "900e762dcd92be12fd6c8c3514cade74",
"sha256": "9a7d72116ba843137dbe0ebe655187aad2032da5ec89a00a56d40379d5a5b2fb"
},
"downloads": -1,
"filename": "vqt-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "900e762dcd92be12fd6c8c3514cade74",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 33041,
"upload_time": "2024-04-01T19:32:48",
"upload_time_iso_8601": "2024-04-01T19:32:48.400686Z",
"url": "https://files.pythonhosted.org/packages/8f/8e/d90ffcc3950b8ac2766b455e9676eee45ebdc9af7d901ad925859362c102/vqt-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-01 19:32:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RichieHakim",
"github_project": "vqt",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "vqt"
}