Name | qfc JSON |
Version |
0.3.6
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2024-08-07 16:36:54 |
maintainer | None |
docs_url | None |
author | Jeremy Magland |
requires_python | <4.0,>=3.8 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# QFC - Quantized Fourier Compression of Timeseries Data with Application to Electrophysiology
## Overview
With the increasing sizes of data for extracellular electrophysiology, it is crucial to develop efficient methods for compressing multi-channel time series data. While lossless methods are desirable for perfectly preserving the original signal, the compression ratios for these methods usually range only from 2-4x. What is needed are ratios on the order of 10-30x, leading us to consider lossy methods.
Here, we implement a simple lossy compression method, inspired by the Discrete Cosine Transform (DCT) and the quantization steps of JPEG compression for images. The method comprises the following steps:
* Compute the Discrete Fourier Transform (DFT) of the time series data in the time domain.
* Quantize the Fourier coefficients to achieve a target entropy (the entropy determines the theoretically achievable compression ratio). This is done by multiplying by a normalization factor and then rounding to the nearest integer.
* Compress the reduced-entropy quantized Fourier coefficients using zlib or zstd (other methods could be used instead).
To decompress:
* Decompress the quantized Fourier coefficients.
* Divide by the normalization factor.
* Compute the Inverse Discrete Fourier Transform (IDFT) to obtain the reconstructed time series data.
This method is particularly well-suited for data that has been bandpass-filtered, as the suppressed Fourier coefficients yield an especially low entropy of the quantized signal.
For a comparison of various lossy and lossless compression schemes, see [Compression strategies for large-scale electrophysiology data, Buccino et al.](https://www.biorxiv.org/content/10.1101/2023.05.22.541700v2.full.pdf).
## Installation
```bash
pip install qfc
```
## Example usage
```python
# See examples/example1.py
from matplotlib import pyplot as plt
import numpy as np
from qfc import qfc_estimate_quant_scale_factor
from qfc.codecs import QFCCodec
def main():
sampling_frequency = 30000
duration = 2
num_channels = 10
num_samples = int(sampling_frequency * duration)
y = np.random.randn(num_samples, num_channels) * 50
y = lowpass_filter(y, sampling_frequency, 6000)
y = np.ascontiguousarray(y) # compressor requires C-order arrays
y = y.astype(np.int16)
target_residual_stdev = 5
############################################################
quant_scale_factor = qfc_estimate_quant_scale_factor(
y,
target_residual_stdev=target_residual_stdev
)
codec = QFCCodec(
quant_scale_factor=quant_scale_factor,
dtype="int16",
segment_length=10000,
compression_method="zstd",
zstd_level=3
)
compressed_bytes = codec.encode(y)
y_reconstructed = codec.decode(compressed_bytes)
############################################################
y_resid = y - y_reconstructed
original_size = y.nbytes
compressed_size = len(compressed_bytes)
compression_ratio = original_size / compressed_size
print(f"Original size: {original_size} bytes")
print(f"Compressed size: {compressed_size} bytes")
print(f"Actual compression ratio: {compression_ratio}")
print(f'Target residual std. dev.: {target_residual_stdev:.2f}')
print(f'Actual Std. dev. of residual: {np.std(y_resid):.2f}')
xgrid = np.arange(y.shape[0]) / sampling_frequency
ch = 3 # select a channel to plot
n = 1000 # number of samples to plot
plt.figure()
plt.plot(xgrid[:n], y[:n, ch], label="Original")
plt.plot(xgrid[:n], y_reconstructed[:n, ch], label="Decompressed")
plt.plot(xgrid[:n], y_resid[:n, ch], label="Residual")
plt.xlabel("Time")
plt.title(f'QFC compression ratio: {compression_ratio:.2f}')
plt.legend()
plt.show()
def lowpass_filter(input_array, sampling_frequency, cutoff_frequency):
F = np.fft.fft(input_array, axis=0)
N = input_array.shape[0]
freqs = np.fft.fftfreq(N, d=1 / sampling_frequency)
sigma = cutoff_frequency / 3
window = np.exp(-np.square(freqs) / (2 * sigma**2))
F_filtered = F * window[:, None]
filtered_array = np.fft.ifft(F_filtered, axis=0)
return np.real(filtered_array)
if __name__ == "__main__":
main()
```
## Zarr example
See [examples/zarr_example.py](./examples/zarr_example.py)
## Benchmarks
I have put together some preliminary systematic benchmarks on real and synthetic data. See [./benchmarks](./benchmarks) and [./benchmarks/results](./benchmarks/results).
As can be seen:
- Quantizing in the Fourier domain (QFC) is a lot better than quantizing in the time domain (call it QTC) for real data or for bandpass-filtered data.
- The compression ratio is a lot better for bandpass-filtered data compared with unfiltered raw.
- For the lossless part of the method, zstd is better than zlib, both in terms of all three of these factors: compression ratio, compression speed, and decompression speed.
- Obviously, the compression ratio is going to depend heavily on the target residual std. dev.
## License
This code is provided under the Apache License, Version 2.0.
## Author
Jeremy Magland, Center for Computational Mathematics, Flatiron Institute
Raw data
{
"_id": null,
"home_page": null,
"name": "qfc",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Jeremy Magland",
"author_email": "jmagland@flatironinstitute.org",
"download_url": "https://files.pythonhosted.org/packages/65/37/ea0fea0cab9312d9a4e033cccd7b525684034c00d64d080ba6b1c6c24347/qfc-0.3.6.tar.gz",
"platform": null,
"description": "# QFC - Quantized Fourier Compression of Timeseries Data with Application to Electrophysiology\n\n## Overview\n\nWith the increasing sizes of data for extracellular electrophysiology, it is crucial to develop efficient methods for compressing multi-channel time series data. While lossless methods are desirable for perfectly preserving the original signal, the compression ratios for these methods usually range only from 2-4x. What is needed are ratios on the order of 10-30x, leading us to consider lossy methods.\n\nHere, we implement a simple lossy compression method, inspired by the Discrete Cosine Transform (DCT) and the quantization steps of JPEG compression for images. The method comprises the following steps:\n* Compute the Discrete Fourier Transform (DFT) of the time series data in the time domain.\n* Quantize the Fourier coefficients to achieve a target entropy (the entropy determines the theoretically achievable compression ratio). This is done by multiplying by a normalization factor and then rounding to the nearest integer.\n* Compress the reduced-entropy quantized Fourier coefficients using zlib or zstd (other methods could be used instead).\n\nTo decompress:\n* Decompress the quantized Fourier coefficients.\n* Divide by the normalization factor.\n* Compute the Inverse Discrete Fourier Transform (IDFT) to obtain the reconstructed time series data.\n\nThis method is particularly well-suited for data that has been bandpass-filtered, as the suppressed Fourier coefficients yield an especially low entropy of the quantized signal.\n\nFor a comparison of various lossy and lossless compression schemes, see [Compression strategies for large-scale electrophysiology data, Buccino et al.](https://www.biorxiv.org/content/10.1101/2023.05.22.541700v2.full.pdf).\n\n## Installation\n\n```bash\npip install qfc\n```\n\n## Example usage\n\n```python\n# See examples/example1.py\n\nfrom matplotlib import pyplot as plt\nimport numpy as np\nfrom qfc import qfc_estimate_quant_scale_factor\nfrom qfc.codecs import QFCCodec\n\n\ndef main():\n sampling_frequency = 30000\n duration = 2\n num_channels = 10\n num_samples = int(sampling_frequency * duration)\n y = np.random.randn(num_samples, num_channels) * 50\n y = lowpass_filter(y, sampling_frequency, 6000)\n y = np.ascontiguousarray(y) # compressor requires C-order arrays\n y = y.astype(np.int16)\n target_residual_stdev = 5\n\n ############################################################\n quant_scale_factor = qfc_estimate_quant_scale_factor(\n y,\n target_residual_stdev=target_residual_stdev\n )\n codec = QFCCodec(\n quant_scale_factor=quant_scale_factor,\n dtype=\"int16\",\n segment_length=10000,\n compression_method=\"zstd\",\n zstd_level=3\n )\n compressed_bytes = codec.encode(y)\n y_reconstructed = codec.decode(compressed_bytes)\n ############################################################\n\n y_resid = y - y_reconstructed\n original_size = y.nbytes\n compressed_size = len(compressed_bytes)\n compression_ratio = original_size / compressed_size\n print(f\"Original size: {original_size} bytes\")\n print(f\"Compressed size: {compressed_size} bytes\")\n print(f\"Actual compression ratio: {compression_ratio}\")\n print(f'Target residual std. dev.: {target_residual_stdev:.2f}')\n print(f'Actual Std. dev. of residual: {np.std(y_resid):.2f}')\n\n xgrid = np.arange(y.shape[0]) / sampling_frequency\n ch = 3 # select a channel to plot\n n = 1000 # number of samples to plot\n plt.figure()\n plt.plot(xgrid[:n], y[:n, ch], label=\"Original\")\n plt.plot(xgrid[:n], y_reconstructed[:n, ch], label=\"Decompressed\")\n plt.plot(xgrid[:n], y_resid[:n, ch], label=\"Residual\")\n plt.xlabel(\"Time\")\n plt.title(f'QFC compression ratio: {compression_ratio:.2f}')\n plt.legend()\n plt.show()\n\n\ndef lowpass_filter(input_array, sampling_frequency, cutoff_frequency):\n F = np.fft.fft(input_array, axis=0)\n N = input_array.shape[0]\n freqs = np.fft.fftfreq(N, d=1 / sampling_frequency)\n sigma = cutoff_frequency / 3\n window = np.exp(-np.square(freqs) / (2 * sigma**2))\n F_filtered = F * window[:, None]\n filtered_array = np.fft.ifft(F_filtered, axis=0)\n return np.real(filtered_array)\n\n\nif __name__ == \"__main__\":\n main()\n```\n\n## Zarr example\n\nSee [examples/zarr_example.py](./examples/zarr_example.py)\n\n## Benchmarks\n\nI have put together some preliminary systematic benchmarks on real and synthetic data. See [./benchmarks](./benchmarks) and [./benchmarks/results](./benchmarks/results).\n\nAs can be seen:\n- Quantizing in the Fourier domain (QFC) is a lot better than quantizing in the time domain (call it QTC) for real data or for bandpass-filtered data.\n- The compression ratio is a lot better for bandpass-filtered data compared with unfiltered raw.\n- For the lossless part of the method, zstd is better than zlib, both in terms of all three of these factors: compression ratio, compression speed, and decompression speed.\n- Obviously, the compression ratio is going to depend heavily on the target residual std. dev.\n\n## License\n\nThis code is provided under the Apache License, Version 2.0.\n\n\n## Author\n\nJeremy Magland, Center for Computational Mathematics, Flatiron Institute\n",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.3.6",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1ce8baa11c86f21450b2080c9ed011cd8fafc47a1ec054345eea5d8d4accc71d",
"md5": "875b1a26a1e1b3ffd62fb1b22bef6ee2",
"sha256": "8dc896892bc31badb74dea6cf4958c6f7439b4f6373f13f99bc1da41bdb4a041"
},
"downloads": -1,
"filename": "qfc-0.3.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "875b1a26a1e1b3ffd62fb1b22bef6ee2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 17869,
"upload_time": "2024-08-07T16:36:52",
"upload_time_iso_8601": "2024-08-07T16:36:52.666770Z",
"url": "https://files.pythonhosted.org/packages/1c/e8/baa11c86f21450b2080c9ed011cd8fafc47a1ec054345eea5d8d4accc71d/qfc-0.3.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6537ea0fea0cab9312d9a4e033cccd7b525684034c00d64d080ba6b1c6c24347",
"md5": "43153bbc7fca05e084f9771981f13a7d",
"sha256": "78ca115b9f208a68de9451c851889061f5147efa25f48ae6fbf8d541dc75ce47"
},
"downloads": -1,
"filename": "qfc-0.3.6.tar.gz",
"has_sig": false,
"md5_digest": "43153bbc7fca05e084f9771981f13a7d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 12084,
"upload_time": "2024-08-07T16:36:54",
"upload_time_iso_8601": "2024-08-07T16:36:54.264657Z",
"url": "https://files.pythonhosted.org/packages/65/37/ea0fea0cab9312d9a4e033cccd7b525684034c00d64d080ba6b1c6c24347/qfc-0.3.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-07 16:36:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "qfc"
}