qfc

Name	qfc JSON
Version	0.3.6 JSON
	download
home_page	None
Summary	None
upload_time	2024-08-07 16:36:54
maintainer	None
docs_url	None
author	Jeremy Magland
requires_python	<4.0,>=3.8
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # QFC - Quantized Fourier Compression of Timeseries Data with Application to Electrophysiology

## Overview

With the increasing sizes of data for extracellular electrophysiology, it is crucial to develop efficient methods for compressing multi-channel time series data. While lossless methods are desirable for perfectly preserving the original signal, the compression ratios for these methods usually range only from 2-4x. What is needed are ratios on the order of 10-30x, leading us to consider lossy methods.

Here, we implement a simple lossy compression method, inspired by the Discrete Cosine Transform (DCT) and the quantization steps of JPEG compression for images. The method comprises the following steps:
* Compute the Discrete Fourier Transform (DFT) of the time series data in the time domain.
* Quantize the Fourier coefficients to achieve a target entropy (the entropy determines the theoretically achievable compression ratio). This is done by multiplying by a normalization factor and then rounding to the nearest integer.
* Compress the reduced-entropy quantized Fourier coefficients using zlib or zstd (other methods could be used instead).

To decompress:
* Decompress the quantized Fourier coefficients.
* Divide by the normalization factor.
* Compute the Inverse Discrete Fourier Transform (IDFT) to obtain the reconstructed time series data.

This method is particularly well-suited for data that has been bandpass-filtered, as the suppressed Fourier coefficients yield an especially low entropy of the quantized signal.

For a comparison of various lossy and lossless compression schemes, see [Compression strategies for large-scale electrophysiology data, Buccino et al.](https://www.biorxiv.org/content/10.1101/2023.05.22.541700v2.full.pdf).

## Installation

```bash
pip install qfc
```

## Example usage

```python
# See examples/example1.py

from matplotlib import pyplot as plt
import numpy as np
from qfc import qfc_estimate_quant_scale_factor
from qfc.codecs import QFCCodec


def main():
    sampling_frequency = 30000
    duration = 2
    num_channels = 10
    num_samples = int(sampling_frequency * duration)
    y = np.random.randn(num_samples, num_channels) * 50
    y = lowpass_filter(y, sampling_frequency, 6000)
    y = np.ascontiguousarray(y)  # compressor requires C-order arrays
    y = y.astype(np.int16)
    target_residual_stdev = 5

    ############################################################
    quant_scale_factor = qfc_estimate_quant_scale_factor(
        y,
        target_residual_stdev=target_residual_stdev
    )
    codec = QFCCodec(
        quant_scale_factor=quant_scale_factor,
        dtype="int16",
        segment_length=10000,
        compression_method="zstd",
        zstd_level=3
    )
    compressed_bytes = codec.encode(y)
    y_reconstructed = codec.decode(compressed_bytes)
    ############################################################

    y_resid = y - y_reconstructed
    original_size = y.nbytes
    compressed_size = len(compressed_bytes)
    compression_ratio = original_size / compressed_size
    print(f"Original size: {original_size} bytes")
    print(f"Compressed size: {compressed_size} bytes")
    print(f"Actual compression ratio: {compression_ratio}")
    print(f'Target residual std. dev.: {target_residual_stdev:.2f}')
    print(f'Actual Std. dev. of residual: {np.std(y_resid):.2f}')

    xgrid = np.arange(y.shape[0]) / sampling_frequency
    ch = 3  # select a channel to plot
    n = 1000  # number of samples to plot
    plt.figure()
    plt.plot(xgrid[:n], y[:n, ch], label="Original")
    plt.plot(xgrid[:n], y_reconstructed[:n, ch], label="Decompressed")
    plt.plot(xgrid[:n], y_resid[:n, ch], label="Residual")
    plt.xlabel("Time")
    plt.title(f'QFC compression ratio: {compression_ratio:.2f}')
    plt.legend()
    plt.show()


def lowpass_filter(input_array, sampling_frequency, cutoff_frequency):
    F = np.fft.fft(input_array, axis=0)
    N = input_array.shape[0]
    freqs = np.fft.fftfreq(N, d=1 / sampling_frequency)
    sigma = cutoff_frequency / 3
    window = np.exp(-np.square(freqs) / (2 * sigma**2))
    F_filtered = F * window[:, None]
    filtered_array = np.fft.ifft(F_filtered, axis=0)
    return np.real(filtered_array)


if __name__ == "__main__":
    main()
```

## Zarr example

See [examples/zarr_example.py](./examples/zarr_example.py)

## Benchmarks

I have put together some preliminary systematic benchmarks on real and synthetic data. See [./benchmarks](./benchmarks) and [./benchmarks/results](./benchmarks/results).

As can be seen:
- Quantizing in the Fourier domain (QFC) is a lot better than quantizing in the time domain (call it QTC) for real data or for bandpass-filtered data.
- The compression ratio is a lot better for bandpass-filtered data compared with unfiltered raw.
- For the lossless part of the method, zstd is better than zlib, both in terms of all three of these factors: compression ratio, compression speed, and decompression speed.
- Obviously, the compression ratio is going to depend heavily on the target residual std. dev.

## License

This code is provided under the Apache License, Version 2.0.


## Author

Jeremy Magland, Center for Computational Mathematics, Flatiron Institute

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "qfc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Jeremy Magland",
    "author_email": "jmagland@flatironinstitute.org",
    "download_url": "https://files.pythonhosted.org/packages/65/37/ea0fea0cab9312d9a4e033cccd7b525684034c00d64d080ba6b1c6c24347/qfc-0.3.6.tar.gz",
    "platform": null,
    "description": "# QFC - Quantized Fourier Compression of Timeseries Data with Application to Electrophysiology\n\n## Overview\n\nWith the increasing sizes of data for extracellular electrophysiology, it is crucial to develop efficient methods for compressing multi-channel time series data. While lossless methods are desirable for perfectly preserving the original signal, the compression ratios for these methods usually range only from 2-4x. What is needed are ratios on the order of 10-30x, leading us to consider lossy methods.\n\nHere, we implement a simple lossy compression method, inspired by the Discrete Cosine Transform (DCT) and the quantization steps of JPEG compression for images. The method comprises the following steps:\n* Compute the Discrete Fourier Transform (DFT) of the time series data in the time domain.\n* Quantize the Fourier coefficients to achieve a target entropy (the entropy determines the theoretically achievable compression ratio). This is done by multiplying by a normalization factor and then rounding to the nearest integer.\n* Compress the reduced-entropy quantized Fourier coefficients using zlib or zstd (other methods could be used instead).\n\nTo decompress:\n* Decompress the quantized Fourier coefficients.\n* Divide by the normalization factor.\n* Compute the Inverse Discrete Fourier Transform (IDFT) to obtain the reconstructed time series data.\n\nThis method is particularly well-suited for data that has been bandpass-filtered, as the suppressed Fourier coefficients yield an especially low entropy of the quantized signal.\n\nFor a comparison of various lossy and lossless compression schemes, see [Compression strategies for large-scale electrophysiology data, Buccino et al.](https://www.biorxiv.org/content/10.1101/2023.05.22.541700v2.full.pdf).\n\n## Installation\n\n```bash\npip install qfc\n```\n\n## Example usage\n\n```python\n# See examples/example1.py\n\nfrom matplotlib import pyplot as plt\nimport numpy as np\nfrom qfc import qfc_estimate_quant_scale_factor\nfrom qfc.codecs import QFCCodec\n\n\ndef main():\n    sampling_frequency = 30000\n    duration = 2\n    num_channels = 10\n    num_samples = int(sampling_frequency * duration)\n    y = np.random.randn(num_samples, num_channels) * 50\n    y = lowpass_filter(y, sampling_frequency, 6000)\n    y = np.ascontiguousarray(y)  # compressor requires C-order arrays\n    y = y.astype(np.int16)\n    target_residual_stdev = 5\n\n    ############################################################\n    quant_scale_factor = qfc_estimate_quant_scale_factor(\n        y,\n        target_residual_stdev=target_residual_stdev\n    )\n    codec = QFCCodec(\n        quant_scale_factor=quant_scale_factor,\n        dtype=\"int16\",\n        segment_length=10000,\n        compression_method=\"zstd\",\n        zstd_level=3\n    )\n    compressed_bytes = codec.encode(y)\n    y_reconstructed = codec.decode(compressed_bytes)\n    ############################################################\n\n    y_resid = y - y_reconstructed\n    original_size = y.nbytes\n    compressed_size = len(compressed_bytes)\n    compression_ratio = original_size / compressed_size\n    print(f\"Original size: {original_size} bytes\")\n    print(f\"Compressed size: {compressed_size} bytes\")\n    print(f\"Actual compression ratio: {compression_ratio}\")\n    print(f'Target residual std. dev.: {target_residual_stdev:.2f}')\n    print(f'Actual Std. dev. of residual: {np.std(y_resid):.2f}')\n\n    xgrid = np.arange(y.shape[0]) / sampling_frequency\n    ch = 3  # select a channel to plot\n    n = 1000  # number of samples to plot\n    plt.figure()\n    plt.plot(xgrid[:n], y[:n, ch], label=\"Original\")\n    plt.plot(xgrid[:n], y_reconstructed[:n, ch], label=\"Decompressed\")\n    plt.plot(xgrid[:n], y_resid[:n, ch], label=\"Residual\")\n    plt.xlabel(\"Time\")\n    plt.title(f'QFC compression ratio: {compression_ratio:.2f}')\n    plt.legend()\n    plt.show()\n\n\ndef lowpass_filter(input_array, sampling_frequency, cutoff_frequency):\n    F = np.fft.fft(input_array, axis=0)\n    N = input_array.shape[0]\n    freqs = np.fft.fftfreq(N, d=1 / sampling_frequency)\n    sigma = cutoff_frequency / 3\n    window = np.exp(-np.square(freqs) / (2 * sigma**2))\n    F_filtered = F * window[:, None]\n    filtered_array = np.fft.ifft(F_filtered, axis=0)\n    return np.real(filtered_array)\n\n\nif __name__ == \"__main__\":\n    main()\n```\n\n## Zarr example\n\nSee [examples/zarr_example.py](./examples/zarr_example.py)\n\n## Benchmarks\n\nI have put together some preliminary systematic benchmarks on real and synthetic data. See [./benchmarks](./benchmarks) and [./benchmarks/results](./benchmarks/results).\n\nAs can be seen:\n- Quantizing in the Fourier domain (QFC) is a lot better than quantizing in the time domain (call it QTC) for real data or for bandpass-filtered data.\n- The compression ratio is a lot better for bandpass-filtered data compared with unfiltered raw.\n- For the lossless part of the method, zstd is better than zlib, both in terms of all three of these factors: compression ratio, compression speed, and decompression speed.\n- Obviously, the compression ratio is going to depend heavily on the target residual std. dev.\n\n## License\n\nThis code is provided under the Apache License, Version 2.0.\n\n\n## Author\n\nJeremy Magland, Center for Computational Mathematics, Flatiron Institute\n",
    "bugtrack_url": null,
    "license": null,
    "summary": null,
    "version": "0.3.6",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ce8baa11c86f21450b2080c9ed011cd8fafc47a1ec054345eea5d8d4accc71d",
                "md5": "875b1a26a1e1b3ffd62fb1b22bef6ee2",
                "sha256": "8dc896892bc31badb74dea6cf4958c6f7439b4f6373f13f99bc1da41bdb4a041"
            },
            "downloads": -1,
            "filename": "qfc-0.3.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "875b1a26a1e1b3ffd62fb1b22bef6ee2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 17869,
            "upload_time": "2024-08-07T16:36:52",
            "upload_time_iso_8601": "2024-08-07T16:36:52.666770Z",
            "url": "https://files.pythonhosted.org/packages/1c/e8/baa11c86f21450b2080c9ed011cd8fafc47a1ec054345eea5d8d4accc71d/qfc-0.3.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6537ea0fea0cab9312d9a4e033cccd7b525684034c00d64d080ba6b1c6c24347",
                "md5": "43153bbc7fca05e084f9771981f13a7d",
                "sha256": "78ca115b9f208a68de9451c851889061f5147efa25f48ae6fbf8d541dc75ce47"
            },
            "downloads": -1,
            "filename": "qfc-0.3.6.tar.gz",
            "has_sig": false,
            "md5_digest": "43153bbc7fca05e084f9771981f13a7d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 12084,
            "upload_time": "2024-08-07T16:36:54",
            "upload_time_iso_8601": "2024-08-07T16:36:54.264657Z",
            "url": "https://files.pythonhosted.org/packages/65/37/ea0fea0cab9312d9a4e033cccd7b525684034c00d64d080ba6b1c6c24347/qfc-0.3.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-07 16:36:54",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "qfc"
}

Jeremy Magland