[![Test Suite](https://github.com/seung-lab/zfpc/actions/workflows/test-suite.yml/badge.svg)](https://github.com/seung-lab/zfpc/actions/workflows/test-suite.yml)
# zfpc: zfp container format
_An unofficial project unaffiliated with the [`zfp`](https://github.com/LLNL/zfp/) project._
An exerimental container format for `zfp` encoded vector fields. As described in the [zfp documentation](https://zfp.readthedocs.io/en/latest/faq.html#q-vfields), datasets such as vector fields are not optimally compressed within a single zfp stream. This is due to the uncorrelated X and Y components. Compress the X and Y components as separate `zfp` arrays and you will yield a higher compression ratio.
However, this method of separate files is cumbersome, must be maintained per a project, and is not compatible with existing data viewers (such as Neuroglancer) that expect to download a single file per an image tile. `zfpc` provides a means for splitting up a 1-4D array based on their (user specified) uncorrelated dimensions, compressing those slices into separate `zfp` streams, and encoding them into a single file. This file can then be decompressed back into its original form seamlessly using `zfpc`. In the future, it may be possible to automatically determine which dimensions are uncorrelated using statistical tests.
In fixed rate mode, it should still be possible to perform random access though this feature is not available yet.
```python
import zfpc
# example shape: (1202, 1240, 64, 2)
vector_field = np.array(...) # dtype must be float or int, 32 or 64-bit
# For data that are arranged as a Z stack of planar XY vectors
# e.g. arr[x,y,z,channel] mostly smoothly vary in the XY plane
# per a channel. Therefore, we set correlated_dims as
# [True,True,False,False] as the z and channel dimensions
# do not smoothly vary to obtain optimal compression.
#
# tolerance, rate, and precision are supported modes.
# By default, lossless compression is used.
correlated_dims = [True, True, False, False]
binary = zfpc.compress(
vector_field,
tolerance=0.01,
correlated_dims=correlated_dims,
)
recovered_img = zfpc.decompress(binary)
```
## Container Format
header,index,streams
### Header
The header is 15 bytes long in the following format written in little endian.
| Field | Type | Description |
|-------------------|---------|----------------------------------------------------------------------------------------------------------|
| magic | char[4] | "zfpc" magic number for file format. |
| format version | uint8 | Always version 0 (for now). |
| dtype,mode,order | uint8 | bits 1-3: zfp data type; bits 4-6: zfp mode; bit 7: unused; bit 8: true indicates c order (bits: DDDMMMUC) |
| nx | uint32 | Size of x axis. |
| ny | uint32 | Size of y axis. |
| nz | uint32 | Size of z axis. |
| nw | uint32 | Number of channels. |
| correlated dims | uint8 | Bottom 4 bits are a bitfield with 1 indicating correlated, 0 uncorrelated. Top 4 bits unused. (xyzw0000) |
### Index
All entries in the index are uint64 (8 bytes) little endian.
Stream offset followed by the size of each stream. The number of streams is calculated by the product of all the uncorrelated dimension sizes.
The stream offset is not a strictly necessary element, but will allow the format to be changed while allowing older decompressors to still function.
### Streams
All zfp streams are concatenated together in Fortran order. The streams are written with a full header so that they can be decompressed independently.
In the future, it might make sense to get savings by condensing them into a single header and writing headerless streams. However, writing full headers presents the possibility of using different compression settings for each stream which could pay off for different components.
Raw data
{
"_id": null,
"home_page": "https://github.com/seung-lab/zfpc/",
"name": "zfpc",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7,<4.0",
"maintainer_email": "",
"keywords": "compression zfp volumetric-data numpy image-processing 2d 3d 4d",
"author": "William Silversmith",
"author_email": "ws9@princeton.edu",
"download_url": "https://files.pythonhosted.org/packages/e6/99/467efebfc9b423da6de260858e45f88e25d7af1e8a5a303b93e915fa9bac/zfpc-0.1.2.tar.gz",
"platform": null,
"description": "[![Test Suite](https://github.com/seung-lab/zfpc/actions/workflows/test-suite.yml/badge.svg)](https://github.com/seung-lab/zfpc/actions/workflows/test-suite.yml)\n\n# zfpc: zfp container format\n\n_An unofficial project unaffiliated with the [`zfp`](https://github.com/LLNL/zfp/) project._\n\nAn exerimental container format for `zfp` encoded vector fields. As described in the [zfp documentation](https://zfp.readthedocs.io/en/latest/faq.html#q-vfields), datasets such as vector fields are not optimally compressed within a single zfp stream. This is due to the uncorrelated X and Y components. Compress the X and Y components as separate `zfp` arrays and you will yield a higher compression ratio.\n\nHowever, this method of separate files is cumbersome, must be maintained per a project, and is not compatible with existing data viewers (such as Neuroglancer) that expect to download a single file per an image tile. `zfpc` provides a means for splitting up a 1-4D array based on their (user specified) uncorrelated dimensions, compressing those slices into separate `zfp` streams, and encoding them into a single file. This file can then be decompressed back into its original form seamlessly using `zfpc`. In the future, it may be possible to automatically determine which dimensions are uncorrelated using statistical tests.\n\nIn fixed rate mode, it should still be possible to perform random access though this feature is not available yet.\n\n```python\nimport zfpc\n\n# example shape: (1202, 1240, 64, 2)\nvector_field = np.array(...) # dtype must be float or int, 32 or 64-bit\n\n# For data that are arranged as a Z stack of planar XY vectors\n# e.g. arr[x,y,z,channel] mostly smoothly vary in the XY plane\n# per a channel. Therefore, we set correlated_dims as \n# [True,True,False,False] as the z and channel dimensions\n# do not smoothly vary to obtain optimal compression.\n#\n# tolerance, rate, and precision are supported modes.\n# By default, lossless compression is used.\ncorrelated_dims = [True, True, False, False]\nbinary = zfpc.compress(\n\tvector_field, \n\ttolerance=0.01,\n\tcorrelated_dims=correlated_dims,\n)\nrecovered_img = zfpc.decompress(binary)\n```\n\n## Container Format\n\nheader,index,streams\n\n### Header\n\nThe header is 15 bytes long in the following format written in little endian.\n\n| Field | Type | Description |\n|-------------------|---------|----------------------------------------------------------------------------------------------------------|\n| magic | char[4] | \"zfpc\" magic number for file format. |\n| format version | uint8 | Always version 0 (for now). |\n| dtype,mode,order | uint8 | bits 1-3: zfp data type; bits 4-6: zfp mode; bit 7: unused; bit 8: true indicates c order (bits: DDDMMMUC) |\n| nx | uint32 | Size of x axis. |\n| ny | uint32 | Size of y axis. |\n| nz | uint32 | Size of z axis. |\n| nw | uint32 | Number of channels. |\n| correlated dims | uint8 | Bottom 4 bits are a bitfield with 1 indicating correlated, 0 uncorrelated. Top 4 bits unused. (xyzw0000) |\n\n### Index\n\nAll entries in the index are uint64 (8 bytes) little endian.\n\nStream offset followed by the size of each stream. The number of streams is calculated by the product of all the uncorrelated dimension sizes.\n\nThe stream offset is not a strictly necessary element, but will allow the format to be changed while allowing older decompressors to still function.\n\n### Streams\n\nAll zfp streams are concatenated together in Fortran order. The streams are written with a full header so that they can be decompressed independently. \n\nIn the future, it might make sense to get savings by condensing them into a single header and writing headerless streams. However, writing full headers presents the possibility of using different compression settings for each stream which could pay off for different components.\n",
"bugtrack_url": null,
"license": "License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)",
"summary": "zfp container (zfpc) for optimal compression of 1D-4D arrays by representing correlated dimensions as separate zfp streams.",
"version": "0.1.2",
"split_keywords": [
"compression",
"zfp",
"volumetric-data",
"numpy",
"image-processing",
"2d",
"3d",
"4d"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "9f6dae5c9809794aca66c7c019efa719",
"sha256": "b6b44a95d360dac8ef20fc2248bbe835147f1c434a2e1a3f02f848b558764ff4"
},
"downloads": -1,
"filename": "zfpc-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9f6dae5c9809794aca66c7c019efa719",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7,<4.0",
"size": 15916,
"upload_time": "2022-08-05T15:23:00",
"upload_time_iso_8601": "2022-08-05T15:23:00.120274Z",
"url": "https://files.pythonhosted.org/packages/ad/f3/5cba36804a5421c4840dd0f33e95c19bfa66f1ee2056e09ebe028fb967c0/zfpc-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "fd47da9a4daad4133fe20a30cfc8726b",
"sha256": "7b6a25226db8c5fd848794fc39f17831be2db50619041b1f34faea503ecd0088"
},
"downloads": -1,
"filename": "zfpc-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "fd47da9a4daad4133fe20a30cfc8726b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7,<4.0",
"size": 16035,
"upload_time": "2022-08-05T15:23:01",
"upload_time_iso_8601": "2022-08-05T15:23:01.565774Z",
"url": "https://files.pythonhosted.org/packages/e6/99/467efebfc9b423da6de260858e45f88e25d7af1e8a5a303b93e915fa9bac/zfpc-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-08-05 15:23:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "seung-lab",
"github_project": "zfpc",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "zfpc"
}