[![Crates.io](https://img.shields.io/crates/v/hidefix.svg)](https://crates.io/crates/hidefix)
[![PyPI](https://img.shields.io/pypi/v/hidefix.svg)](https://pypi.org/project/hidefix/)
[![Documentation](https://docs.rs/hidefix/badge.svg)](https://docs.rs/hidefix/)
[![Build (rust)](https://github.com/gauteh/hidefix/workflows/Rust/badge.svg)](https://github.com/gauteh/hidefix/actions?query=branch%3Amain)
[![Build (python)](https://github.com/gauteh/hidefix/workflows/Python/badge.svg)](https://github.com/gauteh/hidefix/actions?query=branch%3Amain)
[![codecov](https://codecov.io/gh/gauteh/hidefix/branch/main/graph/badge.svg)](https://codecov.io/gh/gauteh/hidefix)
[![Rust nightly](https://img.shields.io/badge/rustc-nightly-orange)](https://rust-lang.github.io/rustup/installation/other.html)
<img src="https://raw.githubusercontent.com/gauteh/hidefix/main/idefix.png">
# HIDEFIX
This Rust and Python library provides an alternative reader for the
[HDF5](https://support.hdfgroup.org/HDF5/doc/H5.format.html) file or [NetCDF4
file](https://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html)
(which uses HDF5) which supports concurrent access to data. This is achieved by
building an index of the chunks, allowing a thread to use many file handles to
read the file. The original (native) HDF5 library is used to build the index,
but once it has been created it is no longer needed. The index can be
serialized to disk so that performing the indexing is not necessary.
In Rust:
```rust
use hidefix::prelude::*;
let idx = Index::index("tests/data/coads_climatology.nc4").unwrap();
let mut r = idx.reader("SST").unwrap();
let values = r.values::<f32>(None, None).unwrap();
println!("SST: {:?}", values);
```
or with Python using Xarray:
```python
import xarray as xr
import hidefix
ds = xr.open_dataset('file.nc', engine='hidefix')
print(ds)
```
See the [example](examples/) for how to use hidefix for
[regular](examples/read_hfx_cache.rs), [parallel](examples/read_hfx_parallel.rs) or
[concurrent](examples/read_hfx_concurrent.rs) reads.
## Motivation
The HDF5 library requires internal locks to be _thread-safe_ since it relies on
internal buffers which cannot be safely accessed/written to from multiple
threads. This effectively causes multi-threaded applications to use sequential
reads, while competing for the locks. And also apparently cause each other
trouble, perhaps through dropping cached chunks which other threads still need.
It can be safely used from different processes, but that requires potentially
much more overhead than multi-threaded or asynchronous code.
## Some basic benchmarks
`hidefix` is intended to perform better when concurrent reads are made either
to the same dataset, same file or to different files from a single process. For
basic benchmarks the performance is on-par or slightly better compared to doing
standard *sequential* reads than the native HDF5 library (through its
[rust-bindings](https://github.com/aldanor/hdf5-rust)). Where `hidefix` shines
is once the _multiple threads_ in the _same process_ tries to read in _any way_
from a HDF5 file simultaneously.
This simple benchmark tries to read a small dataset sequentially or
concurrently using the `cached` reader from `hidefix` and the native reader
from HDF5. The dataset is chunked, shuffled and compressed (using gzip):
```sh
$ cargo bench --bench concurrency -- --ignored
test shuffled_compressed::cache_concurrent_reads ... bench: 15,903,406 ns/iter (+/- 220,824)
test shuffled_compressed::cache_sequential ... bench: 59,778,761 ns/iter (+/- 602,316)
test shuffled_compressed::native_concurrent_reads ... bench: 411,605,868 ns/iter (+/- 35,346,233)
test shuffled_compressed::native_sequential ... bench: 103,457,237 ns/iter (+/- 7,703,936)
```
## Inspiration and other projects
This work is based in part on the [DMR++
module](https://github.com/OPENDAP/bes/tree/master/modules/dmrpp_module) of the
[OPeNDAP](https://www.opendap.org/) [Hyrax
server](https://www.opendap.org/software/hyrax-data-server). The
[zarr](https://zarr.readthedocs.io/en/stable/) format does something similar,
and the same approach has been [tested out on
HDF5](https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314)
as swell.
Raw data
{
"_id": null,
"home_page": null,
"name": "hidefix",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "hdf, async, concurrency",
"author": "Gaute Hope <eg@gaute.vetsj.com>",
"author_email": "Gaute Hope <eg@gaute.vetsj.com>",
"download_url": "https://files.pythonhosted.org/packages/9d/b0/247ce25e81aded835c50bd8dfd7241da26f147f368a79ab93360245b9fb7/hidefix-0.11.1.tar.gz",
"platform": null,
"description": "[![Crates.io](https://img.shields.io/crates/v/hidefix.svg)](https://crates.io/crates/hidefix)\n[![PyPI](https://img.shields.io/pypi/v/hidefix.svg)](https://pypi.org/project/hidefix/)\n[![Documentation](https://docs.rs/hidefix/badge.svg)](https://docs.rs/hidefix/)\n[![Build (rust)](https://github.com/gauteh/hidefix/workflows/Rust/badge.svg)](https://github.com/gauteh/hidefix/actions?query=branch%3Amain)\n[![Build (python)](https://github.com/gauteh/hidefix/workflows/Python/badge.svg)](https://github.com/gauteh/hidefix/actions?query=branch%3Amain)\n[![codecov](https://codecov.io/gh/gauteh/hidefix/branch/main/graph/badge.svg)](https://codecov.io/gh/gauteh/hidefix)\n[![Rust nightly](https://img.shields.io/badge/rustc-nightly-orange)](https://rust-lang.github.io/rustup/installation/other.html)\n\n<img src=\"https://raw.githubusercontent.com/gauteh/hidefix/main/idefix.png\">\n\n# HIDEFIX\n\nThis Rust and Python library provides an alternative reader for the\n[HDF5](https://support.hdfgroup.org/HDF5/doc/H5.format.html) file or [NetCDF4\nfile](https://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html)\n(which uses HDF5) which supports concurrent access to data. This is achieved by\nbuilding an index of the chunks, allowing a thread to use many file handles to\nread the file. The original (native) HDF5 library is used to build the index,\nbut once it has been created it is no longer needed. The index can be\nserialized to disk so that performing the indexing is not necessary.\n\nIn Rust:\n\n```rust\nuse hidefix::prelude::*;\n\nlet idx = Index::index(\"tests/data/coads_climatology.nc4\").unwrap();\nlet mut r = idx.reader(\"SST\").unwrap();\n\nlet values = r.values::<f32>(None, None).unwrap();\n\nprintln!(\"SST: {:?}\", values);\n```\n\nor with Python using Xarray:\n```python\nimport xarray as xr\nimport hidefix\n\nds = xr.open_dataset('file.nc', engine='hidefix')\nprint(ds)\n```\n\nSee the [example](examples/) for how to use hidefix for\n[regular](examples/read_hfx_cache.rs), [parallel](examples/read_hfx_parallel.rs) or\n[concurrent](examples/read_hfx_concurrent.rs) reads.\n\n## Motivation\n\nThe HDF5 library requires internal locks to be _thread-safe_ since it relies on\ninternal buffers which cannot be safely accessed/written to from multiple\nthreads. This effectively causes multi-threaded applications to use sequential\nreads, while competing for the locks. And also apparently cause each other\ntrouble, perhaps through dropping cached chunks which other threads still need.\nIt can be safely used from different processes, but that requires potentially\nmuch more overhead than multi-threaded or asynchronous code.\n\n## Some basic benchmarks\n\n`hidefix` is intended to perform better when concurrent reads are made either\nto the same dataset, same file or to different files from a single process. For\nbasic benchmarks the performance is on-par or slightly better compared to doing\nstandard *sequential* reads than the native HDF5 library (through its\n[rust-bindings](https://github.com/aldanor/hdf5-rust)). Where `hidefix` shines\nis once the _multiple threads_ in the _same process_ tries to read in _any way_\nfrom a HDF5 file simultaneously.\n\nThis simple benchmark tries to read a small dataset sequentially or\nconcurrently using the `cached` reader from `hidefix` and the native reader\nfrom HDF5. The dataset is chunked, shuffled and compressed (using gzip):\n\n```sh\n$ cargo bench --bench concurrency -- --ignored\n\ntest shuffled_compressed::cache_concurrent_reads ... bench: 15,903,406 ns/iter (+/- 220,824)\ntest shuffled_compressed::cache_sequential ... bench: 59,778,761 ns/iter (+/- 602,316)\ntest shuffled_compressed::native_concurrent_reads ... bench: 411,605,868 ns/iter (+/- 35,346,233)\ntest shuffled_compressed::native_sequential ... bench: 103,457,237 ns/iter (+/- 7,703,936)\n```\n\n## Inspiration and other projects\n\nThis work is based in part on the [DMR++\nmodule](https://github.com/OPENDAP/bes/tree/master/modules/dmrpp_module) of the\n[OPeNDAP](https://www.opendap.org/) [Hyrax\nserver](https://www.opendap.org/software/hyrax-data-server). The\n[zarr](https://zarr.readthedocs.io/en/stable/) format does something similar,\nand the same approach has been [tested out on\nHDF5](https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314)\nas swell.\n\n\n",
"bugtrack_url": null,
"license": "LGPL-3.0-or-later",
"summary": "Concurrent HDF5 and NetCDF4 reader (experimental)",
"version": "0.11.1",
"project_urls": {
"Source Code": "https://github.com/gauteh/hidefix"
},
"split_keywords": [
"hdf",
" async",
" concurrency"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d33ef10d8c86cd27d43b13efc3f16626bd532b5d3590267fbc5f857230cb512a",
"md5": "53795c28e0a6e8949a4d1c71cff262e8",
"sha256": "1e380efd685d2ff7aba1f5e85e10c147256bc298e9d55686ff8e698e1868c089"
},
"downloads": -1,
"filename": "hidefix-0.11.1-cp39-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "53795c28e0a6e8949a4d1c71cff262e8",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 1492513,
"upload_time": "2024-08-12T13:08:57",
"upload_time_iso_8601": "2024-08-12T13:08:57.107760Z",
"url": "https://files.pythonhosted.org/packages/d3/3e/f10d8c86cd27d43b13efc3f16626bd532b5d3590267fbc5f857230cb512a/hidefix-0.11.1-cp39-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dc586c42cf4e9ebaf2fd981da499c4408785642befcf3efd5e3bbe266fd21b54",
"md5": "8a23ae29fc671a9502c6b29b1ed54edd",
"sha256": "f927db5a5a892b9940430db39e1024fe53aa72900b8e4fb65170806f13372a86"
},
"downloads": -1,
"filename": "hidefix-0.11.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "8a23ae29fc671a9502c6b29b1ed54edd",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 9242591,
"upload_time": "2024-08-12T13:08:59",
"upload_time_iso_8601": "2024-08-12T13:08:59.434989Z",
"url": "https://files.pythonhosted.org/packages/dc/58/6c42cf4e9ebaf2fd981da499c4408785642befcf3efd5e3bbe266fd21b54/hidefix-0.11.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "542bc5e1b54c0605779c223ab3be9d56705a02579805503da8e3163171cff951",
"md5": "bae5810230927ec3b22831cb6c708140",
"sha256": "9cf35134c423f78f1412dd39ae771956bccb2a600aed6f06b32824aef1f445e8"
},
"downloads": -1,
"filename": "hidefix-0.11.1-cp39-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "bae5810230927ec3b22831cb6c708140",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 3108094,
"upload_time": "2024-08-12T13:09:02",
"upload_time_iso_8601": "2024-08-12T13:09:02.128120Z",
"url": "https://files.pythonhosted.org/packages/54/2b/c5e1b54c0605779c223ab3be9d56705a02579805503da8e3163171cff951/hidefix-0.11.1-cp39-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9db0247ce25e81aded835c50bd8dfd7241da26f147f368a79ab93360245b9fb7",
"md5": "c2c343766a33d33c2c33ce896d212bd3",
"sha256": "3cb5ba49308dcd4533d660111647d033316f4507d89e309b428f842bfda350dc"
},
"downloads": -1,
"filename": "hidefix-0.11.1.tar.gz",
"has_sig": false,
"md5_digest": "c2c343766a33d33c2c33ce896d212bd3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 9153458,
"upload_time": "2024-08-12T13:09:04",
"upload_time_iso_8601": "2024-08-12T13:09:04.238718Z",
"url": "https://files.pythonhosted.org/packages/9d/b0/247ce25e81aded835c50bd8dfd7241da26f147f368a79ab93360245b9fb7/hidefix-0.11.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-12 13:09:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "gauteh",
"github_project": "hidefix",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hidefix"
}