b2h5py


Nameb2h5py JSON
Version 0.4.0 PyPI version JSON
download
home_page
SummaryTransparent optimized reading of n-dimensional Blosc2 slices for h5py
upload_time2024-01-10 08:47:37
maintainer
docs_urlNone
author
requires_python>=3.3
licenseCopyright (c) 2023 The Blosc Development Team <blosc@blosc.org> Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords h5py hdf5 blosc2
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            b2h5py
======

``b2h5py`` provides h5py_ with transparent, automatic optimized reading of n-dimensional slices of Blosc2_-compressed datasets. This optimized slicing leverages direct chunk access (skipping the slow HDF5 filter pipeline) and 2-level partitioning into chunks and then smaller blocks (so that less data is actually decompressed).

.. _h5py: https://www.h5py.org/
.. _Blosc2: https://www.blosc.org/

Benchmarks of this technique show 2x-5x speed-ups compared with normal filter-based access. Comparable results are obtained with a similar technique in PyTables, see `Optimized Hyper-slicing in PyTables with Blosc2 NDim`_.

.. image:: doc/benchmark.png

.. _Optimized Hyper-slicing in PyTables with Blosc2 NDim: https://www.blosc.org/posts/pytables-b2nd-slicing/

Usage
-----

This optimized access works for slices with step 1 on Blosc2-compressed datasets using the native byte order. It is enabled by monkey-patching the ``h5py.Dataset`` class to extend the slicing operation. The easiest way to do this is::

    import b2h5py.auto

After that, optimization will be attempted for any slicing of a dataset (of the form ``dataset[...]`` or ``dataset.__getitem__(...)``). If the optimization is not possible in a particular case, normal h5py slicing code will be used (which performs HDF5 filter-based access, backed by hdf5plugin_ to support Blosc2).

.. _hdf5plugin: https://github.com/silx-kit/hdf5plugin

You may instead just ``import b2h5py`` and explicitly enable the optimization globally by calling ``b2h5py.enable_fast_slicing()``, and disable it again with ``b2h5py.disable_fast_slicing()``. You may also enable it temporarily by using a context manager::

    with b2h5py.fast_slicing():
        # ... code that will use Blosc2 optimized slicing ...

Finally, you may explicitly enable optimizations for a given h5py dataset by wrapping it in a ``B2Dataset`` instance::

    b2dset = b2h5py.B2Dataset(dset)
    # ... slicing ``b2dset`` will use Blosc2 optimization ...

Please note that, for the moment, plain iteration in ``B2Dataset`` instances is not optimized (as it falls back to plain ``Dataset`` slicing). This does not affect the other approaches further above. Instead of ``for row in b2dset:`` loops, you may prefer to use slicing like::

    for i in range(len(b2dset)):
        # ... operate with ``b2dset[i]`` or ``b2dset[i, ...]`` ...

We recommend that you test which approach works better for your datasets. This limitation may be fixed in the future.

Building
--------

Just install PyPA build (e.g. ``pip install build``), enter the source code directory and run ``pyproject-build`` to get a source tarball and a wheel under the ``dist`` directory.

Installing
----------

To install as a wheel from PyPI, run ``pip install b2h5py``.

You may also install the wheel that you built in the previous section, or enter the source code directory and run ``pip install .`` from there.

Running tests
-------------

If you have installed ``b2h5py``, just run ``python -m unittest discover b2h5py.tests``.

Otherwise, just enter its source code directory and run ``python -m unittest``.

You can also run the h5py tests with the patched ``Dataset`` class to check that patching does not break anything. You may install the ``h5py-test`` extra (e.g. ``pip install b2h5py[h5py-test]`` and run ``python -m b2h5py.tests.test_patched_h5py``.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "b2h5py",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.3",
    "maintainer_email": "",
    "keywords": "h5py,HDF5,Blosc2",
    "author": "",
    "author_email": "Ivan Vilata-i-Balaguer <ivan@selidor.net>",
    "download_url": "https://files.pythonhosted.org/packages/86/5e/5ed90857cd91735c16a53359ecb03f49611404909327068be7547bf8c59f/b2h5py-0.4.0.tar.gz",
    "platform": null,
    "description": "b2h5py\n======\n\n``b2h5py`` provides h5py_ with transparent, automatic optimized reading of n-dimensional slices of Blosc2_-compressed datasets. This optimized slicing leverages direct chunk access (skipping the slow HDF5 filter pipeline) and 2-level partitioning into chunks and then smaller blocks (so that less data is actually decompressed).\n\n.. _h5py: https://www.h5py.org/\n.. _Blosc2: https://www.blosc.org/\n\nBenchmarks of this technique show 2x-5x speed-ups compared with normal filter-based access. Comparable results are obtained with a similar technique in PyTables, see `Optimized Hyper-slicing in PyTables with Blosc2 NDim`_.\n\n.. image:: doc/benchmark.png\n\n.. _Optimized Hyper-slicing in PyTables with Blosc2 NDim: https://www.blosc.org/posts/pytables-b2nd-slicing/\n\nUsage\n-----\n\nThis optimized access works for slices with step 1 on Blosc2-compressed datasets using the native byte order. It is enabled by monkey-patching the ``h5py.Dataset`` class to extend the slicing operation. The easiest way to do this is::\n\n    import b2h5py.auto\n\nAfter that, optimization will be attempted for any slicing of a dataset (of the form ``dataset[...]`` or ``dataset.__getitem__(...)``). If the optimization is not possible in a particular case, normal h5py slicing code will be used (which performs HDF5 filter-based access, backed by hdf5plugin_ to support Blosc2).\n\n.. _hdf5plugin: https://github.com/silx-kit/hdf5plugin\n\nYou may instead just ``import b2h5py`` and explicitly enable the optimization globally by calling ``b2h5py.enable_fast_slicing()``, and disable it again with ``b2h5py.disable_fast_slicing()``. You may also enable it temporarily by using a context manager::\n\n    with b2h5py.fast_slicing():\n        # ... code that will use Blosc2 optimized slicing ...\n\nFinally, you may explicitly enable optimizations for a given h5py dataset by wrapping it in a ``B2Dataset`` instance::\n\n    b2dset = b2h5py.B2Dataset(dset)\n    # ... slicing ``b2dset`` will use Blosc2 optimization ...\n\nPlease note that, for the moment, plain iteration in ``B2Dataset`` instances is not optimized (as it falls back to plain ``Dataset`` slicing). This does not affect the other approaches further above. Instead of ``for row in b2dset:`` loops, you may prefer to use slicing like::\n\n    for i in range(len(b2dset)):\n        # ... operate with ``b2dset[i]`` or ``b2dset[i, ...]`` ...\n\nWe recommend that you test which approach works better for your datasets. This limitation may be fixed in the future.\n\nBuilding\n--------\n\nJust install PyPA build (e.g. ``pip install build``), enter the source code directory and run ``pyproject-build`` to get a source tarball and a wheel under the ``dist`` directory.\n\nInstalling\n----------\n\nTo install as a wheel from PyPI, run ``pip install b2h5py``.\n\nYou may also install the wheel that you built in the previous section, or enter the source code directory and run ``pip install .`` from there.\n\nRunning tests\n-------------\n\nIf you have installed ``b2h5py``, just run ``python -m unittest discover b2h5py.tests``.\n\nOtherwise, just enter its source code directory and run ``python -m unittest``.\n\nYou can also run the h5py tests with the patched ``Dataset`` class to check that patching does not break anything. You may install the ``h5py-test`` extra (e.g. ``pip install b2h5py[h5py-test]`` and run ``python -m b2h5py.tests.test_patched_h5py``.\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2023 The Blosc Development Team <blosc@blosc.org>  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ",
    "summary": "Transparent optimized reading of n-dimensional Blosc2 slices for h5py",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://github.com/Blosc/b2h5py",
        "Issues": "https://github.com/Blosc/b2h5py/issues"
    },
    "split_keywords": [
        "h5py",
        "hdf5",
        "blosc2"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f1b278a82a52dee405061d03922632d201b47a1e379696a4b7afe1d3a6d54155",
                "md5": "0938c81f63110f228b569448ddcb7e1d",
                "sha256": "32714e490b0a576e38a8ad0296bde942c0b8a84a26ed6682400d300d3cf08c55"
            },
            "downloads": -1,
            "filename": "b2h5py-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0938c81f63110f228b569448ddcb7e1d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.3",
            "size": 15902,
            "upload_time": "2024-01-10T08:47:35",
            "upload_time_iso_8601": "2024-01-10T08:47:35.101896Z",
            "url": "https://files.pythonhosted.org/packages/f1/b2/78a82a52dee405061d03922632d201b47a1e379696a4b7afe1d3a6d54155/b2h5py-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "865e5ed90857cd91735c16a53359ecb03f49611404909327068be7547bf8c59f",
                "md5": "b1c5d13baf4064e9faf61610a83afdc5",
                "sha256": "c0943b22e8132f680b3fb682186473ce502779635ce3dd73e9cd617d84f68c2a"
            },
            "downloads": -1,
            "filename": "b2h5py-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b1c5d13baf4064e9faf61610a83afdc5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.3",
            "size": 14885,
            "upload_time": "2024-01-10T08:47:37",
            "upload_time_iso_8601": "2024-01-10T08:47:37.103639Z",
            "url": "https://files.pythonhosted.org/packages/86/5e/5ed90857cd91735c16a53359ecb03f49611404909327068be7547bf8c59f/b2h5py-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-10 08:47:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Blosc",
    "github_project": "b2h5py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "b2h5py"
}
        
Elapsed time: 0.19484s