h5yaml


Nameh5yaml JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryUse YAML configuration file to generate HDF5/netCDF4 formated files.
upload_time2025-10-27 10:22:16
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords hdf5 yaml netcdf4
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # H5YAML
[![image](https://img.shields.io/pypi/v/h5yaml.svg?label=release)](https://github.com/rmvanhees/h5yaml/)
[![image](https://img.shields.io/pypi/l/h5yaml.svg)](https://github.com/rmvanhees/h5yaml/LICENSE)
[![image](https://img.shields.io/pypi/dm/h5yaml.svg)](https://pypi.org/project/h5yaml/)
[![image](https://img.shields.io/pypi/status/h5yaml.svg?label=status)](https://pypi.org/project/h5yaml/)

## Description
This package let you generate [HDF5](https://docs.h5py.org/en/stable/)/[netCDF4](https://unidata.github.io/netcdf4-python/)
formatted files as defined in a [YAML](https://yaml.org/) configuration file. This has several advantages: 

 * you define the layout of your HDF5/netCDF4 file using YAML which is human-readable and has intuitive syntax.
 * you can reuse the YAML configuration file to to have all your product have a consistent layout.
 * you can make updates by only changing the YAML configuration file
 * you can have the layout of your HDF5/netCDF4 file as a python dictionary, thus without accessing any HDF5/netCDF4 file

The `H5YAML` package has two classes to generate a HDF5/netCDF4 formatted file.

 1. The class `H5Yaml` uses the [h5py](https://pypi.org/project/h5py/) package, which is a Pythonic interface to
    the HDF5 binary data format.
    Let 'h5_def.yaml' be your YAML configuration file then ```H5Yaml("h5_def.yaml").create("foo.h5")``` will create
	the HDF5 file 'foo.h5'. This can be read by netCDF4 software, because it uses dimension-scales to each dataset.
 2. The class `NcYaml` uses the [netCDF4](https://pypi.org/project/netCDF4/) package, which provides an object-oriented
    python interface to the netCDF version 4 library.
    Let 'nc_def.yaml' be your YAML configuration file then ```NcYaml("nc_def.yaml").create("foo.nc")``` will create
	the netCDF4/HDF5 file 'foo.nc'

The class `NcYaml` must be used when strict conformance to the netCDF4 format is required.
However, package `netCDF4` has some limitations, which `h5py` has not, for example it does
not allow variable-length variables to have a compound data-type.

## Installation
Releases of the code, starting from version 0.1, will be made available via PyPI.

## Usage

The YAML file should be structured as follows:

 * The top level are: 'groups', 'dimensions', 'compounds' and 'variables'
 * The section 'groups' are optional, but you should provide each group you want to use
   in your file. The 'groups' section in the YAML file may look like this:

   ```
   groups:
     - engineering_data
     - image_attributes
     - navigation_data
     - processing_control
     - science_data
   ```

 * The section 'dimensions' is obligatory, you should define the dimensions for each
   variable in your file. The 'dimensions' section may look like this:

   ```
   dimensions:
     days:
       _dtype: u4
       _size: 0
       long_name: days since 2024-01-01 00:00:00Z
     number_of_images:             # an unlimited dimension
       _dtype: u2
       _size: 0
     samples_per_image:            # a fixed dimension
       _dtype: u4
       _size: 307200
     /navigation_data/att_time:    # an unlimited dimension in a group with attributes
       _dtype: f8
       _size: 0
       _FillValue: -32767
       long_name: Attitude sample time (seconds of day)
       calendar: proleptic_gregorian
       units: seconds since %Y-%m-%d %H:%M:%S
       valid_min: 0
       valid_max: 92400
     n_viewport:                   # a fixed dimension with fixed values and attributes
       _dtype: i2
       _size: 5
       _values: [-50, -20, 0, 20, 50]
       long_name: along-track view angles at sensor
       units: degrees
   ```

 * The 'compounds' are optional, but you should provide each compound data-type which
   you want to use in your file. For each compound element you have to provide its
   data-type and attributes: units and long_name. The 'compound' section may look like
   this:

   ```
   compounds:
     stats_dtype:
       time: [u8, seconds since 1970-01-01T00:00:00, timestamp]
       index: [u2, '1', index]
       tbl_id: [u1, '1', binning id]
       saa: [u1, '1', saa-flag]
       coad: [u1, '1', co-addings]
       texp: [f4, ms, exposure time]
       lat: [f4, degree, latitude]
       lon: [f4, degree, longitude]
       avg: [f4, '1', '$S - S_{ref}$']
       unc: [f4, '1', '\u03c3($S - S_{ref}$)']
       dark_offs: [f4, '1', dark-offset]
   ```

   Alternatively, provide a list with names of YAML files which contain the definitions
   of the compounds.

   ```
   compounds:
     - h5_nomhk_tm.yaml
     - h5_science_hk.yaml
   ```
 * The 'variables' are defined by their data-type ('_dtype') and dimensions ('_dims'),
   and optionally chunk sizes ('_chunks'), compression ('_compression'), variable length
   ('_vlen'). In addition, each variable can have as many attributes as you like,
   defined by its name and value. The 'variables' section may look like this:

   ```
   variables:
     /image_attributes/nr_coadditions:
       _dtype: u2
       _dims: [number_of_images]
       _FillValue: 0
       long_name: Number of coadditions
       units: '1'
       valid_min: 1
     /image_attributes/exposure_time:
       _dtype: f8
       _dims: [number_of_images]
       _FillValue: -32767
       long_name: Exposure time
       units: seconds
     stats_163:
       _dtype: stats_dtype
       _vlen: True
       _dims: [days]
       comment: detector map statistics (MPS=163)
   ```

### Notes and ToDo:

 * The usage of older versions of h5py may result in broken netCDF4 files
 * Explain usage of parameter '_chunks', which is currently not correctly implemented.
 * Explain that the usage of variable length data-sets may break netCDF4 compatibility

## Support [TBW]

## Roadmap

 * Release v0.1 : stable API to read your YAML files and generate the HDF5/netCDF4 file


## Authors and acknowledgment
The code is developed by R.M. van Hees (SRON)

## License

* Copyright: SRON (https://www.sron.nl).
* License: BSD-3-clause

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "h5yaml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "HDF5, YAML, netCDF4",
    "author": null,
    "author_email": "Richard van Hees <r.m.van.hees@sron.nl>",
    "download_url": "https://files.pythonhosted.org/packages/d5/66/60776fd3b5135c4f64a25565d8b1f4a5bc64d8b75e72544469620f561034/h5yaml-0.1.1.tar.gz",
    "platform": null,
    "description": "# H5YAML\n[![image](https://img.shields.io/pypi/v/h5yaml.svg?label=release)](https://github.com/rmvanhees/h5yaml/)\n[![image](https://img.shields.io/pypi/l/h5yaml.svg)](https://github.com/rmvanhees/h5yaml/LICENSE)\n[![image](https://img.shields.io/pypi/dm/h5yaml.svg)](https://pypi.org/project/h5yaml/)\n[![image](https://img.shields.io/pypi/status/h5yaml.svg?label=status)](https://pypi.org/project/h5yaml/)\n\n## Description\nThis package let you generate [HDF5](https://docs.h5py.org/en/stable/)/[netCDF4](https://unidata.github.io/netcdf4-python/)\nformatted files as defined in a [YAML](https://yaml.org/) configuration file. This has several advantages: \n\n * you define the layout of your HDF5/netCDF4 file using YAML which is human-readable and has intuitive syntax.\n * you can reuse the YAML configuration file to to have all your product have a consistent layout.\n * you can make updates by only changing the YAML configuration file\n * you can have the layout of your HDF5/netCDF4 file as a python dictionary, thus without accessing any HDF5/netCDF4 file\n\nThe `H5YAML` package has two classes to generate a HDF5/netCDF4 formatted file.\n\n 1. The class `H5Yaml` uses the [h5py](https://pypi.org/project/h5py/) package, which is a Pythonic interface to\n    the HDF5 binary data format.\n    Let 'h5_def.yaml' be your YAML configuration file then ```H5Yaml(\"h5_def.yaml\").create(\"foo.h5\")``` will create\n\tthe HDF5 file 'foo.h5'. This can be read by netCDF4 software, because it uses dimension-scales to each dataset.\n 2. The class `NcYaml` uses the [netCDF4](https://pypi.org/project/netCDF4/) package, which provides an object-oriented\n    python interface to the netCDF version 4 library.\n    Let 'nc_def.yaml' be your YAML configuration file then ```NcYaml(\"nc_def.yaml\").create(\"foo.nc\")``` will create\n\tthe netCDF4/HDF5 file 'foo.nc'\n\nThe class `NcYaml` must be used when strict conformance to the netCDF4 format is required.\nHowever, package `netCDF4` has some limitations, which `h5py` has not, for example it does\nnot allow variable-length variables to have a compound data-type.\n\n## Installation\nReleases of the code, starting from version 0.1, will be made available via PyPI.\n\n## Usage\n\nThe YAML file should be structured as follows:\n\n * The top level are: 'groups', 'dimensions', 'compounds' and 'variables'\n * The section 'groups' are optional, but you should provide each group you want to use\n   in your file. The 'groups' section in the YAML file may look like this:\n\n   ```\n   groups:\n     - engineering_data\n     - image_attributes\n     - navigation_data\n     - processing_control\n     - science_data\n   ```\n\n * The section 'dimensions' is obligatory, you should define the dimensions for each\n   variable in your file. The 'dimensions' section may look like this:\n\n   ```\n   dimensions:\n     days:\n       _dtype: u4\n       _size: 0\n       long_name: days since 2024-01-01 00:00:00Z\n     number_of_images:             # an unlimited dimension\n       _dtype: u2\n       _size: 0\n     samples_per_image:            # a fixed dimension\n       _dtype: u4\n       _size: 307200\n     /navigation_data/att_time:    # an unlimited dimension in a group with attributes\n       _dtype: f8\n       _size: 0\n       _FillValue: -32767\n       long_name: Attitude sample time (seconds of day)\n       calendar: proleptic_gregorian\n       units: seconds since %Y-%m-%d %H:%M:%S\n       valid_min: 0\n       valid_max: 92400\n     n_viewport:                   # a fixed dimension with fixed values and attributes\n       _dtype: i2\n       _size: 5\n       _values: [-50, -20, 0, 20, 50]\n       long_name: along-track view angles at sensor\n       units: degrees\n   ```\n\n * The 'compounds' are optional, but you should provide each compound data-type which\n   you want to use in your file. For each compound element you have to provide its\n   data-type and attributes: units and long_name. The 'compound' section may look like\n   this:\n\n   ```\n   compounds:\n     stats_dtype:\n       time: [u8, seconds since 1970-01-01T00:00:00, timestamp]\n       index: [u2, '1', index]\n       tbl_id: [u1, '1', binning id]\n       saa: [u1, '1', saa-flag]\n       coad: [u1, '1', co-addings]\n       texp: [f4, ms, exposure time]\n       lat: [f4, degree, latitude]\n       lon: [f4, degree, longitude]\n       avg: [f4, '1', '$S - S_{ref}$']\n       unc: [f4, '1', '\\u03c3($S - S_{ref}$)']\n       dark_offs: [f4, '1', dark-offset]\n   ```\n\n   Alternatively, provide a list with names of YAML files which contain the definitions\n   of the compounds.\n\n   ```\n   compounds:\n     - h5_nomhk_tm.yaml\n     - h5_science_hk.yaml\n   ```\n * The 'variables' are defined by their data-type ('_dtype') and dimensions ('_dims'),\n   and optionally chunk sizes ('_chunks'), compression ('_compression'), variable length\n   ('_vlen'). In addition, each variable can have as many attributes as you like,\n   defined by its name and value. The 'variables' section may look like this:\n\n   ```\n   variables:\n     /image_attributes/nr_coadditions:\n       _dtype: u2\n       _dims: [number_of_images]\n       _FillValue: 0\n       long_name: Number of coadditions\n       units: '1'\n       valid_min: 1\n     /image_attributes/exposure_time:\n       _dtype: f8\n       _dims: [number_of_images]\n       _FillValue: -32767\n       long_name: Exposure time\n       units: seconds\n     stats_163:\n       _dtype: stats_dtype\n       _vlen: True\n       _dims: [days]\n       comment: detector map statistics (MPS=163)\n   ```\n\n### Notes and ToDo:\n\n * The usage of older versions of h5py may result in broken netCDF4 files\n * Explain usage of parameter '_chunks', which is currently not correctly implemented.\n * Explain that the usage of variable length data-sets may break netCDF4 compatibility\n\n## Support [TBW]\n\n## Roadmap\n\n * Release v0.1 : stable API to read your YAML files and generate the HDF5/netCDF4 file\n\n\n## Authors and acknowledgment\nThe code is developed by R.M. van Hees (SRON)\n\n## License\n\n* Copyright: SRON (https://www.sron.nl).\n* License: BSD-3-clause\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Use YAML configuration file to generate HDF5/netCDF4 formated files.",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/rmvanhees/h5_yaml",
        "Issues": "https://github.com/rmvanhees/h5_yaml/issues",
        "Source": "https://github.com/rmvanhees/h5_yaml"
    },
    "split_keywords": [
        "hdf5",
        " yaml",
        " netcdf4"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0aa27a85052f7dfa2686989dda3208670222fd78bf107d7c26c0d0f454e8ff4a",
                "md5": "4de47941698807d3feebd742b500e341",
                "sha256": "fa5232a16c2c7a4441163a34cadb75b029d426bc0225547879784a2998380ec9"
            },
            "downloads": -1,
            "filename": "h5yaml-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4de47941698807d3feebd742b500e341",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 13367,
            "upload_time": "2025-10-27T10:22:14",
            "upload_time_iso_8601": "2025-10-27T10:22:14.523459Z",
            "url": "https://files.pythonhosted.org/packages/0a/a2/7a85052f7dfa2686989dda3208670222fd78bf107d7c26c0d0f454e8ff4a/h5yaml-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d56660776fd3b5135c4f64a25565d8b1f4a5bc64d8b75e72544469620f561034",
                "md5": "f15c9ed0a346483768f6e486d3246fe7",
                "sha256": "7459478435418bc74d6f632abbfb48379907b32828f0594f9e1a46cc042debbb"
            },
            "downloads": -1,
            "filename": "h5yaml-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f15c9ed0a346483768f6e486d3246fe7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 9328,
            "upload_time": "2025-10-27T10:22:16",
            "upload_time_iso_8601": "2025-10-27T10:22:16.025698Z",
            "url": "https://files.pythonhosted.org/packages/d5/66/60776fd3b5135c4f64a25565d8b1f4a5bc64d8b75e72544469620f561034/h5yaml-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-27 10:22:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rmvanhees",
    "github_project": "h5_yaml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "h5yaml"
}
        
Elapsed time: 1.18826s