Name | pyhdf5-handler JSON |
Version |
0.3
JSON |
| download |
home_page | None |
Summary | A python librairy developped by Hydris-hydrologie (https://www.hydris-hydrologie.fr/) to simply read and write data to hdf5 format. |
upload_time | 2025-09-10 15:50:14 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
h5
h5py
hdf5
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pyhdf5_handler
# Descritpion
Pyhdf5_handler is a simple python library to quickly read and write hdf5 file storage. This library has been developped by Hydris hydrologie (https://www.hydris-hydrologie.fr/).
Read and write to hdf5 support main python type:
- dictionnary
- list
- tuple
- numeric value (int, float)
- string
- timestamp (datetime, pandas and numpy)
- numpy array
- Structured numpy array
Basically, data are stored in the hdf5 as dataset using numpy array. Thus all input data are stored in a numpy array. If the hdf5 format does not support the type of data, data will be automatically converted to a supported type (byte for string). An attribute, containing the type of the original data, is also created. When reading the hdf5 database, data stored in the dataset are converted back to its original type. If the attribute is not found (for an hdf5 file which has been written by an other librairie), the data will be returned as stored in the hdf5: string and timestamp will be converted to byte sequence but can be decoded using str.decode().
This librairy also provide a way to access "simultaneously" (with different program or threads) to an hdf5 file for reading or writing.
The documentation is hosted at https://maximejay.codeberg.page/pyhdf5_handler.html.
# Installation
Pyhdf5_handler can be installed using pip:
```bash
pip install pyhdf5_handler
```
You can also download the source from https://codeberg.org/maximejay/pyhdf5_handler.
```bash
git clone https://codeberg.org/maximejay/pyhdf5_handler.git
pip install ./pyhdf5_handler
```
# API documentation
The API documentation is hosted at https://maximejay.codeberg.page/pyhdf5_handler.html or can be downloaded at https://codeberg.org/maximejay/pyhdf5_handler/archive/main:html/pyhdf5_handler.zip. This documentation is auto-generated using pdoc (https://pdoc.dev/docs/pdoc.html).
```bash
pdoc pyhdf5_handler/ -o ./html
```
# Quick start
## Create or open an hdf5 database:
```python
import pyhdf5_handler
hdf5 = pyhdf5_handler.open_hdf5("./test.hdf5", read_only=False, replace=False)
```
## Create a new group (like a folder) in this database:
```python
hdf5 = pyhdf5_handler.add_hdf5_sub_group(hdf5, subgroup="my_group")
hdf5["my_group"]
<HDF5 group "/my_group" (0 members)>
```
## Storing any data in the hdf5 database:
### Storing basic type such as integer, float, string or None
```python
pyhdf5_handler.hdf5_dataset_creator(hdf5,"str","str")
pyhdf5_handler.hdf5_dataset_creator(hdf5,"numbers",1.0)
pyhdf5_handler.hdf5_dataset_creator(hdf5,"none",None)
```
### Storing Timestamp
Timestamp object will be stored as string with ts.strftime("%Y-%m-%d %H:%M") encoded as utf8.
```python
import numpy as np
import pandas as pd
pyhdf5_handler.hdf5_dataset_creator(hdf5,"timestamp_numpy",np.datetime64('2019-09-22T17:38:30'))
pyhdf5_handler.hdf5_dataset_creator(hdf5,"timestamp_datetime",datetime.datetime.fromisoformat('2019-09-22T17:38:30'))
pyhdf5_handler.hdf5_dataset_creator(hdf5,"timestamp_pandas",pd.Timestamp('2019-09-22T17:38:30'))
```
### Storing list or tuple
```python
import numpy as np
import pandas as pd
pyhdf5_handler.hdf5_dataset_creator(hdf5,"list_num",[1.0,2.0])
pyhdf5_handler.hdf5_dataset_creator(hdf5,"list_str",["a","b"])
pyhdf5_handler.hdf5_dataset_creator(hdf5,"list_mixte",[1.0,"a"])
pyhdf5_handler.hdf5_dataset_creator(hdf5,"list_date_numpy",[np.datetime64('2019-09-22 17:38:30'),np.datetime64('2019-09-22 18:38:30')])
pyhdf5_handler.hdf5_dataset_creator(hdf5,"list_date_datetime",[datetime.datetime.fromisoformat('2019-09-22 17:38:30'),datetime.datetime.fromisoformat('2019-09-22T18:38:30')])
pyhdf5_handler.hdf5_dataset_creator(hdf5,"list_date_pandas",[pd.Timestamp('2019-09-22 17:38:30'),pd.Timestamp('2019-09-22 17:38:30')])
pyhdf5_handler.hdf5_dataset_creator(hdf5,"list_date_range_pandas",pd.date_range(start='1/1/2018', end='1/08/2018'))
```
Remark: List of timestamp will be stored in an numpy array first. When you will read back the data, you will retreive the numpy array but not the orignal list. Thus the data will be string, not timestamp. You will need to convert it yourself.
### Storing dictionnary
```python
dictionary={"dict":{
"int":1,
"float":2.0,
"none":None,
"timestamp":pd.Timestamp('2019-09-22 17:38:30'),
"list":[1,2,3,4],
"array": np.array([1,2,3,4]),
"date_range": pd.date_range(start='1/1/2018', end='1/08/2018'),
"list_mixte":[1.0,np.datetime64('2019-09-22 17:38:30')],
}
}
pyhdf5_handler.src.hdf5_handler.dump_dict_to_hdf5(hdf5, dictionary)
```
### handle structured ndarray
Structured ndarray are numpy array which store different type of data. Pyhdf5_handler will treat these numpy data specifically:
```python
import numpy as np
data = [('Alice', 25, 55.0), ('Bob', 32, 60.5)]
dtypes = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
people = np.array(data, dtype=dtypes)
pyhdf5_handler.hdf5_dataset_creator(hdf5,"structured_array",people)
```
## Viewing the content of the hdf5 database
### Using the function hdf5_view
This function provide many option to list groups, attributes and dataset in the hdf5 with recursive search (refer to the api documentation).
```python
pyhdf5_handler.hdf5_view(hdf5)
```
### Using hdf5_ls
This function will list only dataset on the current group (like h5ls in bash).
```python
pyhdf5_handler.hdf5_ls(hdf5)
```
## Reading the content of the hdf5
The content of an hdf5 object can be imported as a dictionary.
```python
data=pyhdf5_handler.read_hdf5_as_dict(hdf5)
```
If you want to read a specific item you can use hdf5_read_dataset and specify the output dtype:
```python
pyhdf5_handler.hdf5_read_dataset(hdf5["list_mixte"])
pyhdf5_handler.hdf5_read_dataset(hdf5["str"],str(type("str")))
pyhdf5_handler.hdf5_read_dataset(hdf5["str"],hdf5.attrs["str"])
```
If you don't mind of the output dtype and you prefer to read the content like it is use:
```python
hdf5["list_mixte"][:]
```
## Closing the hdf5 file
Do not forget to close the hdf5 !
```python
hdf5.close()
```
If you get in trouble with your hdf5 file because you forgot to close it, you can try to close all hdf5 file:
```python
pyhdf5_handler.close_all_hdf5_file()
```
## Quickly viewing or reading hdf5file
Most functions above have have their equivalent function working with the file directly. No need to open and close it manually. pyhdf5_handler will do it for you.
```python
pyhdf5_handler.hdf5file_ls("./test.hdf5")
pyhdf5_handler.hdf5file_ls("./test.hdf5",location="structured_array")
data=pyhdf5_handler.read_hdf5file_as_dict("./test.hdf5")
```
##Getting attributes and dataset
The following functions will read attributes and dataset in the hdf5 database.
```python
pyhdf5_handler.get_hdf5file_item(path_to_hdf5="./test.hdf5", location="./", item="structured_array", search_attrs=False)
pyhdf5_handler.get_hdf5file_item(path_to_hdf5="./test.hdf5", location="./", item="list_mixte", search_attrs=False)
pyhdf5_handler.get_hdf5file_attribute(path_to_hdf5="./test.hdf5", location="./", attribute="list_num", wait_time=0)
pyhdf5_handler.get_hdf5file_attribute(path_to_hdf5="./test.hdf5", location="./structured_array/ndarray_ds", attribute="name", wait_time=0)
```
## Searching attributes and dataset
You can also recursively search attributes and dataset in an hdf5 dataset:
```python
res=pyhdf5_handler.search_in_hdf5file("./test.hdf5", key="date_range", location="./")
res=pyhdf5_handler.search_in_hdf5file("./test.hdf5", key="structured_array", location="./")
```
## Parallel file access
Hdf5 does not allowed parallel access to file, i.o a programm can't read some data in the hdf5 while another programm is writing in the same hdf5. To workaround this problem, we provide the function parameter wait_time. This parameter used by most of the functions in this librairy. Wait_time is delay in seconds in which pyhdf5_handler will try to access to the file. Default is 0. When this time is elapsed, the function will not open the hdf5 and nothing will be read or written.
Suppose an external progam, noted external_prog, is writting data in the hdf5 test. This writting will last few seconds, let'say around 10s. You can use the folowing option to read the data:
```python
data=pyhdf5_handler.read_hdf5file_as_dict("./test.hdf5", wait_time=60)
```
In that case pyhdf5_handler will try to access to the hdf5 file during 60 seconds maximum. After 10s, external_prog will have finish its jobs and your script will process normally.
# Release on Pypi
To release on Pipy follow the next steps (https://packaging.python.org/en/latest/tutorials/packaging-projects/):
First remove /dist if exist, then
```bash
python3 -m pip install --upgrade build
python3 -m build
python3 -m pip install --upgrade twine
python3 -m twine upload dist/*
```
Raw data
{
"_id": null,
"home_page": null,
"name": "pyhdf5-handler",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "h5, h5py, hdf5",
"author": null,
"author_email": "Maxime Jay-Allemand <maxime.jay.allemand@hydris-hydrologie.fr>",
"download_url": "https://files.pythonhosted.org/packages/d9/a4/59d65d680eb9c0081c0dfac62fd6a4d186bdf1a01c466e38d218f0b7af70/pyhdf5_handler-0.3.tar.gz",
"platform": null,
"description": "# pyhdf5_handler\n\n# Descritpion\n\nPyhdf5_handler is a simple python library to quickly read and write hdf5 file storage. This library has been developped by Hydris hydrologie (https://www.hydris-hydrologie.fr/). \nRead and write to hdf5 support main python type: \n- dictionnary \n- list \n- tuple \n- numeric value (int, float) \n- string \n- timestamp (datetime, pandas and numpy) \n- numpy array \n- Structured numpy array \n \nBasically, data are stored in the hdf5 as dataset using numpy array. Thus all input data are stored in a numpy array. If the hdf5 format does not support the type of data, data will be automatically converted to a supported type (byte for string). An attribute, containing the type of the original data, is also created. When reading the hdf5 database, data stored in the dataset are converted back to its original type. If the attribute is not found (for an hdf5 file which has been written by an other librairie), the data will be returned as stored in the hdf5: string and timestamp will be converted to byte sequence but can be decoded using str.decode().\n\nThis librairy also provide a way to access \"simultaneously\" (with different program or threads) to an hdf5 file for reading or writing. \n\nThe documentation is hosted at https://maximejay.codeberg.page/pyhdf5_handler.html. \n\n# Installation\n\nPyhdf5_handler can be installed using pip: \n\n```bash\npip install pyhdf5_handler \n```\n\nYou can also download the source from https://codeberg.org/maximejay/pyhdf5_handler. \n\n```bash\ngit clone https://codeberg.org/maximejay/pyhdf5_handler.git \npip install ./pyhdf5_handler \n```\n\n# API documentation \n\nThe API documentation is hosted at https://maximejay.codeberg.page/pyhdf5_handler.html or can be downloaded at https://codeberg.org/maximejay/pyhdf5_handler/archive/main:html/pyhdf5_handler.zip. This documentation is auto-generated using pdoc (https://pdoc.dev/docs/pdoc.html). \n\n```bash\npdoc pyhdf5_handler/ -o ./html \n```\n\n# Quick start \n\n## Create or open an hdf5 database: \n\n```python\nimport pyhdf5_handler \nhdf5 = pyhdf5_handler.open_hdf5(\"./test.hdf5\", read_only=False, replace=False) \n```\n\n## Create a new group (like a folder) in this database: \n\n```python\nhdf5 = pyhdf5_handler.add_hdf5_sub_group(hdf5, subgroup=\"my_group\") \nhdf5[\"my_group\"] \n<HDF5 group \"/my_group\" (0 members)> \n```\n\n## Storing any data in the hdf5 database: \n\n### Storing basic type such as integer, float, string or None \n\n```python\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"str\",\"str\")\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"numbers\",1.0)\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"none\",None) \n```\n\n### Storing Timestamp \n\nTimestamp object will be stored as string with ts.strftime(\"%Y-%m-%d %H:%M\") encoded as utf8.\n\n```python\nimport numpy as np\nimport pandas as pd\n\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"timestamp_numpy\",np.datetime64('2019-09-22T17:38:30'))\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"timestamp_datetime\",datetime.datetime.fromisoformat('2019-09-22T17:38:30'))\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"timestamp_pandas\",pd.Timestamp('2019-09-22T17:38:30'))\n```\n\n### Storing list or tuple\n\n```python\nimport numpy as np\nimport pandas as pd\n\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"list_num\",[1.0,2.0])\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"list_str\",[\"a\",\"b\"])\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"list_mixte\",[1.0,\"a\"])\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"list_date_numpy\",[np.datetime64('2019-09-22 17:38:30'),np.datetime64('2019-09-22 18:38:30')])\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"list_date_datetime\",[datetime.datetime.fromisoformat('2019-09-22 17:38:30'),datetime.datetime.fromisoformat('2019-09-22T18:38:30')])\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"list_date_pandas\",[pd.Timestamp('2019-09-22 17:38:30'),pd.Timestamp('2019-09-22 17:38:30')])\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"list_date_range_pandas\",pd.date_range(start='1/1/2018', end='1/08/2018'))\n```\nRemark: List of timestamp will be stored in an numpy array first. When you will read back the data, you will retreive the numpy array but not the orignal list. Thus the data will be string, not timestamp. You will need to convert it yourself. \n\n### Storing dictionnary\n\n```python\ndictionary={\"dict\":{\n \"int\":1,\n \"float\":2.0,\n \"none\":None,\n \"timestamp\":pd.Timestamp('2019-09-22 17:38:30'),\n \"list\":[1,2,3,4],\n \"array\": np.array([1,2,3,4]),\n \"date_range\": pd.date_range(start='1/1/2018', end='1/08/2018'),\n \"list_mixte\":[1.0,np.datetime64('2019-09-22 17:38:30')],\n }\n }\n\npyhdf5_handler.src.hdf5_handler.dump_dict_to_hdf5(hdf5, dictionary)\n```\n\n### handle structured ndarray\n\nStructured ndarray are numpy array which store different type of data. Pyhdf5_handler will treat these numpy data specifically:\n\n```python\nimport numpy as np\ndata = [('Alice', 25, 55.0), ('Bob', 32, 60.5)]\ndtypes = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]\npeople = np.array(data, dtype=dtypes)\n\npyhdf5_handler.hdf5_dataset_creator(hdf5,\"structured_array\",people)\n```\n\n## Viewing the content of the hdf5 database\n\n### Using the function hdf5_view\n\nThis function provide many option to list groups, attributes and dataset in the hdf5 with recursive search (refer to the api documentation). \n\n```python\npyhdf5_handler.hdf5_view(hdf5)\n```\n\n### Using hdf5_ls\n\nThis function will list only dataset on the current group (like h5ls in bash). \n \n```python\npyhdf5_handler.hdf5_ls(hdf5)\n```\n\n## Reading the content of the hdf5\n\nThe content of an hdf5 object can be imported as a dictionary.\n\n```python\ndata=pyhdf5_handler.read_hdf5_as_dict(hdf5)\n```\nIf you want to read a specific item you can use hdf5_read_dataset and specify the output dtype: \n\n```python\npyhdf5_handler.hdf5_read_dataset(hdf5[\"list_mixte\"])\npyhdf5_handler.hdf5_read_dataset(hdf5[\"str\"],str(type(\"str\")))\npyhdf5_handler.hdf5_read_dataset(hdf5[\"str\"],hdf5.attrs[\"str\"])\n```\n\nIf you don't mind of the output dtype and you prefer to read the content like it is use:\n\n```python\nhdf5[\"list_mixte\"][:]\n```\n\n## Closing the hdf5 file\n\nDo not forget to close the hdf5 !\n\n```python\nhdf5.close()\n```\n\nIf you get in trouble with your hdf5 file because you forgot to close it, you can try to close all hdf5 file:\n\n```python\npyhdf5_handler.close_all_hdf5_file()\n```\n\n## Quickly viewing or reading hdf5file \n\nMost functions above have have their equivalent function working with the file directly. No need to open and close it manually. pyhdf5_handler will do it for you.\n\n```python\npyhdf5_handler.hdf5file_ls(\"./test.hdf5\")\npyhdf5_handler.hdf5file_ls(\"./test.hdf5\",location=\"structured_array\")\n\ndata=pyhdf5_handler.read_hdf5file_as_dict(\"./test.hdf5\")\n```\n\n##Getting attributes and dataset\n\nThe following functions will read attributes and dataset in the hdf5 database. \n\n```python\npyhdf5_handler.get_hdf5file_item(path_to_hdf5=\"./test.hdf5\", location=\"./\", item=\"structured_array\", search_attrs=False)\npyhdf5_handler.get_hdf5file_item(path_to_hdf5=\"./test.hdf5\", location=\"./\", item=\"list_mixte\", search_attrs=False)\n\npyhdf5_handler.get_hdf5file_attribute(path_to_hdf5=\"./test.hdf5\", location=\"./\", attribute=\"list_num\", wait_time=0)\npyhdf5_handler.get_hdf5file_attribute(path_to_hdf5=\"./test.hdf5\", location=\"./structured_array/ndarray_ds\", attribute=\"name\", wait_time=0)\n``` \n## Searching attributes and dataset\n\nYou can also recursively search attributes and dataset in an hdf5 dataset: \n\n```python\nres=pyhdf5_handler.search_in_hdf5file(\"./test.hdf5\", key=\"date_range\", location=\"./\")\nres=pyhdf5_handler.search_in_hdf5file(\"./test.hdf5\", key=\"structured_array\", location=\"./\")\n```\n\n## Parallel file access\n\nHdf5 does not allowed parallel access to file, i.o a programm can't read some data in the hdf5 while another programm is writing in the same hdf5. To workaround this problem, we provide the function parameter wait_time. This parameter used by most of the functions in this librairy. Wait_time is delay in seconds in which pyhdf5_handler will try to access to the file. Default is 0. When this time is elapsed, the function will not open the hdf5 and nothing will be read or written. \nSuppose an external progam, noted external_prog, is writting data in the hdf5 test. This writting will last few seconds, let'say around 10s. You can use the folowing option to read the data:\n\n```python\ndata=pyhdf5_handler.read_hdf5file_as_dict(\"./test.hdf5\", wait_time=60)\n```\n\nIn that case pyhdf5_handler will try to access to the hdf5 file during 60 seconds maximum. After 10s, external_prog will have finish its jobs and your script will process normally.\n \n\n# Release on Pypi \n\nTo release on Pipy follow the next steps (https://packaging.python.org/en/latest/tutorials/packaging-projects/): \nFirst remove /dist if exist, then \n\n```bash\npython3 -m pip install --upgrade build\npython3 -m build\npython3 -m pip install --upgrade twine\npython3 -m twine upload dist/*\n```",
"bugtrack_url": null,
"license": null,
"summary": "A python librairy developped by Hydris-hydrologie (https://www.hydris-hydrologie.fr/) to simply read and write data to hdf5 format.",
"version": "0.3",
"project_urls": {
"Homepage": "https://codeberg.org/maximejay/pyhdf5_handler",
"Issues": "https://codeberg.org/maximejay/pyhdf5_handler/issues"
},
"split_keywords": [
"h5",
" h5py",
" hdf5"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "62313b8f7c25c2de3201a2094ea6021b647f31c7f8bc52c0763e97ab8c04a019",
"md5": "59deba05b162df67ce8bc267c989a9b7",
"sha256": "9cb172405a93113ee9bd3ffaeb3c4bad48a416e666d7f3c4b9b5b394d277aaa2"
},
"downloads": -1,
"filename": "pyhdf5_handler-0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "59deba05b162df67ce8bc267c989a9b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 31532,
"upload_time": "2025-09-10T15:50:12",
"upload_time_iso_8601": "2025-09-10T15:50:12.063229Z",
"url": "https://files.pythonhosted.org/packages/62/31/3b8f7c25c2de3201a2094ea6021b647f31c7f8bc52c0763e97ab8c04a019/pyhdf5_handler-0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d9a459d65d680eb9c0081c0dfac62fd6a4d186bdf1a01c466e38d218f0b7af70",
"md5": "707a6f7c537ec86a87be58f1339a2376",
"sha256": "1049056b6c5af518061dcabfde3e6e89325b9a4d019bcfb9500cea09f2bc5085"
},
"downloads": -1,
"filename": "pyhdf5_handler-0.3.tar.gz",
"has_sig": false,
"md5_digest": "707a6f7c537ec86a87be58f1339a2376",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 183692,
"upload_time": "2025-09-10T15:50:14",
"upload_time_iso_8601": "2025-09-10T15:50:14.039984Z",
"url": "https://files.pythonhosted.org/packages/d9/a4/59d65d680eb9c0081c0dfac62fd6a4d186bdf1a01c466e38d218f0b7af70/pyhdf5_handler-0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-10 15:50:14",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": true,
"codeberg_user": "maximejay",
"codeberg_project": "pyhdf5_handler",
"lcname": "pyhdf5-handler"
}