Name | h5dataframe JSON |
Version |
0.2.2
JSON |
| download |
home_page | None |
Summary | Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory. |
upload_time | 2024-09-02 15:10:37 |
maintainer | None |
docs_url | None |
author | Matteo Bouvier |
requires_python | <4.0,>=3.10 |
license | CECILL-B |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# h5dataframe
Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.
# Warning !
This is very much a **work in progress**, some features might not work yet or cause bugs.
**Save** your data elsewhere before converting it to an H5DataFrame.
If you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.
# Overview
This library provides the `H5DataFrame` object, replacing the regular `pandas.DataFrame`.
An `H5DataFrame` can be created from a `pandas.DataFrame` or from a dictionnary of (column_name -> column_values).
```python
>>> import pandas as pd
>>> from h5dataframe import H5DataFrame
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]},
index=['r1', 'r2', 'r3'])
>>> hdf = H5DataFrame(df)
>>> hdf
a b
r1 1 4
r2 2 5
r3 3 6
[RAM]
[3 rows x 2 columns]
```
At this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the `H5DataFrame.write()` method.
```python
>>> hdf.write('path/to/file.h5')
>>> hdf
a b
r1 1 4
r2 2 5
r3 3 6
[FILE]
[3 rows x 2 columns]
```
The `H5DataFrame` is now backed on an hdf5 file, only loading data in RAM when requested.
Alternatively, an `H5DataFrame` can be read directly from an previously created hdf5 file with the `H5DataFrame.read()` method.
```python
>>> from h5dataframe import H5Mode
>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)
a b
r1 1 4
r2 2 5
r3 3 6
[FILE]
[3 rows x 2 columns]
```
The default mode is `READ` (`'r'`) which creates a **read-only** `H5DataFrame`. To modify the data, use `mode=H5Mode.READ_WRITE` (`'r+'`).
# Installation
From pip:
```shell
pip install h5dataframe
```
From source:
```shell
git clone git@github.com:Vidium/h5dataframe.git
```
Raw data
{
"_id": null,
"home_page": null,
"name": "h5dataframe",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Matteo Bouvier",
"author_email": "m.bouvier@vidium-solutions.com",
"download_url": "https://files.pythonhosted.org/packages/43/4b/703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab/h5dataframe-0.2.2.tar.gz",
"platform": null,
"description": "# h5dataframe\n\nDrop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.\n\n# Warning !\n\nThis is very much a **work in progress**, some features might not work yet or cause bugs.\n**Save** your data elsewhere before converting it to an H5DataFrame.\n\nIf you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.\n\n# Overview\n\nThis library provides the `H5DataFrame` object, replacing the regular `pandas.DataFrame`.\n\nAn `H5DataFrame` can be created from a `pandas.DataFrame` or from a dictionnary of (column_name -> column_values).\n\n```python\n>>> import pandas as pd\n>>> from h5dataframe import H5DataFrame\n>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, \n index=['r1', 'r2', 'r3'])\n>>> hdf = H5DataFrame(df)\n>>> hdf\n a b\nr1 1 4\nr2 2 5\nr3 3 6\n[RAM]\n[3 rows x 2 columns]\n```\n\nAt this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the `H5DataFrame.write()` method.\n\n```python\n>>> hdf.write('path/to/file.h5')\n>>> hdf\n a b\nr1 1 4\nr2 2 5\nr3 3 6\n[FILE]\n[3 rows x 2 columns]\n```\n\nThe `H5DataFrame` is now backed on an hdf5 file, only loading data in RAM when requested.\n\nAlternatively, an `H5DataFrame` can be read directly from an previously created hdf5 file with the `H5DataFrame.read()` method.\n\n```python\n>>> from h5dataframe import H5Mode\n>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)\n a b\nr1 1 4\nr2 2 5\nr3 3 6\n[FILE]\n[3 rows x 2 columns]\n```\n\nThe default mode is `READ` (`'r'`) which creates a **read-only** `H5DataFrame`. To modify the data, use `mode=H5Mode.READ_WRITE` (`'r+'`).\n\n# Installation\n\nFrom pip:\n```shell\npip install h5dataframe\n```\n\nFrom source:\n```shell\ngit clone git@github.com:Vidium/h5dataframe.git\n```",
"bugtrack_url": null,
"license": "CECILL-B",
"summary": "Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.",
"version": "0.2.2",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ee21283dd53353efcdbc8da2748862e13cd94d9073da209ca432210e1b6f680c",
"md5": "4d526d339ae3ef1673cced35723679bc",
"sha256": "29b97c7cf9895aaadbc00099d8ddb155d9e6b044fe3dfe35f3fc1362da5b1a47"
},
"downloads": -1,
"filename": "h5dataframe-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4d526d339ae3ef1673cced35723679bc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 22012,
"upload_time": "2024-09-02T15:10:36",
"upload_time_iso_8601": "2024-09-02T15:10:36.184101Z",
"url": "https://files.pythonhosted.org/packages/ee/21/283dd53353efcdbc8da2748862e13cd94d9073da209ca432210e1b6f680c/h5dataframe-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "434b703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab",
"md5": "78ccb949c3915224292a23ec5fc056e6",
"sha256": "ab9b5e0ec04a4807f452bf16fce8c73cf76812a2b69ca237558a071fab3802d2"
},
"downloads": -1,
"filename": "h5dataframe-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "78ccb949c3915224292a23ec5fc056e6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 19925,
"upload_time": "2024-09-02T15:10:37",
"upload_time_iso_8601": "2024-09-02T15:10:37.875127Z",
"url": "https://files.pythonhosted.org/packages/43/4b/703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab/h5dataframe-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-02 15:10:37",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "h5dataframe"
}