h5dataframe


Nameh5dataframe JSON
Version 0.2.2 PyPI version JSON
download
home_pageNone
SummaryDrop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.
upload_time2024-09-02 15:10:37
maintainerNone
docs_urlNone
authorMatteo Bouvier
requires_python<4.0,>=3.10
licenseCECILL-B
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # h5dataframe

Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.

# Warning !

This is very much a **work in progress**, some features might not work yet or cause bugs.
**Save** your data elsewhere before converting it to an H5DataFrame.

If you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.

# Overview

This library provides the `H5DataFrame` object, replacing the regular `pandas.DataFrame`.

An `H5DataFrame` can be created from a `pandas.DataFrame` or from a dictionnary of (column_name -> column_values).

```python
>>> import pandas as pd
>>> from h5dataframe import H5DataFrame
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, 
                      index=['r1', 'r2', 'r3'])
>>> hdf = H5DataFrame(df)
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[RAM]
[3 rows x 2 columns]
```

At this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the `H5DataFrame.write()` method.

```python
>>> hdf.write('path/to/file.h5')
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]
```

The `H5DataFrame` is now backed on an hdf5 file, only loading data in RAM when requested.

Alternatively, an `H5DataFrame` can be read directly from an previously created hdf5 file with the `H5DataFrame.read()` method.

```python
>>> from h5dataframe import H5Mode
>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]
```

The default mode is `READ` (`'r'`) which creates a **read-only** `H5DataFrame`. To modify the data, use `mode=H5Mode.READ_WRITE` (`'r+'`).

# Installation

From pip:
```shell
pip install h5dataframe
```

From source:
```shell
git clone git@github.com:Vidium/h5dataframe.git
```
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "h5dataframe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Matteo Bouvier",
    "author_email": "m.bouvier@vidium-solutions.com",
    "download_url": "https://files.pythonhosted.org/packages/43/4b/703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab/h5dataframe-0.2.2.tar.gz",
    "platform": null,
    "description": "# h5dataframe\n\nDrop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.\n\n# Warning !\n\nThis is very much a **work in progress**, some features might not work yet or cause bugs.\n**Save** your data elsewhere before converting it to an H5DataFrame.\n\nIf you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.\n\n# Overview\n\nThis library provides the `H5DataFrame` object, replacing the regular `pandas.DataFrame`.\n\nAn `H5DataFrame` can be created from a `pandas.DataFrame` or from a dictionnary of (column_name -> column_values).\n\n```python\n>>> import pandas as pd\n>>> from h5dataframe import H5DataFrame\n>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, \n                      index=['r1', 'r2', 'r3'])\n>>> hdf = H5DataFrame(df)\n>>> hdf\n    a  b\nr1  1  4\nr2  2  5\nr3  3  6\n[RAM]\n[3 rows x 2 columns]\n```\n\nAt this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the `H5DataFrame.write()` method.\n\n```python\n>>> hdf.write('path/to/file.h5')\n>>> hdf\n    a  b\nr1  1  4\nr2  2  5\nr3  3  6\n[FILE]\n[3 rows x 2 columns]\n```\n\nThe `H5DataFrame` is now backed on an hdf5 file, only loading data in RAM when requested.\n\nAlternatively, an `H5DataFrame` can be read directly from an previously created hdf5 file with the `H5DataFrame.read()` method.\n\n```python\n>>> from h5dataframe import H5Mode\n>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)\n    a  b\nr1  1  4\nr2  2  5\nr3  3  6\n[FILE]\n[3 rows x 2 columns]\n```\n\nThe default mode is `READ` (`'r'`) which creates a **read-only** `H5DataFrame`. To modify the data, use `mode=H5Mode.READ_WRITE` (`'r+'`).\n\n# Installation\n\nFrom pip:\n```shell\npip install h5dataframe\n```\n\nFrom source:\n```shell\ngit clone git@github.com:Vidium/h5dataframe.git\n```",
    "bugtrack_url": null,
    "license": "CECILL-B",
    "summary": "Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.",
    "version": "0.2.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ee21283dd53353efcdbc8da2748862e13cd94d9073da209ca432210e1b6f680c",
                "md5": "4d526d339ae3ef1673cced35723679bc",
                "sha256": "29b97c7cf9895aaadbc00099d8ddb155d9e6b044fe3dfe35f3fc1362da5b1a47"
            },
            "downloads": -1,
            "filename": "h5dataframe-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4d526d339ae3ef1673cced35723679bc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 22012,
            "upload_time": "2024-09-02T15:10:36",
            "upload_time_iso_8601": "2024-09-02T15:10:36.184101Z",
            "url": "https://files.pythonhosted.org/packages/ee/21/283dd53353efcdbc8da2748862e13cd94d9073da209ca432210e1b6f680c/h5dataframe-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "434b703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab",
                "md5": "78ccb949c3915224292a23ec5fc056e6",
                "sha256": "ab9b5e0ec04a4807f452bf16fce8c73cf76812a2b69ca237558a071fab3802d2"
            },
            "downloads": -1,
            "filename": "h5dataframe-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "78ccb949c3915224292a23ec5fc056e6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 19925,
            "upload_time": "2024-09-02T15:10:37",
            "upload_time_iso_8601": "2024-09-02T15:10:37.875127Z",
            "url": "https://files.pythonhosted.org/packages/43/4b/703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab/h5dataframe-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-02 15:10:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "h5dataframe"
}
        
Elapsed time: 1.44834s