hdf5storage


Namehdf5storage JSON
Version 0.1.19 PyPI version JSON
download
home_pagehttps://github.com/frejanordsiek/hdf5storage
SummaryUtilities to read/write Python types to/from HDF5 files, including MATLAB v7.3 MAT files.
upload_time2023-01-20 11:55:45
maintainer
docs_urlhttps://pythonhosted.org/hdf5storage/
authorFreja Nordsiek
requires_python
licenseBSD
keywords hdf5 matlab
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Overview
========

This Python package provides high level utilities to read/write a
variety of Python types to/from HDF5 (Heirarchal Data Format) formatted
files. This package also provides support for MATLAB MAT v7.3 formatted
files, which are just HDF5 files with a different extension and some
extra meta-data.

All of this is done without pickling data. Pickling is bad for security
because it allows arbitrary code to be executed in the interpreter. One
wants to be able to read possibly HDF5 and MAT files from untrusted
sources, so pickling is avoided in this package.

The package's documetation is found at
http://pythonhosted.org/hdf5storage/

The package's source code is found at
https://github.com/frejanordsiek/hdf5storage

The package is licensed under a 2-clause BSD license
(https://github.com/frejanordsiek/hdf5storage/blob/master/COPYING.txt).

Installation
============

Dependencies
------------

This package only supports Python >= 2.6.

This package requires the numpy and h5py (>= 2.1) packages to run. Note
that full functionality requires h5py >= 2.3. An optional dependency is
the scipy package.

Installing by pip
-----------------

This package is on `PyPI <https://pypi.python.org/pypi/hdf5storage>`_.
To install hdf5storage using pip, run the command::

    pip install hdf5storage

Installing from Source
----------------------

To install hdf5storage from source, download the package and then
install the dependencies ::

    pip install -r requirements.txt

Then to install the package, run the command with Python ::

    python setup.py install

Running Tests
-------------

For testing, the package nose (>= 1.0) is required as well as unittest2
on Python 2.6. There are some tests that require Matlab and scipy to be
installed and be in the executable path. Not having them means that
those tests cannot be run (they will be skipped) but all the other
tests will run. To install all testing dependencies, other than scipy,
run ::

    pip install -r requirements_tests.txt.

To run the tests ::

    python setup.py nosetests


Building Documentation
----------------------

The documentation additionally requires sphinx (>= 1.7). The
documentation dependencies can be installed by ::

    pip install -r requirements_doc.txt

To build the documentation ::

    python setup.py build_sphinx

Python 2
========

This package was designed and written for Python 3, with Python 2.7 and
2.6 support added later. This does mean that a few things are a little
clunky in Python 2. Examples include requiring ``unicode`` keys for
dictionaries, the ``int`` and ``long`` types both being mapped to the
Python 3 ``int`` type, etc. The storage format's metadata looks more
familiar from a Python 3 standpoint as well.

The documentation is written in terms of Python 3 syntax and types
primarily. Important Python 2 information beyond direct translations of
syntax and types will be pointed out.

Hierarchal Data Format 5 (HDF5)
===============================

HDF5 files (see http://www.hdfgroup.org/HDF5/) are a commonly used file
format for exchange of numerical data. It has built in support for a
large variety of number formats (un/signed integers, floating point
numbers, strings, etc.) as scalars and arrays, enums and compound types.
It also handles differences in data representation on different hardware
platforms (endianness, different floating point formats, etc.). As can
be imagined from the name, data is represented in an HDF5 file in a
hierarchal form modelling a Unix filesystem (Datasets are equivalent to
files, Groups are equivalent to directories, and links are supported).

This package interfaces HDF5 files using the h5py package
(http://www.h5py.org/) as opposed to the PyTables package
(http://www.pytables.org/).

MATLAB MAT v7.3 file support
============================

MATLAB (http://www.mathworks.com/) MAT files version 7.3 and later are
HDF5 files with a different file extension (``.mat``) and a very
specific set of meta-data and storage conventions. This package provides
read and write support for a limited set of Python and MATLAB types.

SciPy (http://scipy.org/) has functions to read and write the older MAT
file formats. This package has functions modeled after the
``scipy.io.savemat`` and ``scipy.io.loadmat`` functions, that have the
same names and similar arguments. The dispatch to the SciPy versions if
the MAT file format is not an HDF5 based one.

Supported Types
===============

The supported Python and MATLAB types are given in the tables below.
The tables assume that one has imported collections and numpy as::

    import collections as cl
    import numpy as np

The table gives which Python types can be read and written, the first
version of this package to support it, the numpy type it gets
converted to for storage (if type information is not written, that
will be what it is read back as) the MATLAB class it becomes if
targetting a MAT file, and the first version of this package to
support writing it so MATlAB can read it.

===============  =======  ==========================  ===========  ==============
Python                                                MATLAB
----------------------------------------------------  ---------------------------
Type             Version  Converted to                Class        Version
===============  =======  ==========================  ===========  ==============
bool             0.1      np.bool\_ or np.uint8       logical      0.1 [1]_
None             0.1      ``np.float64([])``          ``[]``       0.1
int [2]_ [3]_    0.1      np.int64 [2]_               int64        0.1
long [3]_ [4]_   0.1      np.int64                    int64        0.1
float            0.1      np.float64                  double       0.1
complex          0.1      np.complex128               double       0.1
str              0.1      np.uint32/16                char         0.1 [5]_
bytes            0.1      np.bytes\_ or np.uint16     char         0.1 [6]_
bytearray        0.1      np.bytes\_ or np.uint16     char         0.1 [6]_
list             0.1      np.object\_                 cell         0.1
tuple            0.1      np.object\_                 cell         0.1
set              0.1      np.object\_                 cell         0.1
frozenset        0.1      np.object\_                 cell         0.1
cl.deque         0.1      np.object\_                 cell         0.1
dict             0.1                                  struct       0.1 [7]_
np.bool\_        0.1                                  logical      0.1
np.void          0.1
np.uint8         0.1                                  uint8        0.1
np.uint16        0.1                                  uint16       0.1
np.uint32        0.1                                  uint32       0.1
np.uint64        0.1                                  uint64       0.1
np.uint8         0.1                                  int8         0.1
np.int16         0.1                                  int16        0.1
np.int32         0.1                                  int32        0.1
np.int64         0.1                                  int64        0.1
np.float16 [8]_  0.1
np.float32       0.1                                  single       0.1
np.float64       0.1                                  double       0.1
np.complex64     0.1                                  single       0.1
np.complex128    0.1                                  double       0.1
np.str\_         0.1      np.uint32/16                char/uint32  0.1 [5]_
np.bytes\_       0.1      np.bytes\_ or np.uint16     char         0.1 [6]_
np.object\_      0.1                                  cell         0.1
np.ndarray       0.1      [9]_ [10]_                  [9]_ [10]_   0.1 [9]_ [11]_
np.matrix        0.1      [9]_                        [9]_         0.1 [9]_
np.chararray     0.1      [9]_                        [9]_         0.1 [9]_
np.recarray      0.1      structured np.ndarray       [9]_ [10]_   0.1 [9]_
===============  =======  ==========================  ===========  ==============

.. [1] Depends on the selected options. Always ``np.uint8`` when doing
       MATLAB compatiblity, or if the option is explicitly set.
.. [2] In Python 2.x, it may be read back as a ``long`` if it can't fit
       in the size of an ``int``.
.. [3] Must be small enough to fit into an ``np.int64``.
.. [4] Type found only in Python 2.x. Python 2.x's ``long`` and ``int``
       are unified into a single ``int`` type in Python 3.x. Read as an
       ``int`` in Python 3.x.
.. [5] Depends on the selected options and whether it can be converted
       to UTF-16 without using doublets. If the option is explicity set
       (or implicitly when doing MATLAB compatibility) and it can be
       converted to UTF-16 without losing any characters that can't be
       represented in UTF-16 or using UTF-16 doublets (MATLAB doesn't
       support them), then it is written as ``np.uint16`` in UTF-16
       encoding. Otherwise, it is stored at ``np.uint32`` in UTF-32
       encoding.
.. [6] Depends on the selected options. If the option is explicitly set
       (or implicitly when doing MATLAB compatibility), it will be
       stored as ``np.uint16`` in UTF-16 encoding unless it has
       non-ASCII characters in which case a ``NotImplementedError`` is
       thrown). Otherwise, it is just written as ``np.bytes_``.
.. [7] All keys must be ``str`` in Python 3 or ``unicode`` in Python 2.
       They cannot have null characters (``'\x00'``) or forward slashes
       (``'/'``) in them.
.. [8] ``np.float16`` are not supported for h5py versions before
       ``2.2``.
.. [9] Container types are only supported if their underlying dtype is
       supported. Data conversions are done based on its dtype.
.. [10] Structured ``np.ndarray`` s (have fields in their dtypes) can be
        written as an HDF5 COMPOUND type or as an HDF5 Group with
        Datasets holding its fields (either the values directly, or as
        an HDF5 Reference array to the values for the different elements
        of the data). Can only be written as an HDF5 COMPOUND type if
        none of its field are of dtype ``'object'``. Field names cannot
        have null characters (``'\x00'``) and, when writing as an HDF5
        GROUP, forward slashes (``'/'``) in them.
.. [11] Structured ``np.ndarray`` s with no elements, when written like a
        structure, will not be read back with the right dtypes for their
        fields (will all become 'object').

This table gives the MATLAB classes that can be read from a MAT file,
the first version of this package that can read them, and the Python
type they are read as.

===============  =======  =================================
MATLAB Class     Version  Python Type
===============  =======  =================================
logical          0.1      np.bool\_
single           0.1      np.float32 or np.complex64 [12]_
double           0.1      np.float64 or np.complex128 [12]_
uint8            0.1      np.uint8
uint16           0.1      np.uint16
uint32           0.1      np.uint32
uint64           0.1      np.uint64
int8             0.1      np.int8
int16            0.1      np.int16
int32            0.1      np.int32
int64            0.1      np.int64
char             0.1      np.str\_
struct           0.1      structured np.ndarray
cell             0.1      np.object\_
canonical empty  0.1      ``np.float64([])``
===============  =======  =================================

.. [12] Depends on whether there is a complex part or not.


File Incompatibilities
======================

The storage of empty ``numpy.ndarray`` (or objects that would be stored like
one) when the ``Options.store_shape_for_empty`` (implicitly set when Matlab
compatibility is enabled) is incompatible with both Matlab and the main
branch of this package after 2021-07-11 due to a bug (Issue #114) that cannot
be fixed without breaking compatibility in the 0.1.x series and thus
will not be fixed (it is however fixed in the main branch after 2021-07-11)
since such a fix would mean the version could not be of the form 0.1.x.

The incompatibility is caused by storing the array shape in the Dataset after
reversing the dimension order instead of before, meaning that the array is
read with its dimensions reversed from what is expected if read by Matlab
or the main branch after 2021-07-11.


Versions
========

0.1.19. Bugfix release.
        * Issue #122 and #124. Replaced use of deprecated ``numpy.asscalar``
          functions with the ``numpy.ndarray.item`` method.
        * Issue #123. Forced the use of English month and day of the week names
          in the HDF5 header for MATLAB compatibility.
        * Issue #125. Fixed accidental collection of
          ``pkg_resources.parse_version`` from setuptools as a Marshaller now
          that it is a class.
0.1.18. Performance improving release.
        * Pull Request #111 from Daniel Hrisca. Many repeated calls to the
          ``__getitem__`` methods of objects were turned into single calls.
        * Further reducionts in ``__getitem__`` calls in the spirit of PR #111.

0.1.17. Bugfix and deprecation workaround release that fixed the following.
        * Issue #109. Fixed the fix Issue #102 for 32-bit platforms (previous
          fix was segfaulting).
        * Moved to using ``pkg_resources.parse_version`` from ``setuptools``
          with ``distutils.version`` classes as a fallback instead of just the
          later to prepare for the removal of ``distutils`` (PEP 632) and
          prevent warnings on Python versions where it is marked as deprecated.
        * Issue #110. Changed all uses of the ``tostring`` method on numpy types
          to using ``tobytes`` if available, with ``tostring`` as the fallback
          for old versions of numpy where it is not.

0.1.16. Bugfix release that fixed the following bugs.
        * Issue #81 and #82. ``h5py.File`` will require the mode to be
          passed explicitly in the future. All calls without passing it were
          fixed to pass it.
        * Issue #102. Added support for h5py 3.0 and 3.1.
        * Issue #73. Fixed bug where a missing variable in ``loadmat`` would
          cause the function to think that the file is a pre v7.3 format MAT
          file fall back to ``scipy.io.loadmat`` which won't work since the file
          is a v7.3 format MAT file.
        * Fixed formatting issues in the docstrings and the documentation that
          prevented the documentation from building.

0.1.15. Bugfix release that fixed the following bugs.
        * Issue #68. Fixed bug where ``str`` and ``numpy.unicode_``
          strings (but not ndarrays of them) were saved in
          ``uint32`` format regardless of the value of
          ``Options.convert_numpy_bytes_to_utf16``.
        * Issue #70. Updated ``setup.py`` and ``requirements.txt`` to specify
          the maximum versions of numpy and h5py that can be used for specific
          python versions (avoid version with dropped support).
        * Issue #71. Fixed bug where the ``'python_fields'`` attribute wouldn't
          always be written when doing python metadata for data written in
          a struct-like fashion. The bug caused the field order to not be
          preserved when writing and reading.
        * Fixed an assertion in the tests to handle field re-ordering when
          no metadata is used for structured dtypes that only worked on
          older versions of numpy.
        * Issue #72. Fixed bug where python collections filled with ndarrays
          that all have the same shape were converted to multi-dimensional
          object ndarrays instead of a 1D object ndarray of the elements.

0.1.14. Bugfix release that also added a couple features.
        * Issue #45. Fixed syntax errors in unicode strings for Python
          3.0 to 3.2.
        * Issues #44 and #47. Fixed bugs in testing of conversion and
          storage of string types.
        * Issue #46. Fixed raising of ``RuntimeWarnings`` in tests due
          to signalling NaNs.
        * Added requirements files for building documentation and
          running tests.
        * Made it so that Matlab compatability tests are skipped if
          Matlab is not found, instead of raising errors.

0.1.13. Bugfix release fixing the following bug.
        * Issue #36. Fixed bugs in writing ``int`` and ``long`` to HDF5
          and their tests on 32 bit systems.

0.1.12. Bugfix release fixing the following bugs. In addition, copyright years were also updated and notices put in the Matlab files used for testing.
        * Issue #32. Fixed transposing before reshaping ``np.ndarray``
          when reading from HDF5 files where python metadata was stored
          but not Matlab metadata.
        * Issue #33. Fixed the loss of the number of characters when
          reading empty numpy string arrays.
        * Issue #34. Fixed a conversion error when ``np.chararray`` are
          written with Matlab metadata.

0.1.11. Bugfix release fixing the following.
        * Issue #30. Fixed ``loadmat`` not opening files in read mode.

0.1.10. Minor feature/performance fix release doing the following.
        * Issue #29. Added ``writes`` and ``reads`` functions to write
          and read more than one piece of data at a time and made
          ``savemat`` and ``loadmat`` use them to increase performance.
          Previously, the HDF5 file was being opened and closed for
          each piece of data, which impacted performance, especially
	  for large files.

0.1.9. Bugfix and minor feature release doing the following.
       * Issue #23. Fixed bug where a structured ``np.ndarray`` with
         a field name of ``'O'`` could never be written as an
         HDF5 COMPOUND Dataset (falsely thought a field's dtype was
         object).
       * Issue #6. Added optional data compression and the storage of
         data checksums. Controlled by several new options.

0.1.8. Bugfix release fixing the following two bugs.
       * Issue #21. Fixed bug where the ``'MATLAB_class'`` Attribute is
         not set when writing ``dict`` types when writing MATLAB
         metadata.
       * Issue #22. Fixed bug where null characters (``'\x00'``) and
         forward slashes (``'/'``) were allowed in ``dict`` keys and the
         field names of structured ``np.ndarray`` (except that forward
         slashes are allowed when the
         ``structured_numpy_ndarray_as_struct`` is not set as is the
         case when the ``matlab_compatible`` option is set). These
         cause problems for the ``h5py`` package and the HDF5 library.
         ``NotImplementedError`` is now thrown in these cases.

0.1.7. Bugfix release with an added compatibility option and some added test code. Did the following.
       * Fixed an issue reading variables larger than 2 GB in MATLAB
         MAT v7.3 files when no explicit variable names to read are
         given to ``hdf5storage.loadmat``. Fix also reduces memory
         consumption and processing time a little bit by removing an
         unneeded memory copy.
       * ``Options`` now will accept any additional keyword arguments it
         doesn't support, ignoring them, to be API compatible with future
         package versions with added options.
       * Added tests for reading data that has been compressed or had
         other HDF5 filters applied.

0.1.6. Bugfix release fixing a bug with determining the maximum size of a Python 2.x ``int`` on a 32-bit system.

0.1.5. Bugfix release fixing the following bug.
       * Fixed bug where an ``int`` could be stored that is too big to
         fit into an ``int`` when read back in Python 2.x. When it is
         too big, it is converted to a ``long``.
       * Fixed a bug where an ``int`` or ``long`` that is too big to
	 big to fit into an ``np.int64`` raised the wrong exception.
       * Fixed bug where fields names for structured ``np.ndarray`` with
         non-ASCII characters (assumed to be UTF-8 encoded in
         Python 2.x) can't be read or written properly.
       * Fixed bug where ``np.bytes_`` with non-ASCII characters can
         were converted incorrectly to UTF-16 when that option is set
         (set implicitly when doing MATLAB compatibility). Now, it throws
         a ``NotImplementedError``.

0.1.4. Bugfix release fixing the following bugs. Thanks goes to `mrdomino <https://github.com/mrdomino>`_ for writing the bug fixes.
       * Fixed bug where ``dtype`` is used as a keyword parameter of
         ``np.ndarray.astype`` when it is a positional argument.
       * Fixed error caused by ``h5py.__version__`` being absent on
         Ubuntu 12.04.

0.1.3. Bugfix release fixing the following bug.
       * Fixed broken ability to correctly read and write empty
         structured ``np.ndarray`` (has fields).

0.1.2. Bugfix release fixing the following bugs.
       * Removed mistaken support for ``np.float16`` for h5py versions
         before ``2.2`` since that was when support for it was
         introduced.
       * Structured ``np.ndarray`` where one or more fields is of the
         ``'object'`` dtype can now be written without an error when
         the ``structured_numpy_ndarray_as_struct`` option is not set.
         They are written as an HDF5 Group, as if the option was set.
       * Support for the ``'MATLAB_fields'`` Attribute for data types
         that are structures in MATLAB has been added for when the
         version of the h5py package being used is ``2.3`` or greater.
         Support is still missing for earlier versions (this package
         requires a minimum version of ``2.1``).
       * The check for non-unicode string keys (``str`` in Python 3 and
         ``unicode`` in Python 2) in the type ``dict`` is done right
         before any changes are made to the HDF5 file instead of in the
         middle so that no changes are applied if an invalid key is
         present.
       * HDF5 userblock set with the proper metadata for MATLAB support
         right at the beginning of when data is being written to an HDF5
         file instead of at the end, meaning the writing can crash and
         the file will still be a valid MATLAB file.

0.1.1. Bugfix release fixing the following bugs.
       * ``str`` is now written like ``numpy.str_`` instead of
         ``numpy.bytes_``.
       * Complex numbers where the real or imaginary part are ``nan``
         but the other part are not are now read correctly as opposed
         to setting both parts to ``nan``.
       * Fixed bugs in string conversions on Python 2 resulting from
         ``str.decode()`` and ``unicode.encode()`` not taking the same
         keyword arguments as in Python 3.
       * MATLAB structure arrays can now be read without producing an
         error on Python 2.
       * ``numpy.str_`` now written as ``numpy.uint16`` on Python 2 if
         the ``convert_numpy_str_to_utf16`` option is set and the
         conversion can be done without using UTF-16 doublets, instead
         of always writing them as ``numpy.uint32``.

0.1. Initial version.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/frejanordsiek/hdf5storage",
    "name": "hdf5storage",
    "maintainer": "",
    "docs_url": "https://pythonhosted.org/hdf5storage/",
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "hdf5 matlab",
    "author": "Freja Nordsiek",
    "author_email": "fnordsie@posteo.net",
    "download_url": "https://files.pythonhosted.org/packages/c6/9e/ce420a7bd4c5092abea7cdc939f401dffa3936be586b316a771f8a1591fe/hdf5storage-0.1.19.tar.gz",
    "platform": null,
    "description": "Overview\n========\n\nThis Python package provides high level utilities to read/write a\nvariety of Python types to/from HDF5 (Heirarchal Data Format) formatted\nfiles. This package also provides support for MATLAB MAT v7.3 formatted\nfiles, which are just HDF5 files with a different extension and some\nextra meta-data.\n\nAll of this is done without pickling data. Pickling is bad for security\nbecause it allows arbitrary code to be executed in the interpreter. One\nwants to be able to read possibly HDF5 and MAT files from untrusted\nsources, so pickling is avoided in this package.\n\nThe package's documetation is found at\nhttp://pythonhosted.org/hdf5storage/\n\nThe package's source code is found at\nhttps://github.com/frejanordsiek/hdf5storage\n\nThe package is licensed under a 2-clause BSD license\n(https://github.com/frejanordsiek/hdf5storage/blob/master/COPYING.txt).\n\nInstallation\n============\n\nDependencies\n------------\n\nThis package only supports Python >= 2.6.\n\nThis package requires the numpy and h5py (>= 2.1) packages to run. Note\nthat full functionality requires h5py >= 2.3. An optional dependency is\nthe scipy package.\n\nInstalling by pip\n-----------------\n\nThis package is on `PyPI <https://pypi.python.org/pypi/hdf5storage>`_.\nTo install hdf5storage using pip, run the command::\n\n    pip install hdf5storage\n\nInstalling from Source\n----------------------\n\nTo install hdf5storage from source, download the package and then\ninstall the dependencies ::\n\n    pip install -r requirements.txt\n\nThen to install the package, run the command with Python ::\n\n    python setup.py install\n\nRunning Tests\n-------------\n\nFor testing, the package nose (>= 1.0) is required as well as unittest2\non Python 2.6. There are some tests that require Matlab and scipy to be\ninstalled and be in the executable path. Not having them means that\nthose tests cannot be run (they will be skipped) but all the other\ntests will run. To install all testing dependencies, other than scipy,\nrun ::\n\n    pip install -r requirements_tests.txt.\n\nTo run the tests ::\n\n    python setup.py nosetests\n\n\nBuilding Documentation\n----------------------\n\nThe documentation additionally requires sphinx (>= 1.7). The\ndocumentation dependencies can be installed by ::\n\n    pip install -r requirements_doc.txt\n\nTo build the documentation ::\n\n    python setup.py build_sphinx\n\nPython 2\n========\n\nThis package was designed and written for Python 3, with Python 2.7 and\n2.6 support added later. This does mean that a few things are a little\nclunky in Python 2. Examples include requiring ``unicode`` keys for\ndictionaries, the ``int`` and ``long`` types both being mapped to the\nPython 3 ``int`` type, etc. The storage format's metadata looks more\nfamiliar from a Python 3 standpoint as well.\n\nThe documentation is written in terms of Python 3 syntax and types\nprimarily. Important Python 2 information beyond direct translations of\nsyntax and types will be pointed out.\n\nHierarchal Data Format 5 (HDF5)\n===============================\n\nHDF5 files (see http://www.hdfgroup.org/HDF5/) are a commonly used file\nformat for exchange of numerical data. It has built in support for a\nlarge variety of number formats (un/signed integers, floating point\nnumbers, strings, etc.) as scalars and arrays, enums and compound types.\nIt also handles differences in data representation on different hardware\nplatforms (endianness, different floating point formats, etc.). As can\nbe imagined from the name, data is represented in an HDF5 file in a\nhierarchal form modelling a Unix filesystem (Datasets are equivalent to\nfiles, Groups are equivalent to directories, and links are supported).\n\nThis package interfaces HDF5 files using the h5py package\n(http://www.h5py.org/) as opposed to the PyTables package\n(http://www.pytables.org/).\n\nMATLAB MAT v7.3 file support\n============================\n\nMATLAB (http://www.mathworks.com/) MAT files version 7.3 and later are\nHDF5 files with a different file extension (``.mat``) and a very\nspecific set of meta-data and storage conventions. This package provides\nread and write support for a limited set of Python and MATLAB types.\n\nSciPy (http://scipy.org/) has functions to read and write the older MAT\nfile formats. This package has functions modeled after the\n``scipy.io.savemat`` and ``scipy.io.loadmat`` functions, that have the\nsame names and similar arguments. The dispatch to the SciPy versions if\nthe MAT file format is not an HDF5 based one.\n\nSupported Types\n===============\n\nThe supported Python and MATLAB types are given in the tables below.\nThe tables assume that one has imported collections and numpy as::\n\n    import collections as cl\n    import numpy as np\n\nThe table gives which Python types can be read and written, the first\nversion of this package to support it, the numpy type it gets\nconverted to for storage (if type information is not written, that\nwill be what it is read back as) the MATLAB class it becomes if\ntargetting a MAT file, and the first version of this package to\nsupport writing it so MATlAB can read it.\n\n===============  =======  ==========================  ===========  ==============\nPython                                                MATLAB\n----------------------------------------------------  ---------------------------\nType             Version  Converted to                Class        Version\n===============  =======  ==========================  ===========  ==============\nbool             0.1      np.bool\\_ or np.uint8       logical      0.1 [1]_\nNone             0.1      ``np.float64([])``          ``[]``       0.1\nint [2]_ [3]_    0.1      np.int64 [2]_               int64        0.1\nlong [3]_ [4]_   0.1      np.int64                    int64        0.1\nfloat            0.1      np.float64                  double       0.1\ncomplex          0.1      np.complex128               double       0.1\nstr              0.1      np.uint32/16                char         0.1 [5]_\nbytes            0.1      np.bytes\\_ or np.uint16     char         0.1 [6]_\nbytearray        0.1      np.bytes\\_ or np.uint16     char         0.1 [6]_\nlist             0.1      np.object\\_                 cell         0.1\ntuple            0.1      np.object\\_                 cell         0.1\nset              0.1      np.object\\_                 cell         0.1\nfrozenset        0.1      np.object\\_                 cell         0.1\ncl.deque         0.1      np.object\\_                 cell         0.1\ndict             0.1                                  struct       0.1 [7]_\nnp.bool\\_        0.1                                  logical      0.1\nnp.void          0.1\nnp.uint8         0.1                                  uint8        0.1\nnp.uint16        0.1                                  uint16       0.1\nnp.uint32        0.1                                  uint32       0.1\nnp.uint64        0.1                                  uint64       0.1\nnp.uint8         0.1                                  int8         0.1\nnp.int16         0.1                                  int16        0.1\nnp.int32         0.1                                  int32        0.1\nnp.int64         0.1                                  int64        0.1\nnp.float16 [8]_  0.1\nnp.float32       0.1                                  single       0.1\nnp.float64       0.1                                  double       0.1\nnp.complex64     0.1                                  single       0.1\nnp.complex128    0.1                                  double       0.1\nnp.str\\_         0.1      np.uint32/16                char/uint32  0.1 [5]_\nnp.bytes\\_       0.1      np.bytes\\_ or np.uint16     char         0.1 [6]_\nnp.object\\_      0.1                                  cell         0.1\nnp.ndarray       0.1      [9]_ [10]_                  [9]_ [10]_   0.1 [9]_ [11]_\nnp.matrix        0.1      [9]_                        [9]_         0.1 [9]_\nnp.chararray     0.1      [9]_                        [9]_         0.1 [9]_\nnp.recarray      0.1      structured np.ndarray       [9]_ [10]_   0.1 [9]_\n===============  =======  ==========================  ===========  ==============\n\n.. [1] Depends on the selected options. Always ``np.uint8`` when doing\n       MATLAB compatiblity, or if the option is explicitly set.\n.. [2] In Python 2.x, it may be read back as a ``long`` if it can't fit\n       in the size of an ``int``.\n.. [3] Must be small enough to fit into an ``np.int64``.\n.. [4] Type found only in Python 2.x. Python 2.x's ``long`` and ``int``\n       are unified into a single ``int`` type in Python 3.x. Read as an\n       ``int`` in Python 3.x.\n.. [5] Depends on the selected options and whether it can be converted\n       to UTF-16 without using doublets. If the option is explicity set\n       (or implicitly when doing MATLAB compatibility) and it can be\n       converted to UTF-16 without losing any characters that can't be\n       represented in UTF-16 or using UTF-16 doublets (MATLAB doesn't\n       support them), then it is written as ``np.uint16`` in UTF-16\n       encoding. Otherwise, it is stored at ``np.uint32`` in UTF-32\n       encoding.\n.. [6] Depends on the selected options. If the option is explicitly set\n       (or implicitly when doing MATLAB compatibility), it will be\n       stored as ``np.uint16`` in UTF-16 encoding unless it has\n       non-ASCII characters in which case a ``NotImplementedError`` is\n       thrown). Otherwise, it is just written as ``np.bytes_``.\n.. [7] All keys must be ``str`` in Python 3 or ``unicode`` in Python 2.\n       They cannot have null characters (``'\\x00'``) or forward slashes\n       (``'/'``) in them.\n.. [8] ``np.float16`` are not supported for h5py versions before\n       ``2.2``.\n.. [9] Container types are only supported if their underlying dtype is\n       supported. Data conversions are done based on its dtype.\n.. [10] Structured ``np.ndarray`` s (have fields in their dtypes) can be\n        written as an HDF5 COMPOUND type or as an HDF5 Group with\n        Datasets holding its fields (either the values directly, or as\n        an HDF5 Reference array to the values for the different elements\n        of the data). Can only be written as an HDF5 COMPOUND type if\n        none of its field are of dtype ``'object'``. Field names cannot\n        have null characters (``'\\x00'``) and, when writing as an HDF5\n        GROUP, forward slashes (``'/'``) in them.\n.. [11] Structured ``np.ndarray`` s with no elements, when written like a\n        structure, will not be read back with the right dtypes for their\n        fields (will all become 'object').\n\nThis table gives the MATLAB classes that can be read from a MAT file,\nthe first version of this package that can read them, and the Python\ntype they are read as.\n\n===============  =======  =================================\nMATLAB Class     Version  Python Type\n===============  =======  =================================\nlogical          0.1      np.bool\\_\nsingle           0.1      np.float32 or np.complex64 [12]_\ndouble           0.1      np.float64 or np.complex128 [12]_\nuint8            0.1      np.uint8\nuint16           0.1      np.uint16\nuint32           0.1      np.uint32\nuint64           0.1      np.uint64\nint8             0.1      np.int8\nint16            0.1      np.int16\nint32            0.1      np.int32\nint64            0.1      np.int64\nchar             0.1      np.str\\_\nstruct           0.1      structured np.ndarray\ncell             0.1      np.object\\_\ncanonical empty  0.1      ``np.float64([])``\n===============  =======  =================================\n\n.. [12] Depends on whether there is a complex part or not.\n\n\nFile Incompatibilities\n======================\n\nThe storage of empty ``numpy.ndarray`` (or objects that would be stored like\none) when the ``Options.store_shape_for_empty`` (implicitly set when Matlab\ncompatibility is enabled) is incompatible with both Matlab and the main\nbranch of this package after 2021-07-11 due to a bug (Issue #114) that cannot\nbe fixed without breaking compatibility in the 0.1.x series and thus\nwill not be fixed (it is however fixed in the main branch after 2021-07-11)\nsince such a fix would mean the version could not be of the form 0.1.x.\n\nThe incompatibility is caused by storing the array shape in the Dataset after\nreversing the dimension order instead of before, meaning that the array is\nread with its dimensions reversed from what is expected if read by Matlab\nor the main branch after 2021-07-11.\n\n\nVersions\n========\n\n0.1.19. Bugfix release.\n        * Issue #122 and #124. Replaced use of deprecated ``numpy.asscalar``\n          functions with the ``numpy.ndarray.item`` method.\n        * Issue #123. Forced the use of English month and day of the week names\n          in the HDF5 header for MATLAB compatibility.\n        * Issue #125. Fixed accidental collection of\n          ``pkg_resources.parse_version`` from setuptools as a Marshaller now\n          that it is a class.\n0.1.18. Performance improving release.\n        * Pull Request #111 from Daniel Hrisca. Many repeated calls to the\n          ``__getitem__`` methods of objects were turned into single calls.\n        * Further reducionts in ``__getitem__`` calls in the spirit of PR #111.\n\n0.1.17. Bugfix and deprecation workaround release that fixed the following.\n        * Issue #109. Fixed the fix Issue #102 for 32-bit platforms (previous\n          fix was segfaulting).\n        * Moved to using ``pkg_resources.parse_version`` from ``setuptools``\n          with ``distutils.version`` classes as a fallback instead of just the\n          later to prepare for the removal of ``distutils`` (PEP 632) and\n          prevent warnings on Python versions where it is marked as deprecated.\n        * Issue #110. Changed all uses of the ``tostring`` method on numpy types\n          to using ``tobytes`` if available, with ``tostring`` as the fallback\n          for old versions of numpy where it is not.\n\n0.1.16. Bugfix release that fixed the following bugs.\n        * Issue #81 and #82. ``h5py.File`` will require the mode to be\n          passed explicitly in the future. All calls without passing it were\n          fixed to pass it.\n        * Issue #102. Added support for h5py 3.0 and 3.1.\n        * Issue #73. Fixed bug where a missing variable in ``loadmat`` would\n          cause the function to think that the file is a pre v7.3 format MAT\n          file fall back to ``scipy.io.loadmat`` which won't work since the file\n          is a v7.3 format MAT file.\n        * Fixed formatting issues in the docstrings and the documentation that\n          prevented the documentation from building.\n\n0.1.15. Bugfix release that fixed the following bugs.\n        * Issue #68. Fixed bug where ``str`` and ``numpy.unicode_``\n          strings (but not ndarrays of them) were saved in\n          ``uint32`` format regardless of the value of\n          ``Options.convert_numpy_bytes_to_utf16``.\n        * Issue #70. Updated ``setup.py`` and ``requirements.txt`` to specify\n          the maximum versions of numpy and h5py that can be used for specific\n          python versions (avoid version with dropped support).\n        * Issue #71. Fixed bug where the ``'python_fields'`` attribute wouldn't\n          always be written when doing python metadata for data written in\n          a struct-like fashion. The bug caused the field order to not be\n          preserved when writing and reading.\n        * Fixed an assertion in the tests to handle field re-ordering when\n          no metadata is used for structured dtypes that only worked on\n          older versions of numpy.\n        * Issue #72. Fixed bug where python collections filled with ndarrays\n          that all have the same shape were converted to multi-dimensional\n          object ndarrays instead of a 1D object ndarray of the elements.\n\n0.1.14. Bugfix release that also added a couple features.\n        * Issue #45. Fixed syntax errors in unicode strings for Python\n          3.0 to 3.2.\n        * Issues #44 and #47. Fixed bugs in testing of conversion and\n          storage of string types.\n        * Issue #46. Fixed raising of ``RuntimeWarnings`` in tests due\n          to signalling NaNs.\n        * Added requirements files for building documentation and\n          running tests.\n        * Made it so that Matlab compatability tests are skipped if\n          Matlab is not found, instead of raising errors.\n\n0.1.13. Bugfix release fixing the following bug.\n        * Issue #36. Fixed bugs in writing ``int`` and ``long`` to HDF5\n          and their tests on 32 bit systems.\n\n0.1.12. Bugfix release fixing the following bugs. In addition, copyright years were also updated and notices put in the Matlab files used for testing.\n        * Issue #32. Fixed transposing before reshaping ``np.ndarray``\n          when reading from HDF5 files where python metadata was stored\n          but not Matlab metadata.\n        * Issue #33. Fixed the loss of the number of characters when\n          reading empty numpy string arrays.\n        * Issue #34. Fixed a conversion error when ``np.chararray`` are\n          written with Matlab metadata.\n\n0.1.11. Bugfix release fixing the following.\n        * Issue #30. Fixed ``loadmat`` not opening files in read mode.\n\n0.1.10. Minor feature/performance fix release doing the following.\n        * Issue #29. Added ``writes`` and ``reads`` functions to write\n          and read more than one piece of data at a time and made\n          ``savemat`` and ``loadmat`` use them to increase performance.\n          Previously, the HDF5 file was being opened and closed for\n          each piece of data, which impacted performance, especially\n\t  for large files.\n\n0.1.9. Bugfix and minor feature release doing the following.\n       * Issue #23. Fixed bug where a structured ``np.ndarray`` with\n         a field name of ``'O'`` could never be written as an\n         HDF5 COMPOUND Dataset (falsely thought a field's dtype was\n         object).\n       * Issue #6. Added optional data compression and the storage of\n         data checksums. Controlled by several new options.\n\n0.1.8. Bugfix release fixing the following two bugs.\n       * Issue #21. Fixed bug where the ``'MATLAB_class'`` Attribute is\n         not set when writing ``dict`` types when writing MATLAB\n         metadata.\n       * Issue #22. Fixed bug where null characters (``'\\x00'``) and\n         forward slashes (``'/'``) were allowed in ``dict`` keys and the\n         field names of structured ``np.ndarray`` (except that forward\n         slashes are allowed when the\n         ``structured_numpy_ndarray_as_struct`` is not set as is the\n         case when the ``matlab_compatible`` option is set). These\n         cause problems for the ``h5py`` package and the HDF5 library.\n         ``NotImplementedError`` is now thrown in these cases.\n\n0.1.7. Bugfix release with an added compatibility option and some added test code. Did the following.\n       * Fixed an issue reading variables larger than 2 GB in MATLAB\n         MAT v7.3 files when no explicit variable names to read are\n         given to ``hdf5storage.loadmat``. Fix also reduces memory\n         consumption and processing time a little bit by removing an\n         unneeded memory copy.\n       * ``Options`` now will accept any additional keyword arguments it\n         doesn't support, ignoring them, to be API compatible with future\n         package versions with added options.\n       * Added tests for reading data that has been compressed or had\n         other HDF5 filters applied.\n\n0.1.6. Bugfix release fixing a bug with determining the maximum size of a Python 2.x ``int`` on a 32-bit system.\n\n0.1.5. Bugfix release fixing the following bug.\n       * Fixed bug where an ``int`` could be stored that is too big to\n         fit into an ``int`` when read back in Python 2.x. When it is\n         too big, it is converted to a ``long``.\n       * Fixed a bug where an ``int`` or ``long`` that is too big to\n\t big to fit into an ``np.int64`` raised the wrong exception.\n       * Fixed bug where fields names for structured ``np.ndarray`` with\n         non-ASCII characters (assumed to be UTF-8 encoded in\n         Python 2.x) can't be read or written properly.\n       * Fixed bug where ``np.bytes_`` with non-ASCII characters can\n         were converted incorrectly to UTF-16 when that option is set\n         (set implicitly when doing MATLAB compatibility). Now, it throws\n         a ``NotImplementedError``.\n\n0.1.4. Bugfix release fixing the following bugs. Thanks goes to `mrdomino <https://github.com/mrdomino>`_ for writing the bug fixes.\n       * Fixed bug where ``dtype`` is used as a keyword parameter of\n         ``np.ndarray.astype`` when it is a positional argument.\n       * Fixed error caused by ``h5py.__version__`` being absent on\n         Ubuntu 12.04.\n\n0.1.3. Bugfix release fixing the following bug.\n       * Fixed broken ability to correctly read and write empty\n         structured ``np.ndarray`` (has fields).\n\n0.1.2. Bugfix release fixing the following bugs.\n       * Removed mistaken support for ``np.float16`` for h5py versions\n         before ``2.2`` since that was when support for it was\n         introduced.\n       * Structured ``np.ndarray`` where one or more fields is of the\n         ``'object'`` dtype can now be written without an error when\n         the ``structured_numpy_ndarray_as_struct`` option is not set.\n         They are written as an HDF5 Group, as if the option was set.\n       * Support for the ``'MATLAB_fields'`` Attribute for data types\n         that are structures in MATLAB has been added for when the\n         version of the h5py package being used is ``2.3`` or greater.\n         Support is still missing for earlier versions (this package\n         requires a minimum version of ``2.1``).\n       * The check for non-unicode string keys (``str`` in Python 3 and\n         ``unicode`` in Python 2) in the type ``dict`` is done right\n         before any changes are made to the HDF5 file instead of in the\n         middle so that no changes are applied if an invalid key is\n         present.\n       * HDF5 userblock set with the proper metadata for MATLAB support\n         right at the beginning of when data is being written to an HDF5\n         file instead of at the end, meaning the writing can crash and\n         the file will still be a valid MATLAB file.\n\n0.1.1. Bugfix release fixing the following bugs.\n       * ``str`` is now written like ``numpy.str_`` instead of\n         ``numpy.bytes_``.\n       * Complex numbers where the real or imaginary part are ``nan``\n         but the other part are not are now read correctly as opposed\n         to setting both parts to ``nan``.\n       * Fixed bugs in string conversions on Python 2 resulting from\n         ``str.decode()`` and ``unicode.encode()`` not taking the same\n         keyword arguments as in Python 3.\n       * MATLAB structure arrays can now be read without producing an\n         error on Python 2.\n       * ``numpy.str_`` now written as ``numpy.uint16`` on Python 2 if\n         the ``convert_numpy_str_to_utf16`` option is set and the\n         conversion can be done without using UTF-16 doublets, instead\n         of always writing them as ``numpy.uint32``.\n\n0.1. Initial version.\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Utilities to read/write Python types to/from HDF5 files, including MATLAB v7.3 MAT files.",
    "version": "0.1.19",
    "split_keywords": [
        "hdf5",
        "matlab"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ec29ed9f2df3e77400b5312787b4ade31791e8eca91a39a7ccd80677490f4ea5",
                "md5": "a82f10db9add1f9dd449baf407b85435",
                "sha256": "5d49b4a1c6e3047d2709045b0b918a943903117b8015e959f4a51d95b49ea204"
            },
            "downloads": -1,
            "filename": "hdf5storage-0.1.19-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a82f10db9add1f9dd449baf407b85435",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 53631,
            "upload_time": "2023-01-20T11:55:42",
            "upload_time_iso_8601": "2023-01-20T11:55:42.190767Z",
            "url": "https://files.pythonhosted.org/packages/ec/29/ed9f2df3e77400b5312787b4ade31791e8eca91a39a7ccd80677490f4ea5/hdf5storage-0.1.19-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c69ece420a7bd4c5092abea7cdc939f401dffa3936be586b316a771f8a1591fe",
                "md5": "50b75e5e24b6e1bcb3cd3af2739b7f6a",
                "sha256": "7a1a6badf546e8942f4d22d598aee14021796bc28918519c9687a6abb0eeef86"
            },
            "downloads": -1,
            "filename": "hdf5storage-0.1.19.tar.gz",
            "has_sig": false,
            "md5_digest": "50b75e5e24b6e1bcb3cd3af2739b7f6a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 100839,
            "upload_time": "2023-01-20T11:55:45",
            "upload_time_iso_8601": "2023-01-20T11:55:45.043465Z",
            "url": "https://files.pythonhosted.org/packages/c6/9e/ce420a7bd4c5092abea7cdc939f401dffa3936be586b316a771f8a1591fe/hdf5storage-0.1.19.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-20 11:55:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "frejanordsiek",
    "github_project": "hdf5storage",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "hdf5storage"
}
        
Elapsed time: 0.05589s