darr


Namedarr JSON
Version 0.5.5 PyPI version JSON
download
home_pagehttps://github.com/gbeckers/darr
SummaryStores NumPy arrays in a way that is self-documented and tool-independent.
upload_time2023-09-20 12:36:07
maintainer
docs_urlNone
authorGabriel J.L. Beckers
requires_python>=3.9
licenseBSD-3
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage
            |Github CI Status| |Appveyor Status| |PyPi version| |Conda Forge|
|Codecov Badge| |Docs Status| |Zenodo Badge|

Darr is a Python science library that allows you to work efficiently with 
potentially very large, disk-based Numpy arrays that are self-documented. 
Documentation includes copy-paste ready code to read arrays in many popular 
data science languages such as R, Julia, Scilab, IDL, Matlab, Maple, and 
Mathematica, or in Python/Numpy without Darr. Without exporting 
them and with minimal effort.

Universal readability of data is a pillar of good scientific practice. It is
also generally a good idea for anyone who wants to flexibly move between
analysis environments, who wants to save data for the longer term, or who
wants to share data with others without spending much time on figuring out
and/or explaining how the receiver can read it. No idea how to read 
your 7-dimensional uint32 numpy array in Matlab to quickly try out an 
algorithm your colleague wrote? No worries, a quick copy-paste of code from 
the array documentation is all that is needed to read your data in, e.g. R or 
Matlab (see `example
<https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).
As you work with your array, its documentation is automatically kept 
up to date. No need to export anything, make notes, or to provide elaborate 
explanation. No looking up things. No dependence on complicated formats or 
specialized libraries for reading you data elsewhere later. 

In essence, Darr makes it trivially easy to share your numerical arrays with
others or with yourself when working in different computing environments,
and stores them in a future-proof way.

More rationale for a tool-independent approach to numeric array storage is
provided `here <https://darr.readthedocs.io/en/latest/rationale.html>`__.

Under the hood, Darr uses NumPy memory-mapped arrays, which is a widely
established and trusted way of working with disk-based numerical data, and
which makes Darr fully NumPy compatible. This enables efficient out-of-core
read/write access to potentially very large arrays. In addition to
automatic documentation, Darr adds other functionality to NumPy's memmap,
such as easy the appending and truncating of data, support for ragged arrays,
the ability to create arrays from iterators, and easy use of metadata. Flat
binary files and (JSON) text files are accompanied by a README text file
that explains how the array and metadata are stored (`see example arrays
<https://github.com/gbeckers/Darr/tree/master/examplearrays/>`__).

See this `tutorial <https://darr.readthedocs.io/en/latest/tutorialarray.html>`__
for a brief introduction, or the
`documentation <http://darr.readthedocs.io/>`__ for more info.

Darr is currently pre-1.0, still undergoing development. It is open source and
freely available under the `New BSD License
<https://opensource.org/licenses/BSD-3-Clause>`__ terms.

Features
--------
-  Data is stored purely based on flat binary and text files, maximizing
   universal readability.
-  Automatic self-documention, including copy-paste ready code snippets for
   reading the array in a number of popular data analysis environments, such as
   Python (without Darr), R, Julia, Scilab, Octave/Matlab, GDL/IDL, and
   Mathematica
   (see `example array
   <https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).
-  Disk-persistent array data is directly accessible through `NumPy
   indexing <https://numpy.org/doc/stable/reference/arrays.indexing.html>`__
   and may be larger than RAM
-  Easy and efficient appending of data (`see example <https://darr.readthedocs.io/en/latest/tutorialarray.html#appending-data>`__).
-  Supports ragged arrays.
-  Easy use of metadata, stored in a widely readable separate
   JSON text file (`see example
   <https://darr.readthedocs.io/en/latest/tutorialarray.html#metadata>`__).
-  Many numeric types are supported: (u)int8-(u)int64, float16-float64,
   complex64, complex128.
-  Integrates easily with the `Dask <https://dask.pydata.org/en/latest/>`__
   library for out-of-core computation on very large arrays.
-  Minimal dependencies, only `NumPy <http://www.numpy.org/>`__.

Drawbacks
---------
- No compression, although compression for archiving purposes is supported.
- Array storages uses multiple files as binary data is separated from text 
  documentation and metadata.

See the `documentation <http://darr.readthedocs.io/>`__ for more information.

.. |Github CI Status| image:: https://github.com/gbeckers/Darr/actions/workflows/python_package.yml/badge.svg
   :target: https://github.com/gbeckers/Darr/actions/workflows/python_package.yml
.. |Appveyor Status| image:: https://ci.appveyor.com/api/projects/status/github/gbeckers/darr?svg=true
   :target: https://ci.appveyor.com/project/gbeckers/darr
.. |PyPi version| image:: https://img.shields.io/badge/pypi-0.5.5-orange.svg
   :target: https://pypi.org/project/darr/
.. |Conda Forge| image:: https://anaconda.org/conda-forge/darr/badges/version.svg
   :target: https://anaconda.org/conda-forge/darr
.. |Docs Status| image:: https://readthedocs.org/projects/darr/badge/?version=stable
   :target: https://darr.readthedocs.io/en/stable/
.. |Repo Status| image:: https://www.repostatus.org/badges/latest/active.svg
   :alt: Project Status: Active – The project has reached a stable, usable state and is being actively developed.
   :target: https://www.repostatus.org/#active
.. |Codacy Badge| image:: https://api.codacy.com/project/badge/Grade/c0157592ce7a4ecca5f7d8527874ce54
   :alt: Codacy Badge
   :target: https://app.codacy.com/app/gbeckers/Darr?utm_source=github.com&utm_medium=referral&utm_content=gbeckers/Darr&utm_campaign=Badge_Grade_Dashboard
.. |Zenodo Badge| image:: https://zenodo.org/badge/151593293.svg
   :target: https://zenodo.org/badge/latestdoi/151593293
.. |Codecov Badge| image:: https://codecov.io/gh/gbeckers/Darr/branch/master/graph/badge.svg?token=BBV0WDIUSJ
   :target: https://codecov.io/gh/gbeckers/Darr


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/gbeckers/darr",
    "name": "darr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "",
    "author": "Gabriel J.L. Beckers",
    "author_email": "gabriel@gbeckers.nl",
    "download_url": "https://files.pythonhosted.org/packages/fb/41/a7165e3286779432078fd6362aca838fe7fd5156b3d61dfe26b7322a76d6/darr-0.5.5.tar.gz",
    "platform": null,
    "description": "|Github CI Status| |Appveyor Status| |PyPi version| |Conda Forge|\r\n|Codecov Badge| |Docs Status| |Zenodo Badge|\r\n\r\nDarr is a Python science library that allows you to work efficiently with \r\npotentially very large, disk-based Numpy arrays that are self-documented. \r\nDocumentation includes copy-paste ready code to read arrays in many popular \r\ndata science languages such as R, Julia, Scilab, IDL, Matlab, Maple, and \r\nMathematica, or in Python/Numpy without Darr. Without exporting \r\nthem and with minimal effort.\r\n\r\nUniversal readability of data is a pillar of good scientific practice. It is\r\nalso generally a good idea for anyone who wants to flexibly move between\r\nanalysis environments, who wants to save data for the longer term, or who\r\nwants to share data with others without spending much time on figuring out\r\nand/or explaining how the receiver can read it. No idea how to read \r\nyour 7-dimensional uint32 numpy array in Matlab to quickly try out an \r\nalgorithm your colleague wrote? No worries, a quick copy-paste of code from \r\nthe array documentation is all that is needed to read your data in, e.g. R or \r\nMatlab (see `example\r\n<https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).\r\nAs you work with your array, its documentation is automatically kept \r\nup to date. No need to export anything, make notes, or to provide elaborate \r\nexplanation. No looking up things. No dependence on complicated formats or \r\nspecialized libraries for reading you data elsewhere later. \r\n\r\nIn essence, Darr makes it trivially easy to share your numerical arrays with\r\nothers or with yourself when working in different computing environments,\r\nand stores them in a future-proof way.\r\n\r\nMore rationale for a tool-independent approach to numeric array storage is\r\nprovided `here <https://darr.readthedocs.io/en/latest/rationale.html>`__.\r\n\r\nUnder the hood, Darr uses NumPy memory-mapped arrays, which is a widely\r\nestablished and trusted way of working with disk-based numerical data, and\r\nwhich makes Darr fully NumPy compatible. This enables efficient out-of-core\r\nread/write access to potentially very large arrays. In addition to\r\nautomatic documentation, Darr adds other functionality to NumPy's memmap,\r\nsuch as easy the appending and truncating of data, support for ragged arrays,\r\nthe ability to create arrays from iterators, and easy use of metadata. Flat\r\nbinary files and (JSON) text files are accompanied by a README text file\r\nthat explains how the array and metadata are stored (`see example arrays\r\n<https://github.com/gbeckers/Darr/tree/master/examplearrays/>`__).\r\n\r\nSee this `tutorial <https://darr.readthedocs.io/en/latest/tutorialarray.html>`__\r\nfor a brief introduction, or the\r\n`documentation <http://darr.readthedocs.io/>`__ for more info.\r\n\r\nDarr is currently pre-1.0, still undergoing development. It is open source and\r\nfreely available under the `New BSD License\r\n<https://opensource.org/licenses/BSD-3-Clause>`__ terms.\r\n\r\nFeatures\r\n--------\r\n-  Data is stored purely based on flat binary and text files, maximizing\r\n   universal readability.\r\n-  Automatic self-documention, including copy-paste ready code snippets for\r\n   reading the array in a number of popular data analysis environments, such as\r\n   Python (without Darr), R, Julia, Scilab, Octave/Matlab, GDL/IDL, and\r\n   Mathematica\r\n   (see `example array\r\n   <https://github.com/gbeckers/Darr/tree/master/examplearrays/arrays/array_int32_2D.darr>`__).\r\n-  Disk-persistent array data is directly accessible through `NumPy\r\n   indexing <https://numpy.org/doc/stable/reference/arrays.indexing.html>`__\r\n   and may be larger than RAM\r\n-  Easy and efficient appending of data (`see example <https://darr.readthedocs.io/en/latest/tutorialarray.html#appending-data>`__).\r\n-  Supports ragged arrays.\r\n-  Easy use of metadata, stored in a widely readable separate\r\n   JSON text file (`see example\r\n   <https://darr.readthedocs.io/en/latest/tutorialarray.html#metadata>`__).\r\n-  Many numeric types are supported: (u)int8-(u)int64, float16-float64,\r\n   complex64, complex128.\r\n-  Integrates easily with the `Dask <https://dask.pydata.org/en/latest/>`__\r\n   library for out-of-core computation on very large arrays.\r\n-  Minimal dependencies, only `NumPy <http://www.numpy.org/>`__.\r\n\r\nDrawbacks\r\n---------\r\n- No compression, although compression for archiving purposes is supported.\r\n- Array storages uses multiple files as binary data is separated from text \r\n  documentation and metadata.\r\n\r\nSee the `documentation <http://darr.readthedocs.io/>`__ for more information.\r\n\r\n.. |Github CI Status| image:: https://github.com/gbeckers/Darr/actions/workflows/python_package.yml/badge.svg\r\n   :target: https://github.com/gbeckers/Darr/actions/workflows/python_package.yml\r\n.. |Appveyor Status| image:: https://ci.appveyor.com/api/projects/status/github/gbeckers/darr?svg=true\r\n   :target: https://ci.appveyor.com/project/gbeckers/darr\r\n.. |PyPi version| image:: https://img.shields.io/badge/pypi-0.5.5-orange.svg\r\n   :target: https://pypi.org/project/darr/\r\n.. |Conda Forge| image:: https://anaconda.org/conda-forge/darr/badges/version.svg\r\n   :target: https://anaconda.org/conda-forge/darr\r\n.. |Docs Status| image:: https://readthedocs.org/projects/darr/badge/?version=stable\r\n   :target: https://darr.readthedocs.io/en/stable/\r\n.. |Repo Status| image:: https://www.repostatus.org/badges/latest/active.svg\r\n   :alt: Project Status: Active \u2013 The project has reached a stable, usable state and is being actively developed.\r\n   :target: https://www.repostatus.org/#active\r\n.. |Codacy Badge| image:: https://api.codacy.com/project/badge/Grade/c0157592ce7a4ecca5f7d8527874ce54\r\n   :alt: Codacy Badge\r\n   :target: https://app.codacy.com/app/gbeckers/Darr?utm_source=github.com&utm_medium=referral&utm_content=gbeckers/Darr&utm_campaign=Badge_Grade_Dashboard\r\n.. |Zenodo Badge| image:: https://zenodo.org/badge/151593293.svg\r\n   :target: https://zenodo.org/badge/latestdoi/151593293\r\n.. |Codecov Badge| image:: https://codecov.io/gh/gbeckers/Darr/branch/master/graph/badge.svg?token=BBV0WDIUSJ\r\n   :target: https://codecov.io/gh/gbeckers/Darr\r\n\r\n",
    "bugtrack_url": null,
    "license": "BSD-3",
    "summary": "Stores NumPy arrays in a way that is self-documented and tool-independent.",
    "version": "0.5.5",
    "project_urls": {
        "Homepage": "https://github.com/gbeckers/darr",
        "Source": "https://github.com/gbeckers/darr"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "31430692b27b09b1e921b8774fbd19a41e1d2f73b783a64492cc44b332b643df",
                "md5": "7e801cbac83108b33b44c36ef33c1ce8",
                "sha256": "d215a461ecd60d7a1405f2f55a083de82fefce0715bbcb23c9520fe8f79ed695"
            },
            "downloads": -1,
            "filename": "darr-0.5.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7e801cbac83108b33b44c36ef33c1ce8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 55506,
            "upload_time": "2023-09-20T12:36:05",
            "upload_time_iso_8601": "2023-09-20T12:36:05.905406Z",
            "url": "https://files.pythonhosted.org/packages/31/43/0692b27b09b1e921b8774fbd19a41e1d2f73b783a64492cc44b332b643df/darr-0.5.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb41a7165e3286779432078fd6362aca838fe7fd5156b3d61dfe26b7322a76d6",
                "md5": "3b98b3859aae36e3f36653333e8160bf",
                "sha256": "2be3b5412bc9f3756a16c157b97137b499f28791b2f0e5a27b473999de8c4901"
            },
            "downloads": -1,
            "filename": "darr-0.5.5.tar.gz",
            "has_sig": false,
            "md5_digest": "3b98b3859aae36e3f36653333e8160bf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 67427,
            "upload_time": "2023-09-20T12:36:07",
            "upload_time_iso_8601": "2023-09-20T12:36:07.951804Z",
            "url": "https://files.pythonhosted.org/packages/fb/41/a7165e3286779432078fd6362aca838fe7fd5156b3d61dfe26b7322a76d6/darr-0.5.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-20 12:36:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gbeckers",
    "github_project": "darr",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": true,
    "appveyor": true,
    "requirements": [],
    "tox": true,
    "lcname": "darr"
}
        
Elapsed time: 0.11555s