kerchunk


Namekerchunk JSON
Version 0.2.9 PyPI version JSON
download
home_pageNone
SummaryFunctions to make reference descriptions for ReferenceFileSystem
upload_time2025-07-31 14:36:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # kerchunk

Cloud-friendly access to archival data

[![Docs](https://github.com/fsspec/kerchunk/actions/workflows/default.yml/badge.svg)](https://fsspec.github.io/kerchunk/)
[![Tests](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml/badge.svg)](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml)
[![Pypi](https://img.shields.io/pypi/v/kerchunk.svg)](https://pypi.python.org/pypi/kerchunk/)
[![Conda-forge](https://img.shields.io/conda/vn/conda-forge/kerchunk.svg)](https://anaconda.org/conda-forge/kerchunk)

Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed
data formats (e.g. NetCDF, HDF5, GRIB),
allowing efficient access to the data from traditional file systems or cloud object storage.
It also provides a flexible way to create
virtual datasets from multiple files.  It does this by extracting the byte ranges,
compression information and other information about the
data and storing this metadata in a new, separate object.  This means that you can
create a virtual aggregate dataset over potentially many source
files, for efficient, parallel and cloud-friendly *in-situ* access without having to copy or
translate the originals. It is a gateway to in-the-cloud massive data processing while
the data providers still insist on using legacy formats for archival storage.

*Why Kerchunk*:

We provide the following things:

- completely serverless architecture
- metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read
- read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http,
  cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)
- loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a
  single dataset, without a need to go via the specific driver (e.g., no need for h5py)
- asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency
- parallel access with a library like zarr without any locks
- logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate
  indexing across an arbitrary number of dimensions


<img alt="logo" src="./kerchunk.png" width="200"/>


For further information, please see [the documentation](https://fsspec.github.io/kerchunk/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kerchunk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Martin Durant <martin.durant@alumni.utoronto.ca>",
    "download_url": "https://files.pythonhosted.org/packages/01/4d/4e8a1d7780eebe3d869887835eaa63ccc0b5bd316a4c0ba793e0e73c848d/kerchunk-0.2.9.tar.gz",
    "platform": null,
    "description": "# kerchunk\n\nCloud-friendly access to archival data\n\n[![Docs](https://github.com/fsspec/kerchunk/actions/workflows/default.yml/badge.svg)](https://fsspec.github.io/kerchunk/)\n[![Tests](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml/badge.svg)](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml)\n[![Pypi](https://img.shields.io/pypi/v/kerchunk.svg)](https://pypi.python.org/pypi/kerchunk/)\n[![Conda-forge](https://img.shields.io/conda/vn/conda-forge/kerchunk.svg)](https://anaconda.org/conda-forge/kerchunk)\n\nKerchunk is a library that provides a unified way to represent a variety of chunked, compressed\ndata formats (e.g. NetCDF, HDF5, GRIB),\nallowing efficient access to the data from traditional file systems or cloud object storage.\nIt also provides a flexible way to create\nvirtual datasets from multiple files.  It does this by extracting the byte ranges,\ncompression information and other information about the\ndata and storing this metadata in a new, separate object.  This means that you can\ncreate a virtual aggregate dataset over potentially many source\nfiles, for efficient, parallel and cloud-friendly *in-situ* access without having to copy or\ntranslate the originals. It is a gateway to in-the-cloud massive data processing while\nthe data providers still insist on using legacy formats for archival storage.\n\n*Why Kerchunk*:\n\nWe provide the following things:\n\n- completely serverless architecture\n- metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read\n- read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http,\n  cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)\n- loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a\n  single dataset, without a need to go via the specific driver (e.g., no need for h5py)\n- asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency\n- parallel access with a library like zarr without any locks\n- logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate\n  indexing across an arbitrary number of dimensions\n\n\n<img alt=\"logo\" src=\"./kerchunk.png\" width=\"200\"/>\n\n\nFor further information, please see [the documentation](https://fsspec.github.io/kerchunk/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Functions to make reference descriptions for ReferenceFileSystem",
    "version": "0.2.9",
    "project_urls": {
        "Documentation": "https://fsspec.github.io/kerchunk",
        "issue-tracker": "https://github.com/fsspec/kerchunk/issues",
        "source-code": "https://github.com/fsspec/kerchunk"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e096355144a51c7dd7444e7550b26addfce45f31bc59c4a258d6efc84f585af4",
                "md5": "fb5eeead73558a5f30696796fcd6deee",
                "sha256": "5f7ff385ace07c62bd0ae5db639cac8b441169fea2389bb006fd209549f699c2"
            },
            "downloads": -1,
            "filename": "kerchunk-0.2.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fb5eeead73558a5f30696796fcd6deee",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 66440,
            "upload_time": "2025-07-31T14:36:16",
            "upload_time_iso_8601": "2025-07-31T14:36:16.628747Z",
            "url": "https://files.pythonhosted.org/packages/e0/96/355144a51c7dd7444e7550b26addfce45f31bc59c4a258d6efc84f585af4/kerchunk-0.2.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "014d4e8a1d7780eebe3d869887835eaa63ccc0b5bd316a4c0ba793e0e73c848d",
                "md5": "fb468e74f07db0c140de9de906a25a57",
                "sha256": "86a54da9a57a94fd6fb97be786e2d83182d3d8e4fd7c0ea2b67cde3d0641df7d"
            },
            "downloads": -1,
            "filename": "kerchunk-0.2.9.tar.gz",
            "has_sig": false,
            "md5_digest": "fb468e74f07db0c140de9de906a25a57",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 712534,
            "upload_time": "2025-07-31T14:36:17",
            "upload_time_iso_8601": "2025-07-31T14:36:17.699898Z",
            "url": "https://files.pythonhosted.org/packages/01/4d/4e8a1d7780eebe3d869887835eaa63ccc0b5bd316a4c0ba793e0e73c848d/kerchunk-0.2.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-31 14:36:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fsspec",
    "github_project": "kerchunk",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "kerchunk"
}
        
Elapsed time: 1.24750s