Name | kerchunk JSON |
Version |
0.2.9
JSON |
| download |
home_page | None |
Summary | Functions to make reference descriptions for ReferenceFileSystem |
upload_time | 2025-07-31 14:36:17 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.11 |
license | MIT |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
|
# kerchunk
Cloud-friendly access to archival data
[](https://fsspec.github.io/kerchunk/)
[](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml)
[](https://pypi.python.org/pypi/kerchunk/)
[](https://anaconda.org/conda-forge/kerchunk)
Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed
data formats (e.g. NetCDF, HDF5, GRIB),
allowing efficient access to the data from traditional file systems or cloud object storage.
It also provides a flexible way to create
virtual datasets from multiple files. It does this by extracting the byte ranges,
compression information and other information about the
data and storing this metadata in a new, separate object. This means that you can
create a virtual aggregate dataset over potentially many source
files, for efficient, parallel and cloud-friendly *in-situ* access without having to copy or
translate the originals. It is a gateway to in-the-cloud massive data processing while
the data providers still insist on using legacy formats for archival storage.
*Why Kerchunk*:
We provide the following things:
- completely serverless architecture
- metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read
- read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http,
cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)
- loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a
single dataset, without a need to go via the specific driver (e.g., no need for h5py)
- asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency
- parallel access with a library like zarr without any locks
- logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate
indexing across an arbitrary number of dimensions
<img alt="logo" src="./kerchunk.png" width="200"/>
For further information, please see [the documentation](https://fsspec.github.io/kerchunk/).
Raw data
{
"_id": null,
"home_page": null,
"name": "kerchunk",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Martin Durant <martin.durant@alumni.utoronto.ca>",
"download_url": "https://files.pythonhosted.org/packages/01/4d/4e8a1d7780eebe3d869887835eaa63ccc0b5bd316a4c0ba793e0e73c848d/kerchunk-0.2.9.tar.gz",
"platform": null,
"description": "# kerchunk\n\nCloud-friendly access to archival data\n\n[](https://fsspec.github.io/kerchunk/)\n[](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml)\n[](https://pypi.python.org/pypi/kerchunk/)\n[](https://anaconda.org/conda-forge/kerchunk)\n\nKerchunk is a library that provides a unified way to represent a variety of chunked, compressed\ndata formats (e.g. NetCDF, HDF5, GRIB),\nallowing efficient access to the data from traditional file systems or cloud object storage.\nIt also provides a flexible way to create\nvirtual datasets from multiple files. It does this by extracting the byte ranges,\ncompression information and other information about the\ndata and storing this metadata in a new, separate object. This means that you can\ncreate a virtual aggregate dataset over potentially many source\nfiles, for efficient, parallel and cloud-friendly *in-situ* access without having to copy or\ntranslate the originals. It is a gateway to in-the-cloud massive data processing while\nthe data providers still insist on using legacy formats for archival storage.\n\n*Why Kerchunk*:\n\nWe provide the following things:\n\n- completely serverless architecture\n- metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read\n- read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http,\n cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)\n- loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a\n single dataset, without a need to go via the specific driver (e.g., no need for h5py)\n- asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency\n- parallel access with a library like zarr without any locks\n- logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate\n indexing across an arbitrary number of dimensions\n\n\n<img alt=\"logo\" src=\"./kerchunk.png\" width=\"200\"/>\n\n\nFor further information, please see [the documentation](https://fsspec.github.io/kerchunk/).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Functions to make reference descriptions for ReferenceFileSystem",
"version": "0.2.9",
"project_urls": {
"Documentation": "https://fsspec.github.io/kerchunk",
"issue-tracker": "https://github.com/fsspec/kerchunk/issues",
"source-code": "https://github.com/fsspec/kerchunk"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e096355144a51c7dd7444e7550b26addfce45f31bc59c4a258d6efc84f585af4",
"md5": "fb5eeead73558a5f30696796fcd6deee",
"sha256": "5f7ff385ace07c62bd0ae5db639cac8b441169fea2389bb006fd209549f699c2"
},
"downloads": -1,
"filename": "kerchunk-0.2.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fb5eeead73558a5f30696796fcd6deee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 66440,
"upload_time": "2025-07-31T14:36:16",
"upload_time_iso_8601": "2025-07-31T14:36:16.628747Z",
"url": "https://files.pythonhosted.org/packages/e0/96/355144a51c7dd7444e7550b26addfce45f31bc59c4a258d6efc84f585af4/kerchunk-0.2.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "014d4e8a1d7780eebe3d869887835eaa63ccc0b5bd316a4c0ba793e0e73c848d",
"md5": "fb468e74f07db0c140de9de906a25a57",
"sha256": "86a54da9a57a94fd6fb97be786e2d83182d3d8e4fd7c0ea2b67cde3d0641df7d"
},
"downloads": -1,
"filename": "kerchunk-0.2.9.tar.gz",
"has_sig": false,
"md5_digest": "fb468e74f07db0c140de9de906a25a57",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 712534,
"upload_time": "2025-07-31T14:36:17",
"upload_time_iso_8601": "2025-07-31T14:36:17.699898Z",
"url": "https://files.pythonhosted.org/packages/01/4d/4e8a1d7780eebe3d869887835eaa63ccc0b5bd316a4c0ba793e0e73c848d/kerchunk-0.2.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-31 14:36:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fsspec",
"github_project": "kerchunk",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "kerchunk"
}