Lakeshack
=========
.. image:: images/lakeshack_128.png
:align: center
:alt: A small rustic shack on the shores of a big lake
.. image:: ../../images/lakeshack_128.png
:align: center
:alt: A small rustic shack on the shores of a big lake
====================
A simplified data lakehouse, more of a data lakeshack, optimized for retrieving
filtered records from Parquet files. Similar to the various lakehouse solutions
(Iceberg, Hudi, Delta Lake), Lakeshack gathers up the min/max values for specified
columns from each Parquet file and stores them into a database (Metastore). When you
want to query for a set of records, it first checks the Metastore to get the list of
Parquet files that **might** have the desired records, and then only queries those
Parquet files. The files may be stored locally or in S3. You may query using either
native pyarrow or leverage S3 Select.
To acheive optimal performance, a partitioning & clustering strategy (which specifies
how the records are written to the Parquet files) should align with the main query
pattern expected to be used on the data. See the documentation for more information on
this.
Installation
============
Lakeshack may be install using pip::
pip install lakeshack
Documentation
=============
Documentation can be found at https://mhendrey.github.io/lakeshack
Raw data
{
"_id": null,
"home_page": "https://github.com/mhendrey/lakeshack",
"name": "lakeshack",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "pyarrow,s3,parquet",
"author": "Matthew Hendrey",
"author_email": "matthew.hendrey@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/75/35/2b9b341df5282f8d821a60f3db1ea0ed8fe64cef83e2af3c1816dead13cf/lakeshack-0.2.3.tar.gz",
"platform": null,
"description": "Lakeshack\n=========\n.. image:: images/lakeshack_128.png\n :align: center\n :alt: A small rustic shack on the shores of a big lake\n.. image:: ../../images/lakeshack_128.png\n :align: center\n :alt: A small rustic shack on the shores of a big lake\n\n====================\n\nA simplified data lakehouse, more of a data lakeshack, optimized for retrieving\nfiltered records from Parquet files. Similar to the various lakehouse solutions\n(Iceberg, Hudi, Delta Lake), Lakeshack gathers up the min/max values for specified\ncolumns from each Parquet file and stores them into a database (Metastore). When you\nwant to query for a set of records, it first checks the Metastore to get the list of\nParquet files that **might** have the desired records, and then only queries those\nParquet files. The files may be stored locally or in S3. You may query using either\nnative pyarrow or leverage S3 Select.\n\nTo acheive optimal performance, a partitioning & clustering strategy (which specifies\nhow the records are written to the Parquet files) should align with the main query\npattern expected to be used on the data. See the documentation for more information on\nthis.\n\nInstallation\n============\nLakeshack may be install using pip::\n\n pip install lakeshack\n\nDocumentation\n=============\nDocumentation can be found at https://mhendrey.github.io/lakeshack\n",
"bugtrack_url": null,
"license": "GNU GPLv3",
"summary": "Query parquet files using pyarrow or S3 Select by first gathering file metadata into a database",
"version": "0.2.3",
"project_urls": {
"Documentation": "https://mhendrey.github.io/lakeshack",
"Homepage": "https://github.com/mhendrey/lakeshack",
"Source": "https://github.com/mhendrey/lakeshack"
},
"split_keywords": [
"pyarrow",
"s3",
"parquet"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bf7d657b45890d8a57cacf244f458cdc37e1407851298fcf6929591baf3c9d2c",
"md5": "9901ed7503c7a6f8539553bd7dab9f17",
"sha256": "a373af67761f41f6a18f549913b8939267d294e9ff9ba8a62af58d453a6ecd76"
},
"downloads": -1,
"filename": "lakeshack-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9901ed7503c7a6f8539553bd7dab9f17",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 32927,
"upload_time": "2023-11-18T12:19:38",
"upload_time_iso_8601": "2023-11-18T12:19:38.948236Z",
"url": "https://files.pythonhosted.org/packages/bf/7d/657b45890d8a57cacf244f458cdc37e1407851298fcf6929591baf3c9d2c/lakeshack-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "75352b9b341df5282f8d821a60f3db1ea0ed8fe64cef83e2af3c1816dead13cf",
"md5": "cfd48ba7039db591de2c12655f40b106",
"sha256": "2adfc4838e5e691534e8a73e072c4c84f83d72f30a282a70a931463dc6abb0ef"
},
"downloads": -1,
"filename": "lakeshack-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "cfd48ba7039db591de2c12655f40b106",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 29075,
"upload_time": "2023-11-18T12:19:40",
"upload_time_iso_8601": "2023-11-18T12:19:40.839586Z",
"url": "https://files.pythonhosted.org/packages/75/35/2b9b341df5282f8d821a60f3db1ea0ed8fe64cef83e2af3c1816dead13cf/lakeshack-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-18 12:19:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mhendrey",
"github_project": "lakeshack",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "lakeshack"
}