Name | lakefs-spec JSON |
Version |
0.11.1
JSON |
| download |
home_page | None |
Summary | An fsspec implementation for lakeFS. |
upload_time | 2024-12-20 11:24:08 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | Apache-2.0 |
keywords |
lakefs
fsspec
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/_images/lakefs-spec-logo-light.png">
<img alt="lakeFS-spec logo" src="docs/_images/lakefs-spec-logo-dark.png">
</picture>
# lakeFS-spec: An fsspec backend for lakeFS
[![](https://img.shields.io/pypi/v/lakefs-spec)](https://pypi.org/project/lakefs-spec) ![GitHub](https://img.shields.io/github/license/aai-institute/lakefs-spec) [![docs](https://img.shields.io/badge/docs-latest-blue)](https://lakefs-spec.org)
![GitHub](https://img.shields.io/github/stars/aai-institute/lakefs-spec)
Welcome to lakeFS-spec, a [filesystem-spec](https://github.com/fsspec/filesystem_spec) backend implementation for the [lakeFS](https://lakefs.io/) data lake.
Our primary goal is to streamline versioned data operations in lakeFS, enabling seamless integration with popular data science tools such as Pandas, Polars, and DuckDB directly from Python.
Highlights:
- Simple repository operations in lakeFS
- Easy access to underlying storage and versioning operations
- Seamless integration with the fsspec ecosystem
- Directly access lakeFS objects from popular data science libraries (including Pandas, Polars, DuckDB, Hugging Face Datasets, PyArrow) with minimal code
- Transaction support for reliable data version control
- Smart data transfers through client-side caching (up-/download)
- Auto-discovery configuration
> [!NOTE]
> We are seeking early adopters who would like to actively participate in our feedback process and shape the future of the library.
If you are interested in using the library and want to get in touch with us, please reach out via [Github Discussions](https://github.com/aai-institute/lakefs-spec/discussions).
## Installation
lakeFS-spec is published on PyPI, you can simply install it using your favorite package manager:
```shell
$ pip install lakefs-spec
# or
$ poetry add lakefs-spec
```
## Usage
The following usage examples showcase two major ways of using lakeFS-spec: as a low-level filesystem abstraction, and through third-party (data science) libraries.
For a more thorough overview of the features and use cases for lakeFS-spec, see the [user guide](https://lakefs-spec.org/latest/guides/) and [tutorials](https://lakefs-spec.org/latest/tutorials/) sections in the documentation.
### Low-level: As a fsspec filesystem
The following example shows how to upload a file, create a commit, and read back the committed data using the bare lakeFS filesystem implementation.
It assumes you have already created a repository named `repo` and have `lakectl` credentials set up on your machine in `~/.lakectl.yaml` (see the [lakeFS quickstart guide](https://docs.lakefs.io/quickstart/) if you are new to lakeFS and need guidance).
```python
from pathlib import Path
from lakefs_spec import LakeFSFileSystem
REPO, BRANCH = "repo", "main"
# Prepare example local data
local_path = Path("demo.txt")
local_path.write_text("Hello, lakeFS!")
# Upload to lakeFS and create a commit
fs = LakeFSFileSystem() # will auto-discover config from ~/.lakectl.yaml
# Upload a file on a temporary transaction branch
with fs.transaction(repository=REPO, base_branch=BRANCH) as tx:
fs.put(local_path, f"{REPO}/{tx.branch.id}/{local_path.name}")
tx.commit(message="Add demo data")
# Read back committed file
f = fs.open(f"{REPO}/{BRANCH}/demo.txt", "rt")
print(f.readline()) # "Hello, lakeFS!"
```
### High-level: Via third-party libraries
A variety of widely-used data science tools are building on fsspec to access remote storage resources and can thus work with lakeFS data lakes directly through lakeFS-spec (see the [fsspec docs](https://filesystem-spec.readthedocs.io/en/latest/#who-uses-fsspec) for details).
The examples assume you have a lakeFS instance with the [`quickstart` repository](https://docs.lakefs.io/quickstart/launch.html) containing sample data available.
```python
# Pandas -- see https://pandas.pydata.org/docs/user_guide/io.html#reading-writing-remote-files
import pandas as pd
data = pd.read_parquet("lakefs://quickstart/main/lakes.parquet")
print(data.head())
# Polars -- see https://pola-rs.github.io/polars/user-guide/io/cloud-storage/
import polars as pl
data = pl.read_parquet("lakefs://quickstart/main/lakes.parquet", use_pyarrow=True)
print(data.head())
# DuckDB -- see https://duckdb.org/docs/guides/python/filesystems.html
import duckdb
import fsspec
duckdb.register_filesystem(fsspec.filesystem("lakefs"))
res = duckdb.read_parquet("lakefs://quickstart/main/lakes.parquet")
res.show()
```
## Contributing
We encourage and welcome contributions from the community to enhance the project.
Please check [discussions](https://github.com/aai-institute/lakefs-spec/discussions) or raise an [issue](https://github.com/aai-institute/lakefs-spec/issues) on GitHub for any problems you encounter with the library.
For information on the general development workflow, see the [contribution guide](CONTRIBUTING.md).
## License
The lakeFS-spec library is distributed under the [Apache-2 license](https://github.com/aai-institute/lakefs-spec/blob/main/LICENSE).
Raw data
{
"_id": null,
"home_page": null,
"name": "lakefs-spec",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "Nicholas Junge <n.junge@appliedai-institute.de>, Max Mynter <m.mynter@appliedai-institute.de>, Adrian Rumpold <a.rumpold@appliedai-institute.de>",
"keywords": "lakeFS, fsspec",
"author": null,
"author_email": "appliedAI Institute for Europe <lakefs-spec@appliedai-institute.de>",
"download_url": "https://files.pythonhosted.org/packages/7a/12/cc3e661489c5a281eac062a44dcab30aa94e374f2c9b92f34050c6515d14/lakefs_spec-0.11.1.tar.gz",
"platform": null,
"description": "<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/_images/lakefs-spec-logo-light.png\">\n <img alt=\"lakeFS-spec logo\" src=\"docs/_images/lakefs-spec-logo-dark.png\">\n</picture>\n\n# lakeFS-spec: An fsspec backend for lakeFS\n\n[![](https://img.shields.io/pypi/v/lakefs-spec)](https://pypi.org/project/lakefs-spec) ![GitHub](https://img.shields.io/github/license/aai-institute/lakefs-spec) [![docs](https://img.shields.io/badge/docs-latest-blue)](https://lakefs-spec.org)\n ![GitHub](https://img.shields.io/github/stars/aai-institute/lakefs-spec)\n\nWelcome to lakeFS-spec, a [filesystem-spec](https://github.com/fsspec/filesystem_spec) backend implementation for the [lakeFS](https://lakefs.io/) data lake.\nOur primary goal is to streamline versioned data operations in lakeFS, enabling seamless integration with popular data science tools such as Pandas, Polars, and DuckDB directly from Python.\n\nHighlights:\n\n- Simple repository operations in lakeFS\n- Easy access to underlying storage and versioning operations\n- Seamless integration with the fsspec ecosystem\n- Directly access lakeFS objects from popular data science libraries (including Pandas, Polars, DuckDB, Hugging Face Datasets, PyArrow) with minimal code\n- Transaction support for reliable data version control\n- Smart data transfers through client-side caching (up-/download)\n- Auto-discovery configuration\n\n> [!NOTE]\n> We are seeking early adopters who would like to actively participate in our feedback process and shape the future of the library.\nIf you are interested in using the library and want to get in touch with us, please reach out via [Github Discussions](https://github.com/aai-institute/lakefs-spec/discussions).\n\n## Installation\n\nlakeFS-spec is published on PyPI, you can simply install it using your favorite package manager:\n\n```shell\n$ pip install lakefs-spec\n # or\n$ poetry add lakefs-spec\n```\n\n## Usage\n\nThe following usage examples showcase two major ways of using lakeFS-spec: as a low-level filesystem abstraction, and through third-party (data science) libraries.\n\nFor a more thorough overview of the features and use cases for lakeFS-spec, see the [user guide](https://lakefs-spec.org/latest/guides/) and [tutorials](https://lakefs-spec.org/latest/tutorials/) sections in the documentation.\n\n### Low-level: As a fsspec filesystem \n\nThe following example shows how to upload a file, create a commit, and read back the committed data using the bare lakeFS filesystem implementation.\nIt assumes you have already created a repository named `repo` and have `lakectl` credentials set up on your machine in `~/.lakectl.yaml` (see the [lakeFS quickstart guide](https://docs.lakefs.io/quickstart/) if you are new to lakeFS and need guidance).\n\n```python\nfrom pathlib import Path\n\nfrom lakefs_spec import LakeFSFileSystem\n\nREPO, BRANCH = \"repo\", \"main\"\n\n# Prepare example local data\nlocal_path = Path(\"demo.txt\")\nlocal_path.write_text(\"Hello, lakeFS!\")\n\n# Upload to lakeFS and create a commit\nfs = LakeFSFileSystem() # will auto-discover config from ~/.lakectl.yaml\n\n# Upload a file on a temporary transaction branch\nwith fs.transaction(repository=REPO, base_branch=BRANCH) as tx:\n fs.put(local_path, f\"{REPO}/{tx.branch.id}/{local_path.name}\")\n tx.commit(message=\"Add demo data\")\n\n# Read back committed file\nf = fs.open(f\"{REPO}/{BRANCH}/demo.txt\", \"rt\")\nprint(f.readline()) # \"Hello, lakeFS!\"\n```\n\n### High-level: Via third-party libraries\n\nA variety of widely-used data science tools are building on fsspec to access remote storage resources and can thus work with lakeFS data lakes directly through lakeFS-spec (see the [fsspec docs](https://filesystem-spec.readthedocs.io/en/latest/#who-uses-fsspec) for details).\nThe examples assume you have a lakeFS instance with the [`quickstart` repository](https://docs.lakefs.io/quickstart/launch.html) containing sample data available.\n\n```python\n# Pandas -- see https://pandas.pydata.org/docs/user_guide/io.html#reading-writing-remote-files\nimport pandas as pd\n\ndata = pd.read_parquet(\"lakefs://quickstart/main/lakes.parquet\")\nprint(data.head())\n\n\n# Polars -- see https://pola-rs.github.io/polars/user-guide/io/cloud-storage/\nimport polars as pl\n\ndata = pl.read_parquet(\"lakefs://quickstart/main/lakes.parquet\", use_pyarrow=True)\nprint(data.head())\n\n\n# DuckDB -- see https://duckdb.org/docs/guides/python/filesystems.html\nimport duckdb\nimport fsspec\n\nduckdb.register_filesystem(fsspec.filesystem(\"lakefs\"))\nres = duckdb.read_parquet(\"lakefs://quickstart/main/lakes.parquet\")\nres.show()\n```\n\n## Contributing\n\nWe encourage and welcome contributions from the community to enhance the project.\nPlease check [discussions](https://github.com/aai-institute/lakefs-spec/discussions) or raise an [issue](https://github.com/aai-institute/lakefs-spec/issues) on GitHub for any problems you encounter with the library.\n\nFor information on the general development workflow, see the [contribution guide](CONTRIBUTING.md).\n\n## License\n\nThe lakeFS-spec library is distributed under the [Apache-2 license](https://github.com/aai-institute/lakefs-spec/blob/main/LICENSE).\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "An fsspec implementation for lakeFS.",
"version": "0.11.1",
"project_urls": {
"Discussions": "https://github.com/aai-institute/lakefs-spec/discussions",
"Documentation": "https://lakefs-spec.org/latest",
"Homepage": "https://github.com/aai-institute/lakefs-spec",
"Issues": "https://github.com/aai-institute/lakefs-spec/issues",
"Repository": "https://github.com/aai-institute/lakefs-spec.git"
},
"split_keywords": [
"lakefs",
" fsspec"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "255e46ba843ca00b45d4f79e3f805a1dc92a546d31e6069bb316ec806890c6f3",
"md5": "ba02167daf2b036981b41656c1c7c625",
"sha256": "b43e986a527446c7aee48308f2bbcf375d2d7bbab816f0718755cd48e2acdb25"
},
"downloads": -1,
"filename": "lakefs_spec-0.11.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ba02167daf2b036981b41656c1c7c625",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 22551,
"upload_time": "2024-12-20T11:24:04",
"upload_time_iso_8601": "2024-12-20T11:24:04.915967Z",
"url": "https://files.pythonhosted.org/packages/25/5e/46ba843ca00b45d4f79e3f805a1dc92a546d31e6069bb316ec806890c6f3/lakefs_spec-0.11.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7a12cc3e661489c5a281eac062a44dcab30aa94e374f2c9b92f34050c6515d14",
"md5": "271288a03276765ed43e71928f15a2d5",
"sha256": "8e5b9e5c416630fd64f165b47fba06d496e00782c5c4dd5064e6f6f98adea62f"
},
"downloads": -1,
"filename": "lakefs_spec-0.11.1.tar.gz",
"has_sig": false,
"md5_digest": "271288a03276765ed43e71928f15a2d5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 476956,
"upload_time": "2024-12-20T11:24:08",
"upload_time_iso_8601": "2024-12-20T11:24:08.131899Z",
"url": "https://files.pythonhosted.org/packages/7a/12/cc3e661489c5a281eac062a44dcab30aa94e374f2c9b92f34050c6515d14/lakefs_spec-0.11.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-20 11:24:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aai-institute",
"github_project": "lakefs-spec",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "lakefs-spec"
}