airflow-provider-lakefs


Nameairflow-provider-lakefs JSON
Version 0.48.0 PyPI version JSON
download
home_pagehttps://lakefs.io
SummaryA lakeFS provider package built by Treeverse.
upload_time2023-10-23 08:29:07
maintainer
docs_urlNone
authorTreeverse
requires_python>=3.7
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="https://raw.githubusercontent.com/treeverse/lakeFS/master/docs/assets/img/logo_large.png"/>
</p>

[![Apache license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://raw.githubusercontent.com/treeverse/lakeFS/master/LICENSE)
[![Provider test status](https://github.com/treeverse/airflow-provider-lakeFS/actions/workflows/provider.yaml/badge.svg)](https://github.com/treeverse/airflow-provider-lakeFS/actions/workflows/provider.yaml)
[![PyPI version](https://badge.fury.io/py/airflow-provider-lakefs.svg)](https://badge.fury.io/py/airflow-provider-lakefs)
[![Code of Conduct](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](https://github.com/treeverse/lakeFS/blob/master/CODE_OF_CONDUCT.md)


## lakeFS airflow provider

lakeFS airflow provider enables a smooth integration of lakeFS with airflow's DAGs.
"Use the lakeFS provider to create branches, commit objects, wait for files to be written, and more."

For usage example, check out the [example DAG](https://github.com/treeverse/airflow-provider-lakeFS/blob/main/lakefs_provider/example_dags/lakefs-dag.py)


## What is lakeFS

lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.

With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.

lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage as its underlying storage service. It is API compatible with S3, and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.


For more information see the [official lakeFS documentation](https://docs.lakefs.io).


## Capabilities

**Development Environment for Data**
* **Experimentation** - try tools, upgrade versions and evaluate code changes in isolation.
* **Reproducibility** - go back to any point of time to a consistent version of your data lake.

**Continuous Data Integration**
* **Ingest new data safely by enforcing best practices** - make sure new data sources adhere to your lake’s best practices such as format and schema enforcement, naming convention, etc.
* **Metadata validation** - prevent breaking changes from entering the production data environment.


**Continuous Data Deployment**
* **Instantly revert changes to data** - if low quality data is exposed to your consumers, you can revert instantly to a former, consistent and correct snapshot of your data lake.
* **Enforce cross collection consistency** - provide to consumers several collections of data that must be synchronized, in one atomic, revertible, action.
* **Prevent data quality issues by enabling**
    - Testing of production data before exposing it to users / consumers.
    - Testing of intermediate results in your DAG to avoid cascading quality issues.


## Publishing

The repository include GitHub workflow that is trigger on publish event and will build and push the package to PyPI.

Use the following steps to release:

- Update `setup.py` with the new package version
- Update `CHANGELOG.md` with changes for the new release
- Use GitHub release, use semver vX.X.X


## Community

Stay up to date and get lakeFS support via:

- [Slack](https://lakefs.io/slack) (to get help from our team and other users).
- [Twitter](https://twitter.com/lakeFS) (follow for updates and news)
- [YouTube](https://lakefs.io/youtube) (learn from video tutorials)
- [Contact us](https://lakefs.io/contact-us/) (for anything)

## More information

- [lakeFS documentation](https://docs.lakefs.io)
- If you would like to contribute, check out our [contributing guide](https://docs.lakefs.io/contributing).
- [Roadmap](https://docs.lakefs.io/roadmap.html)

## Licensing

lakeFS is completely free and open source and licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

            

Raw data

            {
    "_id": null,
    "home_page": "https://lakefs.io",
    "name": "airflow-provider-lakefs",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Treeverse",
    "author_email": "services@treeverse.io",
    "download_url": "https://files.pythonhosted.org/packages/6b/6c/728b3c6789496230b56d86af908d44235f73b6d9ea072617eb8509cd8d74/airflow-provider-lakefs-0.48.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/treeverse/lakeFS/master/docs/assets/img/logo_large.png\"/>\n</p>\n\n[![Apache license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://raw.githubusercontent.com/treeverse/lakeFS/master/LICENSE)\n[![Provider test status](https://github.com/treeverse/airflow-provider-lakeFS/actions/workflows/provider.yaml/badge.svg)](https://github.com/treeverse/airflow-provider-lakeFS/actions/workflows/provider.yaml)\n[![PyPI version](https://badge.fury.io/py/airflow-provider-lakefs.svg)](https://badge.fury.io/py/airflow-provider-lakefs)\n[![Code of Conduct](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](https://github.com/treeverse/lakeFS/blob/master/CODE_OF_CONDUCT.md)\n\n\n## lakeFS airflow provider\n\nlakeFS airflow provider enables a smooth integration of lakeFS with airflow's DAGs.\n\"Use the lakeFS provider to create branches, commit objects, wait for files to be written, and more.\"\n\nFor usage example, check out the [example DAG](https://github.com/treeverse/airflow-provider-lakeFS/blob/main/lakefs_provider/example_dags/lakefs-dag.py)\n\n\n## What is lakeFS\n\nlakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.\n\nWith lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.\n\nlakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage as its underlying storage service. It is API compatible with S3, and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.\n\n\nFor more information see the [official lakeFS documentation](https://docs.lakefs.io).\n\n\n## Capabilities\n\n**Development Environment for Data**\n* **Experimentation** - try tools, upgrade versions and evaluate code changes in isolation.\n* **Reproducibility** - go back to any point of time to a consistent version of your data lake.\n\n**Continuous Data Integration**\n* **Ingest new data safely by enforcing best practices** - make sure new data sources adhere to your lake\u2019s best practices such as format and schema enforcement, naming convention, etc.\n* **Metadata validation** - prevent breaking changes from entering the production data environment.\n\n\n**Continuous Data Deployment**\n* **Instantly revert changes to data** - if low quality data is exposed to your consumers, you can revert instantly to a former, consistent and correct snapshot of your data lake.\n* **Enforce cross collection consistency** - provide to consumers several collections of data that must be synchronized, in one atomic, revertible, action.\n* **Prevent data quality issues by enabling**\n    - Testing of production data before exposing it to users / consumers.\n    - Testing of intermediate results in your DAG to avoid cascading quality issues.\n\n\n## Publishing\n\nThe repository include GitHub workflow that is trigger on publish event and will build and push the package to PyPI.\n\nUse the following steps to release:\n\n- Update `setup.py` with the new package version\n- Update `CHANGELOG.md` with changes for the new release\n- Use GitHub release, use semver vX.X.X\n\n\n## Community\n\nStay up to date and get lakeFS support via:\n\n- [Slack](https://lakefs.io/slack) (to get help from our team and other users).\n- [Twitter](https://twitter.com/lakeFS) (follow for updates and news)\n- [YouTube](https://lakefs.io/youtube) (learn from video tutorials)\n- [Contact us](https://lakefs.io/contact-us/) (for anything)\n\n## More information\n\n- [lakeFS documentation](https://docs.lakefs.io)\n- If you would like to contribute, check out our [contributing guide](https://docs.lakefs.io/contributing).\n- [Roadmap](https://docs.lakefs.io/roadmap.html)\n\n## Licensing\n\nlakeFS is completely free and open source and licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "A lakeFS provider package built by Treeverse.",
    "version": "0.48.0",
    "project_urls": {
        "Homepage": "https://lakefs.io"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d29e6b0a2bd63d2f5fe5a25493964c4bdbea441814fc991fcd50379a2b1b9d68",
                "md5": "5a41ae0e628503dd1a86637783bf7057",
                "sha256": "b12feff07ec559ab69c32791db636f9137e723dd0c3116989ca5a541a6456a34"
            },
            "downloads": -1,
            "filename": "airflow_provider_lakefs-0.48.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5a41ae0e628503dd1a86637783bf7057",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 24434,
            "upload_time": "2023-10-23T08:29:05",
            "upload_time_iso_8601": "2023-10-23T08:29:05.775571Z",
            "url": "https://files.pythonhosted.org/packages/d2/9e/6b0a2bd63d2f5fe5a25493964c4bdbea441814fc991fcd50379a2b1b9d68/airflow_provider_lakefs-0.48.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b6c728b3c6789496230b56d86af908d44235f73b6d9ea072617eb8509cd8d74",
                "md5": "e07762617c8b31335bda51e14011a296",
                "sha256": "2529c02e09724ef88c96bfc2104fafd463547a43344f03fafc5e67335101dee8"
            },
            "downloads": -1,
            "filename": "airflow-provider-lakefs-0.48.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e07762617c8b31335bda51e14011a296",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 16230,
            "upload_time": "2023-10-23T08:29:07",
            "upload_time_iso_8601": "2023-10-23T08:29:07.191558Z",
            "url": "https://files.pythonhosted.org/packages/6b/6c/728b3c6789496230b56d86af908d44235f73b6d9ea072617eb8509cd8d74/airflow-provider-lakefs-0.48.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-23 08:29:07",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "airflow-provider-lakefs"
}
        
Elapsed time: 0.16953s