PySetl - A PySpark ETL Framework
============================================
|PyPI Badge|
|Build Status|
|Code Coverage|
|Documentation Status|
Overview
--------------------------------------------
PySetl is a framework focused to improve readability and structure of PySpark
ETL projects. Also, it is designed to take advantage of Python's typing syntax
to reduce runtime errors through linting tools and verifying types at runtime.
Thus, effectively enhacing stability for large ETL pipelines.
In order to accomplish this task we provide some tools:
- ``pysetl.config``: Type-safe configuration.
- ``pysetl.storage``: Agnostic and extensible data sources connections.
- ``pysetl.workflow``: Pipeline management and dependency injection.
PySetl is designed with Python typing syntax at its core. Hence, we strongly
suggest `typedspark`_ and `pydantic`_ for development.
Why use PySetl?
--------------------------------------------
- Model complex data pipelines.
- Reduce risks at production with type-safe development.
- Improve large project structure and readability.
Installation
--------------------------------------------
PySetl is available in PyPI:
.. code-block:: bash
pip install pysetl
PySetl doesn't list `pyspark` as dependency since most environments have their own
Spark environment. Nevertheless, you can install pyspark running:
.. code-block:: bash
pip install "pysetl[pyspark]"
Acknowledgments
--------------------------------------------
PySetl is a port from `SETL`_. We want
to fully recognise this package is heavily inspired by the work of the SETL
team. We just adapted things to work in Python.
.. _typedspark: https://typedspark.readthedocs.io/en/latest/
.. _pydantic: https://docs.pydantic.dev/latest/
.. _SETL: https://setl-framework.github.io/setl/
.. |PyPI Badge| image:: https://img.shields.io/pypi/v/pysetl
:target: https://pypi.org/project/pysetl
.. |Build Status| image:: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml/badge.svg
:target: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml
.. |Code Coverage| image:: https://codecov.io/gh/JhossePaul/pysetl/branch/main/graph/badge.svg
:target: https://codecov.io/gh/JhossePaul/pysetl
.. |Documentation Status| image:: https://readthedocs.org/projects/pysetl/badge/?version=latest
:target: https://pysetl.readthedocs.io/en/latest/?badge=latest
Raw data
{
"_id": null,
"home_page": "",
"name": "pysetl",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<3.11",
"maintainer_email": "",
"keywords": "spark,aws,etl",
"author": "Jhosse Paul Marquez Ruiz",
"author_email": "jpaul.marquez.ruiz@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/97/0a/78c8ba2027042c39715017b4871c47f833d1df870408933e1d76d1a33dfc/pysetl-0.1.7rc0.tar.gz",
"platform": null,
"description": "PySetl - A PySpark ETL Framework\n============================================\n\n|PyPI Badge|\n|Build Status|\n|Code Coverage|\n|Documentation Status|\n\nOverview\n--------------------------------------------\nPySetl is a framework focused to improve readability and structure of PySpark\nETL projects. Also, it is designed to take advantage of Python's typing syntax\nto reduce runtime errors through linting tools and verifying types at runtime.\nThus, effectively enhacing stability for large ETL pipelines.\n\nIn order to accomplish this task we provide some tools:\n\n- ``pysetl.config``: Type-safe configuration.\n- ``pysetl.storage``: Agnostic and extensible data sources connections.\n- ``pysetl.workflow``: Pipeline management and dependency injection.\n\nPySetl is designed with Python typing syntax at its core. Hence, we strongly\nsuggest `typedspark`_ and `pydantic`_ for development.\n\nWhy use PySetl?\n--------------------------------------------\n- Model complex data pipelines.\n- Reduce risks at production with type-safe development.\n- Improve large project structure and readability.\n\nInstallation\n--------------------------------------------\nPySetl is available in PyPI:\n\n.. code-block:: bash\n\n pip install pysetl\n\nPySetl doesn't list `pyspark` as dependency since most environments have their own\nSpark environment. Nevertheless, you can install pyspark running:\n\n.. code-block:: bash\n\n pip install \"pysetl[pyspark]\"\n\nAcknowledgments\n--------------------------------------------\n\nPySetl is a port from `SETL`_. We want\nto fully recognise this package is heavily inspired by the work of the SETL\nteam. We just adapted things to work in Python. \n\n.. _typedspark: https://typedspark.readthedocs.io/en/latest/\n.. _pydantic: https://docs.pydantic.dev/latest/\n.. _SETL: https://setl-framework.github.io/setl/ \n\n.. |PyPI Badge| image:: https://img.shields.io/pypi/v/pysetl\n :target: https://pypi.org/project/pysetl\n\n.. |Build Status| image:: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml/badge.svg\n :target: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml\n\n.. |Code Coverage| image:: https://codecov.io/gh/JhossePaul/pysetl/branch/main/graph/badge.svg\n :target: https://codecov.io/gh/JhossePaul/pysetl\n\n.. |Documentation Status| image:: https://readthedocs.org/projects/pysetl/badge/?version=latest\n :target: https://pysetl.readthedocs.io/en/latest/?badge=latest\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "A PySpark ETL Framework",
"version": "0.1.7rc0",
"project_urls": {
"Home": "https://github.com/JhossePaul/pysetl",
"Source": "https://github.com/JhossePaul/pysetl"
},
"split_keywords": [
"spark",
"aws",
"etl"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "04e38455af95c37e469ddfeecc9e2298a4ab4f660627d45a5ada7bc8cf0f9fc1",
"md5": "d95799c82dfa13fa05c93a5c24cd5f1e",
"sha256": "c0e2f1d64ba3cf79ae3a87208cfe8be9c569a2bfebf8c0ac988c8be1d5782949"
},
"downloads": -1,
"filename": "pysetl-0.1.7rc0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d95799c82dfa13fa05c93a5c24cd5f1e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<3.11",
"size": 51136,
"upload_time": "2023-11-04T22:09:22",
"upload_time_iso_8601": "2023-11-04T22:09:22.481174Z",
"url": "https://files.pythonhosted.org/packages/04/e3/8455af95c37e469ddfeecc9e2298a4ab4f660627d45a5ada7bc8cf0f9fc1/pysetl-0.1.7rc0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "970a78c8ba2027042c39715017b4871c47f833d1df870408933e1d76d1a33dfc",
"md5": "0cd9a7bb7767ddead7fdea49ef6193a7",
"sha256": "3c9e838a201e150d902e8494ad1f2fa5ed0d800c073130c3d354bc0f14a43e72"
},
"downloads": -1,
"filename": "pysetl-0.1.7rc0.tar.gz",
"has_sig": false,
"md5_digest": "0cd9a7bb7767ddead7fdea49ef6193a7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<3.11",
"size": 32502,
"upload_time": "2023-11-04T22:09:24",
"upload_time_iso_8601": "2023-11-04T22:09:24.108116Z",
"url": "https://files.pythonhosted.org/packages/97/0a/78c8ba2027042c39715017b4871c47f833d1df870408933e1d76d1a33dfc/pysetl-0.1.7rc0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-04 22:09:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "JhossePaul",
"github_project": "pysetl",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "pysetl"
}