pysetl


Namepysetl JSON
Version 0.1.7rc0 PyPI version JSON
download
home_page
SummaryA PySpark ETL Framework
upload_time2023-11-04 22:09:24
maintainer
docs_urlNone
authorJhosse Paul Marquez Ruiz
requires_python>=3.9,<3.11
licenseApache-2.0
keywords spark aws etl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            PySetl - A PySpark ETL Framework
============================================

|PyPI Badge|
|Build Status|
|Code Coverage|
|Documentation Status|

Overview
--------------------------------------------
PySetl is a framework focused to improve readability and structure of PySpark
ETL projects. Also, it is designed to take advantage of Python's typing syntax
to reduce runtime errors through linting tools and verifying types at runtime.
Thus, effectively enhacing stability for large ETL pipelines.

In order to accomplish this task we provide some tools:

- ``pysetl.config``: Type-safe configuration.
- ``pysetl.storage``: Agnostic and extensible data sources connections.
- ``pysetl.workflow``: Pipeline management and dependency injection.

PySetl is designed with Python typing syntax at its core. Hence, we strongly
suggest `typedspark`_ and `pydantic`_ for development.

Why use PySetl?
--------------------------------------------
- Model complex data pipelines.
- Reduce risks at production with type-safe development.
- Improve large project structure and readability.

Installation
--------------------------------------------
PySetl is available in PyPI:

.. code-block:: bash

    pip install pysetl

PySetl doesn't list `pyspark` as dependency since most environments have their own
Spark environment. Nevertheless, you can install pyspark running:

.. code-block:: bash

    pip install "pysetl[pyspark]"

Acknowledgments
--------------------------------------------

PySetl is a port from `SETL`_.  We want
to fully recognise this package is heavily inspired by the work of the SETL
team. We just adapted things to work in Python. 

.. _typedspark: https://typedspark.readthedocs.io/en/latest/
.. _pydantic: https://docs.pydantic.dev/latest/
.. _SETL: https://setl-framework.github.io/setl/ 

.. |PyPI Badge| image:: https://img.shields.io/pypi/v/pysetl
    :target: https://pypi.org/project/pysetl

.. |Build Status| image:: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml/badge.svg
    :target: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml

.. |Code Coverage| image:: https://codecov.io/gh/JhossePaul/pysetl/branch/main/graph/badge.svg
    :target: https://codecov.io/gh/JhossePaul/pysetl

.. |Documentation Status| image:: https://readthedocs.org/projects/pysetl/badge/?version=latest
    :target: https://pysetl.readthedocs.io/en/latest/?badge=latest

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pysetl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.11",
    "maintainer_email": "",
    "keywords": "spark,aws,etl",
    "author": "Jhosse Paul Marquez Ruiz",
    "author_email": "jpaul.marquez.ruiz@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/97/0a/78c8ba2027042c39715017b4871c47f833d1df870408933e1d76d1a33dfc/pysetl-0.1.7rc0.tar.gz",
    "platform": null,
    "description": "PySetl - A PySpark ETL Framework\n============================================\n\n|PyPI Badge|\n|Build Status|\n|Code Coverage|\n|Documentation Status|\n\nOverview\n--------------------------------------------\nPySetl is a framework focused to improve readability and structure of PySpark\nETL projects. Also, it is designed to take advantage of Python's typing syntax\nto reduce runtime errors through linting tools and verifying types at runtime.\nThus, effectively enhacing stability for large ETL pipelines.\n\nIn order to accomplish this task we provide some tools:\n\n- ``pysetl.config``: Type-safe configuration.\n- ``pysetl.storage``: Agnostic and extensible data sources connections.\n- ``pysetl.workflow``: Pipeline management and dependency injection.\n\nPySetl is designed with Python typing syntax at its core. Hence, we strongly\nsuggest `typedspark`_ and `pydantic`_ for development.\n\nWhy use PySetl?\n--------------------------------------------\n- Model complex data pipelines.\n- Reduce risks at production with type-safe development.\n- Improve large project structure and readability.\n\nInstallation\n--------------------------------------------\nPySetl is available in PyPI:\n\n.. code-block:: bash\n\n    pip install pysetl\n\nPySetl doesn't list `pyspark` as dependency since most environments have their own\nSpark environment. Nevertheless, you can install pyspark running:\n\n.. code-block:: bash\n\n    pip install \"pysetl[pyspark]\"\n\nAcknowledgments\n--------------------------------------------\n\nPySetl is a port from `SETL`_.  We want\nto fully recognise this package is heavily inspired by the work of the SETL\nteam. We just adapted things to work in Python. \n\n.. _typedspark: https://typedspark.readthedocs.io/en/latest/\n.. _pydantic: https://docs.pydantic.dev/latest/\n.. _SETL: https://setl-framework.github.io/setl/ \n\n.. |PyPI Badge| image:: https://img.shields.io/pypi/v/pysetl\n    :target: https://pypi.org/project/pysetl\n\n.. |Build Status| image:: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml/badge.svg\n    :target: https://github.com/JhossePaul/pysetl/actions/workflows/build.yml\n\n.. |Code Coverage| image:: https://codecov.io/gh/JhossePaul/pysetl/branch/main/graph/badge.svg\n    :target: https://codecov.io/gh/JhossePaul/pysetl\n\n.. |Documentation Status| image:: https://readthedocs.org/projects/pysetl/badge/?version=latest\n    :target: https://pysetl.readthedocs.io/en/latest/?badge=latest\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A PySpark ETL Framework",
    "version": "0.1.7rc0",
    "project_urls": {
        "Home": "https://github.com/JhossePaul/pysetl",
        "Source": "https://github.com/JhossePaul/pysetl"
    },
    "split_keywords": [
        "spark",
        "aws",
        "etl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "04e38455af95c37e469ddfeecc9e2298a4ab4f660627d45a5ada7bc8cf0f9fc1",
                "md5": "d95799c82dfa13fa05c93a5c24cd5f1e",
                "sha256": "c0e2f1d64ba3cf79ae3a87208cfe8be9c569a2bfebf8c0ac988c8be1d5782949"
            },
            "downloads": -1,
            "filename": "pysetl-0.1.7rc0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d95799c82dfa13fa05c93a5c24cd5f1e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.11",
            "size": 51136,
            "upload_time": "2023-11-04T22:09:22",
            "upload_time_iso_8601": "2023-11-04T22:09:22.481174Z",
            "url": "https://files.pythonhosted.org/packages/04/e3/8455af95c37e469ddfeecc9e2298a4ab4f660627d45a5ada7bc8cf0f9fc1/pysetl-0.1.7rc0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "970a78c8ba2027042c39715017b4871c47f833d1df870408933e1d76d1a33dfc",
                "md5": "0cd9a7bb7767ddead7fdea49ef6193a7",
                "sha256": "3c9e838a201e150d902e8494ad1f2fa5ed0d800c073130c3d354bc0f14a43e72"
            },
            "downloads": -1,
            "filename": "pysetl-0.1.7rc0.tar.gz",
            "has_sig": false,
            "md5_digest": "0cd9a7bb7767ddead7fdea49ef6193a7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.11",
            "size": 32502,
            "upload_time": "2023-11-04T22:09:24",
            "upload_time_iso_8601": "2023-11-04T22:09:24.108116Z",
            "url": "https://files.pythonhosted.org/packages/97/0a/78c8ba2027042c39715017b4871c47f833d1df870408933e1d76d1a33dfc/pysetl-0.1.7rc0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-04 22:09:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "JhossePaul",
    "github_project": "pysetl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "pysetl"
}
        
Elapsed time: 0.17340s