sqlbucket


Namesqlbucket JSON
Version 0.4.4 PyPI version JSON
download
home_page
SummarySQLBucket - Write your SQL ETL flow and ETL integrity tool.
upload_time2022-12-01 16:13:03
maintainer
docs_urlNone
author
requires_python>=3.10
licenseMIT License Copyright (c) 2019 Social Point Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords sql etl data-integrity
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            SQLBucket
=========

.. image:: https://travis-ci.org/socialpoint-labs/sqlbucket.svg?branch=master
    :target: https://travis-ci.org/socialpoint-labs/sqlbucket


SQLBucket is a lightweight framework to help write, orchestrate and validate
SQL data pipelines. It gives the possibility to set variables and introduces
some control flow using the fantastic Jinja2 library. It also implements a
very simplistic unit and integration test framework where you can validate the
results of your ETL in the form of SQL checks. With SQLBucket, you can apply
TDD principles when writing data pipelines.

It can work as a stand alone service, or be part of your workflow
manager environment (Airflow, Luigi, ..).


Installing
----------

Install and update using `pip`_:

.. code-block:: text

    pip install -U sqlbucket

SQLBucket now works for python 3.10.


A Simple Example
----------------

To start working, you need to instantiate your SQLBucket core object with the
`project_folder` parameter. That folder will contain all your SQL ETL. The
python file where you create your SQLBucket object is also a good place to
instantiate your command line interface, as shown below.

.. code-block:: python

    # my_sqlbucket.py
    from sqlbucket import SQLBucket


    bucket = SQLBucket(projects_folder='projects')


    if __name__ == '__main__':
        bucket.cli()


The following command will create your first project in your `projects folder`.

.. code-block:: bash

    python my_sqlbucket.py create-project -n my_first_project

For more info on CLI, please refer to `its documentations`_.

.. _its documentations: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/cli.rst


Your `projects` should now look like the structure below:

.. code-block:: bash

    projects/
        |-- my_first_project/
            |-- config.yaml
            |-- queries/
                |-- query_one.sql
                |-- query_two.sql
            |-- integrity/
                |-- integrity_one.sql


SQLBucket project structure
---------------------------
An SQLBucket project is made of 3 core components: the configuration, the ETL
queries and the integrity check queries.

**Configuration**

The `config.yaml` is the core of your project. This is where you can define
variables at project level, and configure the order your sql queries must be
executed. For a better explanations on how to configure variables you can refer
to the `usage documentation`_, and also the `variables documentation`_ which
also describes environment and connections variables.

.. _usage documentation: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/usage.rst
.. _variables documentation: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/variables.rst


**ETL queries**

The `queries` folder simply contain your SQL queries. You can organize them in
the folder structure of your choice. As long as they are in the `queries`
folder, SQLBucket will find them and execute them when configured to do so.
See the documentation on `how to write SQL with SQLBucket`_.

.. _how to write SQL with SQLBucket: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/usage.rst


**Integrity queries**

The `integrity` folder simply contain SQL queries to help you validate your
ETL. You can organize them in the folder structure of your choice. The only
convention is to return the result of your integrity (True/False) in a field
named `passed`. The main idea is that integrity is done by SQL itself.
Check `documentation on integrity`_ for a more detailed explanation on testing
the integrity of your ETL. We also have a set of `common macros`_ that can be
helpful to start with.

.. _documentation on integrity: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/integrity.rst
.. _common macros: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/integrity_macros.rst


See below a full example that will actually first run your ETL, and then run
your integrity checks for a given database configuration.


.. code-block:: python

    from sqlbucket import SQLBucket

    connections = {
        'db_demo': 'postgresql://user:password@host:5439/database'
    }

    bucket = SQLBucket(connections=connections)
    project = bucket.load_project(
        project_name='my_first_project',
        connection_name='db_demo',
        variables={'foo': 1}
    )

    # to run ETL
    project.run()

    # to run integrity
    project.run_integrity()


We recommend setting your connection urls as environment variables for security
purposes.

Template project
----------------

To get you up to speed, you can create a fork of the `SQLBucket template project`_
and start building SQL data pipelines within minutes.

.. _SQLBucket template project: https://github.com/philippe2803/sqlbucket-template


Contributing
------------

For guidance on how to make a contribution to SQLBucket, see the `contributing guidelines`_.

.. _contributing guidelines: https://github.com/socialpoint-labs/sqlbucket/blob/master/CONTRIBUTING.rst


Links
-----

* License: `MIT <https://github.com/socialpoint-labs/sqlbucket/blob/master/LICENSE>`_
* Releases: https://pypi.org/project/sqlbucket/
* Code: https://github.com/socialpoint-labs/sqlbucket
* Issue tracker: https://github.com/socialpoint-labs/sqlbucket/issues


.. _pip: https://pip.pypa.io/en/stable/quickstart/

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "sqlbucket",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "sql,etl,data-integrity",
    "author": "",
    "author_email": "Philippe Oger <phil.oger@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/2b/af/2868e47db69bdf077b0ad9a9eb7901e525f07a5acda0fbe9629d1396802d/sqlbucket-0.4.4.tar.gz",
    "platform": null,
    "description": "SQLBucket\n=========\n\n.. image:: https://travis-ci.org/socialpoint-labs/sqlbucket.svg?branch=master\n    :target: https://travis-ci.org/socialpoint-labs/sqlbucket\n\n\nSQLBucket is a lightweight framework to help write, orchestrate and validate\nSQL data pipelines. It gives the possibility to set variables and introduces\nsome control flow using the fantastic Jinja2 library. It also implements a\nvery simplistic unit and integration test framework where you can validate the\nresults of your ETL in the form of SQL checks. With SQLBucket, you can apply\nTDD principles when writing data pipelines.\n\nIt can work as a stand alone service, or be part of your workflow\nmanager environment (Airflow, Luigi, ..).\n\n\nInstalling\n----------\n\nInstall and update using `pip`_:\n\n.. code-block:: text\n\n    pip install -U sqlbucket\n\nSQLBucket now works for python 3.10.\n\n\nA Simple Example\n----------------\n\nTo start working, you need to instantiate your SQLBucket core object with the\n`project_folder` parameter. That folder will contain all your SQL ETL. The\npython file where you create your SQLBucket object is also a good place to\ninstantiate your command line interface, as shown below.\n\n.. code-block:: python\n\n    # my_sqlbucket.py\n    from sqlbucket import SQLBucket\n\n\n    bucket = SQLBucket(projects_folder='projects')\n\n\n    if __name__ == '__main__':\n        bucket.cli()\n\n\nThe following command will create your first project in your `projects folder`.\n\n.. code-block:: bash\n\n    python my_sqlbucket.py create-project -n my_first_project\n\nFor more info on CLI, please refer to `its documentations`_.\n\n.. _its documentations: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/cli.rst\n\n\nYour `projects` should now look like the structure below:\n\n.. code-block:: bash\n\n    projects/\n        |-- my_first_project/\n            |-- config.yaml\n            |-- queries/\n                |-- query_one.sql\n                |-- query_two.sql\n            |-- integrity/\n                |-- integrity_one.sql\n\n\nSQLBucket project structure\n---------------------------\nAn SQLBucket project is made of 3 core components: the configuration, the ETL\nqueries and the integrity check queries.\n\n**Configuration**\n\nThe `config.yaml` is the core of your project. This is where you can define\nvariables at project level, and configure the order your sql queries must be\nexecuted. For a better explanations on how to configure variables you can refer\nto the `usage documentation`_, and also the `variables documentation`_ which\nalso describes environment and connections variables.\n\n.. _usage documentation: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/usage.rst\n.. _variables documentation: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/variables.rst\n\n\n**ETL queries**\n\nThe `queries` folder simply contain your SQL queries. You can organize them in\nthe folder structure of your choice. As long as they are in the `queries`\nfolder, SQLBucket will find them and execute them when configured to do so.\nSee the documentation on `how to write SQL with SQLBucket`_.\n\n.. _how to write SQL with SQLBucket: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/usage.rst\n\n\n**Integrity queries**\n\nThe `integrity` folder simply contain SQL queries to help you validate your\nETL. You can organize them in the folder structure of your choice. The only\nconvention is to return the result of your integrity (True/False) in a field\nnamed `passed`. The main idea is that integrity is done by SQL itself.\nCheck `documentation on integrity`_ for a more detailed explanation on testing\nthe integrity of your ETL. We also have a set of `common macros`_ that can be\nhelpful to start with.\n\n.. _documentation on integrity: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/integrity.rst\n.. _common macros: https://github.com/socialpoint-labs/sqlbucket/blob/master/documentation/integrity_macros.rst\n\n\nSee below a full example that will actually first run your ETL, and then run\nyour integrity checks for a given database configuration.\n\n\n.. code-block:: python\n\n    from sqlbucket import SQLBucket\n\n    connections = {\n        'db_demo': 'postgresql://user:password@host:5439/database'\n    }\n\n    bucket = SQLBucket(connections=connections)\n    project = bucket.load_project(\n        project_name='my_first_project',\n        connection_name='db_demo',\n        variables={'foo': 1}\n    )\n\n    # to run ETL\n    project.run()\n\n    # to run integrity\n    project.run_integrity()\n\n\nWe recommend setting your connection urls as environment variables for security\npurposes.\n\nTemplate project\n----------------\n\nTo get you up to speed, you can create a fork of the `SQLBucket template project`_\nand start building SQL data pipelines within minutes.\n\n.. _SQLBucket template project: https://github.com/philippe2803/sqlbucket-template\n\n\nContributing\n------------\n\nFor guidance on how to make a contribution to SQLBucket, see the `contributing guidelines`_.\n\n.. _contributing guidelines: https://github.com/socialpoint-labs/sqlbucket/blob/master/CONTRIBUTING.rst\n\n\nLinks\n-----\n\n* License: `MIT <https://github.com/socialpoint-labs/sqlbucket/blob/master/LICENSE>`_\n* Releases: https://pypi.org/project/sqlbucket/\n* Code: https://github.com/socialpoint-labs/sqlbucket\n* Issue tracker: https://github.com/socialpoint-labs/sqlbucket/issues\n\n\n.. _pip: https://pip.pypa.io/en/stable/quickstart/\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2019 Social Point  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "SQLBucket - Write your SQL ETL flow and ETL integrity tool.",
    "version": "0.4.4",
    "split_keywords": [
        "sql",
        "etl",
        "data-integrity"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "a945fd4401532eac97678a36e6ceb54d",
                "sha256": "bf3e33c8512f07f04603b38b734d81c2737bcd603d7e8bdaece7c413b775f3e1"
            },
            "downloads": -1,
            "filename": "sqlbucket-0.4.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a945fd4401532eac97678a36e6ceb54d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 16457,
            "upload_time": "2022-12-01T16:13:01",
            "upload_time_iso_8601": "2022-12-01T16:13:01.602959Z",
            "url": "https://files.pythonhosted.org/packages/b0/31/729f37c060cd08c0fb602a2994eecb0800021cc96c8dc0dc49474946e75a/sqlbucket-0.4.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "5c4f3bfc98811fbbfcc00d38619a182e",
                "sha256": "12863ce991d8331f523b78c2dc4d28818fc448a13a59bc39ceac4ec0a49b90ee"
            },
            "downloads": -1,
            "filename": "sqlbucket-0.4.4.tar.gz",
            "has_sig": false,
            "md5_digest": "5c4f3bfc98811fbbfcc00d38619a182e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 15886,
            "upload_time": "2022-12-01T16:13:03",
            "upload_time_iso_8601": "2022-12-01T16:13:03.219948Z",
            "url": "https://files.pythonhosted.org/packages/2b/af/2868e47db69bdf077b0ad9a9eb7901e525f07a5acda0fbe9629d1396802d/sqlbucket-0.4.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-01 16:13:03",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "sqlbucket"
}
        
Elapsed time: 0.01471s