bigquery-test-kit


Namebigquery-test-kit JSON
Version 0.5.0 PyPI version JSON
download
home_pagehttps://github.com/tiboun/python-bigquery-test-kit
SummaryBigQuery test kit
upload_time2024-02-05 09:28:40
maintainer
docs_urlNone
authorBounkong Khamphousone
requires_python>=3
licenseMIT
keywords bigquery testing test-kit bqtk dataset table isolation dsl immutability sql test
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # bigquery-test-kit, a testing framework to be more confident in your BigQuery SQL

[BigQuery](https://cloud.google.com/bigquery) doesn't provide any locally runnabled server,
hence tests need to be run in Big Query itself.

bigquery-test-kit enables Big Query testing by providing you an almost immutable DSL that allows you to :

 - create and delete dataset
 - create and delete table, partitioned or not
 - load csv or json data into tables
 - run query templates
 - transform json or csv data into a data literal or a temp table

You can, therefore, test your query with data as literals or instantiate
datasets and tables in projects and load data into them.

It's faster to run query with data as literals but using materialized tables is mandatory for some use cases.
In fact, data literal may add complexity to your request and therefore be rejected by BigQuery.
In such a situation, temporary tables may come to the rescue as they don't rely on data loading but on data literals.
Complexity will then almost be like you where looking into a real table.

Immutability allows you to share datasets and tables definitions as a fixture and use it accros all tests,
adapt the definitions as necessary without worrying about mutations.

In order to have reproducible tests, BQ-test-kit add the ability to create isolated dataset or table,
thus query's outputs are predictable and assertion can be done in details.
If you are running simple queries (no DML), you can use data literal to make test running faster.

Template queries are rendered via [varsubst](https://pypi.org/project/varsubst/) but you can provide your own
interpolator by extending *bq_test_kit.interpolators.base_interpolator.BaseInterpolator*. Supported templates are
those supported by varsubst, namely envsubst-like (shell variables) or jinja powered.

In order to benefit from those interpolators, you will need to install one of the following extras,
**bq-test-kit[shell]** or **bq-test-kit[jinja2]**.

Usage
=====

Common setup with materialized tables
-------------------------------------

```python
from google.cloud.bigquery.client import Client
from bq_test_kit.bq_test_kit import BQTestKit
from bq_test_kit.bq_test_kit_config import BQTestKitConfig
from bq_test_kit.resource_loaders.package_file_loader import PackageFileLoader

client = Client(location="EU")
bqtk_conf = BQTestKitConfig().with_test_context("basic")
bqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)
# project() uses default one specified by GOOGLE_CLOUD_PROJECT environment variable
with bqtk.project().dataset("my_dataset").isolate().clean_and_keep() as d:
    # dataset `GOOGLE_CLOUD_PROJECT.my_dataset_basic` is created
    # isolation is done via isolate() and the given context.
    # if you are forced to use existing dataset, you must use noop().
    #
    # clean and keep will keep clean dataset if it exists before its creation.
    # Then my_dataset will be kept. This allows user to interact with BigQuery console afterwards.
    # Default behavior is to create and clean.
    schema = [SchemaField("f1", field_type="STRING"), SchemaField("f2", field_type="INT64")]
    with d.table("my_table", schema=schema).clean_and_keep() as t:
        # table `GOOGLE_CLOUD_PROJECT.my_dataset_basic.my_table` is created
        # noop() and isolate() are also supported for tables.
        pfl = PackageFileLoader("tests/it/bq_test_kit/bq_dsl/bq_resources/data_loaders/resources/dummy_data.csv")
        t.csv_loader(from_=pfl).load()
        result = bqtk.query_template(from_=f"select count(*) as nb from `{t.fqdn()}`").run()
        assert len(result.rows) > 0
    # table `GOOGLE_CLOUD_PROJECT.my_dataset_basic.my_table` is deleted
# dataset `GOOGLE_CLOUD_PROJECT.my_dataset_basic` is deleted
```

Advanced setup with materialized tables
---------------------------------------

```python
from google.cloud.bigquery.client import Client
from bq_test_kit.bq_test_kit import BQTestKit
from bq_test_kit.bq_test_kit_config import BQTestKitConfig
from bq_test_kit.resource_loaders.package_file_loader import PackageFileLoader

client = Client(location="EU")
bqtk_conf = BQTestKitConfig().with_test_context("basic")
bqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)
p = bqtk.project("it") \
        .dataset("dataset_foo").isolate() \
        .table("table_foofoo") \
        .table("table_foobar") \
        .project.dataset("dataset_bar").isolate() \
        .table("table_barfoo") \
        .table("table_barbar") \
        .project
table_foofoo, table_foobar, table_barfoo, table_barbar = None, None, None, None
with Tables.from_(p, p) as tables:
    # create datasets and tables in the order built with the dsl. Then, a tuples of all tables are returned.
    assert len(tables) == 4
    table_names = [t.name for t in tables]
    assert table_names == ["table_foofoo",
                           "table_foobar",
                           "table_barfoo",
                           "table_barbar"]
    table_foofoo, table_foobar, table_barfoo, table_barbar = tables
    assert table_foofoo.show() is not None
    assert table_foobar.show() is not None
    assert table_barfoo.show() is not None
    assert table_barbar.show() is not None
```

Simple BigQuery SQL test with data literals
-------------------------------------------

```python
import pytz

from datetime import datetime
from google.cloud.bigquery.client import Client
from google.cloud.bigquery.schema import SchemaField
from bq_test_kit.bq_test_kit import BQTestKit
from bq_test_kit.bq_test_kit_config import BQTestKitConfig
from bq_test_kit.data_literal_transformers.json_data_literal_transformer import JsonDataLiteralTransformer
from bq_test_kit.interpolators.shell_interpolator import ShellInterpolator

client = Client(location="EU")
bqtk_conf = BQTestKitConfig().with_test_context("basic")
bqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)
results = bqtk.query_template(from_="""
    SELECT
        f.foo, b.bar, e.baz, f._partitiontime as pt
    FROM
        ${TABLE_FOO} f
        INNER JOIN ${TABLE_BAR} b ON f.foobar = b.foobar
        LEFT JOIN ${TABLE_EMPTY} e ON b.foobar = e.foobar
""").with_datum({
    "TABLE_FOO": (['{"foobar": "1", "foo": 1, "_PARTITIONTIME": "2020-11-26 17:09:03.967259 UTC"}'], [SchemaField("foobar", "STRING"), SchemaField("foo", "INT64"), SchemaField("_PARTITIONTIME", "TIMESTAMP")]),
    "TABLE_BAR": (['{"foobar": "1", "bar": 2}'], [SchemaField("foobar", "STRING"), SchemaField("bar", "INT64")]),
    "TABLE_EMPTY": (None, [SchemaField("foobar", "STRING"), SchemaField("baz", "INT64")])
    }) \
    .as_data_literals() \
    .loaded_with(JsonDataLiteralTransformer()) \
    .add_interpolator(ShellInterpolator()) \
    .run()
assert len(results.rows) == 1
assert results.rows == [{"foo": 1, "bar": 2, "baz": None, "pt": datetime(2020, 11, 26, 17, 9, 3, 967259, pytz.UTC)}]
```

Simple BigQuery SQL test with temp tables
-----------------------------------------

```python
import pytz

from datetime import datetime
from google.cloud.bigquery.client import Client
from google.cloud.bigquery.schema import SchemaField
from bq_test_kit.bq_test_kit import BQTestKit
from bq_test_kit.bq_test_kit_config import BQTestKitConfig
from bq_test_kit.data_literal_transformers.json_data_literal_transformer import JsonDataLiteralTransformer
from bq_test_kit.interpolators.shell_interpolator import ShellInterpolator

client = Client(location="EU")
bqtk_conf = BQTestKitConfig().with_test_context("basic")
bqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)
results = bqtk.query_template(from_="""
    SELECT
        f.foo, b.bar, e.baz, f._partitiontime as pt
    FROM
        ${TABLE_FOO} f
        INNER JOIN ${TABLE_BAR} b ON f.foobar = b.foobar
        LEFT JOIN ${TABLE_EMPTY} e ON b.foobar = e.foobar
""").with_datum({
    "TABLE_FOO": (['{"foobar": "1", "foo": 1, "_PARTITIONTIME": "2020-11-26 17:09:03.967259 UTC"}'], [SchemaField("foobar", "STRING"), SchemaField("foo", "INT64"), SchemaField("_PARTITIONTIME", "TIMESTAMP")]),
    "TABLE_BAR": (['{"foobar": "1", "bar": 2}'], [SchemaField("foobar", "STRING"), SchemaField("bar", "INT64")]),
    "TABLE_EMPTY": (None, [SchemaField("foobar", "STRING"), SchemaField("baz", "INT64")])
    }) \
    .loaded_with(JsonDataLiteralTransformer()) \
    .add_interpolator(ShellInterpolator()) \
    .run()
assert len(results.rows) == 1
assert results.rows == [{"foo": 1, "bar": 2, "baz": None, "pt": datetime(2020, 11, 26, 17, 9, 3, 967259, pytz.UTC)}]
```

More usage can be found in [it tests](https://github.com/tiboun/python-bq-test-kit/tree/main/tests/it).

Concepts
========

Resource Loaders
----------------

Currently, the only resource loader available is *bq_test_kit.resource_loaders.package_file_loader.PackageFileLoader*.
It allows you to load a file from a package, so you can load any file from your source code.

You can implement yours by extending *bq_test_kit.resource_loaders.base_resource_loader.BaseResourceLoader*.
If so, please create a merge request if you think that yours may be interesting for others.

Interpolators
-------------

Interpolators enable variable substitution within a template.
They lay on dictionaries which can be in a global scope or interpolator scope.
While rendering template, interpolator scope's dictionary is merged into global scope thus,
interpolator scope takes precedence over global one.

You can benefit from two interpolators by installing the extras **bq-test-kit[shell]** or **bq-test-kit[jinja2]**.
Those extra allows you to render you query templates with envsubst-like variable or jinja.
You can define yours by extending *bq_test_kit.interpolators.BaseInterpolator*.


```python
from google.cloud.bigquery.client import Client
from bq_test_kit.bq_test_kit import BQTestKit
from bq_test_kit.bq_test_kit_config import BQTestKitConfig
from bq_test_kit.resource_loaders.package_file_loader import PackageFileLoader
from bq_test_kit.interpolators.jinja_interpolator import JinjaInterpolator
from bq_test_kit.interpolators.shell_interpolator import ShellInterpolator

client = Client(location="EU")
bqtk_conf = BQTestKitConfig().with_test_context("basic")
bqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)

result = bqtk.query_template(from_="select ${NB_USER} as nb") \
            .with_global_dict({"NB_USER": "2"}) \
            .add_interpolator(ShellInterpolator()) \
            .run()
assert len(result.rows) == 1
assert result.rows[0]["nb"] == 2
result = bqtk.query_template(from_="select {{NB_USER}} as nb") \
            .with_interpolators([JinjaInterpolator({"NB_USER": "3"})]) \
            .run()
assert len(result.rows) == 1
assert result.rows[0]["nb"] == 3
```


Data loaders
------------

Supported data loaders are csv and json only even if Big Query API support more.
Data loaders were restricted to those because they can be easily modified by a human and are maintainable.

If you need to support more, you can still load data by instantiating
*bq_test_kit.bq_dsl.bq_resources.data_loaders.base_data_loader.BaseDataLoader*.


Data Literal Transformers
-------------------------

Supported data literal transformers are csv and json.
Data Literal Transformers can be less strict than their counter part, Data Loaders.
In fact, they allow to use cast technique to transform string to bytes or cast a date like to its target type.
Furthermore, in json, another format is allowed, JSON_ARRAY.
This allows to have a better maintainability of the test resources.

Data Literal Transformers allows you to specify _partitiontime or _partitiondate as well,
thus you can specify all your data in one file and still matching the native table behavior.
If you were using Data Loader to load into an ingestion time partitioned table,
you would have to load data into specific partition.
Loading into a specific partition make the time rounded to 00:00:00.

If you need to support a custom format, you may extend BaseDataLiteralTransformer
to benefit from the implemented data literal conversion.
*bq_test_kit.data_literal_transformers.base_data_literal_transformer.BaseDataLiteralTransformer*.

Resource strategies
-------------------

Dataset and table resource management can be changed with one of the following :
 - Noop : don't manage the resource at all
 - CleanBeforeAndAfter : clean before each creation and after each usage.
 - CleanAfter : create without cleaning first and delete after each usage. This is the default behavior.
 - CleanBeforeAndKeepAfter : clean before each creation and don't clean resource after each usage.

The DSL on dataset and table scope provides the following methods in order to change resource strategy :
 - noop : set to Noop
 - clean_and_keep : set to CleanBeforeAndKeepAfter
 - with_resource_strategy : set to any resource strategy you want

Contributions
=============

Contributions are welcome. You can create issue to share a bug or an idea.
You can create merge request as well in order to enhance this project.

Local testing
-------------

Testing is separated in two parts :

 - unit testing : doesn't need interaction with Big Query
 - integration testing : validate behavior against Big Query

In order to run test locally, you must install [tox](https://pypi.org/project/tox/).

After that, you are able to run unit testing with `tox -e clean, py36-ut` from the root folder.

If you plan to run integration testing as well, please use a service account and authenticate yourself with `gcloud auth application-default login` which will set **GOOGLE_APPLICATION_CREDENTIALS** env var. You will have to set **GOOGLE_CLOUD_PROJECT** env var as well in order to run `tox`.

Thanks.


WARNING
=======

DSL may change with breaking change until release of 1.0.0.

Editing in VSCode
=================

In order to benefit from VSCode features such as debugging, you should type the following commands in the root folder of this project.

Linux
-----

 - python3 -m venv .venv
 - source .venv/bin/activate
 - pip3 install -r requirements.txt -r requirements-test.txt -e .

Windows
-------

 - py -3 -m venv .venv
 - .venv\scripts\activate
 - python -m pip install -r requirements.txt -r requirements-test.txt -e .

Changelog
=========

0.5.0
-----
 - add clustering support to tables (thanks to @ckchow)

0.4.3
-----
 - rename project as python-bigquery-test-kit

0.4.2
-----
 - remove load job config in log
 - fix temp tables query generation

0.4.1
-----
 - fix empty array generation for data literals

0.4.0
-----
 - add ability to rely on temp tables or data literals with query template DSL
 - fix generate empty data literal when json array is empty

0.3.1
-----

 - change method typo of with_interpolators
 - fix README.md related to interpolators

0.3.0
-----

 - add data literal transformer package exports

0.2.0
-----

 - Change return type of `Tables.__enter__` (closes #6)
 - Make jinja's local dictionary optional (closes #7)
 - Wrap query result into BQQueryResult (closes #9)


0.1.2
-----

 - Fix time partitioning type in TimeField (closes #3)

0.1.1
-----

 - Fix table reference in Dataset (closes #2)

0.1.0
-----

 - BigQuery resource DSL to create dataset and table (partitioned or not)
 - isolation of dataset and table names
 - results as dict with ease of test on byte arrays. Decoded as base64 string.
 - query templates interpolation made easy
 - csv and json loading into tables, including partitioned one, from code based resources.
 - context manager for cascading creation of BQResource
 - resource definition sharing accross tests made possible with "immutability".

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tiboun/python-bigquery-test-kit",
    "name": "bigquery-test-kit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": "",
    "keywords": "bigquery testing test-kit bqtk dataset table isolation dsl immutability sql test",
    "author": "Bounkong Khamphousone",
    "author_email": "bounkong@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/bd/d1/afe8a16970540197396d7f441e489cd54f9847e055050655932b0f18a0aa/bigquery-test-kit-0.5.0.tar.gz",
    "platform": null,
    "description": "# bigquery-test-kit, a testing framework to be more confident in your BigQuery SQL\n\n[BigQuery](https://cloud.google.com/bigquery) doesn't provide any locally runnabled server,\nhence tests need to be run in Big Query itself.\n\nbigquery-test-kit enables Big Query testing by providing you an almost immutable DSL that allows you to :\n\n - create and delete dataset\n - create and delete table, partitioned or not\n - load csv or json data into tables\n - run query templates\n - transform json or csv data into a data literal or a temp table\n\nYou can, therefore, test your query with data as literals or instantiate\ndatasets and tables in projects and load data into them.\n\nIt's faster to run query with data as literals but using materialized tables is mandatory for some use cases.\nIn fact, data literal may add complexity to your request and therefore be rejected by BigQuery.\nIn such a situation, temporary tables may come to the rescue as they don't rely on data loading but on data literals.\nComplexity will then almost be like you where looking into a real table.\n\nImmutability allows you to share datasets and tables definitions as a fixture and use it accros all tests,\nadapt the definitions as necessary without worrying about mutations.\n\nIn order to have reproducible tests, BQ-test-kit add the ability to create isolated dataset or table,\nthus query's outputs are predictable and assertion can be done in details.\nIf you are running simple queries (no DML), you can use data literal to make test running faster.\n\nTemplate queries are rendered via [varsubst](https://pypi.org/project/varsubst/) but you can provide your own\ninterpolator by extending *bq_test_kit.interpolators.base_interpolator.BaseInterpolator*. Supported templates are\nthose supported by varsubst, namely envsubst-like (shell variables) or jinja powered.\n\nIn order to benefit from those interpolators, you will need to install one of the following extras,\n**bq-test-kit[shell]** or **bq-test-kit[jinja2]**.\n\nUsage\n=====\n\nCommon setup with materialized tables\n-------------------------------------\n\n```python\nfrom google.cloud.bigquery.client import Client\nfrom bq_test_kit.bq_test_kit import BQTestKit\nfrom bq_test_kit.bq_test_kit_config import BQTestKitConfig\nfrom bq_test_kit.resource_loaders.package_file_loader import PackageFileLoader\n\nclient = Client(location=\"EU\")\nbqtk_conf = BQTestKitConfig().with_test_context(\"basic\")\nbqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)\n# project() uses default one specified by GOOGLE_CLOUD_PROJECT environment variable\nwith bqtk.project().dataset(\"my_dataset\").isolate().clean_and_keep() as d:\n    # dataset `GOOGLE_CLOUD_PROJECT.my_dataset_basic` is created\n    # isolation is done via isolate() and the given context.\n    # if you are forced to use existing dataset, you must use noop().\n    #\n    # clean and keep will keep clean dataset if it exists before its creation.\n    # Then my_dataset will be kept. This allows user to interact with BigQuery console afterwards.\n    # Default behavior is to create and clean.\n    schema = [SchemaField(\"f1\", field_type=\"STRING\"), SchemaField(\"f2\", field_type=\"INT64\")]\n    with d.table(\"my_table\", schema=schema).clean_and_keep() as t:\n        # table `GOOGLE_CLOUD_PROJECT.my_dataset_basic.my_table` is created\n        # noop() and isolate() are also supported for tables.\n        pfl = PackageFileLoader(\"tests/it/bq_test_kit/bq_dsl/bq_resources/data_loaders/resources/dummy_data.csv\")\n        t.csv_loader(from_=pfl).load()\n        result = bqtk.query_template(from_=f\"select count(*) as nb from `{t.fqdn()}`\").run()\n        assert len(result.rows) > 0\n    # table `GOOGLE_CLOUD_PROJECT.my_dataset_basic.my_table` is deleted\n# dataset `GOOGLE_CLOUD_PROJECT.my_dataset_basic` is deleted\n```\n\nAdvanced setup with materialized tables\n---------------------------------------\n\n```python\nfrom google.cloud.bigquery.client import Client\nfrom bq_test_kit.bq_test_kit import BQTestKit\nfrom bq_test_kit.bq_test_kit_config import BQTestKitConfig\nfrom bq_test_kit.resource_loaders.package_file_loader import PackageFileLoader\n\nclient = Client(location=\"EU\")\nbqtk_conf = BQTestKitConfig().with_test_context(\"basic\")\nbqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)\np = bqtk.project(\"it\") \\\n        .dataset(\"dataset_foo\").isolate() \\\n        .table(\"table_foofoo\") \\\n        .table(\"table_foobar\") \\\n        .project.dataset(\"dataset_bar\").isolate() \\\n        .table(\"table_barfoo\") \\\n        .table(\"table_barbar\") \\\n        .project\ntable_foofoo, table_foobar, table_barfoo, table_barbar = None, None, None, None\nwith Tables.from_(p, p) as tables:\n    # create datasets and tables in the order built with the dsl. Then, a tuples of all tables are returned.\n    assert len(tables) == 4\n    table_names = [t.name for t in tables]\n    assert table_names == [\"table_foofoo\",\n                           \"table_foobar\",\n                           \"table_barfoo\",\n                           \"table_barbar\"]\n    table_foofoo, table_foobar, table_barfoo, table_barbar = tables\n    assert table_foofoo.show() is not None\n    assert table_foobar.show() is not None\n    assert table_barfoo.show() is not None\n    assert table_barbar.show() is not None\n```\n\nSimple BigQuery SQL test with data literals\n-------------------------------------------\n\n```python\nimport pytz\n\nfrom datetime import datetime\nfrom google.cloud.bigquery.client import Client\nfrom google.cloud.bigquery.schema import SchemaField\nfrom bq_test_kit.bq_test_kit import BQTestKit\nfrom bq_test_kit.bq_test_kit_config import BQTestKitConfig\nfrom bq_test_kit.data_literal_transformers.json_data_literal_transformer import JsonDataLiteralTransformer\nfrom bq_test_kit.interpolators.shell_interpolator import ShellInterpolator\n\nclient = Client(location=\"EU\")\nbqtk_conf = BQTestKitConfig().with_test_context(\"basic\")\nbqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)\nresults = bqtk.query_template(from_=\"\"\"\n    SELECT\n        f.foo, b.bar, e.baz, f._partitiontime as pt\n    FROM\n        ${TABLE_FOO} f\n        INNER JOIN ${TABLE_BAR} b ON f.foobar = b.foobar\n        LEFT JOIN ${TABLE_EMPTY} e ON b.foobar = e.foobar\n\"\"\").with_datum({\n    \"TABLE_FOO\": (['{\"foobar\": \"1\", \"foo\": 1, \"_PARTITIONTIME\": \"2020-11-26 17:09:03.967259 UTC\"}'], [SchemaField(\"foobar\", \"STRING\"), SchemaField(\"foo\", \"INT64\"), SchemaField(\"_PARTITIONTIME\", \"TIMESTAMP\")]),\n    \"TABLE_BAR\": (['{\"foobar\": \"1\", \"bar\": 2}'], [SchemaField(\"foobar\", \"STRING\"), SchemaField(\"bar\", \"INT64\")]),\n    \"TABLE_EMPTY\": (None, [SchemaField(\"foobar\", \"STRING\"), SchemaField(\"baz\", \"INT64\")])\n    }) \\\n    .as_data_literals() \\\n    .loaded_with(JsonDataLiteralTransformer()) \\\n    .add_interpolator(ShellInterpolator()) \\\n    .run()\nassert len(results.rows) == 1\nassert results.rows == [{\"foo\": 1, \"bar\": 2, \"baz\": None, \"pt\": datetime(2020, 11, 26, 17, 9, 3, 967259, pytz.UTC)}]\n```\n\nSimple BigQuery SQL test with temp tables\n-----------------------------------------\n\n```python\nimport pytz\n\nfrom datetime import datetime\nfrom google.cloud.bigquery.client import Client\nfrom google.cloud.bigquery.schema import SchemaField\nfrom bq_test_kit.bq_test_kit import BQTestKit\nfrom bq_test_kit.bq_test_kit_config import BQTestKitConfig\nfrom bq_test_kit.data_literal_transformers.json_data_literal_transformer import JsonDataLiteralTransformer\nfrom bq_test_kit.interpolators.shell_interpolator import ShellInterpolator\n\nclient = Client(location=\"EU\")\nbqtk_conf = BQTestKitConfig().with_test_context(\"basic\")\nbqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)\nresults = bqtk.query_template(from_=\"\"\"\n    SELECT\n        f.foo, b.bar, e.baz, f._partitiontime as pt\n    FROM\n        ${TABLE_FOO} f\n        INNER JOIN ${TABLE_BAR} b ON f.foobar = b.foobar\n        LEFT JOIN ${TABLE_EMPTY} e ON b.foobar = e.foobar\n\"\"\").with_datum({\n    \"TABLE_FOO\": (['{\"foobar\": \"1\", \"foo\": 1, \"_PARTITIONTIME\": \"2020-11-26 17:09:03.967259 UTC\"}'], [SchemaField(\"foobar\", \"STRING\"), SchemaField(\"foo\", \"INT64\"), SchemaField(\"_PARTITIONTIME\", \"TIMESTAMP\")]),\n    \"TABLE_BAR\": (['{\"foobar\": \"1\", \"bar\": 2}'], [SchemaField(\"foobar\", \"STRING\"), SchemaField(\"bar\", \"INT64\")]),\n    \"TABLE_EMPTY\": (None, [SchemaField(\"foobar\", \"STRING\"), SchemaField(\"baz\", \"INT64\")])\n    }) \\\n    .loaded_with(JsonDataLiteralTransformer()) \\\n    .add_interpolator(ShellInterpolator()) \\\n    .run()\nassert len(results.rows) == 1\nassert results.rows == [{\"foo\": 1, \"bar\": 2, \"baz\": None, \"pt\": datetime(2020, 11, 26, 17, 9, 3, 967259, pytz.UTC)}]\n```\n\nMore usage can be found in [it tests](https://github.com/tiboun/python-bq-test-kit/tree/main/tests/it).\n\nConcepts\n========\n\nResource Loaders\n----------------\n\nCurrently, the only resource loader available is *bq_test_kit.resource_loaders.package_file_loader.PackageFileLoader*.\nIt allows you to load a file from a package, so you can load any file from your source code.\n\nYou can implement yours by extending *bq_test_kit.resource_loaders.base_resource_loader.BaseResourceLoader*.\nIf so, please create a merge request if you think that yours may be interesting for others.\n\nInterpolators\n-------------\n\nInterpolators enable variable substitution within a template.\nThey lay on dictionaries which can be in a global scope or interpolator scope.\nWhile rendering template, interpolator scope's dictionary is merged into global scope thus,\ninterpolator scope takes precedence over global one.\n\nYou can benefit from two interpolators by installing the extras **bq-test-kit[shell]** or **bq-test-kit[jinja2]**.\nThose extra allows you to render you query templates with envsubst-like variable or jinja.\nYou can define yours by extending *bq_test_kit.interpolators.BaseInterpolator*.\n\n\n```python\nfrom google.cloud.bigquery.client import Client\nfrom bq_test_kit.bq_test_kit import BQTestKit\nfrom bq_test_kit.bq_test_kit_config import BQTestKitConfig\nfrom bq_test_kit.resource_loaders.package_file_loader import PackageFileLoader\nfrom bq_test_kit.interpolators.jinja_interpolator import JinjaInterpolator\nfrom bq_test_kit.interpolators.shell_interpolator import ShellInterpolator\n\nclient = Client(location=\"EU\")\nbqtk_conf = BQTestKitConfig().with_test_context(\"basic\")\nbqtk = BQTestKit(bq_client=client, bqtk_config=bqtk_conf)\n\nresult = bqtk.query_template(from_=\"select ${NB_USER} as nb\") \\\n            .with_global_dict({\"NB_USER\": \"2\"}) \\\n            .add_interpolator(ShellInterpolator()) \\\n            .run()\nassert len(result.rows) == 1\nassert result.rows[0][\"nb\"] == 2\nresult = bqtk.query_template(from_=\"select {{NB_USER}} as nb\") \\\n            .with_interpolators([JinjaInterpolator({\"NB_USER\": \"3\"})]) \\\n            .run()\nassert len(result.rows) == 1\nassert result.rows[0][\"nb\"] == 3\n```\n\n\nData loaders\n------------\n\nSupported data loaders are csv and json only even if Big Query API support more.\nData loaders were restricted to those because they can be easily modified by a human and are maintainable.\n\nIf you need to support more, you can still load data by instantiating\n*bq_test_kit.bq_dsl.bq_resources.data_loaders.base_data_loader.BaseDataLoader*.\n\n\nData Literal Transformers\n-------------------------\n\nSupported data literal transformers are csv and json.\nData Literal Transformers can be less strict than their counter part, Data Loaders.\nIn fact, they allow to use cast technique to transform string to bytes or cast a date like to its target type.\nFurthermore, in json, another format is allowed, JSON_ARRAY.\nThis allows to have a better maintainability of the test resources.\n\nData Literal Transformers allows you to specify _partitiontime or _partitiondate as well,\nthus you can specify all your data in one file and still matching the native table behavior.\nIf you were using Data Loader to load into an ingestion time partitioned table,\nyou would have to load data into specific partition.\nLoading into a specific partition make the time rounded to 00:00:00.\n\nIf you need to support a custom format, you may extend BaseDataLiteralTransformer\nto benefit from the implemented data literal conversion.\n*bq_test_kit.data_literal_transformers.base_data_literal_transformer.BaseDataLiteralTransformer*.\n\nResource strategies\n-------------------\n\nDataset and table resource management can be changed with one of the following :\n - Noop : don't manage the resource at all\n - CleanBeforeAndAfter : clean before each creation and after each usage.\n - CleanAfter : create without cleaning first and delete after each usage. This is the default behavior.\n - CleanBeforeAndKeepAfter : clean before each creation and don't clean resource after each usage.\n\nThe DSL on dataset and table scope provides the following methods in order to change resource strategy :\n - noop : set to Noop\n - clean_and_keep : set to CleanBeforeAndKeepAfter\n - with_resource_strategy : set to any resource strategy you want\n\nContributions\n=============\n\nContributions are welcome. You can create issue to share a bug or an idea.\nYou can create merge request as well in order to enhance this project.\n\nLocal testing\n-------------\n\nTesting is separated in two parts :\n\n - unit testing : doesn't need interaction with Big Query\n - integration testing : validate behavior against Big Query\n\nIn order to run test locally, you must install [tox](https://pypi.org/project/tox/).\n\nAfter that, you are able to run unit testing with `tox -e clean, py36-ut` from the root folder.\n\nIf you plan to run integration testing as well, please use a service account and authenticate yourself with `gcloud auth application-default login` which will set **GOOGLE_APPLICATION_CREDENTIALS** env var. You will have to set **GOOGLE_CLOUD_PROJECT** env var as well in order to run `tox`.\n\nThanks.\n\n\nWARNING\n=======\n\nDSL may change with breaking change until release of 1.0.0.\n\nEditing in VSCode\n=================\n\nIn order to benefit from VSCode features such as debugging, you should type the following commands in the root folder of this project.\n\nLinux\n-----\n\n - python3 -m venv .venv\n - source .venv/bin/activate\n - pip3 install -r requirements.txt -r requirements-test.txt -e .\n\nWindows\n-------\n\n - py -3 -m venv .venv\n - .venv\\scripts\\activate\n - python -m pip install -r requirements.txt -r requirements-test.txt -e .\n\nChangelog\n=========\n\n0.5.0\n-----\n - add clustering support to tables (thanks to @ckchow)\n\n0.4.3\n-----\n - rename project as python-bigquery-test-kit\n\n0.4.2\n-----\n - remove load job config in log\n - fix temp tables query generation\n\n0.4.1\n-----\n - fix empty array generation for data literals\n\n0.4.0\n-----\n - add ability to rely on temp tables or data literals with query template DSL\n - fix generate empty data literal when json array is empty\n\n0.3.1\n-----\n\n - change method typo of with_interpolators\n - fix README.md related to interpolators\n\n0.3.0\n-----\n\n - add data literal transformer package exports\n\n0.2.0\n-----\n\n - Change return type of `Tables.__enter__` (closes #6)\n - Make jinja's local dictionary optional (closes #7)\n - Wrap query result into BQQueryResult (closes #9)\n\n\n0.1.2\n-----\n\n - Fix time partitioning type in TimeField (closes #3)\n\n0.1.1\n-----\n\n - Fix table reference in Dataset (closes #2)\n\n0.1.0\n-----\n\n - BigQuery resource DSL to create dataset and table (partitioned or not)\n - isolation of dataset and table names\n - results as dict with ease of test on byte arrays. Decoded as base64 string.\n - query templates interpolation made easy\n - csv and json loading into tables, including partitioned one, from code based resources.\n - context manager for cascading creation of BQResource\n - resource definition sharing accross tests made possible with \"immutability\".\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "BigQuery test kit",
    "version": "0.5.0",
    "project_urls": {
        "Homepage": "https://github.com/tiboun/python-bigquery-test-kit"
    },
    "split_keywords": [
        "bigquery",
        "testing",
        "test-kit",
        "bqtk",
        "dataset",
        "table",
        "isolation",
        "dsl",
        "immutability",
        "sql",
        "test"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4b5805377b719326a6ab94a59e6d4a49b70dbe74dce184ba1b84db5b2ed11d25",
                "md5": "9e92b167e6655d99b67dadd37bfd65cf",
                "sha256": "fa7f172e65c80c86a4b87a036e7e39a69f1add21271c574acc3c5cb5db81c3e1"
            },
            "downloads": -1,
            "filename": "bigquery_test_kit-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9e92b167e6655d99b67dadd37bfd65cf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 57131,
            "upload_time": "2024-02-05T09:28:39",
            "upload_time_iso_8601": "2024-02-05T09:28:39.128328Z",
            "url": "https://files.pythonhosted.org/packages/4b/58/05377b719326a6ab94a59e6d4a49b70dbe74dce184ba1b84db5b2ed11d25/bigquery_test_kit-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bdd1afe8a16970540197396d7f441e489cd54f9847e055050655932b0f18a0aa",
                "md5": "838c0eb83ee74493bba3b5a207f167bc",
                "sha256": "c3439fa98dc0f0afc7bf518fa6c71351f4b662d1d356219937ac24f455b58862"
            },
            "downloads": -1,
            "filename": "bigquery-test-kit-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "838c0eb83ee74493bba3b5a207f167bc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 35544,
            "upload_time": "2024-02-05T09:28:40",
            "upload_time_iso_8601": "2024-02-05T09:28:40.954367Z",
            "url": "https://files.pythonhosted.org/packages/bd/d1/afe8a16970540197396d7f441e489cd54f9847e055050655932b0f18a0aa/bigquery-test-kit-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-05 09:28:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tiboun",
    "github_project": "python-bigquery-test-kit",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "bigquery-test-kit"
}
        
Elapsed time: 0.19886s