# macrometa-source-bigquery
Extract data from BigQuery tables.
This is a macrometa source bigquery connector that produces JSON-formatted data
following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).
This tap:
- Pulls data from Google BigQuery tables/views.
- Infers the schema for each resource and produce catalog file.
## Installation
### Step 1: Activate the Google BigQuery API
(originally found in the [Google API docs](https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html))
1. Use [this wizard](https://console.developers.google.com/start/api?id=bigquery-json.googleapis.com) to create or select a project in the Google Developers Console and activate the BigQuery API. Click Continue, then Go to credentials.
2. On the **Add credentials to your project** page, click the **Cancel** button.
3. At the top of the page, select the **OAuth consent screen** tab. Select an **Email address**, enter a **Product name** if not already set, and click the **Save** button.
4. Select the **Credentials** tab, click the **Create credentials** button and select **OAuth client ID**.
5. Select the application type **Other**, enter the name "Macrometa Source BigQuery", and click the **Create** button.
6. Click **OK** to dismiss the resulting dialog.
7. Click the Download button to the right of the client ID.
8. Move this file to your working directory and rename it *client_secrets.json*.
Export the location of the secret file:
```
export GOOGLE_APPLICATION_CREDENTIALS="./client_secret.json"
```
For other authentication method, please see Authentication section.
### Step 2: Install
First, make sure Python 3 is installed on your system or follow these
installation instructions for Mac or Ubuntu.
```
pip install -U macrometa-source-bigquery
```
## Run
### Step 1: Configure
Create a file called tap_config.json in your working directory, following
config.sample.json:
```
{
"streams": [
{"name": "<some_schema_name>",
"table": "`<project>.<dataset>.<table>`",
"columns": ["<col_name_0>", "<col_name_1>", "<col_name_2>"],
"datetime_key": "<your_key>",
"filters": ["country='us'", "state='CA'",
"registered_on>=DATE_ADD(current_date, INTERVAL -7 day)"
] // also optional: these are parsed in 'WHERE' clause
}
],
"start_datetime": "2017-01-01T00:00:00Z", // This can be set at the command line argument
"end_datetime": "2017-02-01T00:00:00Z", // end_datetime is optional
"limit": 100,
"start_always_inclusive": false // default is false, optional
}
```
- The required parameters is at least one stream (one bigquery table/view) to copy.
- It is not a recommended BigQuery practice to use `*` to specify the columns
as it may blow up the cost for a table with a large number of columns.
- `filters` are optional but we strongly recommend using this over a large
partitioned table to control the cost. LIMIT (The authors of tap-bigquery is not
responsible for the cost incurred by running this program. Always test
thoroughly with small data set first.)
- `start_datetime` must also be set in the config file or as the command line
argument (See the next step).
- `limit` will limit the number of results, but it does not result in reduce
the query cost.
The table/view is expected to have a column to indicate the creation or
update date and time so the tap sends the query with `ORDER BY` and use
the column to record the bookmark (See State section).
### Step 2: Create catalog
Run tap-bigquery in discovery mode to let it create json schema file and then
run them together, piping the output of macrometa-source-bigquery to target-csv:
```
macrometa-source-bigquery -c tap_config.json -d > catalog.json
```
### Step 3: Run
macrometa-source-bigquery can be run with any Target. As example, let use
[target-csv](https://github.com/singer-io/target-csv).
```
pip install target-csv
```
Run:
```
macrometa-source-bigquery -c tap_config.json \
--catalog catalog.json --start_datetime '2020-08-01T00:00:00Z' \
--end_datetime '2020-08-02T01:00:00Z' | target-csv --config target_config.json \
> state.json
```
This example should create a csv file in the same directory.
`state.json` should contain a state (bookmark) after the run. (See State section).
Notes:
- start and end datetimes accept ISO 8601 format, can be date only. start datetime
is inclusive, end datetime is not.
- It is recommended to inspect the catalog file and fix the auto-type assignment
if necessary.
- target-csv's target_config.json is optinal.
- macrometa-source-bigquery can produce nested records but it's up to target if the data
writing will be successful. In this example with target-csv, the table is
expected to be flat.
## Authentication
It is recommended to use `macrometa-source-bigquery` with a service account.
- Download the client_secrets.json file for your service account, and place it
on the machine where `macrometa-source-bigquery` will be executed.
- Set a `GOOGLE_APPLICATION_CREDENTIALS` environment variable on the machine,
where the value is the fully qualified path to client_secrets.json
In the testing environment, you can also manually authenticate before runnig
the tap. In this case you do not need `GOOGLE_APPLICATION_CREDENTIALS` defined:
```
gcloud auth application-default login
```
You may also have to set the project:
```
gcloud config set project <project-id>
```
Though not tested, it should also be possible to use the OAuth flow to
authenticate to GCP as well:
- `macrometa-source-bigquery` will attempt to open a new window or tab in your default
browser. If this fails, copy the URL from the console and manually open it
in your browser.
- If you are not already logged into your Google account, you will be prompted
to log in.
- If you are logged into multiple Google accounts, you will be asked to select
one account to use for the authorization.
- Click the **Accept** button to allow `macrometa-source-bigquery` to access your Google BigQuery
table.
- You can close the tab after the signup flow is complete.
## State
This source connector emits [state](https://github.com/singer-io/getting-started/blob/master/docs/CONFIG_AND_STATE.md#state-file).
The command also takes a state file input with `--state <file-name>` option.
If the state is set, start_datetime config and command line argument are
ignored and the datetime value from last_update key is used as the resuming
point.
To avoid the data duplication, start datetime is exclusive
`start_datetime < datetime_column` when the source connector runs with state option. If
you fear a data loss because of this, just use the `--start_datetime` option
instead of state. Or set `start_always_inclusive: true` in configuration.
The source connector itself does not output a state file. It anticipate the target program
or a downstream process to fianlize the state safetly and produce a state file.
## Original repo
https://github.com/Macrometacorp/macrometa-source-bigquery
Raw data
{
"_id": null,
"home_page": "https://github.com/Macrometacorp/macrometa-source-bigquery",
"name": "macrometa-source-bigquery",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Macrometa",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/0a/6a/95dc0d119ab07f37456bae69c2d802341e5e8cf19aa9fc26a4cc9aa77d74/macrometa-source-bigquery-1.0.0.tar.gz",
"platform": null,
"description": "\n\n# macrometa-source-bigquery\n\nExtract data from BigQuery tables.\n\nThis is a macrometa source bigquery connector that produces JSON-formatted data\nfollowing the [Singer spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).\n\nThis tap:\n\n- Pulls data from Google BigQuery tables/views.\n- Infers the schema for each resource and produce catalog file.\n\n## Installation\n\n### Step 1: Activate the Google BigQuery API\n\n (originally found in the [Google API docs](https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html))\n\n 1. Use [this wizard](https://console.developers.google.com/start/api?id=bigquery-json.googleapis.com) to create or select a project in the Google Developers Console and activate the BigQuery API. Click Continue, then Go to credentials.\n 2. On the **Add credentials to your project** page, click the **Cancel** button.\n 3. At the top of the page, select the **OAuth consent screen** tab. Select an **Email address**, enter a **Product name** if not already set, and click the **Save** button.\n 4. Select the **Credentials** tab, click the **Create credentials** button and select **OAuth client ID**.\n 5. Select the application type **Other**, enter the name \"Macrometa Source BigQuery\", and click the **Create** button.\n 6. Click **OK** to dismiss the resulting dialog.\n 7. Click the Download button to the right of the client ID.\n 8. Move this file to your working directory and rename it *client_secrets.json*.\n\n\nExport the location of the secret file:\n\n```\nexport GOOGLE_APPLICATION_CREDENTIALS=\"./client_secret.json\"\n```\n\nFor other authentication method, please see Authentication section.\n\n### Step 2: Install\n\nFirst, make sure Python 3 is installed on your system or follow these \ninstallation instructions for Mac or Ubuntu.\n\n```\npip install -U macrometa-source-bigquery\n```\n\n## Run\n\n### Step 1: Configure\n\nCreate a file called tap_config.json in your working directory, following \nconfig.sample.json:\n\n```\n{\n \"streams\": [\n {\"name\": \"<some_schema_name>\",\n \"table\": \"`<project>.<dataset>.<table>`\",\n \"columns\": [\"<col_name_0>\", \"<col_name_1>\", \"<col_name_2>\"],\n \"datetime_key\": \"<your_key>\",\n \"filters\": [\"country='us'\", \"state='CA'\",\n \"registered_on>=DATE_ADD(current_date, INTERVAL -7 day)\"\n ] // also optional: these are parsed in 'WHERE' clause\n }\n ],\n \"start_datetime\": \"2017-01-01T00:00:00Z\", // This can be set at the command line argument\n \"end_datetime\": \"2017-02-01T00:00:00Z\", // end_datetime is optional\n \"limit\": 100,\n \"start_always_inclusive\": false // default is false, optional\n}\n```\n\n- The required parameters is at least one stream (one bigquery table/view) to copy.\n - It is not a recommended BigQuery practice to use `*` to specify the columns\n as it may blow up the cost for a table with a large number of columns.\n - `filters` are optional but we strongly recommend using this over a large\n partitioned table to control the cost. LIMIT (The authors of tap-bigquery is not\n responsible for the cost incurred by running this program. Always test\n thoroughly with small data set first.)\n- `start_datetime` must also be set in the config file or as the command line\n argument (See the next step).\n- `limit` will limit the number of results, but it does not result in reduce\n the query cost.\n\nThe table/view is expected to have a column to indicate the creation or\nupdate date and time so the tap sends the query with `ORDER BY` and use\nthe column to record the bookmark (See State section).\n\n### Step 2: Create catalog\n\nRun tap-bigquery in discovery mode to let it create json schema file and then\nrun them together, piping the output of macrometa-source-bigquery to target-csv:\n\n```\nmacrometa-source-bigquery -c tap_config.json -d > catalog.json\n```\n\n### Step 3: Run\n\nmacrometa-source-bigquery can be run with any Target. As example, let use\n[target-csv](https://github.com/singer-io/target-csv).\n\n```\npip install target-csv\n```\n\nRun:\n\n```\nmacrometa-source-bigquery -c tap_config.json \\\n --catalog catalog.json --start_datetime '2020-08-01T00:00:00Z' \\\n --end_datetime '2020-08-02T01:00:00Z' | target-csv --config target_config.json \\\n > state.json\n```\n\nThis example should create a csv file in the same directory.\n`state.json` should contain a state (bookmark) after the run. (See State section).\n\nNotes:\n\n- start and end datetimes accept ISO 8601 format, can be date only. start datetime\n is inclusive, end datetime is not.\n- It is recommended to inspect the catalog file and fix the auto-type assignment\n if necessary.\n- target-csv's target_config.json is optinal.\n- macrometa-source-bigquery can produce nested records but it's up to target if the data\n writing will be successful. In this example with target-csv, the table is\n expected to be flat.\n\n## Authentication\n\nIt is recommended to use `macrometa-source-bigquery` with a service account.\n\n- Download the client_secrets.json file for your service account, and place it\n on the machine where `macrometa-source-bigquery` will be executed.\n- Set a `GOOGLE_APPLICATION_CREDENTIALS` environment variable on the machine,\n where the value is the fully qualified path to client_secrets.json\n\nIn the testing environment, you can also manually authenticate before runnig\nthe tap. In this case you do not need `GOOGLE_APPLICATION_CREDENTIALS` defined:\n\n```\ngcloud auth application-default login\n```\n\nYou may also have to set the project:\n\n```\ngcloud config set project <project-id>\n```\n\nThough not tested, it should also be possible to use the OAuth flow to\nauthenticate to GCP as well:\n- `macrometa-source-bigquery` will attempt to open a new window or tab in your default\n browser. If this fails, copy the URL from the console and manually open it\n in your browser.\n- If you are not already logged into your Google account, you will be prompted\n to log in.\n- If you are logged into multiple Google accounts, you will be asked to select\n one account to use for the authorization.\n- Click the **Accept** button to allow `macrometa-source-bigquery` to access your Google BigQuery\n table.\n- You can close the tab after the signup flow is complete.\n\n## State\n\nThis source connector emits [state](https://github.com/singer-io/getting-started/blob/master/docs/CONFIG_AND_STATE.md#state-file).\nThe command also takes a state file input with `--state <file-name>` option.\nIf the state is set, start_datetime config and command line argument are\nignored and the datetime value from last_update key is used as the resuming\npoint.\n\nTo avoid the data duplication, start datetime is exclusive\n`start_datetime < datetime_column` when the source connector runs with state option. If\nyou fear a data loss because of this, just use the `--start_datetime` option\ninstead of state. Or set `start_always_inclusive: true` in configuration.\n\nThe source connector itself does not output a state file. It anticipate the target program\nor a downstream process to fianlize the state safetly and produce a state file.\n\n## Original repo\nhttps://github.com/Macrometacorp/macrometa-source-bigquery\n",
"bugtrack_url": null,
"license": "",
"summary": "Macrometa source bigquery connector for extracting data from BigQuery",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/Macrometacorp/macrometa-source-bigquery"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3787eced78802942d4533b46f0e858bd5a4af497ab2a14ed6a421cd9a0bcff25",
"md5": "da9efc8b14d614d77aef64cf7f64f591",
"sha256": "f1d62117e1c10e411904b8f88656fac108e8844e105bd0df8ee97d8952b1509c"
},
"downloads": -1,
"filename": "macrometa_source_bigquery-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "da9efc8b14d614d77aef64cf7f64f591",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 17821,
"upload_time": "2023-08-01T08:48:43",
"upload_time_iso_8601": "2023-08-01T08:48:43.734477Z",
"url": "https://files.pythonhosted.org/packages/37/87/eced78802942d4533b46f0e858bd5a4af497ab2a14ed6a421cd9a0bcff25/macrometa_source_bigquery-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0a6a95dc0d119ab07f37456bae69c2d802341e5e8cf19aa9fc26a4cc9aa77d74",
"md5": "a483c96c77d4235b723d18d580e1ec14",
"sha256": "a8888b459c0d4a935c45228fd729c9c5b5b57839845ac230fceffcf6386b172c"
},
"downloads": -1,
"filename": "macrometa-source-bigquery-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "a483c96c77d4235b723d18d580e1ec14",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 18902,
"upload_time": "2023-08-01T08:48:44",
"upload_time_iso_8601": "2023-08-01T08:48:44.719043Z",
"url": "https://files.pythonhosted.org/packages/0a/6a/95dc0d119ab07f37456bae69c2d802341e5e8cf19aa9fc26a4cc9aa77d74/macrometa-source-bigquery-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-01 08:48:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Macrometacorp",
"github_project": "macrometa-source-bigquery",
"github_not_found": true,
"lcname": "macrometa-source-bigquery"
}