# pipelinewise-tap-mysql
[![PyPI version](https://badge.fury.io/py/pipelinewise-tap-mysql.svg)](https://badge.fury.io/py/pipelinewise-tap-mysql)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pipelinewise-tap-mysql.svg)](https://pypi.org/project/pipelinewise-tap-mysql/)
[![License: MIT](https://img.shields.io/badge/License-GPLv3-yellow.svg)](https://opensource.org/licenses/GPL-3.0)
[Singer](https://www.singer.io/) tap that extracts data from a [MySQL](https://www.mysql.com/) database and produces JSON-formatted data following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md).
This is a [PipelineWise](https://transferwise.github.io/pipelinewise) compatible tap connector.
## How to use it
The recommended method of running this tap is to use it from [PipelineWise](https://transferwise.github.io/pipelinewise). When running it from PipelineWise you don't need to configure this tap with JSON files and most of things are automated. Please check the related documentation at [Tap MySQL](https://transferwise.github.io/pipelinewise/connectors/taps/mysql.html)
If you want to run this [Singer Tap](https://singer.io) independently please read further.
## Usage
This section dives into basic usage of `tap-mysql` by walking through extracting
data from a table. It assumes that you can connect to and read from a MySQL
database.
### Install
First, make sure Python 3 is installed on your system or follow these
installation instructions for [Mac](http://docs.python-guide.org/en/latest/starting/install3/osx/) or
[Ubuntu](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).
It's recommended to use a virtualenv:
```bash
python3 -m venv venv
pip install pipelinewise-tap-mysql
```
or
```bash
python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install .
```
### Have a source database
There's some important business data siloed in this MySQL database -- we need to
extract it. Here's the table we'd like to sync:
```
mysql> select * from example_db.animals;
+----|----------|----------------------+
| id | name | likes_getting_petted |
+----|----------|----------------------+
| 1 | aardvark | 0 |
| 2 | bear | 0 |
| 3 | cow | 1 |
+----|----------|----------------------+
3 rows in set (0.00 sec)
```
### Create the configuration file
Create a config file containing the database connection credentials, see [sample](config.json.sample).
List of config parameters:
| Parameter | type | required | default | description |
|-------------------|-------------------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| host | string | yes | - | mysql/mariadb host |
| port | int | yes | - | mysql/mariadb port |
| user | string | yes | - | db username |
| password | string | yes | - | db password |
| cursorclass | string | No | `pymysql.cursors.SSCursor` | set cursorclass used by PyMYSQL |
| database | string | No | - | Database to use, None to not use a particular one. Used by PyMYSQL |
| server_id | int | False | Randomly generated int | Used as the slave id when this tap is connecting to the server |
| filter_dbs | string | False | - | Comma separated list of schemas to extract tables only from particular schemas and to improve data extraction performance |
| use_gtid | bool | False | False <br/> | Flag to enable log based replication using GTID |
| engine | string ('mysql' or 'mariadb') | False | 'mysql' | Indicate which flavor the server is, used for LOG_BASED with GTID |
| ssl | string ("true") | No | False | Enable SSL connection |
| ssl_ca | string | No | - | for self-signed SSL |
| ssl_cert | string | No | - | for self-signed SSL |
| ssl_key | string | No | - | for self-signed SSL |
| internal_hostname | string | No | - | Override match hostname for google cloud |
| session_sqls | List of strings | No | ```['SET @@session.time_zone="+0:00"', 'SET @@session.wait_timeout=28800', 'SET @@session.net_read_timeout=3600', 'SET @@session.innodb_lock_wait_timeout=3600']``` | Set session variables dynamically. |
### Discovery mode
The tap can be invoked in discovery mode to find the available tables and
columns in the database:
```bash
$ tap-mysql --config config.json --discover
```
A discovered catalog is output, with a JSON-schema description of each table. A
source table directly corresponds to a Singer stream.
```json
{
"streams": [
{
"tap_stream_id": "example_db-animals",
"table_name": "animals",
"schema": {
"type": "object",
"properties": {
"name": {
"inclusion": "available",
"type": [
"null",
"string"
],
"maxLength": 255
},
"id": {
"inclusion": "automatic",
"minimum": -2147483648,
"maximum": 2147483647,
"type": [
"null",
"integer"
]
},
"likes_getting_petted": {
"inclusion": "available",
"type": [
"null",
"boolean"
]
}
}
},
"metadata": [
{
"breadcrumb": [],
"metadata": {
"row-count": 3,
"table-key-properties": [
"id"
],
"database-name": "example_db",
"selected-by-default": false,
"is-view": false,
}
},
{
"breadcrumb": [
"properties",
"id"
],
"metadata": {
"sql-datatype": "int(11)",
"selected-by-default": true
}
},
{
"breadcrumb": [
"properties",
"name"
],
"metadata": {
"sql-datatype": "varchar(255)",
"selected-by-default": true
}
},
{
"breadcrumb": [
"properties",
"likes_getting_petted"
],
"metadata": {
"sql-datatype": "tinyint(1)",
"selected-by-default": true
}
}
],
"stream": "animals"
}
]
}
```
### Field selection
In sync mode, `tap-mysql` consumes the catalog and looks for tables and fields
have been marked as _selected_ in their associated metadata entries.
Redirect output from the tap's discovery mode to a file so that it can be
modified:
```bash
$ tap-mysql -c config.json --discover > properties.json
```
Then edit `properties.json` to make selections. In this example we want the
`animals` table. The stream's metadata entry (associated with `"breadcrumb": []`)
gets a top-level `selected` flag, as does its columns' metadata entries. Additionally,
we will mark the `animals` table to replicate using a `FULL_TABLE` strategy. For more,
information, see [Replication methods and state file](#replication-methods-and-state-file).
```json
[
{
"breadcrumb": [],
"metadata": {
"row-count": 3,
"table-key-properties": [
"id"
],
"database-name": "example_db",
"selected-by-default": false,
"is-view": false,
"selected": true,
"replication-method": "FULL_TABLE"
}
},
{
"breadcrumb": [
"properties",
"id"
],
"metadata": {
"sql-datatype": "int(11)",
"selected-by-default": true,
"selected": true
}
},
{
"breadcrumb": [
"properties",
"name"
],
"metadata": {
"sql-datatype": "varchar(255)",
"selected-by-default": true,
"selected": true
}
},
{
"breadcrumb": [
"properties",
"likes_getting_petted"
],
"metadata": {
"sql-datatype": "tinyint(1)",
"selected-by-default": true,
"selected": true
}
}
]
```
### Sync mode
With a properties catalog that describes field and table selections, the tap can be invoked in sync mode:
```bash
$ tap-mysql -c config.json --properties properties.json
```
Messages are written to standard output following the Singer specification. The
resultant stream of JSON data can be consumed by a Singer target.
```json
{"value": {"currently_syncing": "example_db-animals"}, "type": "STATE"}
{"key_properties": ["id"], "stream": "animals", "schema": {"properties": {"name": {"inclusion": "available", "maxLength": 255, "type": ["null", "string"]}, "likes_getting_petted": {"inclusion": "available", "type": ["null", "boolean"]}, "id": {"inclusion": "automatic", "minimum": -2147483648, "type": ["null", "integer"], "maximum": 2147483647}}, "type": "object"}, "type": "SCHEMA"}
{"stream": "animals", "version": 1509133344771, "type": "ACTIVATE_VERSION"}
{"record": {"name": "aardvark", "likes_getting_petted": false, "id": 1}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}
{"record": {"name": "bear", "likes_getting_petted": false, "id": 2}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}
{"record": {"name": "cow", "likes_getting_petted": true, "id": 3}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}
{"stream": "animals", "version": 1509133344771, "type": "ACTIVATE_VERSION"}
{"value": {"currently_syncing": "example_db-animals", "bookmarks": {"example_db-animals": {"initial_full_table_complete": true}}}, "type": "STATE"}
{"value": {"currently_syncing": null, "bookmarks": {"example_db-animals": {"initial_full_table_complete": true}}}, "type": "STATE"}
```
## Replication methods and state file
In the above example, we invoked `tap-mysql` without providing a _state_ file and without specifying a replication
method. The ways to replicate a given table are `FULL_TABLE`, `LOG_BASED` and `INCREMENTAL`.
### LOG_BASED
LOG_BASED replication makes use of the server's binary logs (binlogs), this method can work with primary
servers, the tap acts as a replica and requests the primary to stream log events,the tap then consumes events
pertaining to row changes (inserts, updates, deletes), binlog file rotate and gtid events.
Log_based method always requires an initial sync to get a snapshot of the table and current binlog coordinates/gtid
position.
The tap support two ways of consuming log events: using binlog coordinates or GTID, the default behavior is using
binlog coordinates, when turning the `use_gtid` flag, you have to specify the engine flavor (mariadb/mysql) due to
how different are the GTID implementations in these two engines.
When enabling the `use_gtid` flag and the engine is MariaDB, the tap will dynamically infer the GTID pos from
existing binlog coordinate in the state, if the engine is mysql, it will fail.
#### State when using binlog coordinates
```json
{
"bookmarks": {
"example_db-table1": {"log_file": "mysql-binlog.0003", "log_pos": 3244},
"example_db-table2": {"log_file": "mysql-binlog.0001", "log_pos": 42},
"example_db-table3": {"log_file": "mysql-binlog.0003", "log_pos": 100}
}
}
```
#### State when using GTID
```json
{
"bookmarks": {
"example_db-table1": {"log_file": "mysql-binlog.0003", "log_pos": 3244, "gtid": "0:364864374:599"},
"example_db-table2": {"log_file": "mysql-binlog.0001", "log_pos": 42, "gtid": "0:364864374:375"},
"example_db-table3": {"log_file": "mysql-binlog.0003", "log_pos": 100, "gtid": "0:364864374:399"}
}
}
```
### Full Table
Full-table replication extracts all data from the source table each time the tap is invoked.
### Incremental
Incremental replication works in conjunction with a state file to only extract
new records each time the tap is invoked. This requires a replication key to be
specified in the table's metadata as well.
#### Example
Let's sync the `animals` table again, but this time using incremental
replication. The replication method and replication key are set in the
table's metadata entry in properties file:
```json
{
"streams": [
{
"tap_stream_id": "example_db-animals",
"table_name": "animals",
"schema": { ... },
"metadata": [
{
"breadcrumb": [],
"metadata": {
"row-count": 3,
"table-key-properties": [
"id"
],
"database-name": "example_db",
"selected-by-default": false,
"is-view": false,
"replication-method": "INCREMENTAL",
"replication-key": "id"
}
},
...
],
"stream": "animals"
}
]
}
```
We have no meaningful state so far, so just invoke the tap in sync mode again
without a state file:
```bash
$ tap-mysql -c config.json --properties properties.json
```
The output messages look very similar to when the table was replicated using the
default `FULL_TABLE` replication method. One important difference is that the
`STATE` messages now contain a `replication_key_value` -- a bookmark or
high-water mark -- for data that was extracted:
```json
{"type": "STATE", "value": {"currently_syncing": "example_db-animals"}}
{"stream": "animals", "type": "SCHEMA", "schema": {"type": "object", "properties": {"id": {"type": ["null", "integer"], "minimum": -2147483648, "maximum": 2147483647, "inclusion": "automatic"}, "name": {"type": ["null", "string"], "inclusion": "available", "maxLength": 255}, "likes_getting_petted": {"type": ["null", "boolean"], "inclusion": "available"}}}, "key_properties": ["id"]}
{"stream": "animals", "type": "ACTIVATE_VERSION", "version": 1509135204169}
{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 1, "name": "aardvark", "likes_getting_petted": false}}
{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 2, "name": "bear", "likes_getting_petted": false}}
{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 3, "name": "cow", "likes_getting_petted": true}}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"version": 1509135204169, "replication_key_value": 3, "replication_key": "id"}}, "currently_syncing": "example_db-animals"}}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"version": 1509135204169, "replication_key_value": 3, "replication_key": "id"}}, "currently_syncing": null}}
```
Note that the final `STATE` message has a `replication_key_value` of `3`,
reflecting that the extraction ended on a record that had an `id` of `3`.
Subsequent invocations of the tap will pick up from this bookmark.
Normally, the target will echo the last `STATE` after it's finished processing
data. For this example, let's manually write a `state.json` file using the
`STATE` message:
```json
{
"bookmarks": {
"example_db-animals": {
"version": 1509135204169,
"replication_key_value": 3,
"replication_key": "id"
}
},
"currently_syncing": null
}
```
Let's add some more animals to our farm:
```
mysql> insert into animals (name, likes_getting_petted) values ('dog', true), ('elephant', true), ('frog', false);
```
```bash
$ tap-mysql -c config.json --properties properties.json --state state.json
```
This invocation extracts any data since (and including) the
`replication_key_value`:
```json
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 3}}, "currently_syncing": "example_db-animals"}}
{"key_properties": ["id"], "schema": {"properties": {"name": {"maxLength": 255, "inclusion": "available", "type": ["null", "string"]}, "id": {"maximum": 2147483647, "minimum": -2147483648, "inclusion": "automatic", "type": ["null", "integer"]}, "likes_getting_petted": {"inclusion": "available", "type": ["null", "boolean"]}}, "type": "object"}, "type": "SCHEMA", "stream": "animals"}
{"type": "ACTIVATE_VERSION", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "cow", "id": 3, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "dog", "id": 4, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "elephant", "id": 5, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "frog", "id": 6, "likes_getting_petted": false}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 6}}, "currently_syncing": "example_db-animals"}}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 6}}, "currently_syncing": null}}
```
## To run tests:
1. You'll need to have a running MySQL or MariaDB server to run the tests. Run the following SQL commands as a privileged user to create the required objects:
```
CREATE USER <mysql-user> IDENTIFIED BY '<mysql-password>';
CREATE DATABASE tap_mysql_test;
GRANT ALL PRIVILEGES ON tap_mysql_test.* TO <mysql-user>;
```
**Note**: The user and password can be anything but the database name needs to be `tap_mysql_test`.
2. Define the environment variables that are required to run the tests:
```
export TAP_MYSQL_HOST=<mysql-host>
export TAP_MYSQL_PORT=<mysql-port>
export TAP_MYSQL_USER=<mysql-user>
export TAP_MYSQL_PASSWORD=<mysql-password>
export TAP_MYSQL_ENGINE=<engine>
```
3. Install python test dependencies in a virtual env
```bash
python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install .[test]
```
4. To run tests:
```bash
nosetests -c .noserc tests
```
### To run pylint:
1. Install python dependencies and run python linter
```
python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install .[test]
pylint --rcfile .pylintrc tap_mysql
```
---
Based on Stitch documentation
Raw data
{
"_id": null,
"home_page": "https://github.com/transferwise/pipelinewise-tap-mysql",
"name": "pipelinewise-tap-mysql",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Wise",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/09/01/252c53770286655a4a55f7b80119589386e7c457e5a51e73fdbf8c675651/pipelinewise_tap_mysql-1.6.0.tar.gz",
"platform": null,
"description": "# pipelinewise-tap-mysql\n\n[![PyPI version](https://badge.fury.io/py/pipelinewise-tap-mysql.svg)](https://badge.fury.io/py/pipelinewise-tap-mysql)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pipelinewise-tap-mysql.svg)](https://pypi.org/project/pipelinewise-tap-mysql/)\n[![License: MIT](https://img.shields.io/badge/License-GPLv3-yellow.svg)](https://opensource.org/licenses/GPL-3.0)\n\n[Singer](https://www.singer.io/) tap that extracts data from a [MySQL](https://www.mysql.com/) database and produces JSON-formatted data following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md).\n\nThis is a [PipelineWise](https://transferwise.github.io/pipelinewise) compatible tap connector.\n\n## How to use it\n\nThe recommended method of running this tap is to use it from [PipelineWise](https://transferwise.github.io/pipelinewise). When running it from PipelineWise you don't need to configure this tap with JSON files and most of things are automated. Please check the related documentation at [Tap MySQL](https://transferwise.github.io/pipelinewise/connectors/taps/mysql.html)\n\nIf you want to run this [Singer Tap](https://singer.io) independently please read further.\n\n## Usage\n\nThis section dives into basic usage of `tap-mysql` by walking through extracting\ndata from a table. It assumes that you can connect to and read from a MySQL\ndatabase.\n\n### Install\n\nFirst, make sure Python 3 is installed on your system or follow these\ninstallation instructions for [Mac](http://docs.python-guide.org/en/latest/starting/install3/osx/) or\n[Ubuntu](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).\n\nIt's recommended to use a virtualenv:\n\n```bash\n python3 -m venv venv\n pip install pipelinewise-tap-mysql\n```\n\nor\n\n```bash\n python3 -m venv venv\n . venv/bin/activate\n pip install --upgrade pip\n pip install .\n```\n\n### Have a source database\n\nThere's some important business data siloed in this MySQL database -- we need to\nextract it. Here's the table we'd like to sync:\n\n```\nmysql> select * from example_db.animals;\n+----|----------|----------------------+\n| id | name | likes_getting_petted |\n+----|----------|----------------------+\n| 1 | aardvark | 0 |\n| 2 | bear | 0 |\n| 3 | cow | 1 |\n+----|----------|----------------------+\n3 rows in set (0.00 sec)\n```\n\n### Create the configuration file\n\nCreate a config file containing the database connection credentials, see [sample](config.json.sample).\n\nList of config parameters:\n\n| Parameter | type | required | default | description |\n|-------------------|-------------------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|\n| host | string | yes | - | mysql/mariadb host |\n| port | int | yes | - | mysql/mariadb port |\n| user | string | yes | - | db username |\n| password | string | yes | - | db password |\n| cursorclass | string | No | `pymysql.cursors.SSCursor` | set cursorclass used by PyMYSQL |\n| database | string | No | - | Database to use, None to not use a particular one. Used by PyMYSQL |\n| server_id | int | False | Randomly generated int | Used as the slave id when this tap is connecting to the server |\n| filter_dbs | string | False | - | Comma separated list of schemas to extract tables only from particular schemas and to improve data extraction performance |\n| use_gtid | bool | False | False <br/> | Flag to enable log based replication using GTID |\n| engine | string ('mysql' or 'mariadb') | False | 'mysql' | Indicate which flavor the server is, used for LOG_BASED with GTID |\n| ssl | string (\"true\") | No | False | Enable SSL connection |\n| ssl_ca | string | No | - | for self-signed SSL |\n| ssl_cert | string | No | - | for self-signed SSL |\n| ssl_key | string | No | - | for self-signed SSL |\n| internal_hostname | string | No | - | Override match hostname for google cloud |\n| session_sqls | List of strings | No | ```['SET @@session.time_zone=\"+0:00\"', 'SET @@session.wait_timeout=28800', 'SET @@session.net_read_timeout=3600', 'SET @@session.innodb_lock_wait_timeout=3600']``` | Set session variables dynamically. |\n\n\n### Discovery mode\n\nThe tap can be invoked in discovery mode to find the available tables and\ncolumns in the database:\n\n```bash\n$ tap-mysql --config config.json --discover\n\n```\n\nA discovered catalog is output, with a JSON-schema description of each table. A\nsource table directly corresponds to a Singer stream.\n\n```json\n{\n \"streams\": [\n {\n \"tap_stream_id\": \"example_db-animals\",\n \"table_name\": \"animals\",\n \"schema\": {\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"inclusion\": \"available\",\n \"type\": [\n \"null\",\n \"string\"\n ],\n \"maxLength\": 255\n },\n \"id\": {\n \"inclusion\": \"automatic\",\n \"minimum\": -2147483648,\n \"maximum\": 2147483647,\n \"type\": [\n \"null\",\n \"integer\"\n ]\n },\n \"likes_getting_petted\": {\n \"inclusion\": \"available\",\n \"type\": [\n \"null\",\n \"boolean\"\n ]\n }\n }\n },\n \"metadata\": [\n {\n \"breadcrumb\": [],\n \"metadata\": {\n \"row-count\": 3,\n \"table-key-properties\": [\n \"id\"\n ],\n \"database-name\": \"example_db\",\n \"selected-by-default\": false,\n \"is-view\": false,\n }\n },\n {\n \"breadcrumb\": [\n \"properties\",\n \"id\"\n ],\n \"metadata\": {\n \"sql-datatype\": \"int(11)\",\n \"selected-by-default\": true\n }\n },\n {\n \"breadcrumb\": [\n \"properties\",\n \"name\"\n ],\n \"metadata\": {\n \"sql-datatype\": \"varchar(255)\",\n \"selected-by-default\": true\n }\n },\n {\n \"breadcrumb\": [\n \"properties\",\n \"likes_getting_petted\"\n ],\n \"metadata\": {\n \"sql-datatype\": \"tinyint(1)\",\n \"selected-by-default\": true\n }\n }\n ],\n \"stream\": \"animals\"\n }\n ]\n}\n\n```\n\n### Field selection\n\nIn sync mode, `tap-mysql` consumes the catalog and looks for tables and fields\nhave been marked as _selected_ in their associated metadata entries.\n\nRedirect output from the tap's discovery mode to a file so that it can be\nmodified:\n\n```bash\n$ tap-mysql -c config.json --discover > properties.json\n```\n\nThen edit `properties.json` to make selections. In this example we want the\n`animals` table. The stream's metadata entry (associated with `\"breadcrumb\": []`) \ngets a top-level `selected` flag, as does its columns' metadata entries. Additionally,\nwe will mark the `animals` table to replicate using a `FULL_TABLE` strategy. For more,\ninformation, see [Replication methods and state file](#replication-methods-and-state-file).\n\n```json\n[\n {\n \"breadcrumb\": [],\n \"metadata\": {\n \"row-count\": 3,\n \"table-key-properties\": [\n \"id\"\n ],\n \"database-name\": \"example_db\",\n \"selected-by-default\": false,\n \"is-view\": false,\n \"selected\": true,\n \"replication-method\": \"FULL_TABLE\"\n }\n },\n {\n \"breadcrumb\": [\n \"properties\",\n \"id\"\n ],\n \"metadata\": {\n \"sql-datatype\": \"int(11)\",\n \"selected-by-default\": true,\n \"selected\": true\n }\n },\n {\n \"breadcrumb\": [\n \"properties\",\n \"name\"\n ],\n \"metadata\": {\n \"sql-datatype\": \"varchar(255)\",\n \"selected-by-default\": true,\n \"selected\": true\n }\n },\n {\n \"breadcrumb\": [\n \"properties\",\n \"likes_getting_petted\"\n ],\n \"metadata\": {\n \"sql-datatype\": \"tinyint(1)\",\n \"selected-by-default\": true,\n \"selected\": true\n }\n }\n]\n```\n\n### Sync mode\n\nWith a properties catalog that describes field and table selections, the tap can be invoked in sync mode:\n\n```bash\n$ tap-mysql -c config.json --properties properties.json\n```\n\nMessages are written to standard output following the Singer specification. The\nresultant stream of JSON data can be consumed by a Singer target.\n\n```json\n{\"value\": {\"currently_syncing\": \"example_db-animals\"}, \"type\": \"STATE\"}\n\n{\"key_properties\": [\"id\"], \"stream\": \"animals\", \"schema\": {\"properties\": {\"name\": {\"inclusion\": \"available\", \"maxLength\": 255, \"type\": [\"null\", \"string\"]}, \"likes_getting_petted\": {\"inclusion\": \"available\", \"type\": [\"null\", \"boolean\"]}, \"id\": {\"inclusion\": \"automatic\", \"minimum\": -2147483648, \"type\": [\"null\", \"integer\"], \"maximum\": 2147483647}}, \"type\": \"object\"}, \"type\": \"SCHEMA\"}\n\n{\"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"ACTIVATE_VERSION\"}\n\n{\"record\": {\"name\": \"aardvark\", \"likes_getting_petted\": false, \"id\": 1}, \"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"RECORD\"}\n\n{\"record\": {\"name\": \"bear\", \"likes_getting_petted\": false, \"id\": 2}, \"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"RECORD\"}\n\n{\"record\": {\"name\": \"cow\", \"likes_getting_petted\": true, \"id\": 3}, \"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"RECORD\"}\n\n{\"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"ACTIVATE_VERSION\"}\n\n{\"value\": {\"currently_syncing\": \"example_db-animals\", \"bookmarks\": {\"example_db-animals\": {\"initial_full_table_complete\": true}}}, \"type\": \"STATE\"}\n\n{\"value\": {\"currently_syncing\": null, \"bookmarks\": {\"example_db-animals\": {\"initial_full_table_complete\": true}}}, \"type\": \"STATE\"}\n```\n\n## Replication methods and state file\n\nIn the above example, we invoked `tap-mysql` without providing a _state_ file and without specifying a replication \nmethod. The ways to replicate a given table are `FULL_TABLE`, `LOG_BASED` and `INCREMENTAL`.\n\n### LOG_BASED\n\nLOG_BASED replication makes use of the server's binary logs (binlogs), this method can work with primary \nservers, the tap acts as a replica and requests the primary to stream log events,the tap then consumes events \npertaining to row changes (inserts, updates, deletes), binlog file rotate and gtid events.\n\nLog_based method always requires an initial sync to get a snapshot of the table and current binlog coordinates/gtid \nposition.\n\nThe tap support two ways of consuming log events: using binlog coordinates or GTID, the default behavior is using \nbinlog coordinates, when turning the `use_gtid` flag, you have to specify the engine flavor (mariadb/mysql) due to \nhow different are the GTID implementations in these two engines.\n\nWhen enabling the `use_gtid` flag and the engine is MariaDB, the tap will dynamically infer the GTID pos from \nexisting binlog coordinate in the state, if the engine is mysql, it will fail.\n\n#### State when using binlog coordinates\n```json\n{\n \"bookmarks\": {\n \"example_db-table1\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 3244},\n \"example_db-table2\": {\"log_file\": \"mysql-binlog.0001\", \"log_pos\": 42},\n \"example_db-table3\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 100}\n }\n}\n```\n\n#### State when using GTID\n```json\n{\n \"bookmarks\": {\n \"example_db-table1\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 3244, \"gtid\": \"0:364864374:599\"},\n \"example_db-table2\": {\"log_file\": \"mysql-binlog.0001\", \"log_pos\": 42, \"gtid\": \"0:364864374:375\"},\n \"example_db-table3\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 100, \"gtid\": \"0:364864374:399\"}\n }\n}\n```\n\n### Full Table\n\nFull-table replication extracts all data from the source table each time the tap is invoked.\n\n### Incremental\n\nIncremental replication works in conjunction with a state file to only extract\nnew records each time the tap is invoked. This requires a replication key to be\nspecified in the table's metadata as well.\n\n#### Example\n\nLet's sync the `animals` table again, but this time using incremental\nreplication. The replication method and replication key are set in the\ntable's metadata entry in properties file:\n\n```json\n{\n \"streams\": [\n {\n \"tap_stream_id\": \"example_db-animals\",\n \"table_name\": \"animals\",\n \"schema\": { ... },\n \"metadata\": [\n {\n \"breadcrumb\": [],\n \"metadata\": {\n \"row-count\": 3,\n \"table-key-properties\": [\n \"id\"\n ],\n \"database-name\": \"example_db\",\n \"selected-by-default\": false,\n \"is-view\": false,\n \"replication-method\": \"INCREMENTAL\",\n \"replication-key\": \"id\"\n }\n },\n ...\n ],\n \"stream\": \"animals\"\n }\n ]\n}\n```\n\nWe have no meaningful state so far, so just invoke the tap in sync mode again\nwithout a state file:\n\n```bash\n$ tap-mysql -c config.json --properties properties.json\n```\n\nThe output messages look very similar to when the table was replicated using the\ndefault `FULL_TABLE` replication method. One important difference is that the\n`STATE` messages now contain a `replication_key_value` -- a bookmark or\nhigh-water mark -- for data that was extracted:\n\n```json\n{\"type\": \"STATE\", \"value\": {\"currently_syncing\": \"example_db-animals\"}}\n\n{\"stream\": \"animals\", \"type\": \"SCHEMA\", \"schema\": {\"type\": \"object\", \"properties\": {\"id\": {\"type\": [\"null\", \"integer\"], \"minimum\": -2147483648, \"maximum\": 2147483647, \"inclusion\": \"automatic\"}, \"name\": {\"type\": [\"null\", \"string\"], \"inclusion\": \"available\", \"maxLength\": 255}, \"likes_getting_petted\": {\"type\": [\"null\", \"boolean\"], \"inclusion\": \"available\"}}}, \"key_properties\": [\"id\"]}\n\n{\"stream\": \"animals\", \"type\": \"ACTIVATE_VERSION\", \"version\": 1509135204169}\n\n{\"stream\": \"animals\", \"type\": \"RECORD\", \"version\": 1509135204169, \"record\": {\"id\": 1, \"name\": \"aardvark\", \"likes_getting_petted\": false}}\n\n{\"stream\": \"animals\", \"type\": \"RECORD\", \"version\": 1509135204169, \"record\": {\"id\": 2, \"name\": \"bear\", \"likes_getting_petted\": false}}\n\n{\"stream\": \"animals\", \"type\": \"RECORD\", \"version\": 1509135204169, \"record\": {\"id\": 3, \"name\": \"cow\", \"likes_getting_petted\": true}}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"version\": 1509135204169, \"replication_key_value\": 3, \"replication_key\": \"id\"}}, \"currently_syncing\": \"example_db-animals\"}}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"version\": 1509135204169, \"replication_key_value\": 3, \"replication_key\": \"id\"}}, \"currently_syncing\": null}}\n```\n\nNote that the final `STATE` message has a `replication_key_value` of `3`,\nreflecting that the extraction ended on a record that had an `id` of `3`.\nSubsequent invocations of the tap will pick up from this bookmark.\n\nNormally, the target will echo the last `STATE` after it's finished processing\ndata. For this example, let's manually write a `state.json` file using the\n`STATE` message:\n\n```json\n{\n \"bookmarks\": {\n \"example_db-animals\": {\n \"version\": 1509135204169,\n \"replication_key_value\": 3,\n \"replication_key\": \"id\"\n }\n },\n \"currently_syncing\": null\n}\n```\n\nLet's add some more animals to our farm:\n\n```\nmysql> insert into animals (name, likes_getting_petted) values ('dog', true), ('elephant', true), ('frog', false);\n```\n\n```bash\n$ tap-mysql -c config.json --properties properties.json --state state.json\n```\n\nThis invocation extracts any data since (and including) the\n`replication_key_value`:\n\n```json\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"replication_key\": \"id\", \"version\": 1509135204169, \"replication_key_value\": 3}}, \"currently_syncing\": \"example_db-animals\"}}\n\n{\"key_properties\": [\"id\"], \"schema\": {\"properties\": {\"name\": {\"maxLength\": 255, \"inclusion\": \"available\", \"type\": [\"null\", \"string\"]}, \"id\": {\"maximum\": 2147483647, \"minimum\": -2147483648, \"inclusion\": \"automatic\", \"type\": [\"null\", \"integer\"]}, \"likes_getting_petted\": {\"inclusion\": \"available\", \"type\": [\"null\", \"boolean\"]}}, \"type\": \"object\"}, \"type\": \"SCHEMA\", \"stream\": \"animals\"}\n\n{\"type\": \"ACTIVATE_VERSION\", \"version\": 1509135204169, \"stream\": \"animals\"}\n\n{\"record\": {\"name\": \"cow\", \"id\": 3, \"likes_getting_petted\": true}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n{\"record\": {\"name\": \"dog\", \"id\": 4, \"likes_getting_petted\": true}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n{\"record\": {\"name\": \"elephant\", \"id\": 5, \"likes_getting_petted\": true}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n{\"record\": {\"name\": \"frog\", \"id\": 6, \"likes_getting_petted\": false}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"replication_key\": \"id\", \"version\": 1509135204169, \"replication_key_value\": 6}}, \"currently_syncing\": \"example_db-animals\"}}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"replication_key\": \"id\", \"version\": 1509135204169, \"replication_key_value\": 6}}, \"currently_syncing\": null}}\n```\n\n## To run tests:\n\n1. You'll need to have a running MySQL or MariaDB server to run the tests. Run the following SQL commands as a privileged user to create the required objects:\n```\nCREATE USER <mysql-user> IDENTIFIED BY '<mysql-password>';\nCREATE DATABASE tap_mysql_test;\nGRANT ALL PRIVILEGES ON tap_mysql_test.* TO <mysql-user>;\n```\n\n**Note**: The user and password can be anything but the database name needs to be `tap_mysql_test`.\n\n2. Define the environment variables that are required to run the tests:\n```\n export TAP_MYSQL_HOST=<mysql-host>\n export TAP_MYSQL_PORT=<mysql-port>\n export TAP_MYSQL_USER=<mysql-user>\n export TAP_MYSQL_PASSWORD=<mysql-password>\n export TAP_MYSQL_ENGINE=<engine>\n```\n\n3. Install python test dependencies in a virtual env\n\n```bash\npython3 -m venv venv\n. venv/bin/activate\npip install --upgrade pip\npip install .[test]\n```\n\n4. To run tests:\n```bash\nnosetests -c .noserc tests\n```\n\n### To run pylint:\n\n1. Install python dependencies and run python linter\n```\n python3 -m venv venv\n . venv/bin/activate\n pip install --upgrade pip\n pip install .[test]\n pylint --rcfile .pylintrc tap_mysql\n```\n\n---\n\nBased on Stitch documentation\n",
"bugtrack_url": null,
"license": null,
"summary": "Singer.io tap for extracting data from MySQL & MariaDB - PipelineWise compatible",
"version": "1.6.0",
"project_urls": {
"Homepage": "https://github.com/transferwise/pipelinewise-tap-mysql"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "26cb1acf7b2ea84348cb1eef82ac5e124f8a23b85391760daddccb12343250e3",
"md5": "45790feb13ff2d50ab28e67f5c35fc4a",
"sha256": "287f46a2aaa782689797268297de6688196a24dee1de1e01e24b611ccaadc0ec"
},
"downloads": -1,
"filename": "pipelinewise_tap_mysql-1.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "45790feb13ff2d50ab28e67f5c35fc4a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 42627,
"upload_time": "2024-05-09T14:28:48",
"upload_time_iso_8601": "2024-05-09T14:28:48.894004Z",
"url": "https://files.pythonhosted.org/packages/26/cb/1acf7b2ea84348cb1eef82ac5e124f8a23b85391760daddccb12343250e3/pipelinewise_tap_mysql-1.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0901252c53770286655a4a55f7b80119589386e7c457e5a51e73fdbf8c675651",
"md5": "51ffb061e6869781b2ff592164edc2d8",
"sha256": "26e22c635b6d839fba3a8d0e2040365bba1c68508046c5ab49c3ee26d13f7eb5"
},
"downloads": -1,
"filename": "pipelinewise_tap_mysql-1.6.0.tar.gz",
"has_sig": false,
"md5_digest": "51ffb061e6869781b2ff592164edc2d8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 38799,
"upload_time": "2024-05-09T14:29:15",
"upload_time_iso_8601": "2024-05-09T14:29:15.286700Z",
"url": "https://files.pythonhosted.org/packages/09/01/252c53770286655a4a55f7b80119589386e7c457e5a51e73fdbf8c675651/pipelinewise_tap_mysql-1.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-09 14:29:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "transferwise",
"github_project": "pipelinewise-tap-mysql",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pipelinewise-tap-mysql"
}