pipelinewise-tap-mysql


Namepipelinewise-tap-mysql JSON
Version 1.6.0 PyPI version JSON
download
home_pagehttps://github.com/transferwise/pipelinewise-tap-mysql
SummarySinger.io tap for extracting data from MySQL & MariaDB - PipelineWise compatible
upload_time2024-05-09 14:29:15
maintainerNone
docs_urlNone
authorWise
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pipelinewise-tap-mysql

[![PyPI version](https://badge.fury.io/py/pipelinewise-tap-mysql.svg)](https://badge.fury.io/py/pipelinewise-tap-mysql)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pipelinewise-tap-mysql.svg)](https://pypi.org/project/pipelinewise-tap-mysql/)
[![License: MIT](https://img.shields.io/badge/License-GPLv3-yellow.svg)](https://opensource.org/licenses/GPL-3.0)

[Singer](https://www.singer.io/) tap that extracts data from a [MySQL](https://www.mysql.com/) database and produces JSON-formatted data following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md).

This is a [PipelineWise](https://transferwise.github.io/pipelinewise) compatible tap connector.

## How to use it

The recommended method of running this tap is to use it from [PipelineWise](https://transferwise.github.io/pipelinewise). When running it from PipelineWise you don't need to configure this tap with JSON files and most of things are automated. Please check the related documentation at [Tap MySQL](https://transferwise.github.io/pipelinewise/connectors/taps/mysql.html)

If you want to run this [Singer Tap](https://singer.io) independently please read further.

## Usage

This section dives into basic usage of `tap-mysql` by walking through extracting
data from a table. It assumes that you can connect to and read from a MySQL
database.

### Install

First, make sure Python 3 is installed on your system or follow these
installation instructions for [Mac](http://docs.python-guide.org/en/latest/starting/install3/osx/) or
[Ubuntu](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).

It's recommended to use a virtualenv:

```bash
  python3 -m venv venv
  pip install pipelinewise-tap-mysql
```

or

```bash
  python3 -m venv venv
  . venv/bin/activate
  pip install --upgrade pip
  pip install .
```

### Have a source database

There's some important business data siloed in this MySQL database -- we need to
extract it. Here's the table we'd like to sync:

```
mysql> select * from example_db.animals;
+----|----------|----------------------+
| id | name     | likes_getting_petted |
+----|----------|----------------------+
|  1 | aardvark |                    0 |
|  2 | bear     |                    0 |
|  3 | cow      |                    1 |
+----|----------|----------------------+
3 rows in set (0.00 sec)
```

### Create the configuration file

Create a config file containing the database connection credentials, see [sample](config.json.sample).

List of config parameters:

| Parameter         | type                          | required | default                                                                                                                                                           | description                                                                                                               |
|-------------------|-------------------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| host              | string                        | yes      | -                                                                                                                                                                 | mysql/mariadb host                                                                                                        |
| port              | int                           | yes      | -                                                                                                                                                                 | mysql/mariadb port                                                                                                        |
| user              | string                        | yes      | -                                                                                                                                                                 | db username                                                                                                               |
| password          | string                        | yes      | -                                                                                                                                                                 | db password                                                                                                               |
| cursorclass       | string                        | No       | `pymysql.cursors.SSCursor`                                                                                                                                        | set cursorclass used by PyMYSQL                                                                                           |
| database          | string                        | No       | -                                                                                                                                                                 | Database to use, None to not use a particular one. Used by PyMYSQL                                                        |
| server_id         | int                           | False    | Randomly generated int                                                                                                                                            | Used as the slave id when this tap is connecting to the server                                                            |
| filter_dbs        | string                        | False    | -                                                                                                                                                                 | Comma separated list of schemas to extract tables only from particular schemas and to improve data extraction performance |
| use_gtid          | bool                          | False    | False                                                    <br/>                                                                                                         | Flag to enable log based replication using GTID               |
| engine            | string ('mysql' or 'mariadb') | False    | 'mysql'                                                                                                                                                           | Indicate which flavor the server is, used for LOG_BASED with GTID                                                         |
| ssl               | string ("true")               | No       | False                                                                                                                                                             | Enable SSL connection                                                                                                     |
| ssl_ca            | string                        | No       | -                                                                                                                                                                 | for self-signed SSL                                                                                                       |
| ssl_cert          | string                        | No       | -                                                                                                                                                                 | for self-signed SSL                                                                                                       |
| ssl_key           | string                        | No       | -                                                                                                                                                                 | for self-signed SSL                                                                                                       |
| internal_hostname | string | No       | -                                                                                                                                                                 | Override match hostname for google cloud                                                                                  |
| session_sqls      | List of strings               | No       | ```['SET @@session.time_zone="+0:00"', 'SET @@session.wait_timeout=28800', 'SET @@session.net_read_timeout=3600', 'SET @@session.innodb_lock_wait_timeout=3600']``` | Set session variables dynamically.                                                                                        |


### Discovery mode

The tap can be invoked in discovery mode to find the available tables and
columns in the database:

```bash
$ tap-mysql --config config.json --discover

```

A discovered catalog is output, with a JSON-schema description of each table. A
source table directly corresponds to a Singer stream.

```json
{
  "streams": [
    {
      "tap_stream_id": "example_db-animals",
      "table_name": "animals",
      "schema": {
        "type": "object",
        "properties": {
          "name": {
            "inclusion": "available",
            "type": [
              "null",
              "string"
            ],
            "maxLength": 255
          },
          "id": {
            "inclusion": "automatic",
            "minimum": -2147483648,
            "maximum": 2147483647,
            "type": [
              "null",
              "integer"
            ]
          },
          "likes_getting_petted": {
            "inclusion": "available",
            "type": [
              "null",
              "boolean"
            ]
          }
        }
      },
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count": 3,
            "table-key-properties": [
              "id"
            ],
            "database-name": "example_db",
            "selected-by-default": false,
            "is-view": false,
          }
        },
        {
          "breadcrumb": [
            "properties",
            "id"
          ],
          "metadata": {
            "sql-datatype": "int(11)",
            "selected-by-default": true
          }
        },
        {
          "breadcrumb": [
            "properties",
            "name"
          ],
          "metadata": {
            "sql-datatype": "varchar(255)",
            "selected-by-default": true
          }
        },
        {
          "breadcrumb": [
            "properties",
            "likes_getting_petted"
          ],
          "metadata": {
            "sql-datatype": "tinyint(1)",
            "selected-by-default": true
          }
        }
      ],
      "stream": "animals"
    }
  ]
}

```

### Field selection

In sync mode, `tap-mysql` consumes the catalog and looks for tables and fields
have been marked as _selected_ in their associated metadata entries.

Redirect output from the tap's discovery mode to a file so that it can be
modified:

```bash
$ tap-mysql -c config.json --discover > properties.json
```

Then edit `properties.json` to make selections. In this example we want the
`animals` table. The stream's metadata entry (associated with `"breadcrumb": []`) 
gets a top-level `selected` flag, as does its columns' metadata entries. Additionally,
we will mark the `animals` table to replicate using a `FULL_TABLE` strategy. For more,
information, see [Replication methods and state file](#replication-methods-and-state-file).

```json
[
  {
    "breadcrumb": [],
    "metadata": {
      "row-count": 3,
      "table-key-properties": [
        "id"
      ],
      "database-name": "example_db",
      "selected-by-default": false,
      "is-view": false,
      "selected": true,
      "replication-method": "FULL_TABLE"
    }
  },
  {
    "breadcrumb": [
      "properties",
      "id"
    ],
    "metadata": {
      "sql-datatype": "int(11)",
      "selected-by-default": true,
      "selected": true
    }
  },
  {
    "breadcrumb": [
      "properties",
      "name"
    ],
    "metadata": {
      "sql-datatype": "varchar(255)",
      "selected-by-default": true,
      "selected": true
    }
  },
  {
    "breadcrumb": [
      "properties",
      "likes_getting_petted"
    ],
    "metadata": {
      "sql-datatype": "tinyint(1)",
      "selected-by-default": true,
      "selected": true
    }
  }
]
```

### Sync mode

With a properties catalog that describes field and table selections, the tap can be invoked in sync mode:

```bash
$ tap-mysql -c config.json --properties properties.json
```

Messages are written to standard output following the Singer specification. The
resultant stream of JSON data can be consumed by a Singer target.

```json
{"value": {"currently_syncing": "example_db-animals"}, "type": "STATE"}

{"key_properties": ["id"], "stream": "animals", "schema": {"properties": {"name": {"inclusion": "available", "maxLength": 255, "type": ["null", "string"]}, "likes_getting_petted": {"inclusion": "available", "type": ["null", "boolean"]}, "id": {"inclusion": "automatic", "minimum": -2147483648, "type": ["null", "integer"], "maximum": 2147483647}}, "type": "object"}, "type": "SCHEMA"}

{"stream": "animals", "version": 1509133344771, "type": "ACTIVATE_VERSION"}

{"record": {"name": "aardvark", "likes_getting_petted": false, "id": 1}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}

{"record": {"name": "bear", "likes_getting_petted": false, "id": 2}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}

{"record": {"name": "cow", "likes_getting_petted": true, "id": 3}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}

{"stream": "animals", "version": 1509133344771, "type": "ACTIVATE_VERSION"}

{"value": {"currently_syncing": "example_db-animals", "bookmarks": {"example_db-animals": {"initial_full_table_complete": true}}}, "type": "STATE"}

{"value": {"currently_syncing": null, "bookmarks": {"example_db-animals": {"initial_full_table_complete": true}}}, "type": "STATE"}
```

## Replication methods and state file

In the above example, we invoked `tap-mysql` without providing a _state_ file and without specifying a replication 
method. The ways to replicate a given table are `FULL_TABLE`, `LOG_BASED` and `INCREMENTAL`.

### LOG_BASED

LOG_BASED replication makes use of the server's binary logs (binlogs), this method can work with primary 
servers, the tap acts as a replica and requests the primary to stream log events,the tap then consumes events 
pertaining to row changes (inserts, updates, deletes), binlog file rotate and gtid events.

Log_based method always requires an initial sync to get a snapshot of the table and current binlog coordinates/gtid 
position.

The tap support two ways of consuming log events: using binlog coordinates or GTID, the default behavior is using 
binlog coordinates, when turning the `use_gtid` flag, you have to specify the engine flavor (mariadb/mysql) due to 
how different are the GTID implementations in these two engines.

When enabling the `use_gtid` flag and the engine is MariaDB, the tap will dynamically infer the GTID pos from 
existing binlog coordinate in the state, if the engine is mysql, it will fail.

#### State when using binlog coordinates
```json
{
  "bookmarks": {
    "example_db-table1": {"log_file": "mysql-binlog.0003", "log_pos": 3244},
    "example_db-table2": {"log_file": "mysql-binlog.0001", "log_pos": 42},
    "example_db-table3": {"log_file": "mysql-binlog.0003", "log_pos": 100}
  }
}
```

#### State when using GTID
```json
{
  "bookmarks": {
    "example_db-table1": {"log_file": "mysql-binlog.0003", "log_pos": 3244, "gtid": "0:364864374:599"},
    "example_db-table2": {"log_file": "mysql-binlog.0001", "log_pos": 42, "gtid": "0:364864374:375"},
    "example_db-table3": {"log_file": "mysql-binlog.0003", "log_pos": 100, "gtid": "0:364864374:399"}
  }
}
```

### Full Table

Full-table replication extracts all data from the source table each time the tap is invoked.

### Incremental

Incremental replication works in conjunction with a state file to only extract
new records each time the tap is invoked. This requires a replication key to be
specified in the table's metadata as well.

#### Example

Let's sync the `animals` table again, but this time using incremental
replication. The replication method and replication key are set in the
table's metadata entry in properties file:

```json
{
  "streams": [
    {
      "tap_stream_id": "example_db-animals",
      "table_name": "animals",
      "schema": { ... },
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count": 3,
            "table-key-properties": [
              "id"
            ],
            "database-name": "example_db",
            "selected-by-default": false,
            "is-view": false,
            "replication-method": "INCREMENTAL",
            "replication-key": "id"
          }
        },
        ...
      ],
      "stream": "animals"
    }
  ]
}
```

We have no meaningful state so far, so just invoke the tap in sync mode again
without a state file:

```bash
$ tap-mysql -c config.json --properties properties.json
```

The output messages look very similar to when the table was replicated using the
default `FULL_TABLE` replication method. One important difference is that the
`STATE` messages now contain a `replication_key_value` -- a bookmark or
high-water mark -- for data that was extracted:

```json
{"type": "STATE", "value": {"currently_syncing": "example_db-animals"}}

{"stream": "animals", "type": "SCHEMA", "schema": {"type": "object", "properties": {"id": {"type": ["null", "integer"], "minimum": -2147483648, "maximum": 2147483647, "inclusion": "automatic"}, "name": {"type": ["null", "string"], "inclusion": "available", "maxLength": 255}, "likes_getting_petted": {"type": ["null", "boolean"], "inclusion": "available"}}}, "key_properties": ["id"]}

{"stream": "animals", "type": "ACTIVATE_VERSION", "version": 1509135204169}

{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 1, "name": "aardvark", "likes_getting_petted": false}}

{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 2, "name": "bear", "likes_getting_petted": false}}

{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 3, "name": "cow", "likes_getting_petted": true}}

{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"version": 1509135204169, "replication_key_value": 3, "replication_key": "id"}}, "currently_syncing": "example_db-animals"}}

{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"version": 1509135204169, "replication_key_value": 3, "replication_key": "id"}}, "currently_syncing": null}}
```

Note that the final `STATE` message has a `replication_key_value` of `3`,
reflecting that the extraction ended on a record that had an `id` of `3`.
Subsequent invocations of the tap will pick up from this bookmark.

Normally, the target will echo the last `STATE` after it's finished processing
data. For this example, let's manually write a `state.json` file using the
`STATE` message:

```json
{
  "bookmarks": {
    "example_db-animals": {
      "version": 1509135204169,
      "replication_key_value": 3,
      "replication_key": "id"
    }
  },
  "currently_syncing": null
}
```

Let's add some more animals to our farm:

```
mysql> insert into animals (name, likes_getting_petted) values ('dog', true), ('elephant', true), ('frog', false);
```

```bash
$ tap-mysql -c config.json --properties properties.json --state state.json
```

This invocation extracts any data since (and including) the
`replication_key_value`:

```json
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 3}}, "currently_syncing": "example_db-animals"}}

{"key_properties": ["id"], "schema": {"properties": {"name": {"maxLength": 255, "inclusion": "available", "type": ["null", "string"]}, "id": {"maximum": 2147483647, "minimum": -2147483648, "inclusion": "automatic", "type": ["null", "integer"]}, "likes_getting_petted": {"inclusion": "available", "type": ["null", "boolean"]}}, "type": "object"}, "type": "SCHEMA", "stream": "animals"}

{"type": "ACTIVATE_VERSION", "version": 1509135204169, "stream": "animals"}

{"record": {"name": "cow", "id": 3, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "dog", "id": 4, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "elephant", "id": 5, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "frog", "id": 6, "likes_getting_petted": false}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}

{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 6}}, "currently_syncing": "example_db-animals"}}

{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 6}}, "currently_syncing": null}}
```

## To run tests:

1. You'll need to have a running MySQL or MariaDB server to run the tests. Run the following SQL commands as a privileged user to create the required objects:
```
CREATE USER <mysql-user> IDENTIFIED BY '<mysql-password>';
CREATE DATABASE tap_mysql_test;
GRANT ALL PRIVILEGES ON tap_mysql_test.* TO <mysql-user>;
```

**Note**: The user and password can be anything but the database name needs to be `tap_mysql_test`.

2. Define the environment variables that are required to run the tests:
```
  export TAP_MYSQL_HOST=<mysql-host>
  export TAP_MYSQL_PORT=<mysql-port>
  export TAP_MYSQL_USER=<mysql-user>
  export TAP_MYSQL_PASSWORD=<mysql-password>
  export TAP_MYSQL_ENGINE=<engine>
```

3. Install python test dependencies in a virtual env

```bash
python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install .[test]
```

4. To run tests:
```bash
nosetests -c .noserc tests
```

### To run pylint:

1. Install python dependencies and run python linter
```
  python3 -m venv venv
  . venv/bin/activate
  pip install --upgrade pip
  pip install .[test]
  pylint --rcfile .pylintrc tap_mysql
```

---

Based on Stitch documentation

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/transferwise/pipelinewise-tap-mysql",
    "name": "pipelinewise-tap-mysql",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Wise",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/09/01/252c53770286655a4a55f7b80119589386e7c457e5a51e73fdbf8c675651/pipelinewise_tap_mysql-1.6.0.tar.gz",
    "platform": null,
    "description": "# pipelinewise-tap-mysql\n\n[![PyPI version](https://badge.fury.io/py/pipelinewise-tap-mysql.svg)](https://badge.fury.io/py/pipelinewise-tap-mysql)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pipelinewise-tap-mysql.svg)](https://pypi.org/project/pipelinewise-tap-mysql/)\n[![License: MIT](https://img.shields.io/badge/License-GPLv3-yellow.svg)](https://opensource.org/licenses/GPL-3.0)\n\n[Singer](https://www.singer.io/) tap that extracts data from a [MySQL](https://www.mysql.com/) database and produces JSON-formatted data following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md).\n\nThis is a [PipelineWise](https://transferwise.github.io/pipelinewise) compatible tap connector.\n\n## How to use it\n\nThe recommended method of running this tap is to use it from [PipelineWise](https://transferwise.github.io/pipelinewise). When running it from PipelineWise you don't need to configure this tap with JSON files and most of things are automated. Please check the related documentation at [Tap MySQL](https://transferwise.github.io/pipelinewise/connectors/taps/mysql.html)\n\nIf you want to run this [Singer Tap](https://singer.io) independently please read further.\n\n## Usage\n\nThis section dives into basic usage of `tap-mysql` by walking through extracting\ndata from a table. It assumes that you can connect to and read from a MySQL\ndatabase.\n\n### Install\n\nFirst, make sure Python 3 is installed on your system or follow these\ninstallation instructions for [Mac](http://docs.python-guide.org/en/latest/starting/install3/osx/) or\n[Ubuntu](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).\n\nIt's recommended to use a virtualenv:\n\n```bash\n  python3 -m venv venv\n  pip install pipelinewise-tap-mysql\n```\n\nor\n\n```bash\n  python3 -m venv venv\n  . venv/bin/activate\n  pip install --upgrade pip\n  pip install .\n```\n\n### Have a source database\n\nThere's some important business data siloed in this MySQL database -- we need to\nextract it. Here's the table we'd like to sync:\n\n```\nmysql> select * from example_db.animals;\n+----|----------|----------------------+\n| id | name     | likes_getting_petted |\n+----|----------|----------------------+\n|  1 | aardvark |                    0 |\n|  2 | bear     |                    0 |\n|  3 | cow      |                    1 |\n+----|----------|----------------------+\n3 rows in set (0.00 sec)\n```\n\n### Create the configuration file\n\nCreate a config file containing the database connection credentials, see [sample](config.json.sample).\n\nList of config parameters:\n\n| Parameter         | type                          | required | default                                                                                                                                                           | description                                                                                                               |\n|-------------------|-------------------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|\n| host              | string                        | yes      | -                                                                                                                                                                 | mysql/mariadb host                                                                                                        |\n| port              | int                           | yes      | -                                                                                                                                                                 | mysql/mariadb port                                                                                                        |\n| user              | string                        | yes      | -                                                                                                                                                                 | db username                                                                                                               |\n| password          | string                        | yes      | -                                                                                                                                                                 | db password                                                                                                               |\n| cursorclass       | string                        | No       | `pymysql.cursors.SSCursor`                                                                                                                                        | set cursorclass used by PyMYSQL                                                                                           |\n| database          | string                        | No       | -                                                                                                                                                                 | Database to use, None to not use a particular one. Used by PyMYSQL                                                        |\n| server_id         | int                           | False    | Randomly generated int                                                                                                                                            | Used as the slave id when this tap is connecting to the server                                                            |\n| filter_dbs        | string                        | False    | -                                                                                                                                                                 | Comma separated list of schemas to extract tables only from particular schemas and to improve data extraction performance |\n| use_gtid          | bool                          | False    | False                                                    <br/>                                                                                                         | Flag to enable log based replication using GTID               |\n| engine            | string ('mysql' or 'mariadb') | False    | 'mysql'                                                                                                                                                           | Indicate which flavor the server is, used for LOG_BASED with GTID                                                         |\n| ssl               | string (\"true\")               | No       | False                                                                                                                                                             | Enable SSL connection                                                                                                     |\n| ssl_ca            | string                        | No       | -                                                                                                                                                                 | for self-signed SSL                                                                                                       |\n| ssl_cert          | string                        | No       | -                                                                                                                                                                 | for self-signed SSL                                                                                                       |\n| ssl_key           | string                        | No       | -                                                                                                                                                                 | for self-signed SSL                                                                                                       |\n| internal_hostname | string | No       | -                                                                                                                                                                 | Override match hostname for google cloud                                                                                  |\n| session_sqls      | List of strings               | No       | ```['SET @@session.time_zone=\"+0:00\"', 'SET @@session.wait_timeout=28800', 'SET @@session.net_read_timeout=3600', 'SET @@session.innodb_lock_wait_timeout=3600']``` | Set session variables dynamically.                                                                                        |\n\n\n### Discovery mode\n\nThe tap can be invoked in discovery mode to find the available tables and\ncolumns in the database:\n\n```bash\n$ tap-mysql --config config.json --discover\n\n```\n\nA discovered catalog is output, with a JSON-schema description of each table. A\nsource table directly corresponds to a Singer stream.\n\n```json\n{\n  \"streams\": [\n    {\n      \"tap_stream_id\": \"example_db-animals\",\n      \"table_name\": \"animals\",\n      \"schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n          \"name\": {\n            \"inclusion\": \"available\",\n            \"type\": [\n              \"null\",\n              \"string\"\n            ],\n            \"maxLength\": 255\n          },\n          \"id\": {\n            \"inclusion\": \"automatic\",\n            \"minimum\": -2147483648,\n            \"maximum\": 2147483647,\n            \"type\": [\n              \"null\",\n              \"integer\"\n            ]\n          },\n          \"likes_getting_petted\": {\n            \"inclusion\": \"available\",\n            \"type\": [\n              \"null\",\n              \"boolean\"\n            ]\n          }\n        }\n      },\n      \"metadata\": [\n        {\n          \"breadcrumb\": [],\n          \"metadata\": {\n            \"row-count\": 3,\n            \"table-key-properties\": [\n              \"id\"\n            ],\n            \"database-name\": \"example_db\",\n            \"selected-by-default\": false,\n            \"is-view\": false,\n          }\n        },\n        {\n          \"breadcrumb\": [\n            \"properties\",\n            \"id\"\n          ],\n          \"metadata\": {\n            \"sql-datatype\": \"int(11)\",\n            \"selected-by-default\": true\n          }\n        },\n        {\n          \"breadcrumb\": [\n            \"properties\",\n            \"name\"\n          ],\n          \"metadata\": {\n            \"sql-datatype\": \"varchar(255)\",\n            \"selected-by-default\": true\n          }\n        },\n        {\n          \"breadcrumb\": [\n            \"properties\",\n            \"likes_getting_petted\"\n          ],\n          \"metadata\": {\n            \"sql-datatype\": \"tinyint(1)\",\n            \"selected-by-default\": true\n          }\n        }\n      ],\n      \"stream\": \"animals\"\n    }\n  ]\n}\n\n```\n\n### Field selection\n\nIn sync mode, `tap-mysql` consumes the catalog and looks for tables and fields\nhave been marked as _selected_ in their associated metadata entries.\n\nRedirect output from the tap's discovery mode to a file so that it can be\nmodified:\n\n```bash\n$ tap-mysql -c config.json --discover > properties.json\n```\n\nThen edit `properties.json` to make selections. In this example we want the\n`animals` table. The stream's metadata entry (associated with `\"breadcrumb\": []`) \ngets a top-level `selected` flag, as does its columns' metadata entries. Additionally,\nwe will mark the `animals` table to replicate using a `FULL_TABLE` strategy. For more,\ninformation, see [Replication methods and state file](#replication-methods-and-state-file).\n\n```json\n[\n  {\n    \"breadcrumb\": [],\n    \"metadata\": {\n      \"row-count\": 3,\n      \"table-key-properties\": [\n        \"id\"\n      ],\n      \"database-name\": \"example_db\",\n      \"selected-by-default\": false,\n      \"is-view\": false,\n      \"selected\": true,\n      \"replication-method\": \"FULL_TABLE\"\n    }\n  },\n  {\n    \"breadcrumb\": [\n      \"properties\",\n      \"id\"\n    ],\n    \"metadata\": {\n      \"sql-datatype\": \"int(11)\",\n      \"selected-by-default\": true,\n      \"selected\": true\n    }\n  },\n  {\n    \"breadcrumb\": [\n      \"properties\",\n      \"name\"\n    ],\n    \"metadata\": {\n      \"sql-datatype\": \"varchar(255)\",\n      \"selected-by-default\": true,\n      \"selected\": true\n    }\n  },\n  {\n    \"breadcrumb\": [\n      \"properties\",\n      \"likes_getting_petted\"\n    ],\n    \"metadata\": {\n      \"sql-datatype\": \"tinyint(1)\",\n      \"selected-by-default\": true,\n      \"selected\": true\n    }\n  }\n]\n```\n\n### Sync mode\n\nWith a properties catalog that describes field and table selections, the tap can be invoked in sync mode:\n\n```bash\n$ tap-mysql -c config.json --properties properties.json\n```\n\nMessages are written to standard output following the Singer specification. The\nresultant stream of JSON data can be consumed by a Singer target.\n\n```json\n{\"value\": {\"currently_syncing\": \"example_db-animals\"}, \"type\": \"STATE\"}\n\n{\"key_properties\": [\"id\"], \"stream\": \"animals\", \"schema\": {\"properties\": {\"name\": {\"inclusion\": \"available\", \"maxLength\": 255, \"type\": [\"null\", \"string\"]}, \"likes_getting_petted\": {\"inclusion\": \"available\", \"type\": [\"null\", \"boolean\"]}, \"id\": {\"inclusion\": \"automatic\", \"minimum\": -2147483648, \"type\": [\"null\", \"integer\"], \"maximum\": 2147483647}}, \"type\": \"object\"}, \"type\": \"SCHEMA\"}\n\n{\"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"ACTIVATE_VERSION\"}\n\n{\"record\": {\"name\": \"aardvark\", \"likes_getting_petted\": false, \"id\": 1}, \"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"RECORD\"}\n\n{\"record\": {\"name\": \"bear\", \"likes_getting_petted\": false, \"id\": 2}, \"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"RECORD\"}\n\n{\"record\": {\"name\": \"cow\", \"likes_getting_petted\": true, \"id\": 3}, \"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"RECORD\"}\n\n{\"stream\": \"animals\", \"version\": 1509133344771, \"type\": \"ACTIVATE_VERSION\"}\n\n{\"value\": {\"currently_syncing\": \"example_db-animals\", \"bookmarks\": {\"example_db-animals\": {\"initial_full_table_complete\": true}}}, \"type\": \"STATE\"}\n\n{\"value\": {\"currently_syncing\": null, \"bookmarks\": {\"example_db-animals\": {\"initial_full_table_complete\": true}}}, \"type\": \"STATE\"}\n```\n\n## Replication methods and state file\n\nIn the above example, we invoked `tap-mysql` without providing a _state_ file and without specifying a replication \nmethod. The ways to replicate a given table are `FULL_TABLE`, `LOG_BASED` and `INCREMENTAL`.\n\n### LOG_BASED\n\nLOG_BASED replication makes use of the server's binary logs (binlogs), this method can work with primary \nservers, the tap acts as a replica and requests the primary to stream log events,the tap then consumes events \npertaining to row changes (inserts, updates, deletes), binlog file rotate and gtid events.\n\nLog_based method always requires an initial sync to get a snapshot of the table and current binlog coordinates/gtid \nposition.\n\nThe tap support two ways of consuming log events: using binlog coordinates or GTID, the default behavior is using \nbinlog coordinates, when turning the `use_gtid` flag, you have to specify the engine flavor (mariadb/mysql) due to \nhow different are the GTID implementations in these two engines.\n\nWhen enabling the `use_gtid` flag and the engine is MariaDB, the tap will dynamically infer the GTID pos from \nexisting binlog coordinate in the state, if the engine is mysql, it will fail.\n\n#### State when using binlog coordinates\n```json\n{\n  \"bookmarks\": {\n    \"example_db-table1\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 3244},\n    \"example_db-table2\": {\"log_file\": \"mysql-binlog.0001\", \"log_pos\": 42},\n    \"example_db-table3\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 100}\n  }\n}\n```\n\n#### State when using GTID\n```json\n{\n  \"bookmarks\": {\n    \"example_db-table1\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 3244, \"gtid\": \"0:364864374:599\"},\n    \"example_db-table2\": {\"log_file\": \"mysql-binlog.0001\", \"log_pos\": 42, \"gtid\": \"0:364864374:375\"},\n    \"example_db-table3\": {\"log_file\": \"mysql-binlog.0003\", \"log_pos\": 100, \"gtid\": \"0:364864374:399\"}\n  }\n}\n```\n\n### Full Table\n\nFull-table replication extracts all data from the source table each time the tap is invoked.\n\n### Incremental\n\nIncremental replication works in conjunction with a state file to only extract\nnew records each time the tap is invoked. This requires a replication key to be\nspecified in the table's metadata as well.\n\n#### Example\n\nLet's sync the `animals` table again, but this time using incremental\nreplication. The replication method and replication key are set in the\ntable's metadata entry in properties file:\n\n```json\n{\n  \"streams\": [\n    {\n      \"tap_stream_id\": \"example_db-animals\",\n      \"table_name\": \"animals\",\n      \"schema\": { ... },\n      \"metadata\": [\n        {\n          \"breadcrumb\": [],\n          \"metadata\": {\n            \"row-count\": 3,\n            \"table-key-properties\": [\n              \"id\"\n            ],\n            \"database-name\": \"example_db\",\n            \"selected-by-default\": false,\n            \"is-view\": false,\n            \"replication-method\": \"INCREMENTAL\",\n            \"replication-key\": \"id\"\n          }\n        },\n        ...\n      ],\n      \"stream\": \"animals\"\n    }\n  ]\n}\n```\n\nWe have no meaningful state so far, so just invoke the tap in sync mode again\nwithout a state file:\n\n```bash\n$ tap-mysql -c config.json --properties properties.json\n```\n\nThe output messages look very similar to when the table was replicated using the\ndefault `FULL_TABLE` replication method. One important difference is that the\n`STATE` messages now contain a `replication_key_value` -- a bookmark or\nhigh-water mark -- for data that was extracted:\n\n```json\n{\"type\": \"STATE\", \"value\": {\"currently_syncing\": \"example_db-animals\"}}\n\n{\"stream\": \"animals\", \"type\": \"SCHEMA\", \"schema\": {\"type\": \"object\", \"properties\": {\"id\": {\"type\": [\"null\", \"integer\"], \"minimum\": -2147483648, \"maximum\": 2147483647, \"inclusion\": \"automatic\"}, \"name\": {\"type\": [\"null\", \"string\"], \"inclusion\": \"available\", \"maxLength\": 255}, \"likes_getting_petted\": {\"type\": [\"null\", \"boolean\"], \"inclusion\": \"available\"}}}, \"key_properties\": [\"id\"]}\n\n{\"stream\": \"animals\", \"type\": \"ACTIVATE_VERSION\", \"version\": 1509135204169}\n\n{\"stream\": \"animals\", \"type\": \"RECORD\", \"version\": 1509135204169, \"record\": {\"id\": 1, \"name\": \"aardvark\", \"likes_getting_petted\": false}}\n\n{\"stream\": \"animals\", \"type\": \"RECORD\", \"version\": 1509135204169, \"record\": {\"id\": 2, \"name\": \"bear\", \"likes_getting_petted\": false}}\n\n{\"stream\": \"animals\", \"type\": \"RECORD\", \"version\": 1509135204169, \"record\": {\"id\": 3, \"name\": \"cow\", \"likes_getting_petted\": true}}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"version\": 1509135204169, \"replication_key_value\": 3, \"replication_key\": \"id\"}}, \"currently_syncing\": \"example_db-animals\"}}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"version\": 1509135204169, \"replication_key_value\": 3, \"replication_key\": \"id\"}}, \"currently_syncing\": null}}\n```\n\nNote that the final `STATE` message has a `replication_key_value` of `3`,\nreflecting that the extraction ended on a record that had an `id` of `3`.\nSubsequent invocations of the tap will pick up from this bookmark.\n\nNormally, the target will echo the last `STATE` after it's finished processing\ndata. For this example, let's manually write a `state.json` file using the\n`STATE` message:\n\n```json\n{\n  \"bookmarks\": {\n    \"example_db-animals\": {\n      \"version\": 1509135204169,\n      \"replication_key_value\": 3,\n      \"replication_key\": \"id\"\n    }\n  },\n  \"currently_syncing\": null\n}\n```\n\nLet's add some more animals to our farm:\n\n```\nmysql> insert into animals (name, likes_getting_petted) values ('dog', true), ('elephant', true), ('frog', false);\n```\n\n```bash\n$ tap-mysql -c config.json --properties properties.json --state state.json\n```\n\nThis invocation extracts any data since (and including) the\n`replication_key_value`:\n\n```json\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"replication_key\": \"id\", \"version\": 1509135204169, \"replication_key_value\": 3}}, \"currently_syncing\": \"example_db-animals\"}}\n\n{\"key_properties\": [\"id\"], \"schema\": {\"properties\": {\"name\": {\"maxLength\": 255, \"inclusion\": \"available\", \"type\": [\"null\", \"string\"]}, \"id\": {\"maximum\": 2147483647, \"minimum\": -2147483648, \"inclusion\": \"automatic\", \"type\": [\"null\", \"integer\"]}, \"likes_getting_petted\": {\"inclusion\": \"available\", \"type\": [\"null\", \"boolean\"]}}, \"type\": \"object\"}, \"type\": \"SCHEMA\", \"stream\": \"animals\"}\n\n{\"type\": \"ACTIVATE_VERSION\", \"version\": 1509135204169, \"stream\": \"animals\"}\n\n{\"record\": {\"name\": \"cow\", \"id\": 3, \"likes_getting_petted\": true}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n{\"record\": {\"name\": \"dog\", \"id\": 4, \"likes_getting_petted\": true}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n{\"record\": {\"name\": \"elephant\", \"id\": 5, \"likes_getting_petted\": true}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n{\"record\": {\"name\": \"frog\", \"id\": 6, \"likes_getting_petted\": false}, \"type\": \"RECORD\", \"version\": 1509135204169, \"stream\": \"animals\"}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"replication_key\": \"id\", \"version\": 1509135204169, \"replication_key_value\": 6}}, \"currently_syncing\": \"example_db-animals\"}}\n\n{\"type\": \"STATE\", \"value\": {\"bookmarks\": {\"example_db-animals\": {\"replication_key\": \"id\", \"version\": 1509135204169, \"replication_key_value\": 6}}, \"currently_syncing\": null}}\n```\n\n## To run tests:\n\n1. You'll need to have a running MySQL or MariaDB server to run the tests. Run the following SQL commands as a privileged user to create the required objects:\n```\nCREATE USER <mysql-user> IDENTIFIED BY '<mysql-password>';\nCREATE DATABASE tap_mysql_test;\nGRANT ALL PRIVILEGES ON tap_mysql_test.* TO <mysql-user>;\n```\n\n**Note**: The user and password can be anything but the database name needs to be `tap_mysql_test`.\n\n2. Define the environment variables that are required to run the tests:\n```\n  export TAP_MYSQL_HOST=<mysql-host>\n  export TAP_MYSQL_PORT=<mysql-port>\n  export TAP_MYSQL_USER=<mysql-user>\n  export TAP_MYSQL_PASSWORD=<mysql-password>\n  export TAP_MYSQL_ENGINE=<engine>\n```\n\n3. Install python test dependencies in a virtual env\n\n```bash\npython3 -m venv venv\n. venv/bin/activate\npip install --upgrade pip\npip install .[test]\n```\n\n4. To run tests:\n```bash\nnosetests -c .noserc tests\n```\n\n### To run pylint:\n\n1. Install python dependencies and run python linter\n```\n  python3 -m venv venv\n  . venv/bin/activate\n  pip install --upgrade pip\n  pip install .[test]\n  pylint --rcfile .pylintrc tap_mysql\n```\n\n---\n\nBased on Stitch documentation\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Singer.io tap for extracting data from MySQL & MariaDB - PipelineWise compatible",
    "version": "1.6.0",
    "project_urls": {
        "Homepage": "https://github.com/transferwise/pipelinewise-tap-mysql"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26cb1acf7b2ea84348cb1eef82ac5e124f8a23b85391760daddccb12343250e3",
                "md5": "45790feb13ff2d50ab28e67f5c35fc4a",
                "sha256": "287f46a2aaa782689797268297de6688196a24dee1de1e01e24b611ccaadc0ec"
            },
            "downloads": -1,
            "filename": "pipelinewise_tap_mysql-1.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "45790feb13ff2d50ab28e67f5c35fc4a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 42627,
            "upload_time": "2024-05-09T14:28:48",
            "upload_time_iso_8601": "2024-05-09T14:28:48.894004Z",
            "url": "https://files.pythonhosted.org/packages/26/cb/1acf7b2ea84348cb1eef82ac5e124f8a23b85391760daddccb12343250e3/pipelinewise_tap_mysql-1.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0901252c53770286655a4a55f7b80119589386e7c457e5a51e73fdbf8c675651",
                "md5": "51ffb061e6869781b2ff592164edc2d8",
                "sha256": "26e22c635b6d839fba3a8d0e2040365bba1c68508046c5ab49c3ee26d13f7eb5"
            },
            "downloads": -1,
            "filename": "pipelinewise_tap_mysql-1.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "51ffb061e6869781b2ff592164edc2d8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 38799,
            "upload_time": "2024-05-09T14:29:15",
            "upload_time_iso_8601": "2024-05-09T14:29:15.286700Z",
            "url": "https://files.pythonhosted.org/packages/09/01/252c53770286655a4a55f7b80119589386e7c457e5a51e73fdbf8c675651/pipelinewise_tap_mysql-1.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-09 14:29:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "transferwise",
    "github_project": "pipelinewise-tap-mysql",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pipelinewise-tap-mysql"
}
        
Elapsed time: 0.23967s