mysql-ch-replicator


Namemysql-ch-replicator JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryTool for replication of MySQL databases to ClickHouse
upload_time2025-11-07 21:38:00
maintainerNone
docs_urlNone
authorFilipp Ozinov
requires_python<4.0,>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # mysql_ch_replicator
![tests](https://github.com/bakwc/mysql_ch_replicator/actions/workflows/tests.yaml/badge.svg)
[![Release][release-image]][releases]
[![License][license-image]][license]

[release-image]: https://img.shields.io/github/v/release/bakwc/mysql_ch_replicator?style=flat
[releases]: https://github.com/bakwc/mysql_ch_replicator/releases

[license-image]: https://img.shields.io/badge/license-MIT-blue.svg?style=flat
[license]: LICENSE

![img](https://raw.githubusercontent.com/bakwc/mysql_ch_replicator/master/mysql_ch.jpg)

`mysql_ch_replicator` is a powerful and efficient tool designed for real-time replication of MySQL databases to ClickHouse.

With a focus on high performance, it utilizes batching heavily and uses C++ extension for faster execution. This tool ensures seamless data integration with support for migrations, schema changes, and correct data management.

## Table of Contents
- [Features](#features)
- [Installation](#installation)
  - [Requirements](#requirements)
  - [Installation](#installation-1)
  - [Docker Installation](#docker-installation)
- [Usage](#usage)
  - [Basic Usage](#basic-usage)
  - [One Time Data Copy](#one-time-data-copy)
  - [Configuration](#configuration)
    - [Required settings](#required-settings)
    - [Optional settings](#optional-settings)
  - [Advanced Features](#advanced-features)
    - [Migrations & Schema Changes](#migrations--schema-changes)
    - [Recovery Without Downtime](#recovery-without-downtime)
    - [Known Limitations](#known-limitations)
- [Production usage](#production-usage)
- [Development](#development)
  - [Running Tests](#running-tests)
- [Contribution](#contribution)
- [License](#license)
- [Acknowledgements](#acknowledgements)

## Features

- **Real-Time Replication**: Keeps your ClickHouse database in sync with MySQL in real-time.
- **High Performance**: Utilizes batching and ports slow parts to C++ (e.g., MySQL internal JSON parsing) for optimal performance (±20K events / second on a single core).
- **Supports Migrations/Schema Changes**: Handles adding, altering, and removing tables without breaking the replication process (*for most cases, [details here](https://github.com/bakwc/mysql_ch_replicator#migrations--schema-changes)).
- **Recovery without Downtime**: Allows for preserving old data while performing initial replication, ensuring continuous operation.
- **Correct Data Removal**: Unlike MaterializedMySQL, `mysql_ch_replicator` ensures physical removal of data.
- **Comprehensive Data Type Support**: Accurately replicates most data types, including JSON, booleans, and more. Easily extensible for additional data types.
- **Multi-Database Handling**: Replicates the binary log once for all databases, optimizing the process compared to `MaterializedMySQL`, which replicates the log separately for each database.

## Installation
### Requirements
 - Linux / MacOS
 - python3.10 or higher

### Installation

To install `mysql_ch_replicator`, use the following command:

```bash
pip install --upgrade mysql_ch_replicator
```

You may need to also compile C++ components if they're not pre-built for your platform.

### Docker Installation

Alternatively, you can use the pre-built Docker image from DockerHub:

```bash
docker pull fippo/mysql-ch-replicator:latest
```

To run the container:

```bash
docker run -d \
  -v /path/to/your/config.yaml:/app/config.yaml \
  -v /path/to/your/data:/app/data \
  fippo/mysql-ch-replicator:latest \
  --config /app/config.yaml run_all
```

Or with environment variables for credentials:

```bash
docker run -d \
  -v /path/to/your/config.yaml:/app/config.yaml \
  -v /path/to/your/data:/app/data \
  -e MYSQL_USER=root \
  -e MYSQL_PASSWORD=secret \
  -e CLICKHOUSE_USER=default \
  -e CLICKHOUSE_PASSWORD=secret \
  fippo/mysql-ch-replicator:latest \
  --config /app/config.yaml run_all
```

Make sure to:
1. Mount your configuration file using the `-v` flag
2. Mount a persistent volume for the data directory
3. Adjust the paths according to your setup
4. Optionally use `-e` flags to override credentials via environment variables

## Usage

### Basic Usage

Here is a [video tutorial](https://www.youtube.com/watch?v=pfbw8VzdIz4)

For realtime data sync from MySQL to ClickHouse:

1. Prepare config file. Use `example_config.yaml` as an example.
2. Configure MySQL and ClickHouse servers:
 - MySQL server configuration file `my.cnf` should include following settings (required to write binary log in raw format, and enable password authentication):
<details>
  <summary>🛠 MySQL Config</summary>

```ini
[mysqld]
# ... other settings ...
gtid_mode = on
enforce_gtid_consistency = 1
binlog_expire_logs_seconds = 864000
max_binlog_size            = 500M
binlog_format              = ROW
```
 - For MariaDB use following settings:
```ini
[mysqld]
# ... other settings ...
gtid_strict_mode = ON
gtid_domain_id = 0
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_expire_logs_seconds = 864000
max_binlog_size = 500M
binlog_format = ROW
```

For `AWS RDS` you need to set following settings in `Parameter groups`:

```
binlog_format                       ROW
binlog_expire_logs_seconds          86400
```

</details>

 - ClickHouse server config `override.xml` should include following settings (it makes clickhouse apply final keyword automatically to handle updates correctly):

 <details>
   <summary>🛠 ClickHouse Config</summary>
 
```xml
<clickhouse>
    <!-- ... other settings ... -->
    <profiles>
        <default>
            <!-- ... other settings ... -->
            <final>1</final>
            <max_query_size>300000000</max_query_size>
            <max_ast_elements>1000000</max_ast_elements>
            <max_expanded_ast_elements>1000000</max_expanded_ast_elements>
        </default>
    </profiles>
</clickhouse>
```

**!!! Double check final setting is applied !!!**

Execute the following command in clickhouse:

`SELECT name, value, changed FROM system.settings WHERE name = 'final'`
Setting should be set to 1. If not, you should:
 * double check the `override.xml` is applied
 * try to modify `users.xml` instead
</details>

3. Start the replication:

```bash
mysql_ch_replicator --config config.yaml run_all
```

This will keep data in ClickHouse updating as you update data in MySQL. It will always be in sync.

### One Time Data Copy

If you just need to copy data once, and don't need continuous synchronization for all changes, you should do following:

1. Prepare config file. Use `example_config.yaml` as an example.
2. Run one-time data copy:

```bash
mysql_ch_replicator --config config.yaml db_replicator --db mysql_db_name --initial_only=True
```
Where `mysql_db_name` is the name of the database you want to copy.

Don't be afraid to interrupt process in the middle. It will save the state and continue copy after restart.

__Hint__: _set `initial_replication_threads` to a number of cpu cores to accelerate initial replication_

### Configuration

`mysql_ch_replicator` can be configured through a configuration file and also by using enviromnent variables to override some of config settings. Here is the config example:

```yaml
mysql:
  host: 'localhost'
  port: 8306
  user: 'root'
  password: 'root'
  charset: 'utf8mb4'  # optional, default is utf8mb4 for full Unicode support

clickhouse:
  host: 'localhost'
  port: 8323
  user: 'default'
  password: 'default'
  connection_timeout: 30        # optional
  send_receive_timeout: 300     # optional

binlog_replicator:
  data_dir: '/home/user/binlog/'  # a new EMPTY directory (for internal storage of data by mysql_ch_replicator itself)
  records_per_file: 100000
  binlog_retention_period: 43200  # optional, how long to keep binlog files in seconds, default 12 hours

databases: 'database_name_pattern_*'
tables: '*'


# OPTIONAL SETTINGS

initial_replication_threads: 4                        # optional

exclude_databases: ['database_10', 'database_*_42']   # optional
exclude_tables: ['meta_table_*']                      # optional

target_databases:                                     # optional
  source_db_in_mysql_1: destination_db_in_clickhouse_1
  source_db_in_mysql_2: destination_db_in_clickhouse_2
  ...

target_tables:                                        # optional
  source_db_in_mysql_1.source_table_in_mysql_1: destination_table_name_1
  source_db_in_mysql_1.source_table_in_mysql_2: destination_table_name_2
  ...

log_level: 'info'               # optional       
optimize_interval: 86400        # optional
auto_restart_interval: 3600     # optional

indexes:                        # optional
  - databases: '*'
    tables: ['test_table']
    index: 'INDEX name_idx name TYPE ngrambf_v1(5, 65536, 4, 0) GRANULARITY 1'

partition_bys:                  # optional
  - databases: '*'
    tables: ['test_table']
    partition_by: 'toYYYYMM(created_at)'

http_host: '0.0.0.0'    # optional
http_port: 9128         # optional

types_mapping:          # optional
  'char(36)': 'UUID'

ignore_deletes: false    # optional, set to true to ignore DELETE operations

mysql_timezone: 'UTC'    # optional, timezone for MySQL timestamp conversion (default: 'UTC')

```

#### Required settings

- `mysql` MySQL connection settings
- `clickhouse` ClickHouse connection settings
- `binlog_replicator.data_dir` Create a new empty directory, it will be used by script to store it's state
- `databases` Databases name pattern to replicate, e.g. `db_*` will match `db_1` `db_2` `db_test`, list is also supported

#### Optional settings
- `initial_replication_threads` - number of threads for initial replication, by default 1, set it to number of cores to accelerate initial data copy
- `tables` - tables to filter, list is also supported
- `exclude_databases` - databases to __exclude__, string or list, eg `'table1*'` or `['table2', 'table3*']`. If same database matches `databases` and `exclude_databases`, exclude has higher priority.
- `exclude_tables` - databases to __exclude__, string or list. If same table matches `tables` and `exclude_tables`, exclude has higher priority.
- `target_databases` - if you want database in ClickHouse to have different name from MySQL database
- `target_tables` - if you want table in ClickHouse to have different name from MySQL table. Specify as `source_database.source_table: target_table_name`. The target database is determined by existing rules (e.g., `target_databases` mapping). This mapping applies to both initial and realtime replication, including DDL operations like ALTER, DROP, etc.
- `log_level` - log level, default is `info`, you can set to `debug` to get maximum information (allowed values are `debug`, `info`, `warning`, `error`, `critical`)
- `optimize_interval` - interval (seconds) between automatic `OPTIMIZE table FINAL` calls. Default 86400 (1 day). This is required to perform all merges guaranteed and avoid increasing of used storage and decreasing performance.
- `auto_restart_interval` - interval (seconds) between automatic db_replicator restart. Default 3600 (1 hour). This is done to reduce memory usage.
- `binlog_retention_period` - how long to keep binlog files in seconds. Default 43200 (12 hours). This setting controls how long the local binlog files are retained before being automatically cleaned up.
- `indexes` - you may want to add some indexes to accelerate performance, eg. ngram index for full-test search, etc. To apply indexes you need to start replication from scratch.
- `partition_bys` - custom PARTITION BY expressions for tables. By default uses `intDiv(id, 4294967)` for integer primary keys. Useful for time-based partitioning like `toYYYYMM(created_at)`.
- `http_host`, `http_port` - http endpoint to control replication, use `/docs` for abailable commands
- `types_mappings` - custom types mapping, eg. you can map char(36) to UUID instead of String, etc.
- `ignore_deletes` - when set to `true`, DELETE operations in MySQL will be ignored during replication. This creates an append-only model where data is only added, never removed. In this mode, the replicator doesn't create a temporary database and instead replicates directly to the target database.
- `mysql_timezone` - timezone to use for MySQL timestamp conversion to ClickHouse DateTime64. Default is `'UTC'`. Accepts any valid timezone name (e.g., `'America/New_York'`, `'Europe/London'`, `'Asia/Tokyo'`). This setting ensures proper timezone handling when converting MySQL timestamp fields to ClickHouse DateTime64 with timezone information.
- `post_initial_replication_commands` - SQL commands to execute in ClickHouse after initial replication completes for each database. Useful for creating materialized views, summary tables, or other database objects. Commands are executed in order, once per database matching the pattern.

```yaml
post_initial_replication_commands:
  - databases: '*'
    commands:
      - 'CREATE TABLE IF NOT EXISTS summary_table (...) ENGINE = SummingMergeTree() ORDER BY (...)'
      - 'CREATE MATERIALIZED VIEW IF NOT EXISTS data_mv TO summary_table AS SELECT ...'
      - 'INSERT INTO summary_table SELECT ... FROM replicated_table'
```

Few more tables / dbs examples:

```yaml
databases: ['my_database_1', 'my_database_2']
tables: ['table_1', 'table_2*']
```

**Environment Variables**: MySQL and ClickHouse credentials can be overridden using environment variables for better security in containerized environments:
- MySQL: `MYSQL_HOST`, `MYSQL_PORT`, `MYSQL_USER`, `MYSQL_PASSWORD`, `MYSQL_CHARSET`
- ClickHouse: `CLICKHOUSE_HOST`, `CLICKHOUSE_PORT`, `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`

### Advanced Features

#### Migrations & Schema Changes

`mysql_ch_replicator` supports the following:

- **Adding Tables**: Automatically starts replicating data from newly added tables.
- **Altering Tables**: Adjusts replication strategy based on schema changes.
- **Removing Tables**: Handles removal of tables without disrupting the replication process.

**WARNING**. While 95% of operations supported, there could be still some unhandled operations. We try to support all of them, but for your safety, please write the CI/CD test that will check your migrations. Test should work a following way:
 - Aplly all your mysql migrations
 - Try to insert some record into mysql (to any table)
 - Check that this record appears in ClickHouse

#### Known Limitations
1. Migrations not supported during initial replication. You should either wait for initial replication finish and then apply migrations, or restart initial replication from scratch (by removing state file).
2. Primary key changes not supported. This is a ClickHouse level limitation, it does not allow to make any changes realted to primary key.
3. Tables without a primary key are not supported.

#### Recovery Without Downtime

In case of a failure or during the initial replication, `mysql_ch_replicator` will preserve old data and continue syncing new data seamlessly. You could remove the state and restart replication from scratch.

## Production usage

Here is the checklist for production. Please read it carefully and confirm every point. If you have any questions - feel free to ask in discussion / issues. Following the checklist is critical to avoid any issues on prod.

#### Checklist

- [ ] No manual changes are made directly to ClickHouse databases or tables. All modifications (indexes, table settings, etc.) must be applied as overrides in the `mysql_ch_replicator` `config.yaml`.
- [ ] All MySQL migrations undergo either CI/CD or manual testing before deployment. No migration is applied to production without being tested. The tests verify that replication continues to work correctly and that new records are successfully replicated from MySQL to ClickHouse after the migration.
- [ ] Read and acknowledged [known limitations](#known-limitations).

## Development

To contribute to `mysql_ch_replicator`, clone the repository and install the required dependencies:

```bash
git clone https://github.com/your-repo/mysql_ch_replicator.git
cd mysql_ch_replicator
pip install -r requirements.txt
```

### Running Tests

1. Use docker-compose to install all requirements:
```bash
cd tests/
docker compose -f docker-compose-tests.yaml up
```
2. Run tests with:
```bash
docker exec -w /app/ -it tests-replicator-1 python3 -m pytest -v -s tests/
```
3. To run a single test:
```bash
docker exec -w /app/ -it tests-replicator-1 python3 -m pytest -v -s tests/ -k test_your_test_name
```

## Contribution

Contributions are welcome! Please open an issue or submit a pull request for any bugs or features you would like to add.

## License

`mysql_ch_replicator` is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

## Acknowledgements

Thank you to all the contributors who have helped build and improve this tool.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mysql-ch-replicator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Filipp Ozinov",
    "author_email": "filipp@bakanov.su",
    "download_url": "https://files.pythonhosted.org/packages/15/07/719620265565e31ae836eb8c4d31ad52ae44de5b683d0139d88eb974eda8/mysql_ch_replicator-0.1.4.tar.gz",
    "platform": null,
    "description": "# mysql_ch_replicator\n![tests](https://github.com/bakwc/mysql_ch_replicator/actions/workflows/tests.yaml/badge.svg)\n[![Release][release-image]][releases]\n[![License][license-image]][license]\n\n[release-image]: https://img.shields.io/github/v/release/bakwc/mysql_ch_replicator?style=flat\n[releases]: https://github.com/bakwc/mysql_ch_replicator/releases\n\n[license-image]: https://img.shields.io/badge/license-MIT-blue.svg?style=flat\n[license]: LICENSE\n\n![img](https://raw.githubusercontent.com/bakwc/mysql_ch_replicator/master/mysql_ch.jpg)\n\n`mysql_ch_replicator` is a powerful and efficient tool designed for real-time replication of MySQL databases to ClickHouse.\n\nWith a focus on high performance, it utilizes batching heavily and uses C++ extension for faster execution. This tool ensures seamless data integration with support for migrations, schema changes, and correct data management.\n\n## Table of Contents\n- [Features](#features)\n- [Installation](#installation)\n  - [Requirements](#requirements)\n  - [Installation](#installation-1)\n  - [Docker Installation](#docker-installation)\n- [Usage](#usage)\n  - [Basic Usage](#basic-usage)\n  - [One Time Data Copy](#one-time-data-copy)\n  - [Configuration](#configuration)\n    - [Required settings](#required-settings)\n    - [Optional settings](#optional-settings)\n  - [Advanced Features](#advanced-features)\n    - [Migrations & Schema Changes](#migrations--schema-changes)\n    - [Recovery Without Downtime](#recovery-without-downtime)\n    - [Known Limitations](#known-limitations)\n- [Production usage](#production-usage)\n- [Development](#development)\n  - [Running Tests](#running-tests)\n- [Contribution](#contribution)\n- [License](#license)\n- [Acknowledgements](#acknowledgements)\n\n## Features\n\n- **Real-Time Replication**: Keeps your ClickHouse database in sync with MySQL in real-time.\n- **High Performance**: Utilizes batching and ports slow parts to C++ (e.g., MySQL internal JSON parsing) for optimal performance (\u00b120K events / second on a single core).\n- **Supports Migrations/Schema Changes**: Handles adding, altering, and removing tables without breaking the replication process (*for most cases, [details here](https://github.com/bakwc/mysql_ch_replicator#migrations--schema-changes)).\n- **Recovery without Downtime**: Allows for preserving old data while performing initial replication, ensuring continuous operation.\n- **Correct Data Removal**: Unlike MaterializedMySQL, `mysql_ch_replicator` ensures physical removal of data.\n- **Comprehensive Data Type Support**: Accurately replicates most data types, including JSON, booleans, and more. Easily extensible for additional data types.\n- **Multi-Database Handling**: Replicates the binary log once for all databases, optimizing the process compared to `MaterializedMySQL`, which replicates the log separately for each database.\n\n## Installation\n### Requirements\n - Linux / MacOS\n - python3.10 or higher\n\n### Installation\n\nTo install `mysql_ch_replicator`, use the following command:\n\n```bash\npip install --upgrade mysql_ch_replicator\n```\n\nYou may need to also compile C++ components if they're not pre-built for your platform.\n\n### Docker Installation\n\nAlternatively, you can use the pre-built Docker image from DockerHub:\n\n```bash\ndocker pull fippo/mysql-ch-replicator:latest\n```\n\nTo run the container:\n\n```bash\ndocker run -d \\\n  -v /path/to/your/config.yaml:/app/config.yaml \\\n  -v /path/to/your/data:/app/data \\\n  fippo/mysql-ch-replicator:latest \\\n  --config /app/config.yaml run_all\n```\n\nOr with environment variables for credentials:\n\n```bash\ndocker run -d \\\n  -v /path/to/your/config.yaml:/app/config.yaml \\\n  -v /path/to/your/data:/app/data \\\n  -e MYSQL_USER=root \\\n  -e MYSQL_PASSWORD=secret \\\n  -e CLICKHOUSE_USER=default \\\n  -e CLICKHOUSE_PASSWORD=secret \\\n  fippo/mysql-ch-replicator:latest \\\n  --config /app/config.yaml run_all\n```\n\nMake sure to:\n1. Mount your configuration file using the `-v` flag\n2. Mount a persistent volume for the data directory\n3. Adjust the paths according to your setup\n4. Optionally use `-e` flags to override credentials via environment variables\n\n## Usage\n\n### Basic Usage\n\nHere is a [video tutorial](https://www.youtube.com/watch?v=pfbw8VzdIz4)\n\nFor realtime data sync from MySQL to ClickHouse:\n\n1. Prepare config file. Use `example_config.yaml` as an example.\n2. Configure MySQL and ClickHouse servers:\n - MySQL server configuration file `my.cnf` should include following settings (required to write binary log in raw format, and enable password authentication):\n<details>\n  <summary>\ud83d\udee0 MySQL Config</summary>\n\n```ini\n[mysqld]\n# ... other settings ...\ngtid_mode = on\nenforce_gtid_consistency = 1\nbinlog_expire_logs_seconds = 864000\nmax_binlog_size            = 500M\nbinlog_format              = ROW\n```\n - For MariaDB use following settings:\n```ini\n[mysqld]\n# ... other settings ...\ngtid_strict_mode = ON\ngtid_domain_id = 0\nserver_id = 1\nlog_bin = /var/log/mysql/mysql-bin.log\nbinlog_expire_logs_seconds = 864000\nmax_binlog_size = 500M\nbinlog_format = ROW\n```\n\nFor `AWS RDS` you need to set following settings in `Parameter groups`:\n\n```\nbinlog_format                       ROW\nbinlog_expire_logs_seconds          86400\n```\n\n</details>\n\n - ClickHouse server config `override.xml` should include following settings (it makes clickhouse apply final keyword automatically to handle updates correctly):\n\n <details>\n   <summary>\ud83d\udee0 ClickHouse Config</summary>\n \n```xml\n<clickhouse>\n    <!-- ... other settings ... -->\n    <profiles>\n        <default>\n            <!-- ... other settings ... -->\n            <final>1</final>\n            <max_query_size>300000000</max_query_size>\n            <max_ast_elements>1000000</max_ast_elements>\n            <max_expanded_ast_elements>1000000</max_expanded_ast_elements>\n        </default>\n    </profiles>\n</clickhouse>\n```\n\n**!!! Double check final setting is applied !!!**\n\nExecute the following command in clickhouse:\n\n`SELECT name, value, changed FROM system.settings WHERE name = 'final'`\nSetting should be set to 1. If not, you should:\n * double check the `override.xml` is applied\n * try to modify `users.xml` instead\n</details>\n\n3. Start the replication:\n\n```bash\nmysql_ch_replicator --config config.yaml run_all\n```\n\nThis will keep data in ClickHouse updating as you update data in MySQL. It will always be in sync.\n\n### One Time Data Copy\n\nIf you just need to copy data once, and don't need continuous synchronization for all changes, you should do following:\n\n1. Prepare config file. Use `example_config.yaml` as an example.\n2. Run one-time data copy:\n\n```bash\nmysql_ch_replicator --config config.yaml db_replicator --db mysql_db_name --initial_only=True\n```\nWhere `mysql_db_name` is the name of the database you want to copy.\n\nDon't be afraid to interrupt process in the middle. It will save the state and continue copy after restart.\n\n__Hint__: _set `initial_replication_threads` to a number of cpu cores to accelerate initial replication_\n\n### Configuration\n\n`mysql_ch_replicator` can be configured through a configuration file and also by using enviromnent variables to override some of config settings. Here is the config example:\n\n```yaml\nmysql:\n  host: 'localhost'\n  port: 8306\n  user: 'root'\n  password: 'root'\n  charset: 'utf8mb4'  # optional, default is utf8mb4 for full Unicode support\n\nclickhouse:\n  host: 'localhost'\n  port: 8323\n  user: 'default'\n  password: 'default'\n  connection_timeout: 30        # optional\n  send_receive_timeout: 300     # optional\n\nbinlog_replicator:\n  data_dir: '/home/user/binlog/'  # a new EMPTY directory (for internal storage of data by mysql_ch_replicator itself)\n  records_per_file: 100000\n  binlog_retention_period: 43200  # optional, how long to keep binlog files in seconds, default 12 hours\n\ndatabases: 'database_name_pattern_*'\ntables: '*'\n\n\n# OPTIONAL SETTINGS\n\ninitial_replication_threads: 4                        # optional\n\nexclude_databases: ['database_10', 'database_*_42']   # optional\nexclude_tables: ['meta_table_*']                      # optional\n\ntarget_databases:                                     # optional\n  source_db_in_mysql_1: destination_db_in_clickhouse_1\n  source_db_in_mysql_2: destination_db_in_clickhouse_2\n  ...\n\ntarget_tables:                                        # optional\n  source_db_in_mysql_1.source_table_in_mysql_1: destination_table_name_1\n  source_db_in_mysql_1.source_table_in_mysql_2: destination_table_name_2\n  ...\n\nlog_level: 'info'               # optional       \noptimize_interval: 86400        # optional\nauto_restart_interval: 3600     # optional\n\nindexes:                        # optional\n  - databases: '*'\n    tables: ['test_table']\n    index: 'INDEX name_idx name TYPE ngrambf_v1(5, 65536, 4, 0) GRANULARITY 1'\n\npartition_bys:                  # optional\n  - databases: '*'\n    tables: ['test_table']\n    partition_by: 'toYYYYMM(created_at)'\n\nhttp_host: '0.0.0.0'    # optional\nhttp_port: 9128         # optional\n\ntypes_mapping:          # optional\n  'char(36)': 'UUID'\n\nignore_deletes: false    # optional, set to true to ignore DELETE operations\n\nmysql_timezone: 'UTC'    # optional, timezone for MySQL timestamp conversion (default: 'UTC')\n\n```\n\n#### Required settings\n\n- `mysql` MySQL connection settings\n- `clickhouse` ClickHouse connection settings\n- `binlog_replicator.data_dir` Create a new empty directory, it will be used by script to store it's state\n- `databases` Databases name pattern to replicate, e.g. `db_*` will match `db_1` `db_2` `db_test`, list is also supported\n\n#### Optional settings\n- `initial_replication_threads` - number of threads for initial replication, by default 1, set it to number of cores to accelerate initial data copy\n- `tables` - tables to filter, list is also supported\n- `exclude_databases` - databases to __exclude__, string or list, eg `'table1*'` or `['table2', 'table3*']`. If same database matches `databases` and `exclude_databases`, exclude has higher priority.\n- `exclude_tables` - databases to __exclude__, string or list. If same table matches `tables` and `exclude_tables`, exclude has higher priority.\n- `target_databases` - if you want database in ClickHouse to have different name from MySQL database\n- `target_tables` - if you want table in ClickHouse to have different name from MySQL table. Specify as `source_database.source_table: target_table_name`. The target database is determined by existing rules (e.g., `target_databases` mapping). This mapping applies to both initial and realtime replication, including DDL operations like ALTER, DROP, etc.\n- `log_level` - log level, default is `info`, you can set to `debug` to get maximum information (allowed values are `debug`, `info`, `warning`, `error`, `critical`)\n- `optimize_interval` - interval (seconds) between automatic `OPTIMIZE table FINAL` calls. Default 86400 (1 day). This is required to perform all merges guaranteed and avoid increasing of used storage and decreasing performance.\n- `auto_restart_interval` - interval (seconds) between automatic db_replicator restart. Default 3600 (1 hour). This is done to reduce memory usage.\n- `binlog_retention_period` - how long to keep binlog files in seconds. Default 43200 (12 hours). This setting controls how long the local binlog files are retained before being automatically cleaned up.\n- `indexes` - you may want to add some indexes to accelerate performance, eg. ngram index for full-test search, etc. To apply indexes you need to start replication from scratch.\n- `partition_bys` - custom PARTITION BY expressions for tables. By default uses `intDiv(id, 4294967)` for integer primary keys. Useful for time-based partitioning like `toYYYYMM(created_at)`.\n- `http_host`, `http_port` - http endpoint to control replication, use `/docs` for abailable commands\n- `types_mappings` - custom types mapping, eg. you can map char(36) to UUID instead of String, etc.\n- `ignore_deletes` - when set to `true`, DELETE operations in MySQL will be ignored during replication. This creates an append-only model where data is only added, never removed. In this mode, the replicator doesn't create a temporary database and instead replicates directly to the target database.\n- `mysql_timezone` - timezone to use for MySQL timestamp conversion to ClickHouse DateTime64. Default is `'UTC'`. Accepts any valid timezone name (e.g., `'America/New_York'`, `'Europe/London'`, `'Asia/Tokyo'`). This setting ensures proper timezone handling when converting MySQL timestamp fields to ClickHouse DateTime64 with timezone information.\n- `post_initial_replication_commands` - SQL commands to execute in ClickHouse after initial replication completes for each database. Useful for creating materialized views, summary tables, or other database objects. Commands are executed in order, once per database matching the pattern.\n\n```yaml\npost_initial_replication_commands:\n  - databases: '*'\n    commands:\n      - 'CREATE TABLE IF NOT EXISTS summary_table (...) ENGINE = SummingMergeTree() ORDER BY (...)'\n      - 'CREATE MATERIALIZED VIEW IF NOT EXISTS data_mv TO summary_table AS SELECT ...'\n      - 'INSERT INTO summary_table SELECT ... FROM replicated_table'\n```\n\nFew more tables / dbs examples:\n\n```yaml\ndatabases: ['my_database_1', 'my_database_2']\ntables: ['table_1', 'table_2*']\n```\n\n**Environment Variables**: MySQL and ClickHouse credentials can be overridden using environment variables for better security in containerized environments:\n- MySQL: `MYSQL_HOST`, `MYSQL_PORT`, `MYSQL_USER`, `MYSQL_PASSWORD`, `MYSQL_CHARSET`\n- ClickHouse: `CLICKHOUSE_HOST`, `CLICKHOUSE_PORT`, `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`\n\n### Advanced Features\n\n#### Migrations & Schema Changes\n\n`mysql_ch_replicator` supports the following:\n\n- **Adding Tables**: Automatically starts replicating data from newly added tables.\n- **Altering Tables**: Adjusts replication strategy based on schema changes.\n- **Removing Tables**: Handles removal of tables without disrupting the replication process.\n\n**WARNING**. While 95% of operations supported, there could be still some unhandled operations. We try to support all of them, but for your safety, please write the CI/CD test that will check your migrations. Test should work a following way:\n - Aplly all your mysql migrations\n - Try to insert some record into mysql (to any table)\n - Check that this record appears in ClickHouse\n\n#### Known Limitations\n1. Migrations not supported during initial replication. You should either wait for initial replication finish and then apply migrations, or restart initial replication from scratch (by removing state file).\n2. Primary key changes not supported. This is a ClickHouse level limitation, it does not allow to make any changes realted to primary key.\n3. Tables without a primary key are not supported.\n\n#### Recovery Without Downtime\n\nIn case of a failure or during the initial replication, `mysql_ch_replicator` will preserve old data and continue syncing new data seamlessly. You could remove the state and restart replication from scratch.\n\n## Production usage\n\nHere is the checklist for production. Please read it carefully and confirm every point. If you have any questions - feel free to ask in discussion / issues. Following the checklist is critical to avoid any issues on prod.\n\n#### Checklist\n\n- [ ] No manual changes are made directly to ClickHouse databases or tables. All modifications (indexes, table settings, etc.) must be applied as overrides in the `mysql_ch_replicator` `config.yaml`.\n- [ ] All MySQL migrations undergo either CI/CD or manual testing before deployment. No migration is applied to production without being tested. The tests verify that replication continues to work correctly and that new records are successfully replicated from MySQL to ClickHouse after the migration.\n- [ ] Read and acknowledged [known limitations](#known-limitations).\n\n## Development\n\nTo contribute to `mysql_ch_replicator`, clone the repository and install the required dependencies:\n\n```bash\ngit clone https://github.com/your-repo/mysql_ch_replicator.git\ncd mysql_ch_replicator\npip install -r requirements.txt\n```\n\n### Running Tests\n\n1. Use docker-compose to install all requirements:\n```bash\ncd tests/\ndocker compose -f docker-compose-tests.yaml up\n```\n2. Run tests with:\n```bash\ndocker exec -w /app/ -it tests-replicator-1 python3 -m pytest -v -s tests/\n```\n3. To run a single test:\n```bash\ndocker exec -w /app/ -it tests-replicator-1 python3 -m pytest -v -s tests/ -k test_your_test_name\n```\n\n## Contribution\n\nContributions are welcome! Please open an issue or submit a pull request for any bugs or features you would like to add.\n\n## License\n\n`mysql_ch_replicator` is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n\n## Acknowledgements\n\nThank you to all the contributors who have helped build and improve this tool.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Tool for replication of MySQL databases to ClickHouse",
    "version": "0.1.4",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72f030547a057310bc7825c376ed01dadf47b0750e0dcf0b56996a0f31de7ddb",
                "md5": "e5f5667742cdfa24ea9ef04522c9dc82",
                "sha256": "443c62e8ff550269f1fe1db77802a8282202f11953a6899fd181a103bea9b6bf"
            },
            "downloads": -1,
            "filename": "mysql_ch_replicator-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e5f5667742cdfa24ea9ef04522c9dc82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 1209111,
            "upload_time": "2025-11-07T21:37:58",
            "upload_time_iso_8601": "2025-11-07T21:37:58.645075Z",
            "url": "https://files.pythonhosted.org/packages/72/f0/30547a057310bc7825c376ed01dadf47b0750e0dcf0b56996a0f31de7ddb/mysql_ch_replicator-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1507719620265565e31ae836eb8c4d31ad52ae44de5b683d0139d88eb974eda8",
                "md5": "ee0a820e3af6fbbc5df02e2e9cf9e89a",
                "sha256": "e216647bb033181bc6132734e7904e5f2e1e3c276697a7bb39aff98295032991"
            },
            "downloads": -1,
            "filename": "mysql_ch_replicator-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "ee0a820e3af6fbbc5df02e2e9cf9e89a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 1188524,
            "upload_time": "2025-11-07T21:38:00",
            "upload_time_iso_8601": "2025-11-07T21:38:00.489847Z",
            "url": "https://files.pythonhosted.org/packages/15/07/719620265565e31ae836eb8c4d31ad52ae44de5b683d0139d88eb974eda8/mysql_ch_replicator-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-07 21:38:00",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "mysql-ch-replicator"
}
        
Elapsed time: 2.65615s