mysql-ch-replicator


Namemysql-ch-replicator JSON
Version 0.0.35 PyPI version JSON
download
home_pageNone
SummaryTool for replication of MySQL databases to ClickHouse
upload_time2024-11-20 20:59:53
maintainerNone
docs_urlNone
authorFilipp Ozinov
requires_python<4.0,>=3.6
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # mysql_ch_replicator
![tests](https://github.com/bakwc/mysql_ch_replicator/actions/workflows/tests.yaml/badge.svg)
[![Release][release-image]][releases]
[![License][license-image]][license]

[release-image]: https://img.shields.io/badge/release-0.0.35-blue.svg?style=flat
[releases]: https://github.com/bakwc/mysql_ch_replicator/releases

[license-image]: https://img.shields.io/badge/license-MIT-blue.svg?style=flat
[license]: LICENSE

![img](https://raw.githubusercontent.com/bakwc/mysql_ch_replicator/master/mysql_ch.jpg)

`mysql_ch_replicator` is a powerful and efficient tool designed for real-time replication of MySQL databases to ClickHouse.

With a focus on high performance, it utilizes batching heavily and uses C++ extension for faster execution. This tool ensures seamless data integration with support for migrations, schema changes, and correct data management.

## Features

- **Real-Time Replication**: Keeps your ClickHouse database in sync with MySQL in real-time.
- **High Performance**: Utilizes batching and ports slow parts to C++ (e.g., MySQL internal JSON parsing) for optimal performance.
- **Supports Migrations/Schema Changes**: Handles adding, altering, and removing tables without breaking the replication process.
- **Recovery without Downtime**: Allows for preserving old data while performing initial replication, ensuring continuous operation.
- **Correct Data Removal**: Unlike MaterializedMySQL, `mysql_ch_replicator` ensures physical removal of data.
- **Comprehensive Data Type Support**: Accurately replicates most data types, including JSON, booleans, and more. Easily extensible for additional data types.
- **Multi-Database Handling**: Replicates the binary log once for all databases, optimizing the process compared to `MaterializedMySQL`, which replicates the log separately for each database.

## Installation

To install `mysql_ch_replicator`, use the following command:

```bash
pip install mysql_ch_replicator
```

You may need to also compile C++ components if they're not pre-built for your platform.

## Usage

### Basic Usage

For realtime data sync from MySQL to ClickHouse:

1. Prepare config file. Use `example_config.yaml` as an example.
2. Configure MySQL and ClickHouse servers:
 - MySQL server configuration file `my.cnf` should include following settings (required to write binary log in raw format, and enable password authentication):
```ini
[mysqld]
# ... other settings ...
gtid_mode = on
enforce_gtid_consistency = 1
binlog_expire_logs_seconds = 864000
max_binlog_size            = 500M
binlog_format              = ROW
```
 - For MariaDB use following settings:
```ini
[mysqld]
# ... other settings ...
gtid_strict_mode = ON
gtid_domain_id = 0
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_expire_logs_seconds = 864000
max_binlog_size = 500M
binlog_format = ROW
```

For `AWS RDS` you need to set following settings in `Parameter groups`:

```
binlog_format                       ROW
binlog_expire_logs_seconds          86400
```

 - ClickHouse server config `override.xml` should include following settings (it makes clickhouse apply final keyword automatically to handle updates correctly):
```xml
<clickhouse>
    <!-- ... other settings ... -->
    <profiles>
        <default>
            <!-- ... other settings ... -->
            <final>1</final>
        </default>
    </profiles>
</clickhouse>
```

3. Start the replication:

```bash
mysql_ch_replicator --config config.yaml run_all
```

This will keep data in ClickHouse updating as you update data in MySQL. It will always be in sync.

### One Time Data Copy

If you just need to copy data once, and don't need continuous synchronization for all changes, you should do following:

1. Prepare config file. Use `example_config.yaml` as an example.
2. Run one-time data copy:

```bash
mysql_ch_replicator --config config.yaml db_replicator --database mysql_db_name --initial_only=True
```
Where `mysql_db_name` is the name of the database you want to copy.

Don't be afraid to interrupt process in the middle. It will save the state and continue copy after restart.

### Configuration

`mysql_ch_replicator` can be configured through a configuration file. Here is the config example:

```yaml
mysql:
  host: 'localhost'
  port: 8306
  user: 'root'
  password: 'root'

clickhouse:
  host: 'localhost'
  port: 8323
  user: 'default'
  password: 'default'
  connection_timeout: 30        # optional
  send_receive_timeout: 300     # optional

binlog_replicator:
  data_dir: '/home/user/binlog/'
  records_per_file: 100000

databases: 'database_name_pattern_*'
tables: '*'

exclude_databases: ['database_10', 'database_*_42']   # optional
exclude_tables: ['meta_table_*']                      # optional

log_level: 'info'   # optional             
```

#### Required settings

- `mysql` MySQL connection settings
- `clickhouse` ClickHouse connection settings
- `binlog_replicator.data_dir` Create a new empty directory, it will be used by script to store it's state
- `databases` Databases name pattern to replicate, e.g. `db_*` will match `db_1` `db_2` `db_test`, list is also supported

#### Optional settings
- `tables` - tables to filter, list is also supported
- `exclude_databases` - databases to __exclude__, string or list, eg `'table1*'` or `['table2', 'table3*']`. If same database matches `databases` and `exclude_databases`, exclude has higher priority.
- `exclude_tables` - databases to __exclude__, string or list. If same table matches `tables` and `exclude_tables`, exclude has higher priority.
- `log_level` - log level, default is `info`, you can set to `debug` to get maximum information (allowed values are `debug`, `info`, `warning`, `error`, `critical`)

Few more tables / dbs examples:

```yaml
databases: ['my_database_1', 'my_database_2']
tables: ['table_1', 'table_2*']
```

### Advanced Features

#### Migrations & Schema Changes

`mysql_ch_replicator` supports the following:

- **Adding Tables**: Automatically starts replicating data from newly added tables.
- **Altering Tables**: Adjusts replication strategy based on schema changes.
- **Removing Tables**: Handles removal of tables without disrupting the replication process.

#### Recovery Without Downtime

In case of a failure or during the initial replication, `mysql_ch_replicator` will preserve old data and continue syncing new data seamlessly. You could remove the state and restart replication from scratch.

## Development

To contribute to `mysql_ch_replicator`, clone the repository and install the required dependencies:

```bash
git clone https://github.com/your-repo/mysql_ch_replicator.git
cd mysql_ch_replicator
pip install -r requirements.txt
```

### Running Tests

1. Use docker-compose to install all requirements:
```bash
sudo docker compose -f docker-compose-tests.yaml up
```
2. Run tests with:
```bash
sudo docker exec -w /app/ -it mysql_ch_replicator-replicator-1 python3 -m pytest -v -s test_mysql_ch_replicator.py
```

## Contribution

Contributions are welcome! Please open an issue or submit a pull request for any bugs or features you would like to add.

## License

`mysql_ch_replicator` is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

## Acknowledgements

Thank you to all the contributors who have helped build and improve this tool.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mysql-ch-replicator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Filipp Ozinov",
    "author_email": "filipp@bakanov.su",
    "download_url": "https://files.pythonhosted.org/packages/06/d6/8c80fb29917df5c0ccd1c8b4feb179abca6ef219f3173ff208a900b2d8a5/mysql_ch_replicator-0.0.35.tar.gz",
    "platform": null,
    "description": "# mysql_ch_replicator\n![tests](https://github.com/bakwc/mysql_ch_replicator/actions/workflows/tests.yaml/badge.svg)\n[![Release][release-image]][releases]\n[![License][license-image]][license]\n\n[release-image]: https://img.shields.io/badge/release-0.0.35-blue.svg?style=flat\n[releases]: https://github.com/bakwc/mysql_ch_replicator/releases\n\n[license-image]: https://img.shields.io/badge/license-MIT-blue.svg?style=flat\n[license]: LICENSE\n\n![img](https://raw.githubusercontent.com/bakwc/mysql_ch_replicator/master/mysql_ch.jpg)\n\n`mysql_ch_replicator` is a powerful and efficient tool designed for real-time replication of MySQL databases to ClickHouse.\n\nWith a focus on high performance, it utilizes batching heavily and uses C++ extension for faster execution. This tool ensures seamless data integration with support for migrations, schema changes, and correct data management.\n\n## Features\n\n- **Real-Time Replication**: Keeps your ClickHouse database in sync with MySQL in real-time.\n- **High Performance**: Utilizes batching and ports slow parts to C++ (e.g., MySQL internal JSON parsing) for optimal performance.\n- **Supports Migrations/Schema Changes**: Handles adding, altering, and removing tables without breaking the replication process.\n- **Recovery without Downtime**: Allows for preserving old data while performing initial replication, ensuring continuous operation.\n- **Correct Data Removal**: Unlike MaterializedMySQL, `mysql_ch_replicator` ensures physical removal of data.\n- **Comprehensive Data Type Support**: Accurately replicates most data types, including JSON, booleans, and more. Easily extensible for additional data types.\n- **Multi-Database Handling**: Replicates the binary log once for all databases, optimizing the process compared to `MaterializedMySQL`, which replicates the log separately for each database.\n\n## Installation\n\nTo install `mysql_ch_replicator`, use the following command:\n\n```bash\npip install mysql_ch_replicator\n```\n\nYou may need to also compile C++ components if they're not pre-built for your platform.\n\n## Usage\n\n### Basic Usage\n\nFor realtime data sync from MySQL to ClickHouse:\n\n1. Prepare config file. Use `example_config.yaml` as an example.\n2. Configure MySQL and ClickHouse servers:\n - MySQL server configuration file `my.cnf` should include following settings (required to write binary log in raw format, and enable password authentication):\n```ini\n[mysqld]\n# ... other settings ...\ngtid_mode = on\nenforce_gtid_consistency = 1\nbinlog_expire_logs_seconds = 864000\nmax_binlog_size            = 500M\nbinlog_format              = ROW\n```\n - For MariaDB use following settings:\n```ini\n[mysqld]\n# ... other settings ...\ngtid_strict_mode = ON\ngtid_domain_id = 0\nserver_id = 1\nlog_bin = /var/log/mysql/mysql-bin.log\nbinlog_expire_logs_seconds = 864000\nmax_binlog_size = 500M\nbinlog_format = ROW\n```\n\nFor `AWS RDS` you need to set following settings in `Parameter groups`:\n\n```\nbinlog_format                       ROW\nbinlog_expire_logs_seconds          86400\n```\n\n - ClickHouse server config `override.xml` should include following settings (it makes clickhouse apply final keyword automatically to handle updates correctly):\n```xml\n<clickhouse>\n    <!-- ... other settings ... -->\n    <profiles>\n        <default>\n            <!-- ... other settings ... -->\n            <final>1</final>\n        </default>\n    </profiles>\n</clickhouse>\n```\n\n3. Start the replication:\n\n```bash\nmysql_ch_replicator --config config.yaml run_all\n```\n\nThis will keep data in ClickHouse updating as you update data in MySQL. It will always be in sync.\n\n### One Time Data Copy\n\nIf you just need to copy data once, and don't need continuous synchronization for all changes, you should do following:\n\n1. Prepare config file. Use `example_config.yaml` as an example.\n2. Run one-time data copy:\n\n```bash\nmysql_ch_replicator --config config.yaml db_replicator --database mysql_db_name --initial_only=True\n```\nWhere `mysql_db_name` is the name of the database you want to copy.\n\nDon't be afraid to interrupt process in the middle. It will save the state and continue copy after restart.\n\n### Configuration\n\n`mysql_ch_replicator` can be configured through a configuration file. Here is the config example:\n\n```yaml\nmysql:\n  host: 'localhost'\n  port: 8306\n  user: 'root'\n  password: 'root'\n\nclickhouse:\n  host: 'localhost'\n  port: 8323\n  user: 'default'\n  password: 'default'\n  connection_timeout: 30        # optional\n  send_receive_timeout: 300     # optional\n\nbinlog_replicator:\n  data_dir: '/home/user/binlog/'\n  records_per_file: 100000\n\ndatabases: 'database_name_pattern_*'\ntables: '*'\n\nexclude_databases: ['database_10', 'database_*_42']   # optional\nexclude_tables: ['meta_table_*']                      # optional\n\nlog_level: 'info'   # optional             \n```\n\n#### Required settings\n\n- `mysql` MySQL connection settings\n- `clickhouse` ClickHouse connection settings\n- `binlog_replicator.data_dir` Create a new empty directory, it will be used by script to store it's state\n- `databases` Databases name pattern to replicate, e.g. `db_*` will match `db_1` `db_2` `db_test`, list is also supported\n\n#### Optional settings\n- `tables` - tables to filter, list is also supported\n- `exclude_databases` - databases to __exclude__, string or list, eg `'table1*'` or `['table2', 'table3*']`. If same database matches `databases` and `exclude_databases`, exclude has higher priority.\n- `exclude_tables` - databases to __exclude__, string or list. If same table matches `tables` and `exclude_tables`, exclude has higher priority.\n- `log_level` - log level, default is `info`, you can set to `debug` to get maximum information (allowed values are `debug`, `info`, `warning`, `error`, `critical`)\n\nFew more tables / dbs examples:\n\n```yaml\ndatabases: ['my_database_1', 'my_database_2']\ntables: ['table_1', 'table_2*']\n```\n\n### Advanced Features\n\n#### Migrations & Schema Changes\n\n`mysql_ch_replicator` supports the following:\n\n- **Adding Tables**: Automatically starts replicating data from newly added tables.\n- **Altering Tables**: Adjusts replication strategy based on schema changes.\n- **Removing Tables**: Handles removal of tables without disrupting the replication process.\n\n#### Recovery Without Downtime\n\nIn case of a failure or during the initial replication, `mysql_ch_replicator` will preserve old data and continue syncing new data seamlessly. You could remove the state and restart replication from scratch.\n\n## Development\n\nTo contribute to `mysql_ch_replicator`, clone the repository and install the required dependencies:\n\n```bash\ngit clone https://github.com/your-repo/mysql_ch_replicator.git\ncd mysql_ch_replicator\npip install -r requirements.txt\n```\n\n### Running Tests\n\n1. Use docker-compose to install all requirements:\n```bash\nsudo docker compose -f docker-compose-tests.yaml up\n```\n2. Run tests with:\n```bash\nsudo docker exec -w /app/ -it mysql_ch_replicator-replicator-1 python3 -m pytest -v -s test_mysql_ch_replicator.py\n```\n\n## Contribution\n\nContributions are welcome! Please open an issue or submit a pull request for any bugs or features you would like to add.\n\n## License\n\n`mysql_ch_replicator` is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n\n## Acknowledgements\n\nThank you to all the contributors who have helped build and improve this tool.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Tool for replication of MySQL databases to ClickHouse",
    "version": "0.0.35",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4622fa1c0d02cce3324402a5fa27b241aec43a7e8c57fa7cdcadc3e8b6964798",
                "md5": "fee5e29ff3a585819e8c486a7b7b0bcf",
                "sha256": "98701b3717e393e8aec91633d034eff990400e7dd2a0b4c40a3a6b7ee176262b"
            },
            "downloads": -1,
            "filename": "mysql_ch_replicator-0.0.35-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fee5e29ff3a585819e8c486a7b7b0bcf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.6",
            "size": 138861,
            "upload_time": "2024-11-20T20:59:51",
            "upload_time_iso_8601": "2024-11-20T20:59:51.835573Z",
            "url": "https://files.pythonhosted.org/packages/46/22/fa1c0d02cce3324402a5fa27b241aec43a7e8c57fa7cdcadc3e8b6964798/mysql_ch_replicator-0.0.35-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06d68c80fb29917df5c0ccd1c8b4feb179abca6ef219f3173ff208a900b2d8a5",
                "md5": "e5c0383e5616f5e6c9e5b1867771f5ca",
                "sha256": "ee41aeb09aaa41125a8da8ecc4711cd34d7b6dee8a08e89a23b277f27584bae0"
            },
            "downloads": -1,
            "filename": "mysql_ch_replicator-0.0.35.tar.gz",
            "has_sig": false,
            "md5_digest": "e5c0383e5616f5e6c9e5b1867771f5ca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.6",
            "size": 127127,
            "upload_time": "2024-11-20T20:59:53",
            "upload_time_iso_8601": "2024-11-20T20:59:53.595509Z",
            "url": "https://files.pythonhosted.org/packages/06/d6/8c80fb29917df5c0ccd1c8b4feb179abca6ef219f3173ff208a900b2d8a5/mysql_ch_replicator-0.0.35.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-20 20:59:53",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "mysql-ch-replicator"
}
        
Elapsed time: 0.53089s