Name | mysql-ch-replicator JSON |
Version |
0.0.35
JSON |
| download |
home_page | None |
Summary | Tool for replication of MySQL databases to ClickHouse |
upload_time | 2024-11-20 20:59:53 |
maintainer | None |
docs_url | None |
author | Filipp Ozinov |
requires_python | <4.0,>=3.6 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# mysql_ch_replicator
![tests](https://github.com/bakwc/mysql_ch_replicator/actions/workflows/tests.yaml/badge.svg)
[![Release][release-image]][releases]
[![License][license-image]][license]
[release-image]: https://img.shields.io/badge/release-0.0.35-blue.svg?style=flat
[releases]: https://github.com/bakwc/mysql_ch_replicator/releases
[license-image]: https://img.shields.io/badge/license-MIT-blue.svg?style=flat
[license]: LICENSE
![img](https://raw.githubusercontent.com/bakwc/mysql_ch_replicator/master/mysql_ch.jpg)
`mysql_ch_replicator` is a powerful and efficient tool designed for real-time replication of MySQL databases to ClickHouse.
With a focus on high performance, it utilizes batching heavily and uses C++ extension for faster execution. This tool ensures seamless data integration with support for migrations, schema changes, and correct data management.
## Features
- **Real-Time Replication**: Keeps your ClickHouse database in sync with MySQL in real-time.
- **High Performance**: Utilizes batching and ports slow parts to C++ (e.g., MySQL internal JSON parsing) for optimal performance.
- **Supports Migrations/Schema Changes**: Handles adding, altering, and removing tables without breaking the replication process.
- **Recovery without Downtime**: Allows for preserving old data while performing initial replication, ensuring continuous operation.
- **Correct Data Removal**: Unlike MaterializedMySQL, `mysql_ch_replicator` ensures physical removal of data.
- **Comprehensive Data Type Support**: Accurately replicates most data types, including JSON, booleans, and more. Easily extensible for additional data types.
- **Multi-Database Handling**: Replicates the binary log once for all databases, optimizing the process compared to `MaterializedMySQL`, which replicates the log separately for each database.
## Installation
To install `mysql_ch_replicator`, use the following command:
```bash
pip install mysql_ch_replicator
```
You may need to also compile C++ components if they're not pre-built for your platform.
## Usage
### Basic Usage
For realtime data sync from MySQL to ClickHouse:
1. Prepare config file. Use `example_config.yaml` as an example.
2. Configure MySQL and ClickHouse servers:
- MySQL server configuration file `my.cnf` should include following settings (required to write binary log in raw format, and enable password authentication):
```ini
[mysqld]
# ... other settings ...
gtid_mode = on
enforce_gtid_consistency = 1
binlog_expire_logs_seconds = 864000
max_binlog_size = 500M
binlog_format = ROW
```
- For MariaDB use following settings:
```ini
[mysqld]
# ... other settings ...
gtid_strict_mode = ON
gtid_domain_id = 0
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_expire_logs_seconds = 864000
max_binlog_size = 500M
binlog_format = ROW
```
For `AWS RDS` you need to set following settings in `Parameter groups`:
```
binlog_format ROW
binlog_expire_logs_seconds 86400
```
- ClickHouse server config `override.xml` should include following settings (it makes clickhouse apply final keyword automatically to handle updates correctly):
```xml
<clickhouse>
<!-- ... other settings ... -->
<profiles>
<default>
<!-- ... other settings ... -->
<final>1</final>
</default>
</profiles>
</clickhouse>
```
3. Start the replication:
```bash
mysql_ch_replicator --config config.yaml run_all
```
This will keep data in ClickHouse updating as you update data in MySQL. It will always be in sync.
### One Time Data Copy
If you just need to copy data once, and don't need continuous synchronization for all changes, you should do following:
1. Prepare config file. Use `example_config.yaml` as an example.
2. Run one-time data copy:
```bash
mysql_ch_replicator --config config.yaml db_replicator --database mysql_db_name --initial_only=True
```
Where `mysql_db_name` is the name of the database you want to copy.
Don't be afraid to interrupt process in the middle. It will save the state and continue copy after restart.
### Configuration
`mysql_ch_replicator` can be configured through a configuration file. Here is the config example:
```yaml
mysql:
host: 'localhost'
port: 8306
user: 'root'
password: 'root'
clickhouse:
host: 'localhost'
port: 8323
user: 'default'
password: 'default'
connection_timeout: 30 # optional
send_receive_timeout: 300 # optional
binlog_replicator:
data_dir: '/home/user/binlog/'
records_per_file: 100000
databases: 'database_name_pattern_*'
tables: '*'
exclude_databases: ['database_10', 'database_*_42'] # optional
exclude_tables: ['meta_table_*'] # optional
log_level: 'info' # optional
```
#### Required settings
- `mysql` MySQL connection settings
- `clickhouse` ClickHouse connection settings
- `binlog_replicator.data_dir` Create a new empty directory, it will be used by script to store it's state
- `databases` Databases name pattern to replicate, e.g. `db_*` will match `db_1` `db_2` `db_test`, list is also supported
#### Optional settings
- `tables` - tables to filter, list is also supported
- `exclude_databases` - databases to __exclude__, string or list, eg `'table1*'` or `['table2', 'table3*']`. If same database matches `databases` and `exclude_databases`, exclude has higher priority.
- `exclude_tables` - databases to __exclude__, string or list. If same table matches `tables` and `exclude_tables`, exclude has higher priority.
- `log_level` - log level, default is `info`, you can set to `debug` to get maximum information (allowed values are `debug`, `info`, `warning`, `error`, `critical`)
Few more tables / dbs examples:
```yaml
databases: ['my_database_1', 'my_database_2']
tables: ['table_1', 'table_2*']
```
### Advanced Features
#### Migrations & Schema Changes
`mysql_ch_replicator` supports the following:
- **Adding Tables**: Automatically starts replicating data from newly added tables.
- **Altering Tables**: Adjusts replication strategy based on schema changes.
- **Removing Tables**: Handles removal of tables without disrupting the replication process.
#### Recovery Without Downtime
In case of a failure or during the initial replication, `mysql_ch_replicator` will preserve old data and continue syncing new data seamlessly. You could remove the state and restart replication from scratch.
## Development
To contribute to `mysql_ch_replicator`, clone the repository and install the required dependencies:
```bash
git clone https://github.com/your-repo/mysql_ch_replicator.git
cd mysql_ch_replicator
pip install -r requirements.txt
```
### Running Tests
1. Use docker-compose to install all requirements:
```bash
sudo docker compose -f docker-compose-tests.yaml up
```
2. Run tests with:
```bash
sudo docker exec -w /app/ -it mysql_ch_replicator-replicator-1 python3 -m pytest -v -s test_mysql_ch_replicator.py
```
## Contribution
Contributions are welcome! Please open an issue or submit a pull request for any bugs or features you would like to add.
## License
`mysql_ch_replicator` is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
## Acknowledgements
Thank you to all the contributors who have helped build and improve this tool.
Raw data
{
"_id": null,
"home_page": null,
"name": "mysql-ch-replicator",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Filipp Ozinov",
"author_email": "filipp@bakanov.su",
"download_url": "https://files.pythonhosted.org/packages/06/d6/8c80fb29917df5c0ccd1c8b4feb179abca6ef219f3173ff208a900b2d8a5/mysql_ch_replicator-0.0.35.tar.gz",
"platform": null,
"description": "# mysql_ch_replicator\n![tests](https://github.com/bakwc/mysql_ch_replicator/actions/workflows/tests.yaml/badge.svg)\n[![Release][release-image]][releases]\n[![License][license-image]][license]\n\n[release-image]: https://img.shields.io/badge/release-0.0.35-blue.svg?style=flat\n[releases]: https://github.com/bakwc/mysql_ch_replicator/releases\n\n[license-image]: https://img.shields.io/badge/license-MIT-blue.svg?style=flat\n[license]: LICENSE\n\n![img](https://raw.githubusercontent.com/bakwc/mysql_ch_replicator/master/mysql_ch.jpg)\n\n`mysql_ch_replicator` is a powerful and efficient tool designed for real-time replication of MySQL databases to ClickHouse.\n\nWith a focus on high performance, it utilizes batching heavily and uses C++ extension for faster execution. This tool ensures seamless data integration with support for migrations, schema changes, and correct data management.\n\n## Features\n\n- **Real-Time Replication**: Keeps your ClickHouse database in sync with MySQL in real-time.\n- **High Performance**: Utilizes batching and ports slow parts to C++ (e.g., MySQL internal JSON parsing) for optimal performance.\n- **Supports Migrations/Schema Changes**: Handles adding, altering, and removing tables without breaking the replication process.\n- **Recovery without Downtime**: Allows for preserving old data while performing initial replication, ensuring continuous operation.\n- **Correct Data Removal**: Unlike MaterializedMySQL, `mysql_ch_replicator` ensures physical removal of data.\n- **Comprehensive Data Type Support**: Accurately replicates most data types, including JSON, booleans, and more. Easily extensible for additional data types.\n- **Multi-Database Handling**: Replicates the binary log once for all databases, optimizing the process compared to `MaterializedMySQL`, which replicates the log separately for each database.\n\n## Installation\n\nTo install `mysql_ch_replicator`, use the following command:\n\n```bash\npip install mysql_ch_replicator\n```\n\nYou may need to also compile C++ components if they're not pre-built for your platform.\n\n## Usage\n\n### Basic Usage\n\nFor realtime data sync from MySQL to ClickHouse:\n\n1. Prepare config file. Use `example_config.yaml` as an example.\n2. Configure MySQL and ClickHouse servers:\n - MySQL server configuration file `my.cnf` should include following settings (required to write binary log in raw format, and enable password authentication):\n```ini\n[mysqld]\n# ... other settings ...\ngtid_mode = on\nenforce_gtid_consistency = 1\nbinlog_expire_logs_seconds = 864000\nmax_binlog_size = 500M\nbinlog_format = ROW\n```\n - For MariaDB use following settings:\n```ini\n[mysqld]\n# ... other settings ...\ngtid_strict_mode = ON\ngtid_domain_id = 0\nserver_id = 1\nlog_bin = /var/log/mysql/mysql-bin.log\nbinlog_expire_logs_seconds = 864000\nmax_binlog_size = 500M\nbinlog_format = ROW\n```\n\nFor `AWS RDS` you need to set following settings in `Parameter groups`:\n\n```\nbinlog_format ROW\nbinlog_expire_logs_seconds 86400\n```\n\n - ClickHouse server config `override.xml` should include following settings (it makes clickhouse apply final keyword automatically to handle updates correctly):\n```xml\n<clickhouse>\n <!-- ... other settings ... -->\n <profiles>\n <default>\n <!-- ... other settings ... -->\n <final>1</final>\n </default>\n </profiles>\n</clickhouse>\n```\n\n3. Start the replication:\n\n```bash\nmysql_ch_replicator --config config.yaml run_all\n```\n\nThis will keep data in ClickHouse updating as you update data in MySQL. It will always be in sync.\n\n### One Time Data Copy\n\nIf you just need to copy data once, and don't need continuous synchronization for all changes, you should do following:\n\n1. Prepare config file. Use `example_config.yaml` as an example.\n2. Run one-time data copy:\n\n```bash\nmysql_ch_replicator --config config.yaml db_replicator --database mysql_db_name --initial_only=True\n```\nWhere `mysql_db_name` is the name of the database you want to copy.\n\nDon't be afraid to interrupt process in the middle. It will save the state and continue copy after restart.\n\n### Configuration\n\n`mysql_ch_replicator` can be configured through a configuration file. Here is the config example:\n\n```yaml\nmysql:\n host: 'localhost'\n port: 8306\n user: 'root'\n password: 'root'\n\nclickhouse:\n host: 'localhost'\n port: 8323\n user: 'default'\n password: 'default'\n connection_timeout: 30 # optional\n send_receive_timeout: 300 # optional\n\nbinlog_replicator:\n data_dir: '/home/user/binlog/'\n records_per_file: 100000\n\ndatabases: 'database_name_pattern_*'\ntables: '*'\n\nexclude_databases: ['database_10', 'database_*_42'] # optional\nexclude_tables: ['meta_table_*'] # optional\n\nlog_level: 'info' # optional \n```\n\n#### Required settings\n\n- `mysql` MySQL connection settings\n- `clickhouse` ClickHouse connection settings\n- `binlog_replicator.data_dir` Create a new empty directory, it will be used by script to store it's state\n- `databases` Databases name pattern to replicate, e.g. `db_*` will match `db_1` `db_2` `db_test`, list is also supported\n\n#### Optional settings\n- `tables` - tables to filter, list is also supported\n- `exclude_databases` - databases to __exclude__, string or list, eg `'table1*'` or `['table2', 'table3*']`. If same database matches `databases` and `exclude_databases`, exclude has higher priority.\n- `exclude_tables` - databases to __exclude__, string or list. If same table matches `tables` and `exclude_tables`, exclude has higher priority.\n- `log_level` - log level, default is `info`, you can set to `debug` to get maximum information (allowed values are `debug`, `info`, `warning`, `error`, `critical`)\n\nFew more tables / dbs examples:\n\n```yaml\ndatabases: ['my_database_1', 'my_database_2']\ntables: ['table_1', 'table_2*']\n```\n\n### Advanced Features\n\n#### Migrations & Schema Changes\n\n`mysql_ch_replicator` supports the following:\n\n- **Adding Tables**: Automatically starts replicating data from newly added tables.\n- **Altering Tables**: Adjusts replication strategy based on schema changes.\n- **Removing Tables**: Handles removal of tables without disrupting the replication process.\n\n#### Recovery Without Downtime\n\nIn case of a failure or during the initial replication, `mysql_ch_replicator` will preserve old data and continue syncing new data seamlessly. You could remove the state and restart replication from scratch.\n\n## Development\n\nTo contribute to `mysql_ch_replicator`, clone the repository and install the required dependencies:\n\n```bash\ngit clone https://github.com/your-repo/mysql_ch_replicator.git\ncd mysql_ch_replicator\npip install -r requirements.txt\n```\n\n### Running Tests\n\n1. Use docker-compose to install all requirements:\n```bash\nsudo docker compose -f docker-compose-tests.yaml up\n```\n2. Run tests with:\n```bash\nsudo docker exec -w /app/ -it mysql_ch_replicator-replicator-1 python3 -m pytest -v -s test_mysql_ch_replicator.py\n```\n\n## Contribution\n\nContributions are welcome! Please open an issue or submit a pull request for any bugs or features you would like to add.\n\n## License\n\n`mysql_ch_replicator` is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n\n## Acknowledgements\n\nThank you to all the contributors who have helped build and improve this tool.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Tool for replication of MySQL databases to ClickHouse",
"version": "0.0.35",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4622fa1c0d02cce3324402a5fa27b241aec43a7e8c57fa7cdcadc3e8b6964798",
"md5": "fee5e29ff3a585819e8c486a7b7b0bcf",
"sha256": "98701b3717e393e8aec91633d034eff990400e7dd2a0b4c40a3a6b7ee176262b"
},
"downloads": -1,
"filename": "mysql_ch_replicator-0.0.35-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fee5e29ff3a585819e8c486a7b7b0bcf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.6",
"size": 138861,
"upload_time": "2024-11-20T20:59:51",
"upload_time_iso_8601": "2024-11-20T20:59:51.835573Z",
"url": "https://files.pythonhosted.org/packages/46/22/fa1c0d02cce3324402a5fa27b241aec43a7e8c57fa7cdcadc3e8b6964798/mysql_ch_replicator-0.0.35-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "06d68c80fb29917df5c0ccd1c8b4feb179abca6ef219f3173ff208a900b2d8a5",
"md5": "e5c0383e5616f5e6c9e5b1867771f5ca",
"sha256": "ee41aeb09aaa41125a8da8ecc4711cd34d7b6dee8a08e89a23b277f27584bae0"
},
"downloads": -1,
"filename": "mysql_ch_replicator-0.0.35.tar.gz",
"has_sig": false,
"md5_digest": "e5c0383e5616f5e6c9e5b1867771f5ca",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.6",
"size": 127127,
"upload_time": "2024-11-20T20:59:53",
"upload_time_iso_8601": "2024-11-20T20:59:53.595509Z",
"url": "https://files.pythonhosted.org/packages/06/d6/8c80fb29917df5c0ccd1c8b4feb179abca6ef219f3173ff208a900b2d8a5/mysql_ch_replicator-0.0.35.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-20 20:59:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "mysql-ch-replicator"
}