Name | sql2arrow JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. |
upload_time | 2024-12-12 13:21:27 |
maintainer | None |
docs_url | None |
author | zhan zhang |
requires_python | >=3.8 |
license | None |
keywords |
arrow
sql
mysql
rust
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# SQL2Arrow
This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.
## How to use
### Installation
Install the latest SQL2arrow version with:
```bash
pip install sql2arrow
```
### Parsing SQL str
```python
import sql2arrow
sql_str = '''
INSERT INTO `region` VALUES
('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),
('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),
('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),
('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');
'''
columns = [
sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]
arrow_data = sql2arrow.parse_sql(sql_str, columns)
```
### Parsing sql files
```python
import sql2arrow
sql_paths = [
"region.sql_0.gz", "region.sql_1.gz","region.sql_2.gz","region.sql_3.gz","region.sql_4.gz","region.sql_5.gz","region.sql_6.gz"
]
columns = [
sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]
partition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()
partition_func_spec.add_partition("region_code", sql2arrow.partition.IcebergTransforms.bucket(30))
# load data with partition func
partitioned_arrs = sql2arrow.load_sqls_with_partition_func(sql_paths, columns, partition_func_spec, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)
# load data from files one by one
arrs = sql2arrow.load_sqls(sql_paths, columns, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)
```
## arro3
SQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.
```python
# some codes from above
import pyarrow as pa
tables = [pa.Table.from_arrays(a, names=names) for a in arrs]
```
## Limitations
### Dialect
It currently supports only MySQL INSERT statements, but PostgreSQL support will be added soon.
Raw data
{
"_id": null,
"home_page": null,
"name": "sql2arrow",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "zhan zhang <goalzz85@gmail.com>",
"keywords": "arrow, sql, mysql, rust",
"author": "zhan zhang",
"author_email": "goalzz85@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/52/81/1a7a17aa73d0b13f30f4d78acd9036bbf755aaeb02c9875796c2f2e258bd/sql2arrow-0.1.1.tar.gz",
"platform": null,
"description": "# SQL2Arrow\n\nThis is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.\n\n## How to use\n\n### Installation\n\nInstall the latest SQL2arrow version with:\n\n```bash\npip install sql2arrow\n```\n\n### Parsing SQL str\n```python\nimport sql2arrow\n\nsql_str = '''\nINSERT INTO `region` VALUES\n\t('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),\n\t('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),\n\t('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),\n\t('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');\n'''\n\ncolumns = [\n sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\narrow_data = sql2arrow.parse_sql(sql_str, columns)\n```\n\n\n### Parsing sql files\n\n```python\nimport sql2arrow\n\nsql_paths = [\n \"region.sql_0.gz\", \"region.sql_1.gz\",\"region.sql_2.gz\",\"region.sql_3.gz\",\"region.sql_4.gz\",\"region.sql_5.gz\",\"region.sql_6.gz\"\n]\n\ncolumns = [\n sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\n\npartition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()\npartition_func_spec.add_partition(\"region_code\", sql2arrow.partition.IcebergTransforms.bucket(30))\n\n# load data with partition func\npartitioned_arrs = sql2arrow.load_sqls_with_partition_func(sql_paths, columns, partition_func_spec, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)\n\n# load data from files one by one\narrs = sql2arrow.load_sqls(sql_paths, columns, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)\n```\n\n\n## arro3\n\nSQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.\n\n```python\n# some codes from above\n\nimport pyarrow as pa\ntables = [pa.Table.from_arrays(a, names=names) for a in arrs]\n```\n## Limitations\n\n### Dialect\n It currently supports only MySQL INSERT statements, but PostgreSQL support will be added soon.\n",
"bugtrack_url": null,
"license": null,
"summary": "This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.",
"version": "0.1.1",
"project_urls": {
"homepage": "https://github.com/goalzz85/sql2arrow"
},
"split_keywords": [
"arrow",
" sql",
" mysql",
" rust"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e972367b2222ec9352784c94818933f8ad3945f99afb9909f302b40e84bb1fc6",
"md5": "9827a46650ae2c3f55ff3f77c731eb02",
"sha256": "6dedfd396286b42215fe9cbf24cabfaa5870784b1a03556f62c2500ef6f78831"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "9827a46650ae2c3f55ff3f77c731eb02",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4230861,
"upload_time": "2024-12-12T13:21:12",
"upload_time_iso_8601": "2024-12-12T13:21:12.658719Z",
"url": "https://files.pythonhosted.org/packages/e9/72/367b2222ec9352784c94818933f8ad3945f99afb9909f302b40e84bb1fc6/sql2arrow-0.1.1-cp38-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0b3b8199e685243a057bc6bc7674ae4e494724e941ef48fd90d1475d5107789d",
"md5": "19f3ece886d6814e3f4e26babe65c681",
"sha256": "50a060973ff19f533177c440b58794b9ee1bbee4eb2408a7b14361f172aee453"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl",
"has_sig": false,
"md5_digest": "19f3ece886d6814e3f4e26babe65c681",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4694634,
"upload_time": "2024-12-12T13:20:54",
"upload_time_iso_8601": "2024-12-12T13:20:54.705124Z",
"url": "https://files.pythonhosted.org/packages/0b/3b/8199e685243a057bc6bc7674ae4e494724e941ef48fd90d1475d5107789d/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "72e99ff84575f438312604ffe865c299a3fe3fb61e9593f114d5e60932a4e4af",
"md5": "9036daebcb14a84141f0bb95acf0898a",
"sha256": "4707ba2b72e82840366a229582aa5af88d5d3bbd1ee1a3df6e120fda03f8202a"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_armv7l.whl",
"has_sig": false,
"md5_digest": "9036daebcb14a84141f0bb95acf0898a",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4882102,
"upload_time": "2024-12-12T13:20:58",
"upload_time_iso_8601": "2024-12-12T13:20:58.591806Z",
"url": "https://files.pythonhosted.org/packages/72/e9/9ff84575f438312604ffe865c299a3fe3fb61e9593f114d5e60932a4e4af/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1faf8563e69413e926a27174f37c6058a036a400ada1b2ba302c1fc0bfde8230",
"md5": "8414eb7cf92e2e49ba776798199d6793",
"sha256": "7a325f70bb9c4f740395e7a5a0000acbd843f691ec8eebca3b873f13c35a6f39"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_ppc64le.whl",
"has_sig": false,
"md5_digest": "8414eb7cf92e2e49ba776798199d6793",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 6462281,
"upload_time": "2024-12-12T13:21:01",
"upload_time_iso_8601": "2024-12-12T13:21:01.525924Z",
"url": "https://files.pythonhosted.org/packages/1f/af/8563e69413e926a27174f37c6058a036a400ada1b2ba302c1fc0bfde8230/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_ppc64le.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5ed814aeffce93e0f0d900fd7759be39dee85dc07e4f17184d89f7f8b15a58b2",
"md5": "1b73e9740d43bec1a56baac633adb940",
"sha256": "a072f9323595ffe726683404d2d920c9bf8fc0e01d9adcdbff3ef7cfb059e718"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_s390x.whl",
"has_sig": false,
"md5_digest": "1b73e9740d43bec1a56baac633adb940",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5775339,
"upload_time": "2024-12-12T13:21:05",
"upload_time_iso_8601": "2024-12-12T13:21:05.557320Z",
"url": "https://files.pythonhosted.org/packages/5e/d8/14aeffce93e0f0d900fd7759be39dee85dc07e4f17184d89f7f8b15a58b2/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_s390x.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cc2d89ccd7b13937c124df260ee7bfb4be55bc2bc6fca621f531d5fafdf1a4b8",
"md5": "46840a89972d9367b4f4571c8c0c9545",
"sha256": "411d237b9f5b62eea4e739ea37b55b2167503123f6982b9ee5bfa854c1d40104"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "46840a89972d9367b4f4571c8c0c9545",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4984385,
"upload_time": "2024-12-12T13:21:09",
"upload_time_iso_8601": "2024-12-12T13:21:09.152221Z",
"url": "https://files.pythonhosted.org/packages/cc/2d/89ccd7b13937c124df260ee7bfb4be55bc2bc6fca621f531d5fafdf1a4b8/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "dd84d2e24f031e83849e56a744feaaad3526ae38daf9eafe1af9a126a4d81517",
"md5": "8aebb40e0dfd8baa2ac17141c9e9c825",
"sha256": "29d43a91b399bcdd4f19d32e271ae7e96b4c12a7f61c565681dba3763f7a0cc7"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "8aebb40e0dfd8baa2ac17141c9e9c825",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4838464,
"upload_time": "2024-12-12T13:21:16",
"upload_time_iso_8601": "2024-12-12T13:21:16.362602Z",
"url": "https://files.pythonhosted.org/packages/dd/84/d2e24f031e83849e56a744feaaad3526ae38daf9eafe1af9a126a4d81517/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a7d7c0d2e696ae356719dac8cf0b2cef38b2b0c50b0db397088e89e20d304b43",
"md5": "1a5bf17e2082078d21612d4f84303937",
"sha256": "6f116d07de66a6cb8a0208de1422cb42e1498c0a7327b06f2d35bf94f59ac96d"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_armv7l.whl",
"has_sig": false,
"md5_digest": "1a5bf17e2082078d21612d4f84303937",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5139806,
"upload_time": "2024-12-12T13:21:19",
"upload_time_iso_8601": "2024-12-12T13:21:19.806166Z",
"url": "https://files.pythonhosted.org/packages/a7/d7/c0d2e696ae356719dac8cf0b2cef38b2b0c50b0db397088e89e20d304b43/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "90911dd4b07545ea9c3c2445ee83fec408389e9909c74b7ba0c2b32982c84452",
"md5": "391501a9127fcdc4b513683f54c533c6",
"sha256": "ba746427f7f909ca260788e1326e219f30d01e238ea871213e8a15c5ddd67843"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "391501a9127fcdc4b513683f54c533c6",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5260470,
"upload_time": "2024-12-12T13:21:22",
"upload_time_iso_8601": "2024-12-12T13:21:22.418963Z",
"url": "https://files.pythonhosted.org/packages/90/91/1dd4b07545ea9c3c2445ee83fec408389e9909c74b7ba0c2b32982c84452/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4f9c15950f6f8e876dda98d19e04ea2c62897ab0926fccb903d4f077b6a4a9f0",
"md5": "c0b32b948456ece2fa93aca5b75df8c8",
"sha256": "8bf6c30004cb8f378c99231e9c78a7bd769d40b61e3a87ccfa9c98f54dbea09c"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "c0b32b948456ece2fa93aca5b75df8c8",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5143803,
"upload_time": "2024-12-12T13:21:24",
"upload_time_iso_8601": "2024-12-12T13:21:24.716120Z",
"url": "https://files.pythonhosted.org/packages/4f/9c/15950f6f8e876dda98d19e04ea2c62897ab0926fccb903d4f077b6a4a9f0/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ea1889132970c905ffd1a8f99e9012946030d69307888b64e5325418aba4eac8",
"md5": "803fe478537807adb3e232f70a13d4b0",
"sha256": "9f586d40c9cdd277ecfbeb8eddb2d7bbd20020c47dfbd413ae14b9348412b5b8"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1-cp38-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "803fe478537807adb3e232f70a13d4b0",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4509272,
"upload_time": "2024-12-12T13:21:30",
"upload_time_iso_8601": "2024-12-12T13:21:30.911866Z",
"url": "https://files.pythonhosted.org/packages/ea/18/89132970c905ffd1a8f99e9012946030d69307888b64e5325418aba4eac8/sql2arrow-0.1.1-cp38-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "52811a7a17aa73d0b13f30f4d78acd9036bbf755aaeb02c9875796c2f2e258bd",
"md5": "f34325b79fe4b0fdf7606d6349269933",
"sha256": "3e3f2a45a8bdba2a89f55d04b9bace07b1baa2a1be975903490f33e35a660f2e"
},
"downloads": -1,
"filename": "sql2arrow-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "f34325b79fe4b0fdf7606d6349269933",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 43428,
"upload_time": "2024-12-12T13:21:27",
"upload_time_iso_8601": "2024-12-12T13:21:27.671304Z",
"url": "https://files.pythonhosted.org/packages/52/81/1a7a17aa73d0b13f30f4d78acd9036bbf755aaeb02c9875796c2f2e258bd/sql2arrow-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-12 13:21:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "goalzz85",
"github_project": "sql2arrow",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sql2arrow"
}