sql2arrow


Namesql2arrow JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryThis is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.
upload_time2024-12-12 13:21:27
maintainerNone
docs_urlNone
authorzhan zhang
requires_python>=3.8
licenseNone
keywords arrow sql mysql rust
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SQL2Arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.

## How to use

### Installation

Install the latest SQL2arrow version with:

```bash
pip install sql2arrow
```

### Parsing SQL str
```python
import sql2arrow

sql_str = '''
INSERT INTO `region` VALUES
	('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),
	('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),
	('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),
	('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');
'''

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]

arrow_data = sql2arrow.parse_sql(sql_str, columns)
```


### Parsing sql files

```python
import sql2arrow

sql_paths = [
    "region.sql_0.gz", "region.sql_1.gz","region.sql_2.gz","region.sql_3.gz","region.sql_4.gz","region.sql_5.gz","region.sql_6.gz"
]

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]


partition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()
partition_func_spec.add_partition("region_code", sql2arrow.partition.IcebergTransforms.bucket(30))

# load data with partition func
partitioned_arrs = sql2arrow.load_sqls_with_partition_func(sql_paths, columns, partition_func_spec, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)

# load data from files one by one
arrs = sql2arrow.load_sqls(sql_paths, columns, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)
```


## arro3

SQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.

```python
# some codes from above

import pyarrow as pa
tables = [pa.Table.from_arrays(a, names=names) for a in arrs]
```
## Limitations

### Dialect
    It currently supports only MySQL INSERT statements, but PostgreSQL support will be added soon.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sql2arrow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "zhan zhang <goalzz85@gmail.com>",
    "keywords": "arrow, sql, mysql, rust",
    "author": "zhan zhang",
    "author_email": "goalzz85@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/52/81/1a7a17aa73d0b13f30f4d78acd9036bbf755aaeb02c9875796c2f2e258bd/sql2arrow-0.1.1.tar.gz",
    "platform": null,
    "description": "# SQL2Arrow\n\nThis is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.\n\n## How to use\n\n### Installation\n\nInstall the latest SQL2arrow version with:\n\n```bash\npip install sql2arrow\n```\n\n### Parsing SQL str\n```python\nimport sql2arrow\n\nsql_str = '''\nINSERT INTO `region` VALUES\n\t('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),\n\t('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),\n\t('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),\n\t('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');\n'''\n\ncolumns = [\n    sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\narrow_data = sql2arrow.parse_sql(sql_str, columns)\n```\n\n\n### Parsing sql files\n\n```python\nimport sql2arrow\n\nsql_paths = [\n    \"region.sql_0.gz\", \"region.sql_1.gz\",\"region.sql_2.gz\",\"region.sql_3.gz\",\"region.sql_4.gz\",\"region.sql_5.gz\",\"region.sql_6.gz\"\n]\n\ncolumns = [\n    sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\n\npartition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()\npartition_func_spec.add_partition(\"region_code\", sql2arrow.partition.IcebergTransforms.bucket(30))\n\n# load data with partition func\npartitioned_arrs = sql2arrow.load_sqls_with_partition_func(sql_paths, columns, partition_func_spec, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)\n\n# load data from files one by one\narrs = sql2arrow.load_sqls(sql_paths, columns, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)\n```\n\n\n## arro3\n\nSQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.\n\n```python\n# some codes from above\n\nimport pyarrow as pa\ntables = [pa.Table.from_arrays(a, names=names) for a in arrs]\n```\n## Limitations\n\n### Dialect\n    It currently supports only MySQL INSERT statements, but PostgreSQL support will be added soon.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.",
    "version": "0.1.1",
    "project_urls": {
        "homepage": "https://github.com/goalzz85/sql2arrow"
    },
    "split_keywords": [
        "arrow",
        " sql",
        " mysql",
        " rust"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e972367b2222ec9352784c94818933f8ad3945f99afb9909f302b40e84bb1fc6",
                "md5": "9827a46650ae2c3f55ff3f77c731eb02",
                "sha256": "6dedfd396286b42215fe9cbf24cabfaa5870784b1a03556f62c2500ef6f78831"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "9827a46650ae2c3f55ff3f77c731eb02",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4230861,
            "upload_time": "2024-12-12T13:21:12",
            "upload_time_iso_8601": "2024-12-12T13:21:12.658719Z",
            "url": "https://files.pythonhosted.org/packages/e9/72/367b2222ec9352784c94818933f8ad3945f99afb9909f302b40e84bb1fc6/sql2arrow-0.1.1-cp38-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b3b8199e685243a057bc6bc7674ae4e494724e941ef48fd90d1475d5107789d",
                "md5": "19f3ece886d6814e3f4e26babe65c681",
                "sha256": "50a060973ff19f533177c440b58794b9ee1bbee4eb2408a7b14361f172aee453"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl",
            "has_sig": false,
            "md5_digest": "19f3ece886d6814e3f4e26babe65c681",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4694634,
            "upload_time": "2024-12-12T13:20:54",
            "upload_time_iso_8601": "2024-12-12T13:20:54.705124Z",
            "url": "https://files.pythonhosted.org/packages/0b/3b/8199e685243a057bc6bc7674ae4e494724e941ef48fd90d1475d5107789d/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "72e99ff84575f438312604ffe865c299a3fe3fb61e9593f114d5e60932a4e4af",
                "md5": "9036daebcb14a84141f0bb95acf0898a",
                "sha256": "4707ba2b72e82840366a229582aa5af88d5d3bbd1ee1a3df6e120fda03f8202a"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_armv7l.whl",
            "has_sig": false,
            "md5_digest": "9036daebcb14a84141f0bb95acf0898a",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4882102,
            "upload_time": "2024-12-12T13:20:58",
            "upload_time_iso_8601": "2024-12-12T13:20:58.591806Z",
            "url": "https://files.pythonhosted.org/packages/72/e9/9ff84575f438312604ffe865c299a3fe3fb61e9593f114d5e60932a4e4af/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_armv7l.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1faf8563e69413e926a27174f37c6058a036a400ada1b2ba302c1fc0bfde8230",
                "md5": "8414eb7cf92e2e49ba776798199d6793",
                "sha256": "7a325f70bb9c4f740395e7a5a0000acbd843f691ec8eebca3b873f13c35a6f39"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_ppc64le.whl",
            "has_sig": false,
            "md5_digest": "8414eb7cf92e2e49ba776798199d6793",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 6462281,
            "upload_time": "2024-12-12T13:21:01",
            "upload_time_iso_8601": "2024-12-12T13:21:01.525924Z",
            "url": "https://files.pythonhosted.org/packages/1f/af/8563e69413e926a27174f37c6058a036a400ada1b2ba302c1fc0bfde8230/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_ppc64le.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5ed814aeffce93e0f0d900fd7759be39dee85dc07e4f17184d89f7f8b15a58b2",
                "md5": "1b73e9740d43bec1a56baac633adb940",
                "sha256": "a072f9323595ffe726683404d2d920c9bf8fc0e01d9adcdbff3ef7cfb059e718"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_s390x.whl",
            "has_sig": false,
            "md5_digest": "1b73e9740d43bec1a56baac633adb940",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5775339,
            "upload_time": "2024-12-12T13:21:05",
            "upload_time_iso_8601": "2024-12-12T13:21:05.557320Z",
            "url": "https://files.pythonhosted.org/packages/5e/d8/14aeffce93e0f0d900fd7759be39dee85dc07e4f17184d89f7f8b15a58b2/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_s390x.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc2d89ccd7b13937c124df260ee7bfb4be55bc2bc6fca621f531d5fafdf1a4b8",
                "md5": "46840a89972d9367b4f4571c8c0c9545",
                "sha256": "411d237b9f5b62eea4e739ea37b55b2167503123f6982b9ee5bfa854c1d40104"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "46840a89972d9367b4f4571c8c0c9545",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4984385,
            "upload_time": "2024-12-12T13:21:09",
            "upload_time_iso_8601": "2024-12-12T13:21:09.152221Z",
            "url": "https://files.pythonhosted.org/packages/cc/2d/89ccd7b13937c124df260ee7bfb4be55bc2bc6fca621f531d5fafdf1a4b8/sql2arrow-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dd84d2e24f031e83849e56a744feaaad3526ae38daf9eafe1af9a126a4d81517",
                "md5": "8aebb40e0dfd8baa2ac17141c9e9c825",
                "sha256": "29d43a91b399bcdd4f19d32e271ae7e96b4c12a7f61c565681dba3763f7a0cc7"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "8aebb40e0dfd8baa2ac17141c9e9c825",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4838464,
            "upload_time": "2024-12-12T13:21:16",
            "upload_time_iso_8601": "2024-12-12T13:21:16.362602Z",
            "url": "https://files.pythonhosted.org/packages/dd/84/d2e24f031e83849e56a744feaaad3526ae38daf9eafe1af9a126a4d81517/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a7d7c0d2e696ae356719dac8cf0b2cef38b2b0c50b0db397088e89e20d304b43",
                "md5": "1a5bf17e2082078d21612d4f84303937",
                "sha256": "6f116d07de66a6cb8a0208de1422cb42e1498c0a7327b06f2d35bf94f59ac96d"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_armv7l.whl",
            "has_sig": false,
            "md5_digest": "1a5bf17e2082078d21612d4f84303937",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5139806,
            "upload_time": "2024-12-12T13:21:19",
            "upload_time_iso_8601": "2024-12-12T13:21:19.806166Z",
            "url": "https://files.pythonhosted.org/packages/a7/d7/c0d2e696ae356719dac8cf0b2cef38b2b0c50b0db397088e89e20d304b43/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_armv7l.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "90911dd4b07545ea9c3c2445ee83fec408389e9909c74b7ba0c2b32982c84452",
                "md5": "391501a9127fcdc4b513683f54c533c6",
                "sha256": "ba746427f7f909ca260788e1326e219f30d01e238ea871213e8a15c5ddd67843"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "391501a9127fcdc4b513683f54c533c6",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5260470,
            "upload_time": "2024-12-12T13:21:22",
            "upload_time_iso_8601": "2024-12-12T13:21:22.418963Z",
            "url": "https://files.pythonhosted.org/packages/90/91/1dd4b07545ea9c3c2445ee83fec408389e9909c74b7ba0c2b32982c84452/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4f9c15950f6f8e876dda98d19e04ea2c62897ab0926fccb903d4f077b6a4a9f0",
                "md5": "c0b32b948456ece2fa93aca5b75df8c8",
                "sha256": "8bf6c30004cb8f378c99231e9c78a7bd769d40b61e3a87ccfa9c98f54dbea09c"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "c0b32b948456ece2fa93aca5b75df8c8",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5143803,
            "upload_time": "2024-12-12T13:21:24",
            "upload_time_iso_8601": "2024-12-12T13:21:24.716120Z",
            "url": "https://files.pythonhosted.org/packages/4f/9c/15950f6f8e876dda98d19e04ea2c62897ab0926fccb903d4f077b6a4a9f0/sql2arrow-0.1.1-cp38-abi3-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ea1889132970c905ffd1a8f99e9012946030d69307888b64e5325418aba4eac8",
                "md5": "803fe478537807adb3e232f70a13d4b0",
                "sha256": "9f586d40c9cdd277ecfbeb8eddb2d7bbd20020c47dfbd413ae14b9348412b5b8"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1-cp38-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "803fe478537807adb3e232f70a13d4b0",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4509272,
            "upload_time": "2024-12-12T13:21:30",
            "upload_time_iso_8601": "2024-12-12T13:21:30.911866Z",
            "url": "https://files.pythonhosted.org/packages/ea/18/89132970c905ffd1a8f99e9012946030d69307888b64e5325418aba4eac8/sql2arrow-0.1.1-cp38-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "52811a7a17aa73d0b13f30f4d78acd9036bbf755aaeb02c9875796c2f2e258bd",
                "md5": "f34325b79fe4b0fdf7606d6349269933",
                "sha256": "3e3f2a45a8bdba2a89f55d04b9bace07b1baa2a1be975903490f33e35a660f2e"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f34325b79fe4b0fdf7606d6349269933",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 43428,
            "upload_time": "2024-12-12T13:21:27",
            "upload_time_iso_8601": "2024-12-12T13:21:27.671304Z",
            "url": "https://files.pythonhosted.org/packages/52/81/1a7a17aa73d0b13f30f4d78acd9036bbf755aaeb02c9875796c2f2e258bd/sql2arrow-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-12 13:21:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "goalzz85",
    "github_project": "sql2arrow",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sql2arrow"
}
        
Elapsed time: 0.41859s