sql2arrow


Namesql2arrow JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryThis is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.
upload_time2024-12-31 03:23:22
maintainerNone
docs_urlNone
authorzhan zhang
requires_python>=3.8
licenseNone
keywords arrow sql mysql rust
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SQL2Arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.

## How to use

### Installation

Install the latest SQL2arrow version with:

```bash
pip install sql2arrow
```

### Parsing SQL str
```python
import sql2arrow

sql_str = '''
INSERT INTO `region` VALUES
	('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),
	('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),
	('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),
	('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');
'''

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]

arrow_data = sql2arrow.parse_sql(sql_str, columns)
```


### Parsing sql files

```python
import sql2arrow

sql_paths = [
    "region.sql_0.gz", "region.sql_1.gz","region.sql_2.gz","region.sql_3.gz","region.sql_4.gz","region.sql_5.gz","region.sql_6.gz"
]

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]


partition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()
partition_func_spec.add_partition("region_code", sql2arrow.partition.IcebergTransforms.bucket(30))


it = sql2arrow.SQLFile2ArrowIter(
    sql_paths,
    columns,
    4,
    1000,
    sql2arrow.CompressionType.SNAPPY,
    sql2arrow.Dialect.MYSQL,
    partition_func_spec
)

for arr in it:
    print(arr)
```


## arro3

SQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.

```python
# some codes from above

import pyarrow as pa
tables = [pa.Table.from_arrays(a, names=names) for a in arrs]
```
## Limitations

### Dialect
    It currently supports only MySQL and PostgreSQL INSERT statements.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sql2arrow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "zhan zhang <goalzz85@gmail.com>",
    "keywords": "arrow, sql, mysql, rust",
    "author": "zhan zhang",
    "author_email": "goalzz85@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/7f/64/644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e/sql2arrow-0.1.3.tar.gz",
    "platform": null,
    "description": "# SQL2Arrow\n\nThis is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.\n\n## How to use\n\n### Installation\n\nInstall the latest SQL2arrow version with:\n\n```bash\npip install sql2arrow\n```\n\n### Parsing SQL str\n```python\nimport sql2arrow\n\nsql_str = '''\nINSERT INTO `region` VALUES\n\t('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),\n\t('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),\n\t('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),\n\t('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');\n'''\n\ncolumns = [\n    sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\narrow_data = sql2arrow.parse_sql(sql_str, columns)\n```\n\n\n### Parsing sql files\n\n```python\nimport sql2arrow\n\nsql_paths = [\n    \"region.sql_0.gz\", \"region.sql_1.gz\",\"region.sql_2.gz\",\"region.sql_3.gz\",\"region.sql_4.gz\",\"region.sql_5.gz\",\"region.sql_6.gz\"\n]\n\ncolumns = [\n    sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n    sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\n\npartition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()\npartition_func_spec.add_partition(\"region_code\", sql2arrow.partition.IcebergTransforms.bucket(30))\n\n\nit = sql2arrow.SQLFile2ArrowIter(\n    sql_paths,\n    columns,\n    4,\n    1000,\n    sql2arrow.CompressionType.SNAPPY,\n    sql2arrow.Dialect.MYSQL,\n    partition_func_spec\n)\n\nfor arr in it:\n    print(arr)\n```\n\n\n## arro3\n\nSQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.\n\n```python\n# some codes from above\n\nimport pyarrow as pa\ntables = [pa.Table.from_arrays(a, names=names) for a in arrs]\n```\n## Limitations\n\n### Dialect\n    It currently supports only MySQL and PostgreSQL INSERT statements.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.",
    "version": "0.1.3",
    "project_urls": {
        "homepage": "https://github.com/goalzz85/sql2arrow"
    },
    "split_keywords": [
        "arrow",
        " sql",
        " mysql",
        " rust"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c2610a72f670a33026b26860d7635f963fa43b5216469fe34ea67e755556647b",
                "md5": "a0972a43ffb026fe800419fd13349f15",
                "sha256": "5f025f334293dee107ea301110374ead232e7d8fac0b5f4d514aee28a7be5eeb"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "a0972a43ffb026fe800419fd13349f15",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4235188,
            "upload_time": "2024-12-31T03:23:07",
            "upload_time_iso_8601": "2024-12-31T03:23:07.126177Z",
            "url": "https://files.pythonhosted.org/packages/c2/61/0a72f670a33026b26860d7635f963fa43b5216469fe34ea67e755556647b/sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "698c8a9f83f26b30e2231a70f692558a392178968129acc3522342e8691b3cab",
                "md5": "867a17536b38c4412396409dc129be30",
                "sha256": "d4cbda4e80e26248a0d798c37d979c93e2f2caea62be275ff1f25b096f2bc953"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl",
            "has_sig": false,
            "md5_digest": "867a17536b38c4412396409dc129be30",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4704237,
            "upload_time": "2024-12-31T03:22:50",
            "upload_time_iso_8601": "2024-12-31T03:22:50.806344Z",
            "url": "https://files.pythonhosted.org/packages/69/8c/8a9f83f26b30e2231a70f692558a392178968129acc3522342e8691b3cab/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d301d732b0acaef000b58faed4666046a8401019c01dee09c9db5a072db4c3b5",
                "md5": "42c82daa9593767b28760b519f5e417d",
                "sha256": "cd5a69b2794f6c5c51cdac77efabd34e9070adc8e939d37e068e713d33937a0f"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl",
            "has_sig": false,
            "md5_digest": "42c82daa9593767b28760b519f5e417d",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4901166,
            "upload_time": "2024-12-31T03:22:54",
            "upload_time_iso_8601": "2024-12-31T03:22:54.205726Z",
            "url": "https://files.pythonhosted.org/packages/d3/01/d732b0acaef000b58faed4666046a8401019c01dee09c9db5a072db4c3b5/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9ed704daeedc1103be7ac30fd6ba4c522a4bcc4c65447f4f961f34f115245811",
                "md5": "bd576457a23bf199ae6c4770e164b26a",
                "sha256": "3c3a1f03f90ab6bca6fe9cbee73f44c8e833c62256eaede2d53c95730ab5aa4a"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl",
            "has_sig": false,
            "md5_digest": "bd576457a23bf199ae6c4770e164b26a",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 6466175,
            "upload_time": "2024-12-31T03:22:57",
            "upload_time_iso_8601": "2024-12-31T03:22:57.694971Z",
            "url": "https://files.pythonhosted.org/packages/9e/d7/04daeedc1103be7ac30fd6ba4c522a4bcc4c65447f4f961f34f115245811/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94ad56ec5de96386f650f37e0298d1569e4a31a7905c49b4f54f82e72ccca5cf",
                "md5": "2e3e912e14f4e8c88dadf261abe6c52c",
                "sha256": "b4e72daecf028f8842d77ea6d84eac0cd537badc5a8a6433594187f23f15e406"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl",
            "has_sig": false,
            "md5_digest": "2e3e912e14f4e8c88dadf261abe6c52c",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5793425,
            "upload_time": "2024-12-31T03:23:00",
            "upload_time_iso_8601": "2024-12-31T03:23:00.717608Z",
            "url": "https://files.pythonhosted.org/packages/94/ad/56ec5de96386f650f37e0298d1569e4a31a7905c49b4f54f82e72ccca5cf/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3d392e99494c1a7c3e716c25a5b89787e23a6bb2af681bca324301cda66e637d",
                "md5": "d638db4436442136108fd8acdf152d11",
                "sha256": "8d560cca8ef61e434cf6bc5506720ad41bb618c835a3746719116e84a720dddb"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "d638db4436442136108fd8acdf152d11",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4989487,
            "upload_time": "2024-12-31T03:23:03",
            "upload_time_iso_8601": "2024-12-31T03:23:03.931171Z",
            "url": "https://files.pythonhosted.org/packages/3d/39/2e99494c1a7c3e716c25a5b89787e23a6bb2af681bca324301cda66e637d/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ef0405e62dc983f84c61f0ea5f6983b75355773cc8b0777a33081bbdc4d095f5",
                "md5": "b69572819be1ba3d495aed6a87379843",
                "sha256": "61fbdde2a5dcd1be4ce98e420735faad67cae40f4446806ec34ad8b59c71e309"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "b69572819be1ba3d495aed6a87379843",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4850735,
            "upload_time": "2024-12-31T03:23:10",
            "upload_time_iso_8601": "2024-12-31T03:23:10.314240Z",
            "url": "https://files.pythonhosted.org/packages/ef/04/05e62dc983f84c61f0ea5f6983b75355773cc8b0777a33081bbdc4d095f5/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3c0b2b4f15fe60c084007bbb241753cf1daa88ad8feb4b8e3c98d07320c7c374",
                "md5": "9ca718ff4adfff9cd0b72d48d1946495",
                "sha256": "873f6d19d85d66e7deecb22a5996e423453b1a82973c746e140b884cc969340d"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl",
            "has_sig": false,
            "md5_digest": "9ca718ff4adfff9cd0b72d48d1946495",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5152611,
            "upload_time": "2024-12-31T03:23:14",
            "upload_time_iso_8601": "2024-12-31T03:23:14.029115Z",
            "url": "https://files.pythonhosted.org/packages/3c/0b/2b4f15fe60c084007bbb241753cf1daa88ad8feb4b8e3c98d07320c7c374/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "237f3485bab6df9c0c34cb33c154f2b43a7697d390f716a433c657b8209a196c",
                "md5": "24fc22ab940ad9624aa5600ba0d1e488",
                "sha256": "0961fa3d017249a4f30546096185ca9f9ccc7a8c2027751838e7685e0e853a1c"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "24fc22ab940ad9624aa5600ba0d1e488",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5271895,
            "upload_time": "2024-12-31T03:23:17",
            "upload_time_iso_8601": "2024-12-31T03:23:17.301056Z",
            "url": "https://files.pythonhosted.org/packages/23/7f/3485bab6df9c0c34cb33c154f2b43a7697d390f716a433c657b8209a196c/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "beecd526b7296f942964dea1ad06912fb205633aa4b31cf327900a3477b71edb",
                "md5": "64fdb4f3bda86bf317201c30c17b5634",
                "sha256": "87a1cc5ddc90c5e1702cbfc0d3f70a0313e1482105bccc8d51585fd9dd67de81"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "64fdb4f3bda86bf317201c30c17b5634",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 5148761,
            "upload_time": "2024-12-31T03:23:20",
            "upload_time_iso_8601": "2024-12-31T03:23:20.629892Z",
            "url": "https://files.pythonhosted.org/packages/be/ec/d526b7296f942964dea1ad06912fb205633aa4b31cf327900a3477b71edb/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8ac77a0b8e91614b647e9b7792803e77f4a9b53a89d9cafc6e02de0018728f57",
                "md5": "255869d3f11eaa51cd6bb96d4883d038",
                "sha256": "bca05337fca332d04a2b381068f004dae3158fa2f2ecb870738b2b2cd6b8266b"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3-cp38-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "255869d3f11eaa51cd6bb96d4883d038",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4511048,
            "upload_time": "2024-12-31T03:23:23",
            "upload_time_iso_8601": "2024-12-31T03:23:23.386291Z",
            "url": "https://files.pythonhosted.org/packages/8a/c7/7a0b8e91614b647e9b7792803e77f4a9b53a89d9cafc6e02de0018728f57/sql2arrow-0.1.3-cp38-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7f64644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e",
                "md5": "d8ea3c5a52d6212939aafe2ca43586e5",
                "sha256": "8887a606fc3e3548312e64a5e5b30036dd8a311edeb8634aca6c8f6026153b37"
            },
            "downloads": -1,
            "filename": "sql2arrow-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d8ea3c5a52d6212939aafe2ca43586e5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 44990,
            "upload_time": "2024-12-31T03:23:22",
            "upload_time_iso_8601": "2024-12-31T03:23:22.252215Z",
            "url": "https://files.pythonhosted.org/packages/7f/64/644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e/sql2arrow-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-31 03:23:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "goalzz85",
    "github_project": "sql2arrow",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sql2arrow"
}
        
Elapsed time: 4.98818s