Name | sql2arrow JSON |
Version |
0.1.3
JSON |
| download |
home_page | None |
Summary | This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. |
upload_time | 2024-12-31 03:23:22 |
maintainer | None |
docs_url | None |
author | zhan zhang |
requires_python | >=3.8 |
license | None |
keywords |
arrow
sql
mysql
rust
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# SQL2Arrow
This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.
## How to use
### Installation
Install the latest SQL2arrow version with:
```bash
pip install sql2arrow
```
### Parsing SQL str
```python
import sql2arrow
sql_str = '''
INSERT INTO `region` VALUES
('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),
('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),
('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),
('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');
'''
columns = [
sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]
arrow_data = sql2arrow.parse_sql(sql_str, columns)
```
### Parsing sql files
```python
import sql2arrow
sql_paths = [
"region.sql_0.gz", "region.sql_1.gz","region.sql_2.gz","region.sql_3.gz","region.sql_4.gz","region.sql_5.gz","region.sql_6.gz"
]
columns = [
sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]
partition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()
partition_func_spec.add_partition("region_code", sql2arrow.partition.IcebergTransforms.bucket(30))
it = sql2arrow.SQLFile2ArrowIter(
sql_paths,
columns,
4,
1000,
sql2arrow.CompressionType.SNAPPY,
sql2arrow.Dialect.MYSQL,
partition_func_spec
)
for arr in it:
print(arr)
```
## arro3
SQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.
```python
# some codes from above
import pyarrow as pa
tables = [pa.Table.from_arrays(a, names=names) for a in arrs]
```
## Limitations
### Dialect
It currently supports only MySQL and PostgreSQL INSERT statements.
Raw data
{
"_id": null,
"home_page": null,
"name": "sql2arrow",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "zhan zhang <goalzz85@gmail.com>",
"keywords": "arrow, sql, mysql, rust",
"author": "zhan zhang",
"author_email": "goalzz85@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/7f/64/644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e/sql2arrow-0.1.3.tar.gz",
"platform": null,
"description": "# SQL2Arrow\n\nThis is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.\n\n## How to use\n\n### Installation\n\nInstall the latest SQL2arrow version with:\n\n```bash\npip install sql2arrow\n```\n\n### Parsing SQL str\n```python\nimport sql2arrow\n\nsql_str = '''\nINSERT INTO `region` VALUES\n\t('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),\n\t('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),\n\t('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),\n\t('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');\n'''\n\ncolumns = [\n sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\narrow_data = sql2arrow.parse_sql(sql_str, columns)\n```\n\n\n### Parsing sql files\n\n```python\nimport sql2arrow\n\nsql_paths = [\n \"region.sql_0.gz\", \"region.sql_1.gz\",\"region.sql_2.gz\",\"region.sql_3.gz\",\"region.sql_4.gz\",\"region.sql_5.gz\",\"region.sql_6.gz\"\n]\n\ncolumns = [\n sql2arrow.Column(\"region_code\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"region_name\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"create_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"update_time\", sql2arrow.ArrowTypes.utf8()),\n sql2arrow.Column(\"parent_region_code\", sql2arrow.ArrowTypes.utf8())\n]\n\n\npartition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()\npartition_func_spec.add_partition(\"region_code\", sql2arrow.partition.IcebergTransforms.bucket(30))\n\n\nit = sql2arrow.SQLFile2ArrowIter(\n sql_paths,\n columns,\n 4,\n 1000,\n sql2arrow.CompressionType.SNAPPY,\n sql2arrow.Dialect.MYSQL,\n partition_func_spec\n)\n\nfor arr in it:\n print(arr)\n```\n\n\n## arro3\n\nSQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.\n\n```python\n# some codes from above\n\nimport pyarrow as pa\ntables = [pa.Table.from_arrays(a, names=names) for a in arrs]\n```\n## Limitations\n\n### Dialect\n It currently supports only MySQL and PostgreSQL INSERT statements.\n",
"bugtrack_url": null,
"license": null,
"summary": "This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.",
"version": "0.1.3",
"project_urls": {
"homepage": "https://github.com/goalzz85/sql2arrow"
},
"split_keywords": [
"arrow",
" sql",
" mysql",
" rust"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c2610a72f670a33026b26860d7635f963fa43b5216469fe34ea67e755556647b",
"md5": "a0972a43ffb026fe800419fd13349f15",
"sha256": "5f025f334293dee107ea301110374ead232e7d8fac0b5f4d514aee28a7be5eeb"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "a0972a43ffb026fe800419fd13349f15",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4235188,
"upload_time": "2024-12-31T03:23:07",
"upload_time_iso_8601": "2024-12-31T03:23:07.126177Z",
"url": "https://files.pythonhosted.org/packages/c2/61/0a72f670a33026b26860d7635f963fa43b5216469fe34ea67e755556647b/sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "698c8a9f83f26b30e2231a70f692558a392178968129acc3522342e8691b3cab",
"md5": "867a17536b38c4412396409dc129be30",
"sha256": "d4cbda4e80e26248a0d798c37d979c93e2f2caea62be275ff1f25b096f2bc953"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl",
"has_sig": false,
"md5_digest": "867a17536b38c4412396409dc129be30",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4704237,
"upload_time": "2024-12-31T03:22:50",
"upload_time_iso_8601": "2024-12-31T03:22:50.806344Z",
"url": "https://files.pythonhosted.org/packages/69/8c/8a9f83f26b30e2231a70f692558a392178968129acc3522342e8691b3cab/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d301d732b0acaef000b58faed4666046a8401019c01dee09c9db5a072db4c3b5",
"md5": "42c82daa9593767b28760b519f5e417d",
"sha256": "cd5a69b2794f6c5c51cdac77efabd34e9070adc8e939d37e068e713d33937a0f"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl",
"has_sig": false,
"md5_digest": "42c82daa9593767b28760b519f5e417d",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4901166,
"upload_time": "2024-12-31T03:22:54",
"upload_time_iso_8601": "2024-12-31T03:22:54.205726Z",
"url": "https://files.pythonhosted.org/packages/d3/01/d732b0acaef000b58faed4666046a8401019c01dee09c9db5a072db4c3b5/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9ed704daeedc1103be7ac30fd6ba4c522a4bcc4c65447f4f961f34f115245811",
"md5": "bd576457a23bf199ae6c4770e164b26a",
"sha256": "3c3a1f03f90ab6bca6fe9cbee73f44c8e833c62256eaede2d53c95730ab5aa4a"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl",
"has_sig": false,
"md5_digest": "bd576457a23bf199ae6c4770e164b26a",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 6466175,
"upload_time": "2024-12-31T03:22:57",
"upload_time_iso_8601": "2024-12-31T03:22:57.694971Z",
"url": "https://files.pythonhosted.org/packages/9e/d7/04daeedc1103be7ac30fd6ba4c522a4bcc4c65447f4f961f34f115245811/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "94ad56ec5de96386f650f37e0298d1569e4a31a7905c49b4f54f82e72ccca5cf",
"md5": "2e3e912e14f4e8c88dadf261abe6c52c",
"sha256": "b4e72daecf028f8842d77ea6d84eac0cd537badc5a8a6433594187f23f15e406"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl",
"has_sig": false,
"md5_digest": "2e3e912e14f4e8c88dadf261abe6c52c",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5793425,
"upload_time": "2024-12-31T03:23:00",
"upload_time_iso_8601": "2024-12-31T03:23:00.717608Z",
"url": "https://files.pythonhosted.org/packages/94/ad/56ec5de96386f650f37e0298d1569e4a31a7905c49b4f54f82e72ccca5cf/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3d392e99494c1a7c3e716c25a5b89787e23a6bb2af681bca324301cda66e637d",
"md5": "d638db4436442136108fd8acdf152d11",
"sha256": "8d560cca8ef61e434cf6bc5506720ad41bb618c835a3746719116e84a720dddb"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "d638db4436442136108fd8acdf152d11",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4989487,
"upload_time": "2024-12-31T03:23:03",
"upload_time_iso_8601": "2024-12-31T03:23:03.931171Z",
"url": "https://files.pythonhosted.org/packages/3d/39/2e99494c1a7c3e716c25a5b89787e23a6bb2af681bca324301cda66e637d/sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ef0405e62dc983f84c61f0ea5f6983b75355773cc8b0777a33081bbdc4d095f5",
"md5": "b69572819be1ba3d495aed6a87379843",
"sha256": "61fbdde2a5dcd1be4ce98e420735faad67cae40f4446806ec34ad8b59c71e309"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "b69572819be1ba3d495aed6a87379843",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4850735,
"upload_time": "2024-12-31T03:23:10",
"upload_time_iso_8601": "2024-12-31T03:23:10.314240Z",
"url": "https://files.pythonhosted.org/packages/ef/04/05e62dc983f84c61f0ea5f6983b75355773cc8b0777a33081bbdc4d095f5/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3c0b2b4f15fe60c084007bbb241753cf1daa88ad8feb4b8e3c98d07320c7c374",
"md5": "9ca718ff4adfff9cd0b72d48d1946495",
"sha256": "873f6d19d85d66e7deecb22a5996e423453b1a82973c746e140b884cc969340d"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl",
"has_sig": false,
"md5_digest": "9ca718ff4adfff9cd0b72d48d1946495",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5152611,
"upload_time": "2024-12-31T03:23:14",
"upload_time_iso_8601": "2024-12-31T03:23:14.029115Z",
"url": "https://files.pythonhosted.org/packages/3c/0b/2b4f15fe60c084007bbb241753cf1daa88ad8feb4b8e3c98d07320c7c374/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "237f3485bab6df9c0c34cb33c154f2b43a7697d390f716a433c657b8209a196c",
"md5": "24fc22ab940ad9624aa5600ba0d1e488",
"sha256": "0961fa3d017249a4f30546096185ca9f9ccc7a8c2027751838e7685e0e853a1c"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "24fc22ab940ad9624aa5600ba0d1e488",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5271895,
"upload_time": "2024-12-31T03:23:17",
"upload_time_iso_8601": "2024-12-31T03:23:17.301056Z",
"url": "https://files.pythonhosted.org/packages/23/7f/3485bab6df9c0c34cb33c154f2b43a7697d390f716a433c657b8209a196c/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "beecd526b7296f942964dea1ad06912fb205633aa4b31cf327900a3477b71edb",
"md5": "64fdb4f3bda86bf317201c30c17b5634",
"sha256": "87a1cc5ddc90c5e1702cbfc0d3f70a0313e1482105bccc8d51585fd9dd67de81"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "64fdb4f3bda86bf317201c30c17b5634",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 5148761,
"upload_time": "2024-12-31T03:23:20",
"upload_time_iso_8601": "2024-12-31T03:23:20.629892Z",
"url": "https://files.pythonhosted.org/packages/be/ec/d526b7296f942964dea1ad06912fb205633aa4b31cf327900a3477b71edb/sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8ac77a0b8e91614b647e9b7792803e77f4a9b53a89d9cafc6e02de0018728f57",
"md5": "255869d3f11eaa51cd6bb96d4883d038",
"sha256": "bca05337fca332d04a2b381068f004dae3158fa2f2ecb870738b2b2cd6b8266b"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3-cp38-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "255869d3f11eaa51cd6bb96d4883d038",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4511048,
"upload_time": "2024-12-31T03:23:23",
"upload_time_iso_8601": "2024-12-31T03:23:23.386291Z",
"url": "https://files.pythonhosted.org/packages/8a/c7/7a0b8e91614b647e9b7792803e77f4a9b53a89d9cafc6e02de0018728f57/sql2arrow-0.1.3-cp38-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7f64644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e",
"md5": "d8ea3c5a52d6212939aafe2ca43586e5",
"sha256": "8887a606fc3e3548312e64a5e5b30036dd8a311edeb8634aca6c8f6026153b37"
},
"downloads": -1,
"filename": "sql2arrow-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "d8ea3c5a52d6212939aafe2ca43586e5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 44990,
"upload_time": "2024-12-31T03:23:22",
"upload_time_iso_8601": "2024-12-31T03:23:22.252215Z",
"url": "https://files.pythonhosted.org/packages/7f/64/644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e/sql2arrow-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-31 03:23:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "goalzz85",
"github_project": "sql2arrow",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sql2arrow"
}