arrowdantic


Namearrowdantic JSON
Version 0.2.3 PyPI version JSON
download
home_pagehttps://github.com/jorgecarleitao/arrowdantic
SummaryArrow, pydantic style
upload_time2022-12-07 06:31:07
maintainer
docs_urlNone
authorJorge C. Leitao <jorgecarleitao@gmail.com>
requires_python
licenseApache-2.0
keywords analytics arrow odbc parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Welcome to arrowdantic

Arrowdantic is a small Python library backed by a
[mature Rust implementation](https://github.com/jorgecarleitao/arrow2) of Apache Arrow
that can interoperate with
* [Parquet](https://parquet.apache.org/)
* [Apache Arrow](https://arrow.apache.org/) and 
* [ODBC](https://en.wikipedia.org/wiki/Open_Database_Connectivity) (databases).

For simple (but data-heavy) data engineering tasks, this package essentially replaces
`pyarrow`: it supports reading from and writing to Parquet, Arrow at the same or
higher performance and higher safety (e.g. no segfaults).

Furthermore, it supports reading from and writing to ODBC compliant databases at
the same or higher performance than [`turbodbc`](https://turbodbc.readthedocs.io/en/latest/).

This package is particularly suitable for environments such as AWS Lambda -
it takes 8M of disk space, compared to 82M taken by pyarrow.

## Features

* declare and access Arrow-backed arrays (integers, floats, boolean, string, binary)
* read from and write to Apache Arrow IPC file
* read from and write to Apache Parquet
* read from and write to ODBC-compliant databases (e.g. postgres, mongoDB)

## Examples

### Use parquet

```python
import io
import arrowdantic as ad

original_arrays = [ad.UInt32Array([1, None])]

schema = ad.Schema(
    [ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)

data = io.BytesIO()
with ad.ParquetFileWriter(data, schema) as writer:
    writer.write(ad.Chunk(original_arrays))
data.seek(0)

reader = ad.ParquetFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays
```

### Use Arrow files

```python
import arrowdantic as ad

original_arrays = [ad.UInt32Array([1, None])]

schema = ad.Schema(
    [ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)

import io

data = io.BytesIO()
with ad.ArrowFileWriter(data, schema) as writer:
    writer.write(ad.Chunk(original_arrays))
data.seek(0)

reader = ad.ArrowFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays
```

### Use ODBC

```python
import arrowdantic as ad


arrays = [ad.Int32Array([1, None]), ad.StringArray(["aa", None])]

with ad.ODBCConnector(r"Driver={SQLite3};Database=sqlite-test.db") as con:
    # create an empty table with a schema
    con.execute("DROP TABLE IF EXISTS example;")
    con.execute("CREATE TABLE example (c1 INT, c2 TEXT);")

    # insert the arrays
    con.write("INSERT INTO example (c1, c2) VALUES (?, ?)", ad.Chunk(arrays))

    # read the arrays
    with con.execute("SELECT c1, c2 FROM example", 1024) as chunks:
        assert chunks.fields() == [
            ad.Field("c1", ad.DataType.int32(), True),
            ad.Field("c2", ad.DataType.string(), True),
        ]
        chunk = next(chunks)
assert chunk.arrays() == arrays
```

### Use timezones

This package fully supports datetime and conversions between them and arrow:

```python
import arrowdantic as ad


dt = datetime.datetime(
    year=2021,
    month=1,
    day=1,
    hour=1,
    minute=1,
    second=1,
    microsecond=1,
    tzinfo=datetime.timezone.utc,
)
a = ad.TimestampArray([dt, None])
assert (
    str(a)
    == 'Timestamp(Microsecond, Some("+00:00"))[2021-01-01 01:01:01.000001 +00:00, None]'
)
assert list(a) == [dt, None]
assert a.type == ad.DataType.timestamp(datetime.timezone.utc)
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jorgecarleitao/arrowdantic",
    "name": "arrowdantic",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "analytics,arrow,ODBC,parquet",
    "author": "Jorge C. Leitao <jorgecarleitao@gmail.com>",
    "author_email": "Jorge C. Leitao <jorgecarleitao@gmail.com>",
    "download_url": "",
    "platform": null,
    "description": "# Welcome to arrowdantic\n\nArrowdantic is a small Python library backed by a\n[mature Rust implementation](https://github.com/jorgecarleitao/arrow2) of Apache Arrow\nthat can interoperate with\n* [Parquet](https://parquet.apache.org/)\n* [Apache Arrow](https://arrow.apache.org/) and \n* [ODBC](https://en.wikipedia.org/wiki/Open_Database_Connectivity) (databases).\n\nFor simple (but data-heavy) data engineering tasks, this package essentially replaces\n`pyarrow`: it supports reading from and writing to Parquet, Arrow at the same or\nhigher performance and higher safety (e.g. no segfaults).\n\nFurthermore, it supports reading from and writing to ODBC compliant databases at\nthe same or higher performance than [`turbodbc`](https://turbodbc.readthedocs.io/en/latest/).\n\nThis package is particularly suitable for environments such as AWS Lambda -\nit takes 8M of disk space, compared to 82M taken by pyarrow.\n\n## Features\n\n* declare and access Arrow-backed arrays (integers, floats, boolean, string, binary)\n* read from and write to Apache Arrow IPC file\n* read from and write to Apache Parquet\n* read from and write to ODBC-compliant databases (e.g. postgres, mongoDB)\n\n## Examples\n\n### Use parquet\n\n```python\nimport io\nimport arrowdantic as ad\n\noriginal_arrays = [ad.UInt32Array([1, None])]\n\nschema = ad.Schema(\n    [ad.Field(f\"c{i}\", array.type, True) for i, array in enumerate(original_arrays)]\n)\n\ndata = io.BytesIO()\nwith ad.ParquetFileWriter(data, schema) as writer:\n    writer.write(ad.Chunk(original_arrays))\ndata.seek(0)\n\nreader = ad.ParquetFileReader(data)\nchunk = next(reader)\nassert chunk.arrays() == original_arrays\n```\n\n### Use Arrow files\n\n```python\nimport arrowdantic as ad\n\noriginal_arrays = [ad.UInt32Array([1, None])]\n\nschema = ad.Schema(\n    [ad.Field(f\"c{i}\", array.type, True) for i, array in enumerate(original_arrays)]\n)\n\nimport io\n\ndata = io.BytesIO()\nwith ad.ArrowFileWriter(data, schema) as writer:\n    writer.write(ad.Chunk(original_arrays))\ndata.seek(0)\n\nreader = ad.ArrowFileReader(data)\nchunk = next(reader)\nassert chunk.arrays() == original_arrays\n```\n\n### Use ODBC\n\n```python\nimport arrowdantic as ad\n\n\narrays = [ad.Int32Array([1, None]), ad.StringArray([\"aa\", None])]\n\nwith ad.ODBCConnector(r\"Driver={SQLite3};Database=sqlite-test.db\") as con:\n    # create an empty table with a schema\n    con.execute(\"DROP TABLE IF EXISTS example;\")\n    con.execute(\"CREATE TABLE example (c1 INT, c2 TEXT);\")\n\n    # insert the arrays\n    con.write(\"INSERT INTO example (c1, c2) VALUES (?, ?)\", ad.Chunk(arrays))\n\n    # read the arrays\n    with con.execute(\"SELECT c1, c2 FROM example\", 1024) as chunks:\n        assert chunks.fields() == [\n            ad.Field(\"c1\", ad.DataType.int32(), True),\n            ad.Field(\"c2\", ad.DataType.string(), True),\n        ]\n        chunk = next(chunks)\nassert chunk.arrays() == arrays\n```\n\n### Use timezones\n\nThis package fully supports datetime and conversions between them and arrow:\n\n```python\nimport arrowdantic as ad\n\n\ndt = datetime.datetime(\n    year=2021,\n    month=1,\n    day=1,\n    hour=1,\n    minute=1,\n    second=1,\n    microsecond=1,\n    tzinfo=datetime.timezone.utc,\n)\na = ad.TimestampArray([dt, None])\nassert (\n    str(a)\n    == 'Timestamp(Microsecond, Some(\"+00:00\"))[2021-01-01 01:01:01.000001 +00:00, None]'\n)\nassert list(a) == [dt, None]\nassert a.type == ad.DataType.timestamp(datetime.timezone.utc)\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Arrow, pydantic style",
    "version": "0.2.3",
    "split_keywords": [
        "analytics",
        "arrow",
        "odbc",
        "parquet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "5fefdd79a8b1fb1ee20183ea494e3218",
                "sha256": "f211b8bd5262d5bd8be098f8a2ca43b7fcdd17cedb810e2f3cf1a2773bf89273"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp310-cp310-macosx_10_7_x86_64.whl",
            "has_sig": false,
            "md5_digest": "5fefdd79a8b1fb1ee20183ea494e3218",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 2984437,
            "upload_time": "2022-12-07T06:31:07",
            "upload_time_iso_8601": "2022-12-07T06:31:07.483478Z",
            "url": "https://files.pythonhosted.org/packages/fd/b7/301ec72c4f2f9d1180b9d2d4a40a06b31ef4e90b7e175725a9ff3df74d4d/arrowdantic-0.2.3-cp310-cp310-macosx_10_7_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "c0f9c9bec2202c6d26f06fa4ea786d05",
                "sha256": "21cba30e43a119d472c1e2e84332caf24540b8eeddc49188184cbb42b34090a1"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp310-none-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "c0f9c9bec2202c6d26f06fa4ea786d05",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 2720295,
            "upload_time": "2022-12-07T06:31:09",
            "upload_time_iso_8601": "2022-12-07T06:31:09.481771Z",
            "url": "https://files.pythonhosted.org/packages/2d/3b/5435a5d590ebbad6d03aa10e43f0ddf80fc6aa176c023e8992f09d72df1c/arrowdantic-0.2.3-cp310-none-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "1c2f4c82626f5b997160b2a1e9fd7f42",
                "sha256": "06fe232564ad73fb09b3a06461e3e64b12304eac0e2fa7de5562852d34ea8db5"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp37-cp37m-macosx_10_7_x86_64.whl",
            "has_sig": false,
            "md5_digest": "1c2f4c82626f5b997160b2a1e9fd7f42",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": null,
            "size": 2984290,
            "upload_time": "2022-12-07T06:31:11",
            "upload_time_iso_8601": "2022-12-07T06:31:11.296442Z",
            "url": "https://files.pythonhosted.org/packages/64/a9/b3392c796e38b20823d9cb0998c3937549bfcafbffcfc93adfdc6423173b/arrowdantic-0.2.3-cp37-cp37m-macosx_10_7_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "7c1479f8aeab8b122b92b0ecccd3fb01",
                "sha256": "12644b44e6e29d4ef835cc717d2f9e6bf7d645b12888cf4b9420eb2794099847"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7c1479f8aeab8b122b92b0ecccd3fb01",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": null,
            "size": 3174998,
            "upload_time": "2022-12-07T06:31:13",
            "upload_time_iso_8601": "2022-12-07T06:31:13.168520Z",
            "url": "https://files.pythonhosted.org/packages/36/12/c31c047833de18fc2f2f7bfdf2ce52217ef0df68c72c5e140235f9949261/arrowdantic-0.2.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "93bb53a6da18d5793ab96f808203a075",
                "sha256": "f5167d8d7910fe1965c383a9b0d8f9d0e8d30f58c4f955cdd4b6cc281a763ef0"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp37-none-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "93bb53a6da18d5793ab96f808203a075",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": null,
            "size": 2720533,
            "upload_time": "2022-12-07T06:31:14",
            "upload_time_iso_8601": "2022-12-07T06:31:14.937122Z",
            "url": "https://files.pythonhosted.org/packages/dc/8c/7837e881c665053a7dd7dd29594ce8ef52ce85ee686c8ab5a4e6e78b6178/arrowdantic-0.2.3-cp37-none-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "79b93073a60a2b0f212955bb121b1966",
                "sha256": "eda287dac0104e60bb4b246b14f8f215756ef3a8ec4007f2a0919b2157ecf755"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp38-cp38-macosx_10_7_x86_64.whl",
            "has_sig": false,
            "md5_digest": "79b93073a60a2b0f212955bb121b1966",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": null,
            "size": 2984324,
            "upload_time": "2022-12-07T06:31:16",
            "upload_time_iso_8601": "2022-12-07T06:31:16.631162Z",
            "url": "https://files.pythonhosted.org/packages/f8/f2/77bb5ceee722fe26ed7ae1bf75c83c6dbaa5e0135f50ba3aeeae27c71e00/arrowdantic-0.2.3-cp38-cp38-macosx_10_7_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "6d34adc966c9b9d46448ee8b2e008d69",
                "sha256": "5b28eaf9f1591662a46751123bc79c0f05dd810f8c5b70720ecc63a308a49283"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp38-none-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "6d34adc966c9b9d46448ee8b2e008d69",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": null,
            "size": 2720129,
            "upload_time": "2022-12-07T06:31:18",
            "upload_time_iso_8601": "2022-12-07T06:31:18.594341Z",
            "url": "https://files.pythonhosted.org/packages/f3/22/c23300f7040e5b589232763aaa71572165e15700aadab6b1c6ebdb7bdb06/arrowdantic-0.2.3-cp38-none-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "4a8bfa638e0dba3bfec6b5ea677f5d75",
                "sha256": "711dbcec7cd2bf2b727b4369f4b34270d7a0707d8da96f629600de7a677ae944"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp39-cp39-macosx_10_7_x86_64.whl",
            "has_sig": false,
            "md5_digest": "4a8bfa638e0dba3bfec6b5ea677f5d75",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 2984671,
            "upload_time": "2022-12-07T06:31:20",
            "upload_time_iso_8601": "2022-12-07T06:31:20.227561Z",
            "url": "https://files.pythonhosted.org/packages/c5/a4/38f1d055cd306da2fa2da6276aca4497d8a5f738a193dd0f5b4bf000be82/arrowdantic-0.2.3-cp39-cp39-macosx_10_7_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "663bef6b30ff3f4e871f8beb0317d067",
                "sha256": "25346892df70a76d3ca0ee58e91afc63d3568bdb4f70407cd3ba2c3c8ec9a40a"
            },
            "downloads": -1,
            "filename": "arrowdantic-0.2.3-cp39-none-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "663bef6b30ff3f4e871f8beb0317d067",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 2720225,
            "upload_time": "2022-12-07T06:31:21",
            "upload_time_iso_8601": "2022-12-07T06:31:21.772479Z",
            "url": "https://files.pythonhosted.org/packages/5d/ea/3fdc0c5acb9cca8aba5d013e6a4baa95dc75a86bcbccaa591f4af93e1936/arrowdantic-0.2.3-cp39-none-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-07 06:31:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "jorgecarleitao",
    "github_project": "arrowdantic",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "arrowdantic"
}
        
Elapsed time: 0.01740s