pydantic-glue


Namepydantic-glue JSON
Version 0.6.0 PyPI version JSON
download
home_pagehttps://github.com/svdimchenko/pydantic-glue
SummaryConvert pydantic model to aws glue schema for terraform
upload_time2024-11-15 12:21:58
maintainerNone
docs_urlNone
authorSerhii Dimchenko
requires_python<4.0,>=3.9
licenseMIT
keywords pydantic glue athena types convert
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # JSON Schema to AWS Glue schema converter

## Installation

```bash
pip install pydantic-glue
```

## What?

Converts `pydantic` schemas to `json schema` and then to `AWS glue schema`,
so in theory anything that can be converted to JSON Schema *could* also work.

## Why?

When using `AWS Kinesis Firehose` in a configuration that receives JSONs and writes `parquet` files on S3,
one needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.

AWS Glue lets you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema,
but as of *May 2022*
there are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.

<https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform>

This is also confirmed by AWS support.

What one could do is create a table set the columns manually,
but this means you now have two sources of truth to maintain.

This tool allows you to define a table in `pydantic`
and generate a JSON with column types that can be used with `terraform` to create a Glue table.

## Example

Take the following pydantic class

```python title="example.py"
from pydantic import BaseModel
from typing import List


class Bar(BaseModel):
    name: str
    age: int


class Foo(BaseModel):
    nums: List[int]
    bars: List[Bar]
    other: str

```

Running `pydantic-glue`

```bash
pydantic-glue -f example.py -c Foo
```

you get this JSON in the terminal:

```json
{
  "//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
  "columns": {
    "nums": "array<int>",
    "bars": "array<struct<name:string,age:int>>",
    "other": "string"
  }
}
```

and can be used in terraform like that

```terraform
locals {
  columns = jsondecode(file("${path.module}/glue_schema.json")).columns
}

resource "aws_glue_catalog_table" "table" {
  name          = "table_name"
  database_name = "db_name"

  storage_descriptor {
    dynamic "columns" {
      for_each = local.columns

      content {
        name = columns.key
        type = columns.value
      }
    }
  }
}
```

Alternatively you can run CLI with `-o` flag to set output file location:

```bash
pydantic-glue -f example.py -c Foo -o example.json -l
```

## Override the type for the AWS Glue Schema

Wherever there is a `type` key in the input JSON Schema, an additional key `glue_type` may be
defined to override the type that is used in the AWS Glue Schema. This is, for example, useful for
a pydantic model that has a field of type `int` that is unix epoch time, while the column type you
would like in Glue is of type `timestamp`.

Additional JSON Schema keys to a pydantic model can be added by using the
[`Field` function](https://docs.pydantic.dev/latest/api/fields/#pydantic.fields.Field)
with the argument `json_schema_extra` like so:

```python
from pydantic import BaseModel, Field

class A(BaseModel):
    epoch_time: int = Field(
        ...,
        json_schema_extra={
            "glue_type": "timestamp",
        },
    )
```

The resulting JSON Schema will be:

```json
{
    "properties": {
        "epoch_time": {
            "glue_type": "timestamp",
            "title": "Epoch Time",
            "type": "integer"
        }
    },
    "required": [
        "epoch_time"
    ],
    "title": "A",
    "type": "object"
}
```

And the result after processing with pydantic-glue:

```json
{
  "//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
  "columns": {
    "epoch_time": "timestamp",
  }
}
```

Recursing through object properties terminates when you supply a `glue_type` to use. If the type is
complex, you must supply the full complex type yourself.

## How it works?

* `pydantic` gets converted to JSON Schema
* the JSON Schema types get mapped to Glue types recursively

## Future work

* Not all types are supported, I just add types as I need them, but adding types is very easy,
  feel free to open issues or send a PR if you stumbled upon a non-supported use case
* the tool could be easily extended to working with JSON Schema directly
* thus, anything that can be converted to a JSON Schema should also work.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/svdimchenko/pydantic-glue",
    "name": "pydantic-glue",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "pydantic, glue, athena, types, convert",
    "author": "Serhii Dimchenko",
    "author_email": "svdimchenko@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/77/23/b219e56aba861232b712c82fdd8def4a347daae59fa6cffd8e1c42acc9af/pydantic_glue-0.6.0.tar.gz",
    "platform": null,
    "description": "# JSON Schema to AWS Glue schema converter\n\n## Installation\n\n```bash\npip install pydantic-glue\n```\n\n## What?\n\nConverts `pydantic` schemas to `json schema` and then to `AWS glue schema`,\nso in theory anything that can be converted to JSON Schema *could* also work.\n\n## Why?\n\nWhen using `AWS Kinesis Firehose` in a configuration that receives JSONs and writes `parquet` files on S3,\none needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.\n\nAWS Glue lets you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema,\nbut as of *May 2022*\nthere are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.\n\n<https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform>\n\nThis is also confirmed by AWS support.\n\nWhat one could do is create a table set the columns manually,\nbut this means you now have two sources of truth to maintain.\n\nThis tool allows you to define a table in `pydantic`\nand generate a JSON with column types that can be used with `terraform` to create a Glue table.\n\n## Example\n\nTake the following pydantic class\n\n```python title=\"example.py\"\nfrom pydantic import BaseModel\nfrom typing import List\n\n\nclass Bar(BaseModel):\n    name: str\n    age: int\n\n\nclass Foo(BaseModel):\n    nums: List[int]\n    bars: List[Bar]\n    other: str\n\n```\n\nRunning `pydantic-glue`\n\n```bash\npydantic-glue -f example.py -c Foo\n```\n\nyou get this JSON in the terminal:\n\n```json\n{\n  \"//\": \"Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT\",\n  \"columns\": {\n    \"nums\": \"array<int>\",\n    \"bars\": \"array<struct<name:string,age:int>>\",\n    \"other\": \"string\"\n  }\n}\n```\n\nand can be used in terraform like that\n\n```terraform\nlocals {\n  columns = jsondecode(file(\"${path.module}/glue_schema.json\")).columns\n}\n\nresource \"aws_glue_catalog_table\" \"table\" {\n  name          = \"table_name\"\n  database_name = \"db_name\"\n\n  storage_descriptor {\n    dynamic \"columns\" {\n      for_each = local.columns\n\n      content {\n        name = columns.key\n        type = columns.value\n      }\n    }\n  }\n}\n```\n\nAlternatively you can run CLI with `-o` flag to set output file location:\n\n```bash\npydantic-glue -f example.py -c Foo -o example.json -l\n```\n\n## Override the type for the AWS Glue Schema\n\nWherever there is a `type` key in the input JSON Schema, an additional key `glue_type` may be\ndefined to override the type that is used in the AWS Glue Schema. This is, for example, useful for\na pydantic model that has a field of type `int` that is unix epoch time, while the column type you\nwould like in Glue is of type `timestamp`.\n\nAdditional JSON Schema keys to a pydantic model can be added by using the\n[`Field` function](https://docs.pydantic.dev/latest/api/fields/#pydantic.fields.Field)\nwith the argument `json_schema_extra` like so:\n\n```python\nfrom pydantic import BaseModel, Field\n\nclass A(BaseModel):\n    epoch_time: int = Field(\n        ...,\n        json_schema_extra={\n            \"glue_type\": \"timestamp\",\n        },\n    )\n```\n\nThe resulting JSON Schema will be:\n\n```json\n{\n    \"properties\": {\n        \"epoch_time\": {\n            \"glue_type\": \"timestamp\",\n            \"title\": \"Epoch Time\",\n            \"type\": \"integer\"\n        }\n    },\n    \"required\": [\n        \"epoch_time\"\n    ],\n    \"title\": \"A\",\n    \"type\": \"object\"\n}\n```\n\nAnd the result after processing with pydantic-glue:\n\n```json\n{\n  \"//\": \"Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT\",\n  \"columns\": {\n    \"epoch_time\": \"timestamp\",\n  }\n}\n```\n\nRecursing through object properties terminates when you supply a `glue_type` to use. If the type is\ncomplex, you must supply the full complex type yourself.\n\n## How it works?\n\n* `pydantic` gets converted to JSON Schema\n* the JSON Schema types get mapped to Glue types recursively\n\n## Future work\n\n* Not all types are supported, I just add types as I need them, but adding types is very easy,\n  feel free to open issues or send a PR if you stumbled upon a non-supported use case\n* the tool could be easily extended to working with JSON Schema directly\n* thus, anything that can be converted to a JSON Schema should also work.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Convert pydantic model to aws glue schema for terraform",
    "version": "0.6.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/svdimchenko/pydantic-glue/issues",
        "Homepage": "https://github.com/svdimchenko/pydantic-glue",
        "Releases": "https://github.com/svdimchenko/pydantic-glue/releases",
        "Repository": "https://github.com/svdimchenko/pydantic-glue"
    },
    "split_keywords": [
        "pydantic",
        " glue",
        " athena",
        " types",
        " convert"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "697d2b20b773c1c9432b832541ddb30210017743ba00bf6babd681a0c71aa9fe",
                "md5": "37342327e9f06512bddc26228b6d2a64",
                "sha256": "624e38043e50e0a417efb02de58258434d37fc7d86dae602b46fcad543645071"
            },
            "downloads": -1,
            "filename": "pydantic_glue-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "37342327e9f06512bddc26228b6d2a64",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 6025,
            "upload_time": "2024-11-15T12:21:56",
            "upload_time_iso_8601": "2024-11-15T12:21:56.997155Z",
            "url": "https://files.pythonhosted.org/packages/69/7d/2b20b773c1c9432b832541ddb30210017743ba00bf6babd681a0c71aa9fe/pydantic_glue-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7723b219e56aba861232b712c82fdd8def4a347daae59fa6cffd8e1c42acc9af",
                "md5": "4e859eb9b0be34351699284c1f6580f7",
                "sha256": "190d42a073f5666791df760a40f22df3a7e0daf6970fb6f30661825d89cdcf61"
            },
            "downloads": -1,
            "filename": "pydantic_glue-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4e859eb9b0be34351699284c1f6580f7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 5290,
            "upload_time": "2024-11-15T12:21:58",
            "upload_time_iso_8601": "2024-11-15T12:21:58.071618Z",
            "url": "https://files.pythonhosted.org/packages/77/23/b219e56aba861232b712c82fdd8def4a347daae59fa6cffd8e1c42acc9af/pydantic_glue-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-15 12:21:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "svdimchenko",
    "github_project": "pydantic-glue",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pydantic-glue"
}
        
Elapsed time: 0.35602s