pydantic-glue


Namepydantic-glue JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://github.com/svdimchenko/pydantic-glue
SummaryConvert pydantic model to aws glue schema for terraform
upload_time2024-05-08 19:23:43
maintainerNone
docs_urlNone
authorSerhii Dimchenko
requires_python<4.0,>=3.9
licenseMIT
keywords pydantic glue athena types convert
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # JSON Schema to AWS Glue schema converter

## Installation

```bash
pip install pydantic-glue
```

## What?

Converts `pydantic` schemas to `json schema` and then to `AWS glue schema`,
so in theory anything that can be converted to JSON Schema *could* also work.

## Why?

When using `AWS Kinesis Firehose` in a configuration that receives JSONs and writes `parquet` files on S3,
one needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.

AWS Glue lets you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema,
but as of *May 2022`
there are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.

<https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform>

This is also confirmed by AWS support.

What one could do is create a table set the columns manually,
but this means you now have two sources of truth to maintain.

This tool allows you to define a table in `pydantic`
and generate a JSON with column types that can be used with `terraform` to create a Glue table.

## Example

Take the following pydantic class

```python title="example.py"
from pydantic import BaseModel
from typing import List


class Bar(BaseModel):
    name: str
    age: int


class Foo(BaseModel):
    nums: List[int]
    bars: List[Bar]
    other: str

```

Running `pydantic-glue`

```bash
pydantic-glue -f example.py -c Foo
```

you get this JSON in the terminal:

```json
{
  "//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
  "columns": {
    "nums": "array<int>",
    "bars": "array<struct<name:string,age:int>>",
    "other": "string"
  }
}
```

and can be used in terraform like that

```terraform
locals {
  columns = jsondecode(file("${path.module}/glue_schema.json")).columns
}

resource "aws_glue_catalog_table" "table" {
  name          = "table_name"
  database_name = "db_name"

  storage_descriptor {
    dynamic "columns" {
      for_each = local.columns

      content {
        name = columns.key
        type = columns.value
      }
    }
  }
}
```

Alternatively you can run CLI with `-o` flag to set output file location:

```bash
pydantic-glue -f example.py -c Foo -o example.json -l
```

## How it works?

* `pydantic` gets converted to JSON Schema
* the JSON Schema types get mapped to Glue types recursively

## Future work

* Not all types are supported, I just add types as I need them, but adding types is very easy,
  feel free to open issues or send a PR if you stumbled upon a non-supported use case
* the tool could be easily extended to working with JSON Schema directly
* thus, anything that can be converted to a JSON Schema should also work.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/svdimchenko/pydantic-glue",
    "name": "pydantic-glue",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "pydantic, glue, athena, types, convert",
    "author": "Serhii Dimchenko",
    "author_email": "svdimchenko@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b4/42/28f8625d2af609dd04f3ad613b436878f07baf09852be989ca59ae8005d4/pydantic_glue-0.4.0.tar.gz",
    "platform": null,
    "description": "# JSON Schema to AWS Glue schema converter\n\n## Installation\n\n```bash\npip install pydantic-glue\n```\n\n## What?\n\nConverts `pydantic` schemas to `json schema` and then to `AWS glue schema`,\nso in theory anything that can be converted to JSON Schema *could* also work.\n\n## Why?\n\nWhen using `AWS Kinesis Firehose` in a configuration that receives JSONs and writes `parquet` files on S3,\none needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.\n\nAWS Glue lets you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema,\nbut as of *May 2022`\nthere are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.\n\n<https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform>\n\nThis is also confirmed by AWS support.\n\nWhat one could do is create a table set the columns manually,\nbut this means you now have two sources of truth to maintain.\n\nThis tool allows you to define a table in `pydantic`\nand generate a JSON with column types that can be used with `terraform` to create a Glue table.\n\n## Example\n\nTake the following pydantic class\n\n```python title=\"example.py\"\nfrom pydantic import BaseModel\nfrom typing import List\n\n\nclass Bar(BaseModel):\n    name: str\n    age: int\n\n\nclass Foo(BaseModel):\n    nums: List[int]\n    bars: List[Bar]\n    other: str\n\n```\n\nRunning `pydantic-glue`\n\n```bash\npydantic-glue -f example.py -c Foo\n```\n\nyou get this JSON in the terminal:\n\n```json\n{\n  \"//\": \"Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT\",\n  \"columns\": {\n    \"nums\": \"array<int>\",\n    \"bars\": \"array<struct<name:string,age:int>>\",\n    \"other\": \"string\"\n  }\n}\n```\n\nand can be used in terraform like that\n\n```terraform\nlocals {\n  columns = jsondecode(file(\"${path.module}/glue_schema.json\")).columns\n}\n\nresource \"aws_glue_catalog_table\" \"table\" {\n  name          = \"table_name\"\n  database_name = \"db_name\"\n\n  storage_descriptor {\n    dynamic \"columns\" {\n      for_each = local.columns\n\n      content {\n        name = columns.key\n        type = columns.value\n      }\n    }\n  }\n}\n```\n\nAlternatively you can run CLI with `-o` flag to set output file location:\n\n```bash\npydantic-glue -f example.py -c Foo -o example.json -l\n```\n\n## How it works?\n\n* `pydantic` gets converted to JSON Schema\n* the JSON Schema types get mapped to Glue types recursively\n\n## Future work\n\n* Not all types are supported, I just add types as I need them, but adding types is very easy,\n  feel free to open issues or send a PR if you stumbled upon a non-supported use case\n* the tool could be easily extended to working with JSON Schema directly\n* thus, anything that can be converted to a JSON Schema should also work.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Convert pydantic model to aws glue schema for terraform",
    "version": "0.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/svdimchenko/pydantic-glue/issues",
        "Homepage": "https://github.com/svdimchenko/pydantic-glue",
        "Releases": "https://github.com/svdimchenko/pydantic-glue/releases",
        "Repository": "https://github.com/svdimchenko/pydantic-glue"
    },
    "split_keywords": [
        "pydantic",
        " glue",
        " athena",
        " types",
        " convert"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e74ac9b6a8fc2051fb9b865346332ce002bad3f07bfb0d6d3032c0b894e29f1e",
                "md5": "bafab3a205f7809e105795bfee4a772d",
                "sha256": "a992ee659d005fb363580fd3b9e4aed452830e9af18b1e8f5106591c98e2e201"
            },
            "downloads": -1,
            "filename": "pydantic_glue-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bafab3a205f7809e105795bfee4a772d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 5351,
            "upload_time": "2024-05-08T19:23:42",
            "upload_time_iso_8601": "2024-05-08T19:23:42.889472Z",
            "url": "https://files.pythonhosted.org/packages/e7/4a/c9b6a8fc2051fb9b865346332ce002bad3f07bfb0d6d3032c0b894e29f1e/pydantic_glue-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b44228f8625d2af609dd04f3ad613b436878f07baf09852be989ca59ae8005d4",
                "md5": "743409afa5f9c06ae3589c75d14d4c81",
                "sha256": "fc27ba2e59551dd869c4498b3b2621c3df6fb94de43ef04b3adf8fc3be1b6c21"
            },
            "downloads": -1,
            "filename": "pydantic_glue-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "743409afa5f9c06ae3589c75d14d4c81",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 4643,
            "upload_time": "2024-05-08T19:23:43",
            "upload_time_iso_8601": "2024-05-08T19:23:43.852485Z",
            "url": "https://files.pythonhosted.org/packages/b4/42/28f8625d2af609dd04f3ad613b436878f07baf09852be989ca59ae8005d4/pydantic_glue-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-08 19:23:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "svdimchenko",
    "github_project": "pydantic-glue",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pydantic-glue"
}
        
Elapsed time: 0.27477s