# JSON Schema to AWS Glue schema converter
## Installation
```bash
pip install pydantic-glue
```
## What?
Converts `pydantic` schemas to `json schema` and then to `AWS glue schema`,
so in theory anything that can be converted to JSON Schema *could* also work.
## Why?
When using `AWS Kinesis Firehose` in a configuration that receives JSONs and writes `parquet` files on S3,
one needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.
AWS Glue lets you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema,
but as of *May 2022`
there are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.
<https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform>
This is also confirmed by AWS support.
What one could do is create a table set the columns manually,
but this means you now have two sources of truth to maintain.
This tool allows you to define a table in `pydantic`
and generate a JSON with column types that can be used with `terraform` to create a Glue table.
## Example
Take the following pydantic class
```python title="example.py"
from pydantic import BaseModel
from typing import List
class Bar(BaseModel):
name: str
age: int
class Foo(BaseModel):
nums: List[int]
bars: List[Bar]
other: str
```
Running `pydantic-glue`
```bash
pydantic-glue -f example.py -c Foo
```
you get this JSON in the terminal:
```json
{
"//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
"columns": {
"nums": "array<int>",
"bars": "array<struct<name:string,age:int>>",
"other": "string"
}
}
```
and can be used in terraform like that
```terraform
locals {
columns = jsondecode(file("${path.module}/glue_schema.json")).columns
}
resource "aws_glue_catalog_table" "table" {
name = "table_name"
database_name = "db_name"
storage_descriptor {
dynamic "columns" {
for_each = local.columns
content {
name = columns.key
type = columns.value
}
}
}
}
```
Alternatively you can run CLI with `-o` flag to set output file location:
```bash
pydantic-glue -f example.py -c Foo -o example.json -l
```
## How it works?
* `pydantic` gets converted to JSON Schema
* the JSON Schema types get mapped to Glue types recursively
## Future work
* Not all types are supported, I just add types as I need them, but adding types is very easy,
feel free to open issues or send a PR if you stumbled upon a non-supported use case
* the tool could be easily extended to working with JSON Schema directly
* thus, anything that can be converted to a JSON Schema should also work.
Raw data
{
"_id": null,
"home_page": "https://github.com/svdimchenko/pydantic-glue",
"name": "pydantic-glue",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "pydantic, glue, athena, types, convert",
"author": "Serhii Dimchenko",
"author_email": "svdimchenko@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b4/42/28f8625d2af609dd04f3ad613b436878f07baf09852be989ca59ae8005d4/pydantic_glue-0.4.0.tar.gz",
"platform": null,
"description": "# JSON Schema to AWS Glue schema converter\n\n## Installation\n\n```bash\npip install pydantic-glue\n```\n\n## What?\n\nConverts `pydantic` schemas to `json schema` and then to `AWS glue schema`,\nso in theory anything that can be converted to JSON Schema *could* also work.\n\n## Why?\n\nWhen using `AWS Kinesis Firehose` in a configuration that receives JSONs and writes `parquet` files on S3,\none needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.\n\nAWS Glue lets you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema,\nbut as of *May 2022`\nthere are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.\n\n<https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform>\n\nThis is also confirmed by AWS support.\n\nWhat one could do is create a table set the columns manually,\nbut this means you now have two sources of truth to maintain.\n\nThis tool allows you to define a table in `pydantic`\nand generate a JSON with column types that can be used with `terraform` to create a Glue table.\n\n## Example\n\nTake the following pydantic class\n\n```python title=\"example.py\"\nfrom pydantic import BaseModel\nfrom typing import List\n\n\nclass Bar(BaseModel):\n name: str\n age: int\n\n\nclass Foo(BaseModel):\n nums: List[int]\n bars: List[Bar]\n other: str\n\n```\n\nRunning `pydantic-glue`\n\n```bash\npydantic-glue -f example.py -c Foo\n```\n\nyou get this JSON in the terminal:\n\n```json\n{\n \"//\": \"Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT\",\n \"columns\": {\n \"nums\": \"array<int>\",\n \"bars\": \"array<struct<name:string,age:int>>\",\n \"other\": \"string\"\n }\n}\n```\n\nand can be used in terraform like that\n\n```terraform\nlocals {\n columns = jsondecode(file(\"${path.module}/glue_schema.json\")).columns\n}\n\nresource \"aws_glue_catalog_table\" \"table\" {\n name = \"table_name\"\n database_name = \"db_name\"\n\n storage_descriptor {\n dynamic \"columns\" {\n for_each = local.columns\n\n content {\n name = columns.key\n type = columns.value\n }\n }\n }\n}\n```\n\nAlternatively you can run CLI with `-o` flag to set output file location:\n\n```bash\npydantic-glue -f example.py -c Foo -o example.json -l\n```\n\n## How it works?\n\n* `pydantic` gets converted to JSON Schema\n* the JSON Schema types get mapped to Glue types recursively\n\n## Future work\n\n* Not all types are supported, I just add types as I need them, but adding types is very easy,\n feel free to open issues or send a PR if you stumbled upon a non-supported use case\n* the tool could be easily extended to working with JSON Schema directly\n* thus, anything that can be converted to a JSON Schema should also work.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Convert pydantic model to aws glue schema for terraform",
"version": "0.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/svdimchenko/pydantic-glue/issues",
"Homepage": "https://github.com/svdimchenko/pydantic-glue",
"Releases": "https://github.com/svdimchenko/pydantic-glue/releases",
"Repository": "https://github.com/svdimchenko/pydantic-glue"
},
"split_keywords": [
"pydantic",
" glue",
" athena",
" types",
" convert"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e74ac9b6a8fc2051fb9b865346332ce002bad3f07bfb0d6d3032c0b894e29f1e",
"md5": "bafab3a205f7809e105795bfee4a772d",
"sha256": "a992ee659d005fb363580fd3b9e4aed452830e9af18b1e8f5106591c98e2e201"
},
"downloads": -1,
"filename": "pydantic_glue-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bafab3a205f7809e105795bfee4a772d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 5351,
"upload_time": "2024-05-08T19:23:42",
"upload_time_iso_8601": "2024-05-08T19:23:42.889472Z",
"url": "https://files.pythonhosted.org/packages/e7/4a/c9b6a8fc2051fb9b865346332ce002bad3f07bfb0d6d3032c0b894e29f1e/pydantic_glue-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b44228f8625d2af609dd04f3ad613b436878f07baf09852be989ca59ae8005d4",
"md5": "743409afa5f9c06ae3589c75d14d4c81",
"sha256": "fc27ba2e59551dd869c4498b3b2621c3df6fb94de43ef04b3adf8fc3be1b6c21"
},
"downloads": -1,
"filename": "pydantic_glue-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "743409afa5f9c06ae3589c75d14d4c81",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 4643,
"upload_time": "2024-05-08T19:23:43",
"upload_time_iso_8601": "2024-05-08T19:23:43.852485Z",
"url": "https://files.pythonhosted.org/packages/b4/42/28f8625d2af609dd04f3ad613b436878f07baf09852be989ca59ae8005d4/pydantic_glue-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-08 19:23:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "svdimchenko",
"github_project": "pydantic-glue",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pydantic-glue"
}