# AWS Glue Schema Registry for Python
[![PyPI](https://img.shields.io/pypi/v/aws-glue-schema-registry.svg)](https://pypi.org/project/aws-glue-schema-registry)
[![PyPI](https://img.shields.io/pypi/pyversions/aws-glue-schema-registry)](https://pypi.org/project/aws-glue-schema-registry)
[![main](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml/badge.svg)](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml)
Use the AWS Glue Schema Registry in Python projects.
This library is a partial port of [aws-glue-schema-registry](https://github.com/awslabs/aws-glue-schema-registry) which implements a subset of its features with full compatibility.
## Feature Support
Feature | Java Library | Python Library | Notes
:------ | :----------- | :------------- | :----
Serialization and deserialization using schema registry | ✔️ | ✔️
Avro message format | ✔️ | ✔️
JSON Schema message format | ✔️ | ✔️
Kafka Streams support | ✔️ | | N/A for Python, Kafka Streams is Java-only
Compression | ✔️ | ✔️ |
Local schema cache | ✔️ | ✔️
Schema auto-registration | ✔️ | ✔️
Evolution checks | ✔️ | ✔️
Migration from a third party Schema Registry | ✔️ | ✔️
Flink support | ✔️ | ❌
Kafka Connect support | ✔️ | | N/A for Python, Kafka Connect is Java-only
## Installation
Clone this repository and install it:
```
python setup.py install -e .
```
This library includes opt-in extra dependencies that enable support for certain features. For example, to use the schema registry with [kafka-python](https://pypi.org/project/kafka-python/), you should install the `kafka-python` extra:
```
python setup.py install -e .[kafka-python]
```
Extra name | Purpose
:--------- | :------
kafka-python | Provides adapter classes to plug into `kafka-python`
## Usage
First use `boto3` to create a low-level AWS Glue client:
```python
import boto3
# Pass your AWS credentials or profile information here
session = boto3.Session(access_key_id=xxx, secret_access_key=xxx, region_name='us-west-2')
glue_client = session.client('glue')
```
See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration for more information on configuring boto3.
Send Kafka messages with `SchemaRegistrySerializer`:
```python
from aws_schema_registry import DataAndSchema, SchemaRegistryClient
from aws_schema_registry.avro import AvroSchema
# In this example we will use kafka-python as our Kafka client,
# so we need to have the `kafka-python` extras installed and use
# the kafka adapter.
from aws_schema_registry.adapter.kafka import KafkaSerializer
from kafka import KafkaConsumer
# Create the schema registry client, which is a façade around the boto3 glue client
client = SchemaRegistryClient(glue_client,
registry_name='my-registry')
# Create the serializer
serializer = KafkaSerializer(client)
# Create the producer
producer = KafkaProducer(value_serializer=serializer)
# Our producer needs a schema to send along with the data.
# In this example we're using Avro, so we'll load an .avsc file.
with open('user.avsc', 'r') as schema_file:
schema = AvroSchema(schema_file.read())
# Send message data along with schema
data = {
'name': 'John Doe',
'favorite_number': 6
}
producer.send('my-topic', value=(data, schema))
# the value MUST be a tuple when we're using the KafkaSerializer
```
Read Kafka messages with `SchemaRegistryDeserializer`:
```python
from aws_schema_registry import SchemaRegistryClient
# In this example we will use kafka-python as our Kafka client,
# so we need to have the `kafka-python` extras installed and use
# the kafka adapter.
from aws_schema_registry.adapter.kafka import KafkaDeserializer
from kafka import KafkaConsumer
# Create the schema registry client, which is a façade around the boto3 glue client
client = SchemaRegistryClient(glue_client,
registry_name='my-registry')
# Create the deserializer
deserializer = KafkaDeserializer(client)
# Create the consumer
consumer = KafkaConsumer('my-topic', value_deserializer=deserializer)
# Now use the consumer normally
for message in consumer:
# The deserializer produces DataAndSchema instances
value: DataAndSchema = message.value
# which are NamedTuples with a `data` and `schema` property
value.data == value[0]
value.schema == value[1]
# and can be deconstructed
data, schema = value
```
## Contributing
Clone this repository and install development dependencies:
```
pip install -e .[dev]
```
Run the linter and tests with tox before committing. After committing, check Github Actions to see the result of the automated checks.
### Linting
Lint the code with:
```
flake8
```
Run the type checker with:
```
mypy
```
### Tests
Tests go under the `tests/` directory. All tests outside of `tests/integration` are unit tests with no external dependencies.
Tests under `tests/integration` are integration test that interact with external resources and/or real AWS schema registries. They generally run slower and require some additional configuration.
Run just the unit tests with:
```
pytest --ignore tests/integration
```
All integration tests use the following environment variables:
- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`
- `AWS_SESSION_TOKEN`
- `AWS_REGION`
- `AWS_PROFILE`
- `CLEANUP_REGISTRY`: Set to any value to prevent the test from destroying the registry created during the test, allowing you to inspect its contents.
If no `AWS_` environment variables are set, `boto3` will try to load credentials from your default AWS profile.
See individual integration test directories for additional requirements and setup instructions.
### Tox
This project uses [Tox](https://tox.wiki/en/latest/) to run tests across multiple Python versions.
Install Tox with:
```
pip install tox
```
and run it with:
```
tox
```
Note that Tox requires the tested python versions to be installed. One convenient way to manage this is using [pyenv](https://github.com/pyenv/pyenv#installation). See the `.python-versions` file for the Python versions that need to be installed.
### Releases
Assuming pypi permissions:
```
python -m build
twine upload -r testpypi dist/*
twine upload dist/*
```
Raw data
{
"_id": null,
"home_page": "",
"name": "aws-glue-schema-registry",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "aws,glue,schema,registry,avro",
"author": "Corentin Debost",
"author_email": "develop@disasteraware.com",
"download_url": "https://files.pythonhosted.org/packages/98/1a/78a073257d0201e1847fb36c6770fce774b67fa0898bf7666ae3a932e11b/aws-glue-schema-registry-1.1.3.tar.gz",
"platform": null,
"description": "# AWS Glue Schema Registry for Python\n\n[![PyPI](https://img.shields.io/pypi/v/aws-glue-schema-registry.svg)](https://pypi.org/project/aws-glue-schema-registry)\n[![PyPI](https://img.shields.io/pypi/pyversions/aws-glue-schema-registry)](https://pypi.org/project/aws-glue-schema-registry)\n[![main](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml/badge.svg)](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml)\n\nUse the AWS Glue Schema Registry in Python projects.\n\nThis library is a partial port of [aws-glue-schema-registry](https://github.com/awslabs/aws-glue-schema-registry) which implements a subset of its features with full compatibility.\n\n## Feature Support\n\nFeature | Java Library | Python Library | Notes\n:------ | :----------- | :------------- | :----\nSerialization and deserialization using schema registry | \u2714\ufe0f | \u2714\ufe0f\nAvro message format | \u2714\ufe0f | \u2714\ufe0f\nJSON Schema message format | \u2714\ufe0f | \u2714\ufe0f\nKafka Streams support | \u2714\ufe0f | | N/A for Python, Kafka Streams is Java-only\nCompression | \u2714\ufe0f | \u2714\ufe0f |\nLocal schema cache | \u2714\ufe0f | \u2714\ufe0f\nSchema auto-registration | \u2714\ufe0f | \u2714\ufe0f\nEvolution checks | \u2714\ufe0f | \u2714\ufe0f\nMigration from a third party Schema Registry | \u2714\ufe0f | \u2714\ufe0f\nFlink support | \u2714\ufe0f | \u274c\nKafka Connect support | \u2714\ufe0f | | N/A for Python, Kafka Connect is Java-only\n\n## Installation\n\nClone this repository and install it:\n\n```\npython setup.py install -e .\n```\n\nThis library includes opt-in extra dependencies that enable support for certain features. For example, to use the schema registry with [kafka-python](https://pypi.org/project/kafka-python/), you should install the `kafka-python` extra:\n\n```\npython setup.py install -e .[kafka-python]\n```\n\nExtra name | Purpose\n:--------- | :------\nkafka-python | Provides adapter classes to plug into `kafka-python`\n\n## Usage\n\nFirst use `boto3` to create a low-level AWS Glue client:\n\n```python\nimport boto3\n\n# Pass your AWS credentials or profile information here\nsession = boto3.Session(access_key_id=xxx, secret_access_key=xxx, region_name='us-west-2')\n\nglue_client = session.client('glue')\n```\n\nSee https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration for more information on configuring boto3.\n\nSend Kafka messages with `SchemaRegistrySerializer`:\n\n```python\nfrom aws_schema_registry import DataAndSchema, SchemaRegistryClient\nfrom aws_schema_registry.avro import AvroSchema\n\n# In this example we will use kafka-python as our Kafka client,\n# so we need to have the `kafka-python` extras installed and use\n# the kafka adapter.\nfrom aws_schema_registry.adapter.kafka import KafkaSerializer\nfrom kafka import KafkaConsumer\n\n# Create the schema registry client, which is a fa\u00e7ade around the boto3 glue client\nclient = SchemaRegistryClient(glue_client,\n registry_name='my-registry')\n\n# Create the serializer\nserializer = KafkaSerializer(client)\n\n# Create the producer\nproducer = KafkaProducer(value_serializer=serializer)\n\n# Our producer needs a schema to send along with the data.\n# In this example we're using Avro, so we'll load an .avsc file.\nwith open('user.avsc', 'r') as schema_file:\n schema = AvroSchema(schema_file.read())\n\n# Send message data along with schema\ndata = {\n 'name': 'John Doe',\n 'favorite_number': 6\n}\nproducer.send('my-topic', value=(data, schema))\n# the value MUST be a tuple when we're using the KafkaSerializer\n```\n\nRead Kafka messages with `SchemaRegistryDeserializer`:\n\n```python\nfrom aws_schema_registry import SchemaRegistryClient\n\n# In this example we will use kafka-python as our Kafka client,\n# so we need to have the `kafka-python` extras installed and use\n# the kafka adapter.\nfrom aws_schema_registry.adapter.kafka import KafkaDeserializer\nfrom kafka import KafkaConsumer\n\n# Create the schema registry client, which is a fa\u00e7ade around the boto3 glue client\nclient = SchemaRegistryClient(glue_client,\n registry_name='my-registry')\n\n# Create the deserializer\ndeserializer = KafkaDeserializer(client)\n\n# Create the consumer\nconsumer = KafkaConsumer('my-topic', value_deserializer=deserializer)\n\n# Now use the consumer normally\nfor message in consumer:\n # The deserializer produces DataAndSchema instances\n value: DataAndSchema = message.value\n # which are NamedTuples with a `data` and `schema` property\n value.data == value[0]\n value.schema == value[1]\n # and can be deconstructed\n data, schema = value\n```\n\n## Contributing\n\nClone this repository and install development dependencies:\n\n```\npip install -e .[dev]\n```\n\nRun the linter and tests with tox before committing. After committing, check Github Actions to see the result of the automated checks.\n\n### Linting\n\nLint the code with:\n\n```\nflake8\n```\n\nRun the type checker with:\n\n```\nmypy\n```\n\n### Tests\n\nTests go under the `tests/` directory. All tests outside of `tests/integration` are unit tests with no external dependencies.\n\nTests under `tests/integration` are integration test that interact with external resources and/or real AWS schema registries. They generally run slower and require some additional configuration.\n\nRun just the unit tests with:\n\n```\npytest --ignore tests/integration\n```\n\nAll integration tests use the following environment variables:\n\n- `AWS_ACCESS_KEY_ID`\n- `AWS_SECRET_ACCESS_KEY`\n- `AWS_SESSION_TOKEN`\n- `AWS_REGION`\n- `AWS_PROFILE`\n- `CLEANUP_REGISTRY`: Set to any value to prevent the test from destroying the registry created during the test, allowing you to inspect its contents.\n\nIf no `AWS_` environment variables are set, `boto3` will try to load credentials from your default AWS profile.\n\nSee individual integration test directories for additional requirements and setup instructions.\n\n### Tox\n\nThis project uses [Tox](https://tox.wiki/en/latest/) to run tests across multiple Python versions.\n\nInstall Tox with:\n\n```\npip install tox\n```\n\nand run it with:\n\n```\ntox\n```\n\nNote that Tox requires the tested python versions to be installed. One convenient way to manage this is using [pyenv](https://github.com/pyenv/pyenv#installation). See the `.python-versions` file for the Python versions that need to be installed.\n\n\n### Releases\n\nAssuming pypi permissions:\n\n```\npython -m build\ntwine upload -r testpypi dist/*\ntwine upload dist/*\n```\n",
"bugtrack_url": null,
"license": "Apache Software License",
"summary": "Use the AWS Glue Schema Registry.",
"version": "1.1.3",
"project_urls": {
"Source": "https://github.com/DisasterAWARE/aws-glue-schema-registry-python"
},
"split_keywords": [
"aws",
"glue",
"schema",
"registry",
"avro"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cc32025a53122c8648d9acb896d3516c30714f3260a78ead44f90e199e18cfd2",
"md5": "917bdfddcdf12d26bca72aaec1bfe274",
"sha256": "590b95edd8b135c9ac74f985c05d117ea79ef6e56209ab90bc0b2fcc222e8fff"
},
"downloads": -1,
"filename": "aws_glue_schema_registry-1.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "917bdfddcdf12d26bca72aaec1bfe274",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 20164,
"upload_time": "2023-10-17T15:59:43",
"upload_time_iso_8601": "2023-10-17T15:59:43.512783Z",
"url": "https://files.pythonhosted.org/packages/cc/32/025a53122c8648d9acb896d3516c30714f3260a78ead44f90e199e18cfd2/aws_glue_schema_registry-1.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "981a78a073257d0201e1847fb36c6770fce774b67fa0898bf7666ae3a932e11b",
"md5": "c5aa2ce379dd324e554230136aad8c07",
"sha256": "87dff8456d94cc09371e01d341a7ce91ae2d539958d297e43e8ae605e154d1e5"
},
"downloads": -1,
"filename": "aws-glue-schema-registry-1.1.3.tar.gz",
"has_sig": false,
"md5_digest": "c5aa2ce379dd324e554230136aad8c07",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 19215,
"upload_time": "2023-10-17T15:59:45",
"upload_time_iso_8601": "2023-10-17T15:59:45.215019Z",
"url": "https://files.pythonhosted.org/packages/98/1a/78a073257d0201e1847fb36c6770fce774b67fa0898bf7666ae3a932e11b/aws-glue-schema-registry-1.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-17 15:59:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DisasterAWARE",
"github_project": "aws-glue-schema-registry-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "aws-glue-schema-registry"
}