aws-glue-schema-registry


Nameaws-glue-schema-registry JSON
Version 1.1.3 PyPI version JSON
download
home_page
SummaryUse the AWS Glue Schema Registry.
upload_time2023-10-17 15:59:45
maintainer
docs_urlNone
authorCorentin Debost
requires_python>=3.8
licenseApache Software License
keywords aws glue schema registry avro
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AWS Glue Schema Registry for Python

[![PyPI](https://img.shields.io/pypi/v/aws-glue-schema-registry.svg)](https://pypi.org/project/aws-glue-schema-registry)
[![PyPI](https://img.shields.io/pypi/pyversions/aws-glue-schema-registry)](https://pypi.org/project/aws-glue-schema-registry)
[![main](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml/badge.svg)](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml)

Use the AWS Glue Schema Registry in Python projects.

This library is a partial port of [aws-glue-schema-registry](https://github.com/awslabs/aws-glue-schema-registry) which implements a subset of its features with full compatibility.

## Feature Support

Feature | Java Library | Python Library | Notes
:------ | :----------- | :------------- | :----
Serialization and deserialization using schema registry | ✔️ | ✔️
Avro message format | ✔️ | ✔️
JSON Schema message format | ✔️ | ✔️
Kafka Streams support | ✔️ | | N/A for Python, Kafka Streams is Java-only
Compression | ✔️ | ✔️ |
Local schema cache | ✔️ | ✔️
Schema auto-registration | ✔️ | ✔️
Evolution checks | ✔️ | ✔️
Migration from a third party Schema Registry | ✔️ | ✔️
Flink support | ✔️ | ❌
Kafka Connect support | ✔️ | | N/A for Python, Kafka Connect is Java-only

## Installation

Clone this repository and install it:

```
python setup.py install -e .
```

This library includes opt-in extra dependencies that enable support for certain features. For example, to use the schema registry with [kafka-python](https://pypi.org/project/kafka-python/), you should install the `kafka-python` extra:

```
python setup.py install -e .[kafka-python]
```

Extra name | Purpose
:--------- | :------
kafka-python | Provides adapter classes to plug into `kafka-python`

## Usage

First use `boto3` to create a low-level AWS Glue client:

```python
import boto3

# Pass your AWS credentials or profile information here
session = boto3.Session(access_key_id=xxx, secret_access_key=xxx, region_name='us-west-2')

glue_client = session.client('glue')
```

See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration for more information on configuring boto3.

Send Kafka messages with `SchemaRegistrySerializer`:

```python
from aws_schema_registry import DataAndSchema, SchemaRegistryClient
from aws_schema_registry.avro import AvroSchema

# In this example we will use kafka-python as our Kafka client,
# so we need to have the `kafka-python` extras installed and use
# the kafka adapter.
from aws_schema_registry.adapter.kafka import KafkaSerializer
from kafka import KafkaConsumer

# Create the schema registry client, which is a façade around the boto3 glue client
client = SchemaRegistryClient(glue_client,
                              registry_name='my-registry')

# Create the serializer
serializer = KafkaSerializer(client)

# Create the producer
producer = KafkaProducer(value_serializer=serializer)

# Our producer needs a schema to send along with the data.
# In this example we're using Avro, so we'll load an .avsc file.
with open('user.avsc', 'r') as schema_file:
    schema = AvroSchema(schema_file.read())

# Send message data along with schema
data = {
    'name': 'John Doe',
    'favorite_number': 6
}
producer.send('my-topic', value=(data, schema))
# the value MUST be a tuple when we're using the KafkaSerializer
```

Read Kafka messages with `SchemaRegistryDeserializer`:

```python
from aws_schema_registry import SchemaRegistryClient

# In this example we will use kafka-python as our Kafka client,
# so we need to have the `kafka-python` extras installed and use
# the kafka adapter.
from aws_schema_registry.adapter.kafka import KafkaDeserializer
from kafka import KafkaConsumer

# Create the schema registry client, which is a façade around the boto3 glue client
client = SchemaRegistryClient(glue_client,
                              registry_name='my-registry')

# Create the deserializer
deserializer = KafkaDeserializer(client)

# Create the consumer
consumer = KafkaConsumer('my-topic', value_deserializer=deserializer)

# Now use the consumer normally
for message in consumer:
    # The deserializer produces DataAndSchema instances
    value: DataAndSchema = message.value
    # which are NamedTuples with a `data` and `schema` property
    value.data == value[0]
    value.schema == value[1]
    # and can be deconstructed
    data, schema = value
```

## Contributing

Clone this repository and install development dependencies:

```
pip install -e .[dev]
```

Run the linter and tests with tox before committing. After committing, check Github Actions to see the result of the automated checks.

### Linting

Lint the code with:

```
flake8
```

Run the type checker with:

```
mypy
```

### Tests

Tests go under the `tests/` directory. All tests outside of `tests/integration` are unit tests with no external dependencies.

Tests under `tests/integration` are integration test that interact with external resources and/or real AWS schema registries. They generally run slower and require some additional configuration.

Run just the unit tests with:

```
pytest --ignore tests/integration
```

All integration tests use the following environment variables:

- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`
- `AWS_SESSION_TOKEN`
- `AWS_REGION`
- `AWS_PROFILE`
- `CLEANUP_REGISTRY`: Set to any value to prevent the test from destroying the registry created during the test, allowing you to inspect its contents.

If no `AWS_` environment variables are set, `boto3` will try to load credentials from your default AWS profile.

See individual integration test directories for additional requirements and setup instructions.

### Tox

This project uses [Tox](https://tox.wiki/en/latest/) to run tests across multiple Python versions.

Install Tox with:

```
pip install tox
```

and run it with:

```
tox
```

Note that Tox requires the tested python versions to be installed. One convenient way to manage this is using [pyenv](https://github.com/pyenv/pyenv#installation). See the `.python-versions` file for the Python versions that need to be installed.


### Releases

Assuming pypi permissions:

```
python -m build
twine upload -r testpypi dist/*
twine upload dist/*
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "aws-glue-schema-registry",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "aws,glue,schema,registry,avro",
    "author": "Corentin Debost",
    "author_email": "develop@disasteraware.com",
    "download_url": "https://files.pythonhosted.org/packages/98/1a/78a073257d0201e1847fb36c6770fce774b67fa0898bf7666ae3a932e11b/aws-glue-schema-registry-1.1.3.tar.gz",
    "platform": null,
    "description": "# AWS Glue Schema Registry for Python\n\n[![PyPI](https://img.shields.io/pypi/v/aws-glue-schema-registry.svg)](https://pypi.org/project/aws-glue-schema-registry)\n[![PyPI](https://img.shields.io/pypi/pyversions/aws-glue-schema-registry)](https://pypi.org/project/aws-glue-schema-registry)\n[![main](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml/badge.svg)](https://github.com/DisasterAWARE/aws-glue-schema-registry-python/actions/workflows/main.yml)\n\nUse the AWS Glue Schema Registry in Python projects.\n\nThis library is a partial port of [aws-glue-schema-registry](https://github.com/awslabs/aws-glue-schema-registry) which implements a subset of its features with full compatibility.\n\n## Feature Support\n\nFeature | Java Library | Python Library | Notes\n:------ | :----------- | :------------- | :----\nSerialization and deserialization using schema registry | \u2714\ufe0f | \u2714\ufe0f\nAvro message format | \u2714\ufe0f | \u2714\ufe0f\nJSON Schema message format | \u2714\ufe0f | \u2714\ufe0f\nKafka Streams support | \u2714\ufe0f | | N/A for Python, Kafka Streams is Java-only\nCompression | \u2714\ufe0f | \u2714\ufe0f |\nLocal schema cache | \u2714\ufe0f | \u2714\ufe0f\nSchema auto-registration | \u2714\ufe0f | \u2714\ufe0f\nEvolution checks | \u2714\ufe0f | \u2714\ufe0f\nMigration from a third party Schema Registry | \u2714\ufe0f | \u2714\ufe0f\nFlink support | \u2714\ufe0f | \u274c\nKafka Connect support | \u2714\ufe0f | | N/A for Python, Kafka Connect is Java-only\n\n## Installation\n\nClone this repository and install it:\n\n```\npython setup.py install -e .\n```\n\nThis library includes opt-in extra dependencies that enable support for certain features. For example, to use the schema registry with [kafka-python](https://pypi.org/project/kafka-python/), you should install the `kafka-python` extra:\n\n```\npython setup.py install -e .[kafka-python]\n```\n\nExtra name | Purpose\n:--------- | :------\nkafka-python | Provides adapter classes to plug into `kafka-python`\n\n## Usage\n\nFirst use `boto3` to create a low-level AWS Glue client:\n\n```python\nimport boto3\n\n# Pass your AWS credentials or profile information here\nsession = boto3.Session(access_key_id=xxx, secret_access_key=xxx, region_name='us-west-2')\n\nglue_client = session.client('glue')\n```\n\nSee https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration for more information on configuring boto3.\n\nSend Kafka messages with `SchemaRegistrySerializer`:\n\n```python\nfrom aws_schema_registry import DataAndSchema, SchemaRegistryClient\nfrom aws_schema_registry.avro import AvroSchema\n\n# In this example we will use kafka-python as our Kafka client,\n# so we need to have the `kafka-python` extras installed and use\n# the kafka adapter.\nfrom aws_schema_registry.adapter.kafka import KafkaSerializer\nfrom kafka import KafkaConsumer\n\n# Create the schema registry client, which is a fa\u00e7ade around the boto3 glue client\nclient = SchemaRegistryClient(glue_client,\n                              registry_name='my-registry')\n\n# Create the serializer\nserializer = KafkaSerializer(client)\n\n# Create the producer\nproducer = KafkaProducer(value_serializer=serializer)\n\n# Our producer needs a schema to send along with the data.\n# In this example we're using Avro, so we'll load an .avsc file.\nwith open('user.avsc', 'r') as schema_file:\n    schema = AvroSchema(schema_file.read())\n\n# Send message data along with schema\ndata = {\n    'name': 'John Doe',\n    'favorite_number': 6\n}\nproducer.send('my-topic', value=(data, schema))\n# the value MUST be a tuple when we're using the KafkaSerializer\n```\n\nRead Kafka messages with `SchemaRegistryDeserializer`:\n\n```python\nfrom aws_schema_registry import SchemaRegistryClient\n\n# In this example we will use kafka-python as our Kafka client,\n# so we need to have the `kafka-python` extras installed and use\n# the kafka adapter.\nfrom aws_schema_registry.adapter.kafka import KafkaDeserializer\nfrom kafka import KafkaConsumer\n\n# Create the schema registry client, which is a fa\u00e7ade around the boto3 glue client\nclient = SchemaRegistryClient(glue_client,\n                              registry_name='my-registry')\n\n# Create the deserializer\ndeserializer = KafkaDeserializer(client)\n\n# Create the consumer\nconsumer = KafkaConsumer('my-topic', value_deserializer=deserializer)\n\n# Now use the consumer normally\nfor message in consumer:\n    # The deserializer produces DataAndSchema instances\n    value: DataAndSchema = message.value\n    # which are NamedTuples with a `data` and `schema` property\n    value.data == value[0]\n    value.schema == value[1]\n    # and can be deconstructed\n    data, schema = value\n```\n\n## Contributing\n\nClone this repository and install development dependencies:\n\n```\npip install -e .[dev]\n```\n\nRun the linter and tests with tox before committing. After committing, check Github Actions to see the result of the automated checks.\n\n### Linting\n\nLint the code with:\n\n```\nflake8\n```\n\nRun the type checker with:\n\n```\nmypy\n```\n\n### Tests\n\nTests go under the `tests/` directory. All tests outside of `tests/integration` are unit tests with no external dependencies.\n\nTests under `tests/integration` are integration test that interact with external resources and/or real AWS schema registries. They generally run slower and require some additional configuration.\n\nRun just the unit tests with:\n\n```\npytest --ignore tests/integration\n```\n\nAll integration tests use the following environment variables:\n\n- `AWS_ACCESS_KEY_ID`\n- `AWS_SECRET_ACCESS_KEY`\n- `AWS_SESSION_TOKEN`\n- `AWS_REGION`\n- `AWS_PROFILE`\n- `CLEANUP_REGISTRY`: Set to any value to prevent the test from destroying the registry created during the test, allowing you to inspect its contents.\n\nIf no `AWS_` environment variables are set, `boto3` will try to load credentials from your default AWS profile.\n\nSee individual integration test directories for additional requirements and setup instructions.\n\n### Tox\n\nThis project uses [Tox](https://tox.wiki/en/latest/) to run tests across multiple Python versions.\n\nInstall Tox with:\n\n```\npip install tox\n```\n\nand run it with:\n\n```\ntox\n```\n\nNote that Tox requires the tested python versions to be installed. One convenient way to manage this is using [pyenv](https://github.com/pyenv/pyenv#installation). See the `.python-versions` file for the Python versions that need to be installed.\n\n\n### Releases\n\nAssuming pypi permissions:\n\n```\npython -m build\ntwine upload -r testpypi dist/*\ntwine upload dist/*\n```\n",
    "bugtrack_url": null,
    "license": "Apache Software License",
    "summary": "Use the AWS Glue Schema Registry.",
    "version": "1.1.3",
    "project_urls": {
        "Source": "https://github.com/DisasterAWARE/aws-glue-schema-registry-python"
    },
    "split_keywords": [
        "aws",
        "glue",
        "schema",
        "registry",
        "avro"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cc32025a53122c8648d9acb896d3516c30714f3260a78ead44f90e199e18cfd2",
                "md5": "917bdfddcdf12d26bca72aaec1bfe274",
                "sha256": "590b95edd8b135c9ac74f985c05d117ea79ef6e56209ab90bc0b2fcc222e8fff"
            },
            "downloads": -1,
            "filename": "aws_glue_schema_registry-1.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "917bdfddcdf12d26bca72aaec1bfe274",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 20164,
            "upload_time": "2023-10-17T15:59:43",
            "upload_time_iso_8601": "2023-10-17T15:59:43.512783Z",
            "url": "https://files.pythonhosted.org/packages/cc/32/025a53122c8648d9acb896d3516c30714f3260a78ead44f90e199e18cfd2/aws_glue_schema_registry-1.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "981a78a073257d0201e1847fb36c6770fce774b67fa0898bf7666ae3a932e11b",
                "md5": "c5aa2ce379dd324e554230136aad8c07",
                "sha256": "87dff8456d94cc09371e01d341a7ce91ae2d539958d297e43e8ae605e154d1e5"
            },
            "downloads": -1,
            "filename": "aws-glue-schema-registry-1.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "c5aa2ce379dd324e554230136aad8c07",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19215,
            "upload_time": "2023-10-17T15:59:45",
            "upload_time_iso_8601": "2023-10-17T15:59:45.215019Z",
            "url": "https://files.pythonhosted.org/packages/98/1a/78a073257d0201e1847fb36c6770fce774b67fa0898bf7666ae3a932e11b/aws-glue-schema-registry-1.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-17 15:59:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DisasterAWARE",
    "github_project": "aws-glue-schema-registry-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "aws-glue-schema-registry"
}
        
Elapsed time: 0.12366s