iceberg-evolve


Nameiceberg-evolve JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/anatol-ju/iceberg-evolve
SummarySchema diffing and evolution tool for Iceberg and beyond.
upload_time2025-08-07 08:28:21
maintainerNone
docs_urlNone
authorAnatol Jurenkow
requires_python>=3.10
licenseNone
keywords iceberg schema evolution data-engineering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # iceberg-evolve
Schema diffing and evolution tool for Apache Iceberg and beyond.

## 📣 New in 1.0.0
Initial release with core support for schema comparison and automated evolution against live Iceberg tables.

## 🔧 Features
- Schema Loading
  - Store and load Iceberg schemas to/from standalone JSON files via IcebergSchemaJSONSerializer.
  - Fetch table schemas directly from Iceberg catalogs (Hive, Glue, REST) via PyIceberg configurations (pyiceberg.yaml).

- Schema Diffing
  - Detect added, removed, renamed, and type-changed columns.
  - Support matching by column id or name strategies (default: id).

- Automated Evolution
  - Generate and apply Iceberg schema evolution operations (add/rename/update/drop).
  - Preview migrations with a `--dry-run` mode before applying changes.

- Rich CLI
  - `iceberg-evolve diff <old.json> <new.json>` to view schema diffs in a colored, tree-style format.
  - `iceberg-evolve evolve --catalog-url <URI> --table-ident <db.table> --schema-path <new.json>` to apply migrations.

- Python API
  - Programmatic access to `Schema`, `SchemaDiff`, and migration utilities for integration in CI/CD pipelines or custom scripts.

- Utilities
  - Clean and normalize Iceberg type strings.
  - Render operation plans to console via [Rich](https://pypi.org/project/rich/).

## 🚀 Use Cases
- Automate schema migrations for data lakes built on Iceberg.
- Integrate schema checks into CI/CD workflows to prevent accidental breaking changes.
- Generate human-readable schema evolution plans for review and auditing.
- Build Python tooling around Iceberg schemas, including advanced analyses and reporting.

## 🚚 Installation
Requires Python 3.10 or later.
```
pip install iceberg-evolve
```

Or, to install for development with Poetry:
```
git clone https://github.com/anatol-ju/iceberg-evolve.git
cd iceberg-evolve
poetry install --with dev
pre-commit install  # optional: enable linting and formatting hooks
```

## 🧱 Quick Examples
For a quick look at the output, install the project and run:
```
poetry run example
```

### Python API
```python
from iceberg_evolve.schema import Schema
from iceberg_evolve.diff import SchemaDiff
from iceberg_evolve.renderer import SchemaDiffRenderer

# Load schemas
old = Schema.from_json_file("schemas/users_current.json")
new = Schema.from_json_file("schemas/users_new.json")

# Compute diff and render to console
diff = SchemaDiff(old, new)
SchemaDiffRenderer(diff).display()

from iceberg_evolve.schema import Schema
from iceberg_evolve.serializer import IcebergSchemaJSONSerializer

# Load an Iceberg Schema from a local file (in the expected format)
old_schema = Schema.from_json_file("schemas/users_current.json")

# Write it out to a standalone JSON file...
IcebergSchemaJSONSerializer.to_json_file(old_schema, "schemas/users_exported.json")

# ...and read it back in later
reloaded_schema = IcebergSchemaJSONSerializer.from_json_file("schemas/users_exported.json")
```

### CLI
```
# View diff between two JSON schemas
iceberg-evolve diff users_current.json users_new.json \
  --match-by name

# Apply evolution to a live Iceberg table (dry run)
iceberg-evolve evolve \
  --catalog-url hive://localhost:9083 \
  --table-ident analytics.users \
  --schema-path users_new.json \
  --dry-run

# Serialize a table's schema
iceberg-evolve serialize \
  --catalog-url hive://localhost:9083 \
  --table-ident analytics.users \
  --output-path schemas/users_table_schema.json
```

## ⚙️ Configuration
This package relies on PyIceberg, therefore the configuration is the same. See [documentation](https://py.iceberg.apache.org/configuration/).
Create a `pyiceberg.yaml` in your project root to configure catalogs:
```yaml
catalogs:
  default:
    type: hive
    uri: thrift://localhost:9083

  glue:
    type: glue
    region: eu-west-1
```
You can find an example configuration in the examples directory. Alternatively, you can use environmental variables to set the catalog details.

When using the CLI, pass the catalog name or full URI to the `evolve` command via `--catalog-url` (e.g., `glue://default`).

## 🧪 Testing
Run unit tests with pytest:
```
poetry run pytest
```
Coverage reports are generated automatically via the existing configuration.

This project contains a basic local setup to test the functionality with a `hive` metastore. The purpose is to give you some insights before applying the package in your pipelines.
You can run integration tests, once the Docker containers are up. Either by:
```
poetry run pytest tests/test_integration.py
```
Or without logging into the container:
```
docker compose exec runner poetry run pytest tests/test_integration.py
```
You don't have to select the integration test explicitly, it will be skipped automatically if you run unit tests outside of a container.

## 📝 License
This project is licensed under the MIT License. See the LICENSE file for details.

## 🧑‍💻 Author
Anatol Jurenkow
Cloud Data Engineer | AWS Enthusiast | Iceberg Fan
[GitHub](https://github.com/anatol-ju) · [LinkedIn](https://www.linkedin.com/in/anatol-jurenkow)

Feel free to open issues or contribute via pull requests—happy evolving!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/anatol-ju/iceberg-evolve",
    "name": "iceberg-evolve",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "iceberg, schema, evolution, data-engineering",
    "author": "Anatol Jurenkow",
    "author_email": "azurius@t-online.de",
    "download_url": "https://files.pythonhosted.org/packages/e3/3d/e3d7ed520491b2dddb935c7050e5166c674d1086962209bcdf1a1188cc59/iceberg_evolve-1.0.1.tar.gz",
    "platform": null,
    "description": "# iceberg-evolve\nSchema diffing and evolution tool for Apache Iceberg and beyond.\n\n## \ud83d\udce3 New in 1.0.0\nInitial release with core support for schema comparison and automated evolution against live Iceberg tables.\n\n## \ud83d\udd27 Features\n- Schema Loading\n  - Store and load Iceberg schemas to/from standalone JSON files via IcebergSchemaJSONSerializer.\n  - Fetch table schemas directly from Iceberg catalogs (Hive, Glue, REST) via PyIceberg configurations (pyiceberg.yaml).\n\n- Schema Diffing\n  - Detect added, removed, renamed, and type-changed columns.\n  - Support matching by column id or name strategies (default: id).\n\n- Automated Evolution\n  - Generate and apply Iceberg schema evolution operations (add/rename/update/drop).\n  - Preview migrations with a `--dry-run` mode before applying changes.\n\n- Rich CLI\n  - `iceberg-evolve diff <old.json> <new.json>` to view schema diffs in a colored, tree-style format.\n  - `iceberg-evolve evolve --catalog-url <URI> --table-ident <db.table> --schema-path <new.json>` to apply migrations.\n\n- Python API\n  - Programmatic access to `Schema`, `SchemaDiff`, and migration utilities for integration in CI/CD pipelines or custom scripts.\n\n- Utilities\n  - Clean and normalize Iceberg type strings.\n  - Render operation plans to console via [Rich](https://pypi.org/project/rich/).\n\n## \ud83d\ude80 Use Cases\n- Automate schema migrations for data lakes built on Iceberg.\n- Integrate schema checks into CI/CD workflows to prevent accidental breaking changes.\n- Generate human-readable schema evolution plans for review and auditing.\n- Build Python tooling around Iceberg schemas, including advanced analyses and reporting.\n\n## \ud83d\ude9a Installation\nRequires Python 3.10 or later.\n```\npip install iceberg-evolve\n```\n\nOr, to install for development with Poetry:\n```\ngit clone https://github.com/anatol-ju/iceberg-evolve.git\ncd iceberg-evolve\npoetry install --with dev\npre-commit install  # optional: enable linting and formatting hooks\n```\n\n## \ud83e\uddf1 Quick Examples\nFor a quick look at the output, install the project and run:\n```\npoetry run example\n```\n\n### Python API\n```python\nfrom iceberg_evolve.schema import Schema\nfrom iceberg_evolve.diff import SchemaDiff\nfrom iceberg_evolve.renderer import SchemaDiffRenderer\n\n# Load schemas\nold = Schema.from_json_file(\"schemas/users_current.json\")\nnew = Schema.from_json_file(\"schemas/users_new.json\")\n\n# Compute diff and render to console\ndiff = SchemaDiff(old, new)\nSchemaDiffRenderer(diff).display()\n\nfrom iceberg_evolve.schema import Schema\nfrom iceberg_evolve.serializer import IcebergSchemaJSONSerializer\n\n# Load an Iceberg Schema from a local file (in the expected format)\nold_schema = Schema.from_json_file(\"schemas/users_current.json\")\n\n# Write it out to a standalone JSON file...\nIcebergSchemaJSONSerializer.to_json_file(old_schema, \"schemas/users_exported.json\")\n\n# ...and read it back in later\nreloaded_schema = IcebergSchemaJSONSerializer.from_json_file(\"schemas/users_exported.json\")\n```\n\n### CLI\n```\n# View diff between two JSON schemas\niceberg-evolve diff users_current.json users_new.json \\\n  --match-by name\n\n# Apply evolution to a live Iceberg table (dry run)\niceberg-evolve evolve \\\n  --catalog-url hive://localhost:9083 \\\n  --table-ident analytics.users \\\n  --schema-path users_new.json \\\n  --dry-run\n\n# Serialize a table's schema\niceberg-evolve serialize \\\n  --catalog-url hive://localhost:9083 \\\n  --table-ident analytics.users \\\n  --output-path schemas/users_table_schema.json\n```\n\n## \u2699\ufe0f Configuration\nThis package relies on PyIceberg, therefore the configuration is the same. See [documentation](https://py.iceberg.apache.org/configuration/).\nCreate a `pyiceberg.yaml` in your project root to configure catalogs:\n```yaml\ncatalogs:\n  default:\n    type: hive\n    uri: thrift://localhost:9083\n\n  glue:\n    type: glue\n    region: eu-west-1\n```\nYou can find an example configuration in the examples directory. Alternatively, you can use environmental variables to set the catalog details.\n\nWhen using the CLI, pass the catalog name or full URI to the `evolve` command via `--catalog-url` (e.g., `glue://default`).\n\n## \ud83e\uddea Testing\nRun unit tests with pytest:\n```\npoetry run pytest\n```\nCoverage reports are generated automatically via the existing configuration.\n\nThis project contains a basic local setup to test the functionality with a `hive` metastore. The purpose is to give you some insights before applying the package in your pipelines.\nYou can run integration tests, once the Docker containers are up. Either by:\n```\npoetry run pytest tests/test_integration.py\n```\nOr without logging into the container:\n```\ndocker compose exec runner poetry run pytest tests/test_integration.py\n```\nYou don't have to select the integration test explicitly, it will be skipped automatically if you run unit tests outside of a container.\n\n## \ud83d\udcdd License\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n## \ud83e\uddd1\u200d\ud83d\udcbb Author\nAnatol Jurenkow\nCloud Data Engineer | AWS Enthusiast | Iceberg Fan\n[GitHub](https://github.com/anatol-ju) \u00b7 [LinkedIn](https://www.linkedin.com/in/anatol-jurenkow)\n\nFeel free to open issues or contribute via pull requests\u2014happy evolving!\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Schema diffing and evolution tool for Iceberg and beyond.",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/anatol-ju/iceberg-evolve",
        "Repository": "https://github.com/anatol-ju/iceberg-evolve"
    },
    "split_keywords": [
        "iceberg",
        " schema",
        " evolution",
        " data-engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5432760e40670be054a97ba5133c2d6c68851f6ab0d32acd44e3fb9ff2e9c5de",
                "md5": "4a54e79708e4b1320d5bfadef165a8dc",
                "sha256": "d588c03c1d248b3191ce52939dee7081d2aab1803ffd1684f1804ec1bb2661aa"
            },
            "downloads": -1,
            "filename": "iceberg_evolve-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4a54e79708e4b1320d5bfadef165a8dc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 25403,
            "upload_time": "2025-08-07T08:28:20",
            "upload_time_iso_8601": "2025-08-07T08:28:20.424772Z",
            "url": "https://files.pythonhosted.org/packages/54/32/760e40670be054a97ba5133c2d6c68851f6ab0d32acd44e3fb9ff2e9c5de/iceberg_evolve-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e33de3d7ed520491b2dddb935c7050e5166c674d1086962209bcdf1a1188cc59",
                "md5": "4958a5a82bb967069bf00b543eb2688a",
                "sha256": "51df3b03d4e0ce7f7650c7ea9e4a10656038a2bef85b5e6d5a62fb8d40460147"
            },
            "downloads": -1,
            "filename": "iceberg_evolve-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "4958a5a82bb967069bf00b543eb2688a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 22571,
            "upload_time": "2025-08-07T08:28:21",
            "upload_time_iso_8601": "2025-08-07T08:28:21.732395Z",
            "url": "https://files.pythonhosted.org/packages/e3/3d/e3d7ed520491b2dddb935c7050e5166c674d1086962209bcdf1a1188cc59/iceberg_evolve-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-07 08:28:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "anatol-ju",
    "github_project": "iceberg-evolve",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "iceberg-evolve"
}
        
Elapsed time: 0.95178s