glassflow


Nameglassflow JSON
Version 3.0.0 PyPI version JSON
download
home_pageNone
SummaryGlassFlow Clickhouse ETL Python SDK: Create GlassFlow pipelines between Kafka and ClickHouse
upload_time2025-09-05 15:53:22
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords clickhouse data-engineering data-pipeline etl glassflow kafka streaming
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GlassFlow ETL Python SDK

<p align="left">
  <a target="_blank" href="https://pypi.python.org/pypi/glassflow">
    <img src="https://img.shields.io/pypi/v/glassflow.svg?labelColor=&color=e69e3a">
  </a>
  <a target="_blank" href="https://github.com/glassflow/glassflow-python-sdk/blob/main/LICENSE">
    <img src="https://img.shields.io/pypi/l/glassflow.svg?labelColor=&color=e69e3a">
  </a>
  <a target="_blank" href="https://pypi.python.org/pypi/glassflow">
    <img src="https://img.shields.io/pypi/pyversions/glassflow.svg?labelColor=&color=e69e3a">
  </a>
  <br />
  <a target="_blank" href="(https://github.com/glassflow/glassflow-python-sdk/actions">
    <img src="https://github.com/glassflow/glassflow-python-sdk/workflows/Test/badge.svg?labelColor=&color=e69e3a">
  </a>
<!-- Pytest Coverage Comment:Begin -->
  <img src=https://img.shields.io/badge/coverage-94%25-brightgreen>
<!-- Pytest Coverage Comment:End -->
</p>

A Python SDK for creating and managing data pipelines between Kafka and ClickHouse.

## Features

- Create and manage data pipelines between Kafka and ClickHouse
- Deduplication of events during a time window based on a key
- Temporal joins between topics based on a common key with a given time window
- Schema validation and configuration management

## Installation

```bash
pip install glassflow
```

## Quick Start

```python
from glassflow.etl import Pipeline


pipeline_config = {
  "pipeline_id": "test-pipeline",
  "source": {
    "type": "kafka",
    "provider": "aiven",
    "connection_params": {
      "brokers": ["localhoust:9092"],
      "protocol": "SASL_SSL",
      "mechanism": "SCRAM-SHA-256",
      "username": "user",
      "password": "pass"
    }
    "topics": [
      {
        "consumer_group_initial_offset": "earliest",
        "id": "test-topic",
        "name": "test-topic",
        "schema": {
          "type": "json",
          "fields": [
            {"name": "id", "type": "string" },
            {"name": "email", "type": "string"}
          ]
        },
        "deduplication": {
          "id_field": "id",
          "id_field_type": "string",
          "time_window": "1h",
          "enabled": True
        }
      }
    ],
  },
  "sink": {
    "type": "clickhouse",
    "host": "localhost:8443",
    "port": 8443,
    "database": "test",
    "username": "default",
    "password": "pass",
    "table_mapping": [
      {
        "source_id": "test_table",
        "field_name": "id",
        "column_name": "user_id",
        "column_type": "UUID"
      },
      {
        "source_id": "test_table",
        "field_name": "email",
        "column_name": "email",
        "column_type": "String"
      }
    ]
  }
}

# Create a pipeline from a JSON configuration
pipeline = Pipeline(pipeline_config)

# Create the pipeline
pipeline.create()
```

## Pipeline Configuration

For detailed information about the pipeline configuration, see [GlassFlow docs](https://docs.glassflow.dev/pipeline/pipeline-configuration).

## Tracking

The SDK includes anonymous usage tracking to help improve the product. Tracking is enabled by default but can be disabled in two ways:

1. Using an environment variable:
```bash
export GF_TRACKING_ENABLED=false
```

2. Programmatically using the `disable_tracking` method:
```python
pipeline = Pipeline(pipeline_config)
pipeline.disable_tracking()
```

The tracking collects anonymous information about:
- SDK version
- Platform (operating system)
- Python version
- Pipeline ID
- Whether joins or deduplication are enabled
- Kafka security protocol, auth mechanism used and whether authentication is disabled
- Errors during pipeline creation and deletion

## Development

### Setup

1. Clone the repository
2. Create a virtual environment
3. Install dependencies:

```bash
uv venv
source .venv/bin/activate
uv pip install -e .[dev]
```

### Testing

```bash
pytest
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "glassflow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "clickhouse, data-engineering, data-pipeline, etl, glassflow, kafka, streaming",
    "author": null,
    "author_email": "GlassFlow <hello@glassflow.dev>",
    "download_url": "https://files.pythonhosted.org/packages/d0/40/215b76f9c340845aea5edf7dddd41d48833cdeb39ddb4a4a8b7d365867be/glassflow-3.0.0.tar.gz",
    "platform": null,
    "description": "# GlassFlow ETL Python SDK\n\n<p align=\"left\">\n  <a target=\"_blank\" href=\"https://pypi.python.org/pypi/glassflow\">\n    <img src=\"https://img.shields.io/pypi/v/glassflow.svg?labelColor=&color=e69e3a\">\n  </a>\n  <a target=\"_blank\" href=\"https://github.com/glassflow/glassflow-python-sdk/blob/main/LICENSE\">\n    <img src=\"https://img.shields.io/pypi/l/glassflow.svg?labelColor=&color=e69e3a\">\n  </a>\n  <a target=\"_blank\" href=\"https://pypi.python.org/pypi/glassflow\">\n    <img src=\"https://img.shields.io/pypi/pyversions/glassflow.svg?labelColor=&color=e69e3a\">\n  </a>\n  <br />\n  <a target=\"_blank\" href=\"(https://github.com/glassflow/glassflow-python-sdk/actions\">\n    <img src=\"https://github.com/glassflow/glassflow-python-sdk/workflows/Test/badge.svg?labelColor=&color=e69e3a\">\n  </a>\n<!-- Pytest Coverage Comment:Begin -->\n  <img src=https://img.shields.io/badge/coverage-94%25-brightgreen>\n<!-- Pytest Coverage Comment:End -->\n</p>\n\nA Python SDK for creating and managing data pipelines between Kafka and ClickHouse.\n\n## Features\n\n- Create and manage data pipelines between Kafka and ClickHouse\n- Deduplication of events during a time window based on a key\n- Temporal joins between topics based on a common key with a given time window\n- Schema validation and configuration management\n\n## Installation\n\n```bash\npip install glassflow\n```\n\n## Quick Start\n\n```python\nfrom glassflow.etl import Pipeline\n\n\npipeline_config = {\n  \"pipeline_id\": \"test-pipeline\",\n  \"source\": {\n    \"type\": \"kafka\",\n    \"provider\": \"aiven\",\n    \"connection_params\": {\n      \"brokers\": [\"localhoust:9092\"],\n      \"protocol\": \"SASL_SSL\",\n      \"mechanism\": \"SCRAM-SHA-256\",\n      \"username\": \"user\",\n      \"password\": \"pass\"\n    }\n    \"topics\": [\n      {\n        \"consumer_group_initial_offset\": \"earliest\",\n        \"id\": \"test-topic\",\n        \"name\": \"test-topic\",\n        \"schema\": {\n          \"type\": \"json\",\n          \"fields\": [\n            {\"name\": \"id\", \"type\": \"string\" },\n            {\"name\": \"email\", \"type\": \"string\"}\n          ]\n        },\n        \"deduplication\": {\n          \"id_field\": \"id\",\n          \"id_field_type\": \"string\",\n          \"time_window\": \"1h\",\n          \"enabled\": True\n        }\n      }\n    ],\n  },\n  \"sink\": {\n    \"type\": \"clickhouse\",\n    \"host\": \"localhost:8443\",\n    \"port\": 8443,\n    \"database\": \"test\",\n    \"username\": \"default\",\n    \"password\": \"pass\",\n    \"table_mapping\": [\n      {\n        \"source_id\": \"test_table\",\n        \"field_name\": \"id\",\n        \"column_name\": \"user_id\",\n        \"column_type\": \"UUID\"\n      },\n      {\n        \"source_id\": \"test_table\",\n        \"field_name\": \"email\",\n        \"column_name\": \"email\",\n        \"column_type\": \"String\"\n      }\n    ]\n  }\n}\n\n# Create a pipeline from a JSON configuration\npipeline = Pipeline(pipeline_config)\n\n# Create the pipeline\npipeline.create()\n```\n\n## Pipeline Configuration\n\nFor detailed information about the pipeline configuration, see [GlassFlow docs](https://docs.glassflow.dev/pipeline/pipeline-configuration).\n\n## Tracking\n\nThe SDK includes anonymous usage tracking to help improve the product. Tracking is enabled by default but can be disabled in two ways:\n\n1. Using an environment variable:\n```bash\nexport GF_TRACKING_ENABLED=false\n```\n\n2. Programmatically using the `disable_tracking` method:\n```python\npipeline = Pipeline(pipeline_config)\npipeline.disable_tracking()\n```\n\nThe tracking collects anonymous information about:\n- SDK version\n- Platform (operating system)\n- Python version\n- Pipeline ID\n- Whether joins or deduplication are enabled\n- Kafka security protocol, auth mechanism used and whether authentication is disabled\n- Errors during pipeline creation and deletion\n\n## Development\n\n### Setup\n\n1. Clone the repository\n2. Create a virtual environment\n3. Install dependencies:\n\n```bash\nuv venv\nsource .venv/bin/activate\nuv pip install -e .[dev]\n```\n\n### Testing\n\n```bash\npytest\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "GlassFlow Clickhouse ETL Python SDK: Create GlassFlow pipelines between Kafka and ClickHouse",
    "version": "3.0.0",
    "project_urls": {
        "Documentation": "https://glassflow.github.io/glassflow-python-sdk",
        "Homepage": "https://github.com/glassflow/glassflow-python-sdk",
        "Issues": "https://github.com/glassflow/glassflow-python-sdk/issues",
        "Repository": "https://github.com/glassflow/glassflow-python-sdk.git"
    },
    "split_keywords": [
        "clickhouse",
        " data-engineering",
        " data-pipeline",
        " etl",
        " glassflow",
        " kafka",
        " streaming"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b1c41d628e6f9671517fba37890063d857d4f906e9cd8c258b2c4095ca6f484b",
                "md5": "0a740ec4300b0a312c79bf3c04ecc1f6",
                "sha256": "f77b0af0762fc56ba7e0b6180a8fec33874d41a3b5a83adcf45d8d80a96c46e8"
            },
            "downloads": -1,
            "filename": "glassflow-3.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0a740ec4300b0a312c79bf3c04ecc1f6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18986,
            "upload_time": "2025-09-05T15:53:20",
            "upload_time_iso_8601": "2025-09-05T15:53:20.680476Z",
            "url": "https://files.pythonhosted.org/packages/b1/c4/1d628e6f9671517fba37890063d857d4f906e9cd8c258b2c4095ca6f484b/glassflow-3.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d040215b76f9c340845aea5edf7dddd41d48833cdeb39ddb4a4a8b7d365867be",
                "md5": "59a6a8b5f5d4317e6b95f3e55bf3c40c",
                "sha256": "449c1c73f1317743370952adc63a609cb14f93e8d2e5112ae2d7c70d249e4916"
            },
            "downloads": -1,
            "filename": "glassflow-3.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "59a6a8b5f5d4317e6b95f3e55bf3c40c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 77755,
            "upload_time": "2025-09-05T15:53:22",
            "upload_time_iso_8601": "2025-09-05T15:53:22.012334Z",
            "url": "https://files.pythonhosted.org/packages/d0/40/215b76f9c340845aea5edf7dddd41d48833cdeb39ddb4a4a8b7d365867be/glassflow-3.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-05 15:53:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "glassflow",
    "github_project": "glassflow-python-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "glassflow"
}
        
Elapsed time: 1.51562s