zephflow

Name	zephflow JSON
Version	0.3.1 JSON
	download
home_page	None
Summary	Python SDK for ZephFlow data processing pipelines
upload_time	2025-10-07 23:22:12
maintainer	None
docs_url	None
author	Fleak Tech Inc.
requires_python	<4.0.0,>=3.8.1
license	Apache-2.0
keywords	data-processing streaming etl pipeline workflow
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ZephFlow Python SDK

[![PyPI version](https://img.shields.io/pypi/v/zephflow.svg)](https://pypi.org/project/zephflow/)
[![Python Versions](https://img.shields.io/pypi/pyversions/zephflow.svg)](https://pypi.org/project/zephflow/)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

Python SDK for building and running ZephFlow data processing pipelines. ZephFlow provides a powerful, intuitive API for stream processing, data transformation, and event-driven architectures.

## Features

- **Simple, fluent API** for building data processing pipelines
- **Powerful filtering** using JSONPath expressions
- **Data transformation** with the eval expression language
- **Flow composition** - merge and combine multiple flows
- **Error handling** with assertions and error tracking
- **Multiple sink options** for outputting processed data
- **Java-based engine** for high performance processing

## Documentation

For comprehensive documentation, tutorials, and API reference, visit: [https://docs.fleak.ai/zephflow](https://docs.fleak.ai/zephflow)

## Prerequisites

- Python 3.8 or higher
- Java 17 or higher (required for the processing engine)

## Installation

Install ZephFlow using pip:

```bash
pip install zephflow
```

## Quick Start

Here's a simple example to get you started with ZephFlow:

```python
import zephflow

# Create a flow that filters and transforms events
flow = (
    zephflow.ZephFlow.start_flow()
    .filter("$.value > 10")  # Keep only events with value > 10
    .eval("""
        dict(
            id=$.id,
            doubled_value=$.value * 2,
            category=case(
                $.value < 20 => 'medium',
                _ => 'high'
            )
        )
    """)
)

# Process some events
events = [
    {"id": 1, "value": 5},   # Will be filtered out
    {"id": 2, "value": 15},  # Will be processed
    {"id": 3, "value": 25}   # Will be processed
]

result = flow.process(events)
print(f"Processed {result.getOutputEvents().size()} events")
```

If you already have a workflow file:

```python
import zephflow

zephflow.ZephFlow.execute_dag("my_dag.yaml")
```

## Troubleshooting
### macOS SSL Certificate Issue
If you're on macOS and encounter an error like:

```<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)>
This indicates that Python cannot verify SSL certificates due to missing system root certificates.
```

#### Solution
Run the certificate installation script that comes with your Python installation:

```
/Applications/Python\ 3.x/Install\ Certificates.command
Replace 3.x with your installed version (e.g., 3.10). This installs the necessary certificates so Python can verify HTTPS downloads.
```

## Core Concepts

### Filtering

Use JSONPath expressions to filter events:

```python
flow = (
    zephflow.ZephFlow.start_flow()
    .filter("$.priority == 'high' && $.value >= 100")
)
```

### Transformation

Transform data using the eval expression language:

```python
flow = (
    zephflow.ZephFlow.start_flow()
    .eval("""
        dict(
            timestamp=now(),
            original_id=$.id,
            processed_value=$.value * 1.1,
            status='processed'
        )
    """)
)
```

### Merging Flows

Combine multiple flows for complex processing logic:

```python
high_priority = zephflow.ZephFlow.start_flow().filter("$.priority == 'high'")
large_value = zephflow.ZephFlow.start_flow().filter("$.value >= 1000")

merged = zephflow.ZephFlow.merge(high_priority, large_value)
```

### Error Handling

Add assertions to validate data and handle errors:

```python
flow = (
  zephflow.ZephFlow.start_flow()
  .assertion("$.required_field != null")
  .assertion("$.value >= 0")
  .eval("dict(id=$.id, validated_value=$.value)")
)

result = flow.process(events, include_error_by_step=True)
if result.getErrorByStep().size() > 0:
  print("Some events failed validation")
```

## S3 Dead Letter Queue (DLQ)

ZephFlow supports automatic error handling by storing failed events to Amazon S3 using a Dead Letter Queue mechanism. **S3 DLQ works with data sources** (like file_source, kafka_source, etc.) and captures events that fail during **data ingestion, conversion, or pipeline processing** (including filter, assertion, eval failures).

### S3 DLQ Configuration with File Source

Configure S3 DLQ to automatically capture events that fail during data source processing:

```python
import tempfile
import json
import zephflow
from zephflow import JobContext, S3DlqConfig

# Create test data file with some invalid data
test_data = [
    {"user_id": 1, "value": 100, "category": "A"},
    {"user_id": 2, "value": 200, "category": "B"},
    "invalid_json_string",  # This will cause parsing failure -> DLQ
    {"malformed": "json", "missing": 0 },  # This will cause parsing failure -> DLQ
]

# Write test data to file (including invalid JSON)
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
    for item in test_data:
        if isinstance(item, dict):
            f.write(json.dumps(item) + '\n')
        else:
            f.write(str(item) + '\n')  # Write invalid JSON
    input_file = f.name

# Configure S3 DLQ
dlq_config = S3DlqConfig(
    region="us-west-2",
    bucket="error-events-bucket",
    batch_size=100,                    # Events to batch before writing
    flush_interval_millis=30000,       # Max wait time (30 seconds)
    access_key_id="your-access-key",
    secret_access_key="your-secret-key"
)

# Create JobContext with DLQ configuration
job_context = (
    JobContext.builder()
    .metric_tags({"env": "production", "service": "data-processor"})
    .dlq_config(dlq_config)
    .build()
)

# Create a flow with file source - DLQ will capture parsing failures
flow = (
    zephflow.ZephFlow.start_flow(job_context)
    .file_source(input_file, "JSON_OBJECT")  # Invalid JSON lines will go to DLQ
    .filter("$.value > 0")                   # Normal pipeline processing
    .eval("""
        dict(
            user_id=$.user_id,
            processed_value=$.value * 1.1,
            processed_at=now()
        )
    """)
    .stdout_sink("JSON_OBJECT")
)

# Execute the flow - source parsing failures will be sent to S3 DLQ
flow.execute("data-processor", "production", "json-processor")
print(f"Invalid JSON events sent to S3 DLQ: error-events-bucket")

# Cleanup
import os
os.unlink(input_file)
```

### S3 DLQ with Kafka Source

S3 DLQ also works with streaming sources like Kafka to capture deserialization failures:

```python
import zephflow
from zephflow import JobContext, S3DlqConfig

# Configure S3 DLQ for Kafka processing errors
dlq_config = S3DlqConfig(
    region="us-east-1",
    bucket="kafka-processing-errors",
    batch_size=50,
    flush_interval_millis=10000,
    access_key_id="your-access-key",
    secret_access_key="your-secret-key"
)

job_context = (
    JobContext.builder()
    .dlq_config(dlq_config)
    .metric_tags({"env": "production", "service": "kafka-processor"})
    .build()
)

# Kafka source with DLQ - will capture messages that fail JSON parsing
flow = (
    zephflow.ZephFlow.start_flow(job_context)
    .kafka_source(
        broker="localhost:9092",
        topic="user-events",
        group_id="processor-group",
        encoding_type="JSON_OBJECT"  # Invalid JSON messages will go to DLQ
    )
    .filter("$.event_type == 'purchase'")
    .eval("""
        dict(
            user_id=$.user_id,
            amount=$.amount,
            processed_at=now()
        )
    """)
    .stdout_sink("JSON_OBJECT")
)

# This would run continuously, capturing Kafka deserialization failures to S3 DLQ
# flow.execute("kafka-processor", "production", "purchase-events")
```

### S3 DLQ with Pipeline Processing Failures

S3 DLQ also captures pipeline processing failures like assertion errors:

```python
import tempfile
import json
import zephflow
from zephflow import JobContext, S3DlqConfig

# Create test data with values that will cause assertion failures
test_data = [
    {"user_id": 1, "value": 100, "category": "A"},  # Will pass
    {"user_id": 2, "value": 1500, "category": "B"}, # Will fail assertion (> 1000) -> DLQ
    {"user_id": 3, "value": 50, "category": "A"},   # Will pass
    {"user_id": 4, "value": 2000, "category": "C"}, # Will fail assertion (> 1000) -> DLQ
]

# Write test data to file
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
    for item in test_data:
        f.write(json.dumps(item) + '\n')
    input_file = f.name

# Configure S3 DLQ
dlq_config = S3DlqConfig(
    region="us-west-2",
    bucket="pipeline-error-events",
    batch_size=10,
    flush_interval_millis=5000,
    access_key_id="your-access-key",
    secret_access_key="your-secret-key"
)

job_context = (
    JobContext.builder()
    .metric_tags({"env": "production", "service": "data-validator"})
    .dlq_config(dlq_config)
    .build()
)

# Pipeline with assertion that will cause some events to fail
flow = (
    zephflow.ZephFlow.start_flow(job_context)
    .file_source(input_file, "JSON_OBJECT")
    .filter("$.value > 0")                  # Basic filtering
    .assertion("$.value < 1000")            # This will fail for value=1500,2000 -> DLQ
    .eval("""
        dict(
            user_id=$.user_id,
            validated_value=$.value,
            processed_at=now()
        )
    """)
    .stdout_sink("JSON_OBJECT")
)

# Execute - assertion failures will be sent to S3 DLQ
flow.execute("data-validator", "production", "validation-service")
print(f"Assertion failures sent to S3 DLQ: pipeline-error-events")

# Cleanup
import os
os.unlink(input_file)
```

### S3 DLQ Configuration Options

The `S3DlqConfig` supports the following parameters:

- `region`: AWS region where the DLQ bucket is located
- `bucket`: S3 bucket name for storing failed events
- `batch_size`: Number of events to batch before writing (default: 100)
- `flush_interval_millis`: Maximum time to wait before flushing events (default: 5000ms)
- `access_key_id`: AWS access key (optional, uses default credential chain if not provided)
- `secret_access_key`: AWS secret key (optional, uses default credential chain if not provided)

### DLQ Error Event Format

Failed source events are stored in S3 using Avro format with the following structure:

- **processingTimestamp**: Timestamp when the error occurred (milliseconds)
- **key**: Original message key (bytes, nullable)
- **value**: Original message value (bytes, nullable)
- **metadata**: Additional metadata about the source (map of strings, nullable)
- **errorMessage**: Error details including stack trace (string)

### Common S3 DLQ Use Cases

S3 DLQ captures failures when using **data sources**, including:

- **JSON parsing failures** in file_source or kafka_source
- **Deserialization errors** when converting raw data to structured format
- **Schema validation failures** at the source level
- **Network or I/O errors** during data fetching
- **Pipeline processing failures** like assertion failures, eval errors, or filter exceptions
- **Data transformation errors** in any pipeline step

**Note**: S3 DLQ **only works with data sources** (file_source, kafka_source, etc.). When using `flow.process(events)` with in-memory data, pipeline failures are handled through `result.getErrorByStep()` instead.

## Examples

For more detailed examples, check out [Quick Start Example](https://github.com/fleaktech/zephflow-python-sdk/blob/main/examples/quickstart.py) - Basic filtering and transformation

## Environment Variables

- `ZEPHFLOW_MAIN_JAR` - Path to a custom ZephFlow JAR file (optional)
- `ZEPHFLOW_JAR_DIR` - Directory for storing downloaded JAR files (optional)


## Support

- **Documentation**: [https://docs.fleak.ai/zephflow](https://docs.fleak.ai/zephflow)
- **Discussions**: [Slack](https://join.slack.com/t/fleak-hq/shared_invite/zt-361k9cnhf-9~mmjpOH1IbZfRxeXplfKA)

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## About Fleak

ZephFlow is developed and maintained by [Fleak Tech Inc.](https://fleak.ai), building the future of data processing and streaming analytics.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "zephflow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": "data-processing, streaming, etl, pipeline, workflow",
    "author": "Fleak Tech Inc.",
    "author_email": "contact@fleak.ai",
    "download_url": "https://files.pythonhosted.org/packages/33/2c/1756a721dd450ff7bd038891f7c8bba4b459b16b4dbcbcb5a4d99654048e/zephflow-0.3.1.tar.gz",
    "platform": null,
    "description": "# ZephFlow Python SDK\n\n[![PyPI version](https://img.shields.io/pypi/v/zephflow.svg)](https://pypi.org/project/zephflow/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/zephflow.svg)](https://pypi.org/project/zephflow/)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\nPython SDK for building and running ZephFlow data processing pipelines. ZephFlow provides a powerful, intuitive API for stream processing, data transformation, and event-driven architectures.\n\n## Features\n\n- **Simple, fluent API** for building data processing pipelines\n- **Powerful filtering** using JSONPath expressions\n- **Data transformation** with the eval expression language\n- **Flow composition** - merge and combine multiple flows\n- **Error handling** with assertions and error tracking\n- **Multiple sink options** for outputting processed data\n- **Java-based engine** for high performance processing\n\n## Documentation\n\nFor comprehensive documentation, tutorials, and API reference, visit: [https://docs.fleak.ai/zephflow](https://docs.fleak.ai/zephflow)\n\n## Prerequisites\n\n- Python 3.8 or higher\n- Java 17 or higher (required for the processing engine)\n\n## Installation\n\nInstall ZephFlow using pip:\n\n```bash\npip install zephflow\n```\n\n## Quick Start\n\nHere's a simple example to get you started with ZephFlow:\n\n```python\nimport zephflow\n\n# Create a flow that filters and transforms events\nflow = (\n    zephflow.ZephFlow.start_flow()\n    .filter(\"$.value > 10\")  # Keep only events with value > 10\n    .eval(\"\"\"\n        dict(\n            id=$.id,\n            doubled_value=$.value * 2,\n            category=case(\n                $.value < 20 => 'medium',\n                _ => 'high'\n            )\n        )\n    \"\"\")\n)\n\n# Process some events\nevents = [\n    {\"id\": 1, \"value\": 5},   # Will be filtered out\n    {\"id\": 2, \"value\": 15},  # Will be processed\n    {\"id\": 3, \"value\": 25}   # Will be processed\n]\n\nresult = flow.process(events)\nprint(f\"Processed {result.getOutputEvents().size()} events\")\n```\n\nIf you already have a workflow file:\n\n```python\nimport zephflow\n\nzephflow.ZephFlow.execute_dag(\"my_dag.yaml\")\n```\n\n## Troubleshooting\n### macOS SSL Certificate Issue\nIf you're on macOS and encounter an error like:\n\n```<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)>\nThis indicates that Python cannot verify SSL certificates due to missing system root certificates.\n```\n\n#### Solution\nRun the certificate installation script that comes with your Python installation:\n\n```\n/Applications/Python\\ 3.x/Install\\ Certificates.command\nReplace 3.x with your installed version (e.g., 3.10). This installs the necessary certificates so Python can verify HTTPS downloads.\n```\n\n## Core Concepts\n\n### Filtering\n\nUse JSONPath expressions to filter events:\n\n```python\nflow = (\n    zephflow.ZephFlow.start_flow()\n    .filter(\"$.priority == 'high' && $.value >= 100\")\n)\n```\n\n### Transformation\n\nTransform data using the eval expression language:\n\n```python\nflow = (\n    zephflow.ZephFlow.start_flow()\n    .eval(\"\"\"\n        dict(\n            timestamp=now(),\n            original_id=$.id,\n            processed_value=$.value * 1.1,\n            status='processed'\n        )\n    \"\"\")\n)\n```\n\n### Merging Flows\n\nCombine multiple flows for complex processing logic:\n\n```python\nhigh_priority = zephflow.ZephFlow.start_flow().filter(\"$.priority == 'high'\")\nlarge_value = zephflow.ZephFlow.start_flow().filter(\"$.value >= 1000\")\n\nmerged = zephflow.ZephFlow.merge(high_priority, large_value)\n```\n\n### Error Handling\n\nAdd assertions to validate data and handle errors:\n\n```python\nflow = (\n  zephflow.ZephFlow.start_flow()\n  .assertion(\"$.required_field != null\")\n  .assertion(\"$.value >= 0\")\n  .eval(\"dict(id=$.id, validated_value=$.value)\")\n)\n\nresult = flow.process(events, include_error_by_step=True)\nif result.getErrorByStep().size() > 0:\n  print(\"Some events failed validation\")\n```\n\n## S3 Dead Letter Queue (DLQ)\n\nZephFlow supports automatic error handling by storing failed events to Amazon S3 using a Dead Letter Queue mechanism. **S3 DLQ works with data sources** (like file_source, kafka_source, etc.) and captures events that fail during **data ingestion, conversion, or pipeline processing** (including filter, assertion, eval failures).\n\n### S3 DLQ Configuration with File Source\n\nConfigure S3 DLQ to automatically capture events that fail during data source processing:\n\n```python\nimport tempfile\nimport json\nimport zephflow\nfrom zephflow import JobContext, S3DlqConfig\n\n# Create test data file with some invalid data\ntest_data = [\n    {\"user_id\": 1, \"value\": 100, \"category\": \"A\"},\n    {\"user_id\": 2, \"value\": 200, \"category\": \"B\"},\n    \"invalid_json_string\",  # This will cause parsing failure -> DLQ\n    {\"malformed\": \"json\", \"missing\": 0 },  # This will cause parsing failure -> DLQ\n]\n\n# Write test data to file (including invalid JSON)\nwith tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:\n    for item in test_data:\n        if isinstance(item, dict):\n            f.write(json.dumps(item) + '\\n')\n        else:\n            f.write(str(item) + '\\n')  # Write invalid JSON\n    input_file = f.name\n\n# Configure S3 DLQ\ndlq_config = S3DlqConfig(\n    region=\"us-west-2\",\n    bucket=\"error-events-bucket\",\n    batch_size=100,                    # Events to batch before writing\n    flush_interval_millis=30000,       # Max wait time (30 seconds)\n    access_key_id=\"your-access-key\",\n    secret_access_key=\"your-secret-key\"\n)\n\n# Create JobContext with DLQ configuration\njob_context = (\n    JobContext.builder()\n    .metric_tags({\"env\": \"production\", \"service\": \"data-processor\"})\n    .dlq_config(dlq_config)\n    .build()\n)\n\n# Create a flow with file source - DLQ will capture parsing failures\nflow = (\n    zephflow.ZephFlow.start_flow(job_context)\n    .file_source(input_file, \"JSON_OBJECT\")  # Invalid JSON lines will go to DLQ\n    .filter(\"$.value > 0\")                   # Normal pipeline processing\n    .eval(\"\"\"\n        dict(\n            user_id=$.user_id,\n            processed_value=$.value * 1.1,\n            processed_at=now()\n        )\n    \"\"\")\n    .stdout_sink(\"JSON_OBJECT\")\n)\n\n# Execute the flow - source parsing failures will be sent to S3 DLQ\nflow.execute(\"data-processor\", \"production\", \"json-processor\")\nprint(f\"Invalid JSON events sent to S3 DLQ: error-events-bucket\")\n\n# Cleanup\nimport os\nos.unlink(input_file)\n```\n\n### S3 DLQ with Kafka Source\n\nS3 DLQ also works with streaming sources like Kafka to capture deserialization failures:\n\n```python\nimport zephflow\nfrom zephflow import JobContext, S3DlqConfig\n\n# Configure S3 DLQ for Kafka processing errors\ndlq_config = S3DlqConfig(\n    region=\"us-east-1\",\n    bucket=\"kafka-processing-errors\",\n    batch_size=50,\n    flush_interval_millis=10000,\n    access_key_id=\"your-access-key\",\n    secret_access_key=\"your-secret-key\"\n)\n\njob_context = (\n    JobContext.builder()\n    .dlq_config(dlq_config)\n    .metric_tags({\"env\": \"production\", \"service\": \"kafka-processor\"})\n    .build()\n)\n\n# Kafka source with DLQ - will capture messages that fail JSON parsing\nflow = (\n    zephflow.ZephFlow.start_flow(job_context)\n    .kafka_source(\n        broker=\"localhost:9092\",\n        topic=\"user-events\",\n        group_id=\"processor-group\",\n        encoding_type=\"JSON_OBJECT\"  # Invalid JSON messages will go to DLQ\n    )\n    .filter(\"$.event_type == 'purchase'\")\n    .eval(\"\"\"\n        dict(\n            user_id=$.user_id,\n            amount=$.amount,\n            processed_at=now()\n        )\n    \"\"\")\n    .stdout_sink(\"JSON_OBJECT\")\n)\n\n# This would run continuously, capturing Kafka deserialization failures to S3 DLQ\n# flow.execute(\"kafka-processor\", \"production\", \"purchase-events\")\n```\n\n### S3 DLQ with Pipeline Processing Failures\n\nS3 DLQ also captures pipeline processing failures like assertion errors:\n\n```python\nimport tempfile\nimport json\nimport zephflow\nfrom zephflow import JobContext, S3DlqConfig\n\n# Create test data with values that will cause assertion failures\ntest_data = [\n    {\"user_id\": 1, \"value\": 100, \"category\": \"A\"},  # Will pass\n    {\"user_id\": 2, \"value\": 1500, \"category\": \"B\"}, # Will fail assertion (> 1000) -> DLQ\n    {\"user_id\": 3, \"value\": 50, \"category\": \"A\"},   # Will pass\n    {\"user_id\": 4, \"value\": 2000, \"category\": \"C\"}, # Will fail assertion (> 1000) -> DLQ\n]\n\n# Write test data to file\nwith tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:\n    for item in test_data:\n        f.write(json.dumps(item) + '\\n')\n    input_file = f.name\n\n# Configure S3 DLQ\ndlq_config = S3DlqConfig(\n    region=\"us-west-2\",\n    bucket=\"pipeline-error-events\",\n    batch_size=10,\n    flush_interval_millis=5000,\n    access_key_id=\"your-access-key\",\n    secret_access_key=\"your-secret-key\"\n)\n\njob_context = (\n    JobContext.builder()\n    .metric_tags({\"env\": \"production\", \"service\": \"data-validator\"})\n    .dlq_config(dlq_config)\n    .build()\n)\n\n# Pipeline with assertion that will cause some events to fail\nflow = (\n    zephflow.ZephFlow.start_flow(job_context)\n    .file_source(input_file, \"JSON_OBJECT\")\n    .filter(\"$.value > 0\")                  # Basic filtering\n    .assertion(\"$.value < 1000\")            # This will fail for value=1500,2000 -> DLQ\n    .eval(\"\"\"\n        dict(\n            user_id=$.user_id,\n            validated_value=$.value,\n            processed_at=now()\n        )\n    \"\"\")\n    .stdout_sink(\"JSON_OBJECT\")\n)\n\n# Execute - assertion failures will be sent to S3 DLQ\nflow.execute(\"data-validator\", \"production\", \"validation-service\")\nprint(f\"Assertion failures sent to S3 DLQ: pipeline-error-events\")\n\n# Cleanup\nimport os\nos.unlink(input_file)\n```\n\n### S3 DLQ Configuration Options\n\nThe `S3DlqConfig` supports the following parameters:\n\n- `region`: AWS region where the DLQ bucket is located\n- `bucket`: S3 bucket name for storing failed events\n- `batch_size`: Number of events to batch before writing (default: 100)\n- `flush_interval_millis`: Maximum time to wait before flushing events (default: 5000ms)\n- `access_key_id`: AWS access key (optional, uses default credential chain if not provided)\n- `secret_access_key`: AWS secret key (optional, uses default credential chain if not provided)\n\n### DLQ Error Event Format\n\nFailed source events are stored in S3 using Avro format with the following structure:\n\n- **processingTimestamp**: Timestamp when the error occurred (milliseconds)\n- **key**: Original message key (bytes, nullable)\n- **value**: Original message value (bytes, nullable)\n- **metadata**: Additional metadata about the source (map of strings, nullable)\n- **errorMessage**: Error details including stack trace (string)\n\n### Common S3 DLQ Use Cases\n\nS3 DLQ captures failures when using **data sources**, including:\n\n- **JSON parsing failures** in file_source or kafka_source\n- **Deserialization errors** when converting raw data to structured format\n- **Schema validation failures** at the source level\n- **Network or I/O errors** during data fetching\n- **Pipeline processing failures** like assertion failures, eval errors, or filter exceptions\n- **Data transformation errors** in any pipeline step\n\n**Note**: S3 DLQ **only works with data sources** (file_source, kafka_source, etc.). When using `flow.process(events)` with in-memory data, pipeline failures are handled through `result.getErrorByStep()` instead.\n\n## Examples\n\nFor more detailed examples, check out [Quick Start Example](https://github.com/fleaktech/zephflow-python-sdk/blob/main/examples/quickstart.py) - Basic filtering and transformation\n\n## Environment Variables\n\n- `ZEPHFLOW_MAIN_JAR` - Path to a custom ZephFlow JAR file (optional)\n- `ZEPHFLOW_JAR_DIR` - Directory for storing downloaded JAR files (optional)\n\n\n## Support\n\n- **Documentation**: [https://docs.fleak.ai/zephflow](https://docs.fleak.ai/zephflow)\n- **Discussions**: [Slack](https://join.slack.com/t/fleak-hq/shared_invite/zt-361k9cnhf-9~mmjpOH1IbZfRxeXplfKA)\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## About Fleak\n\nZephFlow is developed and maintained by [Fleak Tech Inc.](https://fleak.ai), building the future of data processing and streaming analytics.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Python SDK for ZephFlow data processing pipelines",
    "version": "0.3.1",
    "project_urls": {
        "Documentation": "https://docs.fleak.ai/zephflow",
        "Homepage": "https://github.com/fleaktech/zephflow-python-sdk",
        "Repository": "https://github.com/fleaktech/zephflow-python-sdk"
    },
    "split_keywords": [
        "data-processing",
        " streaming",
        " etl",
        " pipeline",
        " workflow"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cd3ebf548ad03c5f43616c545aca3cfefe897fd72ce948dd0a5eaf3e3a8ef97d",
                "md5": "52cce9372e5050f84defb1caf3526090",
                "sha256": "37b1ec5fce1c5844506e85c451b293f3a081e794e5b2748fc27732113e899ecc"
            },
            "downloads": -1,
            "filename": "zephflow-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "52cce9372e5050f84defb1caf3526090",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.8.1",
            "size": 20401,
            "upload_time": "2025-10-07T23:22:11",
            "upload_time_iso_8601": "2025-10-07T23:22:11.323029Z",
            "url": "https://files.pythonhosted.org/packages/cd/3e/bf548ad03c5f43616c545aca3cfefe897fd72ce948dd0a5eaf3e3a8ef97d/zephflow-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "332c1756a721dd450ff7bd038891f7c8bba4b459b16b4dbcbcb5a4d99654048e",
                "md5": "534ccacd7bb5e25d054d24ec3982377f",
                "sha256": "0ad53ad268077d58b66df9009868f59af2752cda7ffffa06966edeefe1d8bedb"
            },
            "downloads": -1,
            "filename": "zephflow-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "534ccacd7bb5e25d054d24ec3982377f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.8.1",
            "size": 21264,
            "upload_time": "2025-10-07T23:22:12",
            "upload_time_iso_8601": "2025-10-07T23:22:12.524985Z",
            "url": "https://files.pythonhosted.org/packages/33/2c/1756a721dd450ff7bd038891f7c8bba4b459b16b4dbcbcb5a4d99654048e/zephflow-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-07 23:22:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fleaktech",
    "github_project": "zephflow-python-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zephflow"
}

Fleak Tech Inc.