openlineage-dagster


Nameopenlineage-dagster JSON
Version 1.11.3 PyPI version JSON
download
home_pageNone
SummaryOpenLineage integration with Dagster
upload_time2024-04-04 20:18:04
maintainerNone
docs_urlNone
authorOpenLineage
requires_python>=3.8
licenseNone
keywords openlineage
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            > **Note** </br>
> New integration maintainers are needed! Please [open an issue](https://github.com/OpenLineage/OpenLineage/issues/new) to get started.

# OpenLineage Dagster Integration

A library that integrates [Dagster](https://dagster.io/) with [OpenLineage](https://openlineage.io) for automatic metadata collection.
It provides an OpenLineage sensor, a Dagster sensor that tails Dagster event logs for tracking metadata.
On each sensor evaluation, the function processes a batch of event logs, converts Dagster events into OpenLineage events, 
and emits them to an OpenLineage backend.

## Features

**Metadata**

* Dagster job & op lifecycle

## Requirements

- [Python 3.8](https://www.python.org/downloads)
- [Dagster 1.0.0+](https://dagster.io/)

## Installation

```bash
$ python -m pip install openlineage-dagster
```

## Usage

### OpenLineage Sensor & Event Log Storage Requirements

**Single OpenLineage sensor per Dagster instance** <br />
As the OpenLineage sensor processes all event logs for a given Dagster instance, define and enable only one sensor per instance. 
Running multiple sensors will result in duplicate OpenLineage job runs being emitted for Dagster steps with different OpenLineage run IDs. These are dynamically generated during sensor runs.

**Non-sharded [Event Log Storage](https://docs.dagster.io/deployment/dagster-instance#event-log-storage)** <br />
For the sensor to handle all event logs across runs, use non-sharded event log storage.
If an event log storage sharded by run (i.e., the default `SqliteEventLogStorage`) is used, the cursor that tracks the last processed event storage ID may not update properly. 

### OpenLineage Sensor Setup

Get a OpenLineage sensor definition from the `openlineage_sensor()` factory function and add it to your Dagster repository.

```python
from dagster import repository
from openlineage.dagster.sensor import openlineage_sensor


@repository
def my_repository():
    openlineage_sensor_def = openlineage_sensor()
    return other_defs + [openlineage_sensor_def]
```

Given that parallel sensor runs are not supported at the time of writing, some tuning may be necessary to avoid affecting other sensors' performance.

See Dagster's documentation on [Evaluation Interval](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#evaluation-interval)
for more detail on `minimum_interval_seconds`, which defaults to 30 seconds.
`record_filter_limit` is the maximum number of event logs to process on each sensor evaluation, and it defaults to 30 records per evaluation.
Default values can be overridden:

```python
@repository
def my_repository():
    openlineage_sensor_def = openlineage_sensor(
        minimum_interval_seconds=60,
        record_filter_limit=60,
    )
    return other_defs + [openlineage_sensor_def]
```


The OpenLineage sensor handles event logs in ascending order of storage ID and starts with the first log by default.
Optionally, `after_storage_id` can be specified to customize the starting point.
This is only applicable when the cursor is undefined or has been deleted.

```python
@repository
def my_repository():
    openlineage_sensor_def = openlineage_sensor(
        after_storage_id=100
    )
    return other_defs + [openlineage_sensor_def]
```

### OpenLineage Adapter & Client Configuration

The sensor uses an OpenLineage adapter and client to convert and push data to an OpenLineage backend.
These depend on environment variables.

If using User Repository Deployments, add the below variables to the repository where the sensor is defined.
Otherwise, add the variables to the Dagster Daemon.

* `OPENLINEAGE_URL` - point to the service which will consume OpenLineage events.
* `OPENLINEAGE_API_KEY` - set if the consumer of OpenLineage events requires a `Bearer` authentication key.
* `OPENLINEAGE_NAMESPACE` - set if you are using something other than the `default` as the default namespace when a Dagster repository is undefined. 

#### OpenLineage Namespace & Dagster Repository

For Dagster jobs organized in repositories, Dagster keeps track of the repository name for each pipeline run.
When the repository name is present, it is always used as the OpenLineage namespace name.
`OPENLINEAGE_NAMESPACE` option is a way to fall back and provide some other static default value. 


## Development

To install all dependencies for local development:

```bash
$ python -m pip install -e .[dev]  # or python -m pip install -e .\[dev\] in zsh 
```

To run the test suite:

```bash
$ pytest
```

----
SPDX-License-Identifier: Apache-2.0\
Copyright 2018-2023 contributors to the OpenLineage project

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "openlineage-dagster",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "openlineage",
    "author": "OpenLineage",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/9a/d7/89ad4ed125cb6a41e7e1fd6824271e2dc0a9ece07e028cb1d9ff64bedcda/openlineage-dagster-1.11.3.tar.gz",
    "platform": null,
    "description": "> **Note** </br>\n> New integration maintainers are needed! Please [open an issue](https://github.com/OpenLineage/OpenLineage/issues/new) to get started.\n\n# OpenLineage Dagster Integration\n\nA library that integrates [Dagster](https://dagster.io/) with [OpenLineage](https://openlineage.io) for automatic metadata collection.\nIt provides an OpenLineage sensor, a Dagster sensor that tails Dagster event logs for tracking metadata.\nOn each sensor evaluation, the function processes a batch of event logs, converts Dagster events into OpenLineage events, \nand emits them to an OpenLineage backend.\n\n## Features\n\n**Metadata**\n\n* Dagster job & op lifecycle\n\n## Requirements\n\n- [Python 3.8](https://www.python.org/downloads)\n- [Dagster 1.0.0+](https://dagster.io/)\n\n## Installation\n\n```bash\n$ python -m pip install openlineage-dagster\n```\n\n## Usage\n\n### OpenLineage Sensor & Event Log Storage Requirements\n\n**Single OpenLineage sensor per Dagster instance** <br />\nAs the OpenLineage sensor processes all event logs for a given Dagster instance, define and enable only one sensor per instance. \nRunning multiple sensors will result in duplicate OpenLineage job runs being emitted for Dagster steps with different OpenLineage run IDs. These are dynamically generated during sensor runs.\n\n**Non-sharded [Event Log Storage](https://docs.dagster.io/deployment/dagster-instance#event-log-storage)** <br />\nFor the sensor to handle all event logs across runs, use non-sharded event log storage.\nIf an event log storage sharded by run (i.e., the default `SqliteEventLogStorage`) is used, the cursor that tracks the last processed event storage ID may not update properly. \n\n### OpenLineage Sensor Setup\n\nGet a OpenLineage sensor definition from the `openlineage_sensor()` factory function and add it to your Dagster repository.\n\n```python\nfrom dagster import repository\nfrom openlineage.dagster.sensor import openlineage_sensor\n\n\n@repository\ndef my_repository():\n    openlineage_sensor_def = openlineage_sensor()\n    return other_defs + [openlineage_sensor_def]\n```\n\nGiven that parallel sensor runs are not supported at the time of writing, some tuning may be necessary to avoid affecting other sensors' performance.\n\nSee Dagster's documentation on [Evaluation Interval](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#evaluation-interval)\nfor more detail on `minimum_interval_seconds`, which defaults to 30 seconds.\n`record_filter_limit` is the maximum number of event logs to process on each sensor evaluation, and it defaults to 30 records per evaluation.\nDefault values can be overridden:\n\n```python\n@repository\ndef my_repository():\n    openlineage_sensor_def = openlineage_sensor(\n        minimum_interval_seconds=60,\n        record_filter_limit=60,\n    )\n    return other_defs + [openlineage_sensor_def]\n```\n\n\nThe OpenLineage sensor handles event logs in ascending order of storage ID and starts with the first log by default.\nOptionally, `after_storage_id` can be specified to customize the starting point.\nThis is only applicable when the cursor is undefined or has been deleted.\n\n```python\n@repository\ndef my_repository():\n    openlineage_sensor_def = openlineage_sensor(\n        after_storage_id=100\n    )\n    return other_defs + [openlineage_sensor_def]\n```\n\n### OpenLineage Adapter & Client Configuration\n\nThe sensor uses an OpenLineage adapter and client to convert and push data to an OpenLineage backend.\nThese depend on environment variables.\n\nIf using User Repository Deployments, add the below variables to the repository where the sensor is defined.\nOtherwise, add the variables to the Dagster Daemon.\n\n* `OPENLINEAGE_URL` - point to the service which will consume OpenLineage events.\n* `OPENLINEAGE_API_KEY` - set if the consumer of OpenLineage events requires a `Bearer` authentication key.\n* `OPENLINEAGE_NAMESPACE` - set if you are using something other than the `default` as the default namespace when a Dagster repository is undefined. \n\n#### OpenLineage Namespace & Dagster Repository\n\nFor Dagster jobs organized in repositories, Dagster keeps track of the repository name for each pipeline run.\nWhen the repository name is present, it is always used as the OpenLineage namespace name.\n`OPENLINEAGE_NAMESPACE` option is a way to fall back and provide some other static default value. \n\n\n## Development\n\nTo install all dependencies for local development:\n\n```bash\n$ python -m pip install -e .[dev]  # or python -m pip install -e .\\[dev\\] in zsh \n```\n\nTo run the test suite:\n\n```bash\n$ pytest\n```\n\n----\nSPDX-License-Identifier: Apache-2.0\\\nCopyright 2018-2023 contributors to the OpenLineage project\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "OpenLineage integration with Dagster",
    "version": "1.11.3",
    "project_urls": null,
    "split_keywords": [
        "openlineage"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ade5d62508bafb45f0a6ca77aa8847dfad71177eb7d1f6f22dec20e21c1dbda2",
                "md5": "6238857d55e0755ad5269cd3887a7a3c",
                "sha256": "0bc1c29c546c50a8cb95943a94084c588a8307b6053e368961bf9124e8dce574"
            },
            "downloads": -1,
            "filename": "openlineage_dagster-1.11.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6238857d55e0755ad5269cd3887a7a3c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9041,
            "upload_time": "2024-04-04T20:17:50",
            "upload_time_iso_8601": "2024-04-04T20:17:50.457897Z",
            "url": "https://files.pythonhosted.org/packages/ad/e5/d62508bafb45f0a6ca77aa8847dfad71177eb7d1f6f22dec20e21c1dbda2/openlineage_dagster-1.11.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ad789ad4ed125cb6a41e7e1fd6824271e2dc0a9ece07e028cb1d9ff64bedcda",
                "md5": "bf365ed6ca48211ba69b36b9897596b2",
                "sha256": "b2a181e8e440faa0430d06aca7603dcbde2862a2ba9511aeaea1ef1c6fa4b91f"
            },
            "downloads": -1,
            "filename": "openlineage-dagster-1.11.3.tar.gz",
            "has_sig": false,
            "md5_digest": "bf365ed6ca48211ba69b36b9897596b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 12901,
            "upload_time": "2024-04-04T20:18:04",
            "upload_time_iso_8601": "2024-04-04T20:18:04.739018Z",
            "url": "https://files.pythonhosted.org/packages/9a/d7/89ad4ed125cb6a41e7e1fd6824271e2dc0a9ece07e028cb1d9ff64bedcda/openlineage-dagster-1.11.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-04 20:18:04",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "openlineage-dagster"
}
        
Elapsed time: 0.21379s