[![PyPI version](https://badge.fury.io/py/odd-airflow2-integration.svg)](https://badge.fury.io/py/odd-airflow2-integration)
# Open Data Discovery Airflow 2 Integrator
Airflow plugin which tracks DAGs, tasks, tasks runs and sends them to the platform since DAG is run via [Airflow Listeners ](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/listeners.html)
## Requirements
* __Python >= 3.9__
* __Airflow >= 2.5.1__
* __Presence__ of an HTTP Connection with the name '__odd__'. That connection must have a __host__ property with yours
platforms host(fill a __port__ property if required) and a __password__ field with platform collectors token.
This connection MUST be represented before your scheduler is in run, we recommend using [AWS Param store](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/secrets-backends/aws-ssm-parameter-store.html),
Azure KV or similar backends.
## Installation
The package must be installed alongside Airflow
```bash
poetry add odd-airflow2-integration
# or
pip install odd-airflow2-integration
```
## Lineage
To build a proper lineage for tasks we need somehow to deliver the information
about what are the inputs and outputs for each task. So we decided to follow the
old Airflow concepts for lineage creation and use the `inlets` and `outlets`
attributes.
So `inlets`/`outlets` attributes are being used to list Datasets' ODDRNs that
are considered to be the inputs/outputs for the task.
Example of defining `inlets` and `outlets` using TaskFlow:
```python
@task(
task_id="task_2",
inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)
def transform(data_dict: dict):
pass
task_2 = transform()
```
Example using Operators:
```python
task_2 = PythonOperator(
task_id="task_2",
python_callable=transform,
inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)
```
Also it is worth to mention that neither `inlets` nor `outlets` can not be
templated using the `template_fields` of Operators that have this option.
More information about this topic is presented in the comment section for
the following [issue](https://github.com/opendatadiscovery/odd-airflow-2/issues/8#issuecomment-1884554977).
Raw data
{
"_id": null,
"home_page": "https://github.com/opendatadiscovery/odd-airflow-2",
"name": "odd-airflow2-integration",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "airflow2,opendatadiscovery",
"author": "Open Data Discovery",
"author_email": "pypi@opendatadiscovery.org",
"download_url": "https://files.pythonhosted.org/packages/3d/80/67703f305690ab31662106d0701feac17c1d327b4eb964218dc451edfd32/odd_airflow2_integration-0.0.8.tar.gz",
"platform": null,
"description": "[![PyPI version](https://badge.fury.io/py/odd-airflow2-integration.svg)](https://badge.fury.io/py/odd-airflow2-integration)\n\n# Open Data Discovery Airflow 2 Integrator\n\nAirflow plugin which tracks DAGs, tasks, tasks runs and sends them to the platform since DAG is run via [Airflow Listeners ](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/listeners.html)\n\n## Requirements\n\n* __Python >= 3.9__\n* __Airflow >= 2.5.1__\n* __Presence__ of an HTTP Connection with the name '__odd__'. That connection must have a __host__ property with yours\nplatforms host(fill a __port__ property if required) and a __password__ field with platform collectors token.\nThis connection MUST be represented before your scheduler is in run, we recommend using [AWS Param store](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/secrets-backends/aws-ssm-parameter-store.html), \nAzure KV or similar backends.\n\n## Installation\n\nThe package must be installed alongside Airflow\n\n```bash\npoetry add odd-airflow2-integration\n# or\npip install odd-airflow2-integration\n```\n\n## Lineage\nTo build a proper lineage for tasks we need somehow to deliver the information\nabout what are the inputs and outputs for each task. So we decided to follow the\nold Airflow concepts for lineage creation and use the `inlets` and `outlets`\nattributes.\n\nSo `inlets`/`outlets` attributes are being used to list Datasets' ODDRNs that\nare considered to be the inputs/outputs for the task.\n\nExample of defining `inlets` and `outlets` using TaskFlow:\n```python\n@task(\n task_id=\"task_2\",\n inlets=[\"//airflow/internal_host/dags/test_dag/tasks/task_1\", ],\n outlets=[\"//airflow/internal_host/dags/test_dag/tasks/task_3\", ]\n)\ndef transform(data_dict: dict):\n pass\n\ntask_2 = transform()\n```\nExample using Operators:\n```python\ntask_2 = PythonOperator(\n task_id=\"task_2\",\n python_callable=transform,\n inlets=[\"//airflow/internal_host/dags/test_dag/tasks/task_1\", ],\n outlets=[\"//airflow/internal_host/dags/test_dag/tasks/task_3\", ]\n)\n```\n\nAlso it is worth to mention that neither `inlets` nor `outlets` can not be\ntemplated using the `template_fields` of Operators that have this option.\nMore information about this topic is presented in the comment section for\nthe following [issue](https://github.com/opendatadiscovery/odd-airflow-2/issues/8#issuecomment-1884554977).",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "ODD integration with Airflow",
"version": "0.0.8",
"project_urls": {
"Homepage": "https://github.com/opendatadiscovery/odd-airflow-2",
"Repository": "https://github.com/opendatadiscovery/odd-airflow-2"
},
"split_keywords": [
"airflow2",
"opendatadiscovery"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "79cd91ab1e7b9c5019b3da8aa3a5d6574911adb53304bcc161637624e5269cc9",
"md5": "139388990f6486fc0e2f529fe233b6d4",
"sha256": "c8b10ab7b046634879dce060c8a2092c51058a8644c9ba12f861f50ee0ff05c3"
},
"downloads": -1,
"filename": "odd_airflow2_integration-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "139388990f6486fc0e2f529fe233b6d4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 13441,
"upload_time": "2024-01-10T12:48:30",
"upload_time_iso_8601": "2024-01-10T12:48:30.958800Z",
"url": "https://files.pythonhosted.org/packages/79/cd/91ab1e7b9c5019b3da8aa3a5d6574911adb53304bcc161637624e5269cc9/odd_airflow2_integration-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3d8067703f305690ab31662106d0701feac17c1d327b4eb964218dc451edfd32",
"md5": "c992c897977c7a66d39be6b8f4a6f756",
"sha256": "02496969d28632510b249a2651d445e939ea58de81c079d4a346303e71f75457"
},
"downloads": -1,
"filename": "odd_airflow2_integration-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "c992c897977c7a66d39be6b8f4a6f756",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 8437,
"upload_time": "2024-01-10T12:48:32",
"upload_time_iso_8601": "2024-01-10T12:48:32.550924Z",
"url": "https://files.pythonhosted.org/packages/3d/80/67703f305690ab31662106d0701feac17c1d327b4eb964218dc451edfd32/odd_airflow2_integration-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-10 12:48:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "opendatadiscovery",
"github_project": "odd-airflow-2",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "odd-airflow2-integration"
}