airflow-hop-plugin


Nameairflow-hop-plugin JSON
Version 0.1.0b6 PyPI version JSON
download
home_pagehttps://github.com/damavis/airflow-hop-plugin
SummaryNone
upload_time2024-09-25 13:11:25
maintainerNone
docs_urlNone
authorDamavis
requires_python>=3
licenseApache 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            [![Build Status](https://api.travis-ci.com/damavis/airflow-hop-plugin.svg?branch=main)](https://app.travis-ci.com/damavis/airflow-hop-plugin)
[![codecov](https://codecov.io/gh/damavis/airflow-hop-plugin/branch/main/graph/badge.svg?token=IRE2T3NEOD)](https://codecov.io/gh/damavis/airflow-hop-plugin)
[![PyPI](https://img.shields.io/pypi/v/airflow-hop-plugin)](https://pypi.org/project/airflow-hop-plugin/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/airflow-hop-plugin)](https://pypi.org/project/airflow-hop-plugin/)

# Hop Airflow plugin

This is an Apache Hop plugin for Apache Airflow in order to orchestrate Apache Hop pipelines and workflows from Airflow.

## Requirements

Before setting up the plugin you must have completely set up your Hop environment:

- Configured Hop Server
- Configured remote pipeline configuration
- Configured remote workflow configuration

To do so, go to the metadata window in Hop UI by pressing Ctrl+Shift+M or click the ![icono](images/Metadata_icon.png) icon.

![Hop Server](images/Hop_server.png)

Double click `Hop server` to create a new configuration.

![Remote pipeline Config](images/Pipeline_Run_config.png)

Double click `Pipeline Run Configuration` to create a new configuration.

![Remote workflow config](images/Workflow_Run_config.png)

Double click `Workflow Run Configuration` to create a new configuration.

## Set up guide

The following content will be a "how to set up the plugin" guide plus some requirements and
restraints when it comes to its usage.

### 1. Generate metadata.json

For the correct configuration of this plugin a file containing all Hop's metadata must be created
inside each project directory. This can be done by exporting it from Hop UI.

Please note that this process must be repeated each time the metadata of a project is modified.

![Metadata option](images/Export_metadata.png)

### 2. Install the plugin

The first step in order to get this plugin working is to install the package using the following
command:

```
pip install airflow-hop-plugin
```

### 3. Hop Directory Structure

Due to some technical limitations it's really important for the Hop home directory to have the
following structure.

```
hop # This is the hop home directory
├── ...
├── config
│   ├── hop-config.json
│   ├── example_environment.json # This is where you should save your environment files
│   ├── metadata
│   │   └── ...
│   └── projects
│       ├── ...
│       └── example_project # This is how your project's directory should look
│           ├── metadata.json
│           ├── metadata
│           │   └── ...
│           ├── example_directory
│           │   └── example_workflow.hwl
│           └── example_pipeline.hpl
├── ...
```

Moreover, please remember to save all projects inside the "projects" directory and set a path
relative to the hop home directory when configuring them like shown in the following picture:

![Here goes an image](images/project_properties.png)

### 4. Create an Airflow Connection
To correctly use the operators you must create a new
[Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html).
There are multiple ways to do so and whichever you want can be used, but it should have these
values for the following attributes:

- Connection ID: 'hop_default'
- Connection Type: 'http'
- Login: apache_hop_username
- Password: apache_hop_password
- Host: apache_hop_server
- Port: apache_hop_port
- Extra: "hop_home": "/path/to/hop-home/"

 Example of a new Airflow connection using Airflow's CLI:

```
airflow connections add 'hop_default' \
    --conn-json '{
        "conn_type": "http",
        "login": "cluster",
        "password": "cluster",
        "host": "0.0.0.0",
        "port": 8080,
        "schema": "",
        "extra": {
            "hop_home": "/home/user/hop"
        }
    }'
```

### 5. Creating a DAG

Here's an example of a DAG:

```python
from airflow_hop.operators import HopPipelineOperator
from airflow_hop.operators import HopWorkflowOperator

# ... #

with DAG('sample_dag', start_date=datetime(2022,7,26),
        schedule_interval='@daily', catchup=False) as dag:

    # Define a pipeline
    first_pipe = HopPipelineOperator(
        task_id='first_pipe',
        pipeline='pipelines/first_pipeline.hpl',
        pipe_config='remote hop server',
        project_name='default',
        log_level='Basic')

    # Define a pipeline with parameters
    second_pipe = HopPipelineOperator(
        task_id='second_pipe',
        pipeline='pipelines/second_pipeline.hpl',
        pipe_config='remote hop server',
        project_name='default',
        log_level='Basic',
        params={'DATE':'{{ ds }}'}) # Date in yyyy-mm-dd format

    # Define a workflow with parameters
    work_test = HopWorkflowOperator(
        task_id='work_test',
        workflow='workflows/workflow_example.hwf',
        project_name='default',
        log_level='Basic',
        params={'DATE':'{{ ds }}'}) # Date in yyyy-mm-dd format

    first_pipe >> second_pipe >> work_test
```

It's important to point out that both the workflow and pipeline parameters within their respective
operators must be a relative path parting from the project's directory.

## Development

### Deploy Apache Hop Server using Docker

Requeriments:

- [docker](https://docs.docker.com/engine/install/)
- [docker-compose](https://docs.docker.com/compose/install/)

If you want to use Docker to create the server you can use the following docker-compose
configuration as a template:

```yaml
services:
  apache-hop:
    image: apache/hop:latest
    ports:
      - 8080:8080
    volumes:
      - hop_path:/home/hop
    environment:
      HOP_SERVER_USER: cluster
      HOP_SERVER_PASS: cluster
      HOP_SERVER_PORT: 8080
      HOP_SERVER_HOSTNAME: 0.0.0.0
```

Once done, the Hop server can be started using docker compose.

## License

```
Copyright 2022 Aneior Studio, SL

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/damavis/airflow-hop-plugin",
    "name": "airflow-hop-plugin",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": null,
    "keywords": null,
    "author": "Damavis",
    "author_email": "info@damavis.com",
    "download_url": null,
    "platform": null,
    "description": "[![Build Status](https://api.travis-ci.com/damavis/airflow-hop-plugin.svg?branch=main)](https://app.travis-ci.com/damavis/airflow-hop-plugin)\n[![codecov](https://codecov.io/gh/damavis/airflow-hop-plugin/branch/main/graph/badge.svg?token=IRE2T3NEOD)](https://codecov.io/gh/damavis/airflow-hop-plugin)\n[![PyPI](https://img.shields.io/pypi/v/airflow-hop-plugin)](https://pypi.org/project/airflow-hop-plugin/)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/airflow-hop-plugin)](https://pypi.org/project/airflow-hop-plugin/)\n\n# Hop Airflow plugin\n\nThis is an Apache Hop plugin for Apache Airflow in order to orchestrate Apache Hop pipelines and workflows from Airflow.\n\n## Requirements\n\nBefore setting up the plugin you must have completely set up your Hop environment:\n\n- Configured Hop Server\n- Configured remote pipeline configuration\n- Configured remote workflow configuration\n\nTo do so, go to the metadata window in Hop UI by pressing Ctrl+Shift+M or click the ![icono](images/Metadata_icon.png) icon.\n\n![Hop Server](images/Hop_server.png)\n\nDouble click `Hop server` to create a new configuration.\n\n![Remote pipeline Config](images/Pipeline_Run_config.png)\n\nDouble click `Pipeline Run Configuration` to create a new configuration.\n\n![Remote workflow config](images/Workflow_Run_config.png)\n\nDouble click `Workflow Run Configuration` to create a new configuration.\n\n## Set up guide\n\nThe following content will be a \"how to set up the plugin\" guide plus some requirements and\nrestraints when it comes to its usage.\n\n### 1. Generate metadata.json\n\nFor the correct configuration of this plugin a file containing all Hop's metadata must be created\ninside each project directory. This can be done by exporting it from Hop UI.\n\nPlease note that this process must be repeated each time the metadata of a project is modified.\n\n![Metadata option](images/Export_metadata.png)\n\n### 2. Install the plugin\n\nThe first step in order to get this plugin working is to install the package using the following\ncommand:\n\n```\npip install airflow-hop-plugin\n```\n\n### 3. Hop Directory Structure\n\nDue to some technical limitations it's really important for the Hop home directory to have the\nfollowing structure.\n\n```\nhop # This is the hop home directory\n\u251c\u2500\u2500 ...\n\u251c\u2500\u2500 config\n\u2502   \u251c\u2500\u2500 hop-config.json\n\u2502   \u251c\u2500\u2500 example_environment.json # This is where you should save your environment files\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 metadata\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 ...\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 projects\n\u2502       \u251c\u2500\u2500 ...\n\u2502       \u2514\u2500\u2500 example_project # This is how your project's directory should look\n\u2502           \u251c\u2500\u2500 metadata.json\n\u2502           \u251c\u2500\u2500 metadata\n\u2502           \u2502   \u2514\u2500\u2500 ...\n\u2502           \u251c\u2500\u2500 example_directory\n\u2502           \u2502   \u2514\u2500\u2500 example_workflow.hwl\n\u2502           \u2514\u2500\u2500 example_pipeline.hpl\n\u251c\u2500\u2500 ...\n```\n\nMoreover, please remember to save all projects inside the \"projects\" directory and set a path\nrelative to the hop home directory when configuring them like shown in the following picture:\n\n![Here goes an image](images/project_properties.png)\n\n### 4. Create an Airflow Connection\nTo correctly use the operators you must create a new\n[Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html).\nThere are multiple ways to do so and whichever you want can be used, but it should have these\nvalues for the following attributes:\n\n- Connection ID: 'hop_default'\n- Connection Type: 'http'\n- Login: apache_hop_username\n- Password: apache_hop_password\n- Host: apache_hop_server\n- Port: apache_hop_port\n- Extra: \"hop_home\": \"/path/to/hop-home/\"\n\n Example of a new Airflow connection using Airflow's CLI:\n\n```\nairflow connections add 'hop_default' \\\n    --conn-json '{\n        \"conn_type\": \"http\",\n        \"login\": \"cluster\",\n        \"password\": \"cluster\",\n        \"host\": \"0.0.0.0\",\n        \"port\": 8080,\n        \"schema\": \"\",\n        \"extra\": {\n            \"hop_home\": \"/home/user/hop\"\n        }\n    }'\n```\n\n### 5. Creating a DAG\n\nHere's an example of a DAG:\n\n```python\nfrom airflow_hop.operators import HopPipelineOperator\nfrom airflow_hop.operators import HopWorkflowOperator\n\n# ... #\n\nwith DAG('sample_dag', start_date=datetime(2022,7,26),\n        schedule_interval='@daily', catchup=False) as dag:\n\n    # Define a pipeline\n    first_pipe = HopPipelineOperator(\n        task_id='first_pipe',\n        pipeline='pipelines/first_pipeline.hpl',\n        pipe_config='remote hop server',\n        project_name='default',\n        log_level='Basic')\n\n    # Define a pipeline with parameters\n    second_pipe = HopPipelineOperator(\n        task_id='second_pipe',\n        pipeline='pipelines/second_pipeline.hpl',\n        pipe_config='remote hop server',\n        project_name='default',\n        log_level='Basic',\n        params={'DATE':'{{ ds }}'}) # Date in yyyy-mm-dd format\n\n    # Define a workflow with parameters\n    work_test = HopWorkflowOperator(\n        task_id='work_test',\n        workflow='workflows/workflow_example.hwf',\n        project_name='default',\n        log_level='Basic',\n        params={'DATE':'{{ ds }}'}) # Date in yyyy-mm-dd format\n\n    first_pipe >> second_pipe >> work_test\n```\n\nIt's important to point out that both the workflow and pipeline parameters within their respective\noperators must be a relative path parting from the project's directory.\n\n## Development\n\n### Deploy Apache Hop Server using Docker\n\nRequeriments:\n\n- [docker](https://docs.docker.com/engine/install/)\n- [docker-compose](https://docs.docker.com/compose/install/)\n\nIf you want to use Docker to create the server you can use the following docker-compose\nconfiguration as a template:\n\n```yaml\nservices:\n  apache-hop:\n    image: apache/hop:latest\n    ports:\n      - 8080:8080\n    volumes:\n      - hop_path:/home/hop\n    environment:\n      HOP_SERVER_USER: cluster\n      HOP_SERVER_PASS: cluster\n      HOP_SERVER_PORT: 8080\n      HOP_SERVER_HOSTNAME: 0.0.0.0\n```\n\nOnce done, the Hop server can be started using docker compose.\n\n## License\n\n```\nCopyright 2022 Aneior Studio, SL\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": null,
    "version": "0.1.0b6",
    "project_urls": {
        "Homepage": "https://github.com/damavis/airflow-hop-plugin"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "163bc4a1069f4954d587dcc4a318ff08d8cd3ba11101839f7b2e8b164398139b",
                "md5": "8700baf8ac69a2f7928daa6656deacd7",
                "sha256": "7c4cdb74c32656c4dff6ee94897747acf95ce9e747fa9a997f80f82c80f848ea"
            },
            "downloads": -1,
            "filename": "airflow_hop_plugin-0.1.0b6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8700baf8ac69a2f7928daa6656deacd7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 14976,
            "upload_time": "2024-09-25T13:11:25",
            "upload_time_iso_8601": "2024-09-25T13:11:25.407967Z",
            "url": "https://files.pythonhosted.org/packages/16/3b/c4a1069f4954d587dcc4a318ff08d8cd3ba11101839f7b2e8b164398139b/airflow_hop_plugin-0.1.0b6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-25 13:11:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "damavis",
    "github_project": "airflow-hop-plugin",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "airflow-hop-plugin"
}
        
Elapsed time: 7.58803s