dbnd-airflow-monitor


Namedbnd-airflow-monitor JSON
Version 1.0.23.2 PyPI version JSON
download
home_pagehttps://github.com/databand-ai/dbnd
SummaryMachine Learning Orchestration
upload_time2024-05-19 17:41:30
maintainerEvgeny Shulman
docs_urlNone
authorEvgeny Shulman
requires_pythonNone
licenseNone
keywords orchestration data machinelearning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Databand Airflow Monitor

Databand Airflow Monitor is a stand-alone module for Databand system, enables you to load data from Airflow server and import it into Databand system.
This Databand side module is one of two components allows you to sync your Airflow data into Databand system.

## Installation with setup tools

```bash
cd modules/dbnd-airflow-monitor
pip install -e .
```

## Usage

`dbnd airflow-monitor`

### Important flags

`--sync-history`: by default, airflow monitor's `since` value will be determined by last time it was running. use this flag to enable syncning from beginning

### Configuration

You can configure your syncing variables inside databand configuration system

```cfg
[airflow_monitor]
interval = 10 ; Time in seconds to wait between each fetching cycle
include_logs = True ; Whether or not to include logs (might be heavy)
include_task_args = True ; Whether or not to include task arguments
fetch_quantity = 100 ; Max number of tasks or dag runs to retrieve at each fetch
fetch_period = 60 ; Time in minutes for window fetching size (start: since, end: since + period)
dag_ids = ['ingest_data_dag', 'simple_dag'] ; Limit fetching to these specific dag ids

## DB Fetcher
### Pay attention, when using this system airflow version must be equal to databand's airflow version
sql_alchemy_conn = sqlite:////usr/local/airflow/airflow.db ; When using fetcher=db, use this sql connection string
local_dag_folder =  /usr/local/airflow/dags ; When using fetcher=db, this is the dag folder location
```

## Steps for Google Composer

​
After spinning new google composer, under PyPi packages add dbnd, and add `DBND__CORE__DATABAND_URL` env pointing to dnbd instance, copy plugin file to pluings folder (go to dags folder, one level up, and then plugins)
​
​
For monitor to work you will need to setup service account (add relevant binding):
(taken from here: https://medium.com/google-cloud/using-airflow-experimental-rest-api-on-google-cloud-platform-cloud-composer-and-iap-9bd0260f095a
see Create a Service Account for POST Trigger section)
​
example with creating new SA:

```bash
export PROJECT=prefab-root-227507
export SERVICE_ACCOUNT_NAME=dbnd-airflow-monitor
gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME --project $PROJECT
# Give service account permissions to create tokens for iap requests.
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com --role roles/iam.serviceAccountTokenCreator
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com --role roles/iam.serviceAccountActor
# Service account also needs to be authorized to use Composer.
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com --role roles/composer.user
# We need a service account key to trigger the dag.
gcloud iam service-accounts keys create ~/$PROJECT-$SERVICE_ACCOUNT_NAME-key.json --iam-account=$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com
export GOOGLE_APPLICATION_CREDENTIALS=~/$PROJECT-$SERVICE_ACCOUNT_NAME-key.json
```

​
configure airflow monitor with composer fetcher, with url pointing to composer airflow instance and client id (same article, Getting Airflow Client ID section):
Visit the Airflow URL https://YOUR_UNIQUE_ID.appspot.com (which you noted in the last step) in an incognito window, don’t login. At this first landing page for IAP Auth has client id in the url in the address bar:

```
https://accounts.google.com/signin/oauth/identifier?client_id=00000000000-xxxx0x0xx0xx00xxxx0x00xxx0xxxxx.apps.googleusercontent.com&...
```

## Integration Tests

We have 2 tests:

-   databand/integration-tests/airflow_monitor
-   databand/integration-tests/airflow_monitor_stress

To run them, go to the right dir and run inttest container:

```
cd databand/integration-tests/airflow_monitor
docker-compose up inttest
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/databand-ai/dbnd",
    "name": "dbnd-airflow-monitor",
    "maintainer": "Evgeny Shulman",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "evgeny.shulman@databand.ai",
    "keywords": "orchestration, data, machinelearning",
    "author": "Evgeny Shulman",
    "author_email": "evgeny.shulman@databand.ai",
    "download_url": "https://files.pythonhosted.org/packages/29/87/3865046d0006ff77527a88ed463daa3e7aefb52d05571d3e11f530408906/dbnd-airflow-monitor-1.0.23.2.tar.gz",
    "platform": "any",
    "description": "# Databand Airflow Monitor\n\nDataband Airflow Monitor is a stand-alone module for Databand system, enables you to load data from Airflow server and import it into Databand system.\nThis Databand side module is one of two components allows you to sync your Airflow data into Databand system.\n\n## Installation with setup tools\n\n```bash\ncd modules/dbnd-airflow-monitor\npip install -e .\n```\n\n## Usage\n\n`dbnd airflow-monitor`\n\n### Important flags\n\n`--sync-history`: by default, airflow monitor's `since` value will be determined by last time it was running. use this flag to enable syncning from beginning\n\n### Configuration\n\nYou can configure your syncing variables inside databand configuration system\n\n```cfg\n[airflow_monitor]\ninterval = 10 ; Time in seconds to wait between each fetching cycle\ninclude_logs = True ; Whether or not to include logs (might be heavy)\ninclude_task_args = True ; Whether or not to include task arguments\nfetch_quantity = 100 ; Max number of tasks or dag runs to retrieve at each fetch\nfetch_period = 60 ; Time in minutes for window fetching size (start: since, end: since + period)\ndag_ids = ['ingest_data_dag', 'simple_dag'] ; Limit fetching to these specific dag ids\n\n## DB Fetcher\n### Pay attention, when using this system airflow version must be equal to databand's airflow version\nsql_alchemy_conn = sqlite:////usr/local/airflow/airflow.db ; When using fetcher=db, use this sql connection string\nlocal_dag_folder =  /usr/local/airflow/dags ; When using fetcher=db, this is the dag folder location\n```\n\n## Steps for Google Composer\n\n\u200b\nAfter spinning new google composer, under PyPi packages add dbnd, and add `DBND__CORE__DATABAND_URL` env pointing to dnbd instance, copy plugin file to pluings folder (go to dags folder, one level up, and then plugins)\n\u200b\n\u200b\nFor monitor to work you will need to setup service account (add relevant binding):\n(taken from here: https://medium.com/google-cloud/using-airflow-experimental-rest-api-on-google-cloud-platform-cloud-composer-and-iap-9bd0260f095a\nsee Create a Service Account for POST Trigger section)\n\u200b\nexample with creating new SA:\n\n```bash\nexport PROJECT=prefab-root-227507\nexport SERVICE_ACCOUNT_NAME=dbnd-airflow-monitor\ngcloud iam service-accounts create $SERVICE_ACCOUNT_NAME --project $PROJECT\n# Give service account permissions to create tokens for iap requests.\ngcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com --role roles/iam.serviceAccountTokenCreator\ngcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com --role roles/iam.serviceAccountActor\n# Service account also needs to be authorized to use Composer.\ngcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com --role roles/composer.user\n# We need a service account key to trigger the dag.\ngcloud iam service-accounts keys create ~/$PROJECT-$SERVICE_ACCOUNT_NAME-key.json --iam-account=$SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceaccount.com\nexport GOOGLE_APPLICATION_CREDENTIALS=~/$PROJECT-$SERVICE_ACCOUNT_NAME-key.json\n```\n\n\u200b\nconfigure airflow monitor with composer fetcher, with url pointing to composer airflow instance and client id (same article, Getting Airflow Client ID section):\nVisit the Airflow URL https://YOUR_UNIQUE_ID.appspot.com (which you noted in the last step) in an incognito window, don\u2019t login. At this first landing page for IAP Auth has client id in the url in the address bar:\n\n```\nhttps://accounts.google.com/signin/oauth/identifier?client_id=00000000000-xxxx0x0xx0xx00xxxx0x00xxx0xxxxx.apps.googleusercontent.com&...\n```\n\n## Integration Tests\n\nWe have 2 tests:\n\n-   databand/integration-tests/airflow_monitor\n-   databand/integration-tests/airflow_monitor_stress\n\nTo run them, go to the right dir and run inttest container:\n\n```\ncd databand/integration-tests/airflow_monitor\ndocker-compose up inttest\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Machine Learning Orchestration",
    "version": "1.0.23.2",
    "project_urls": {
        "Bug-Tracker": "https://github.com/databand-ai/dbnd/issues",
        "Documentation": "https://dbnd.readme.io/",
        "Homepage": "https://github.com/databand-ai/dbnd",
        "Source-Code": "https://github.com/databand-ai/dbnd"
    },
    "split_keywords": [
        "orchestration",
        " data",
        " machinelearning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dbd7276157759fce68f609b8b3226fa7043727310197dd6ef503a3800c22645b",
                "md5": "2bb4bcf9bf3e76589cb6fda39c235f41",
                "sha256": "7bd0e4dd1bd0107013fbc307418d93182cf4d7aed54bba6c61c634f3a47d08ce"
            },
            "downloads": -1,
            "filename": "dbnd_airflow_monitor-1.0.23.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2bb4bcf9bf3e76589cb6fda39c235f41",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 60780,
            "upload_time": "2024-05-19T17:39:55",
            "upload_time_iso_8601": "2024-05-19T17:39:55.362726Z",
            "url": "https://files.pythonhosted.org/packages/db/d7/276157759fce68f609b8b3226fa7043727310197dd6ef503a3800c22645b/dbnd_airflow_monitor-1.0.23.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "29873865046d0006ff77527a88ed463daa3e7aefb52d05571d3e11f530408906",
                "md5": "a7d8a460ab81e44d59aa2ca6cf6c2a10",
                "sha256": "e181b593943ccf080ebf805386f320a6fb2b0c31f7663a93af4c7d5a72b5743b"
            },
            "downloads": -1,
            "filename": "dbnd-airflow-monitor-1.0.23.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a7d8a460ab81e44d59aa2ca6cf6c2a10",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 44910,
            "upload_time": "2024-05-19T17:41:30",
            "upload_time_iso_8601": "2024-05-19T17:41:30.772036Z",
            "url": "https://files.pythonhosted.org/packages/29/87/3865046d0006ff77527a88ed463daa3e7aefb52d05571d3e11f530408906/dbnd-airflow-monitor-1.0.23.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-19 17:41:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "databand-ai",
    "github_project": "dbnd",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "dbnd-airflow-monitor"
}
        
Elapsed time: 0.26366s