labcas.workflow


Namelabcas.workflow JSON
Version 0.1.13 PyPI version JSON
download
home_pagehttps://github.com/NASA-PDS/peppi
SummaryGet Planetary Data from the Planetary Data System (PDS)
upload_time2025-08-29 23:03:11
maintainerNone
docs_urlNone
authorLabcas
requires_python>=3.11
licenseapache-2.0
keywords pds planetary data api
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LabCas Workflow

Run workflows for Labcas

Depending on what you do, there are multiple ways of running a labcase workflow:

- **Developers:** for developers: local run, natively running on your OS
- **Integrators:** for AWS Managed Apache Airflow integrators (mwaa), with a local mwaa
- **System Administrators:** for System administors, deployed/configured on AWS
- **End users:** For end users, using the AWS deployment.


## Developers

The tasks of the workflow run independently from Airflow. TODO: integrate to the airflow python API.

### Install

With python 3.11, preferably use a virtual environment


    pip install -e '.[dev]'

### Set AWS connection

    ./aws-login.darwin.amd64
    export AWS_PROFILE=saml-pub

### Run/Test the client

#### Without a dask cluster:

    python src/labcas/workflow/manager/main.py


#### With a local dask cluster

Start the scheduler:

    docker network create dask
    docker run --network dask -p 8787:8787 -p 8786:8786 labcas/workflow scheduler

Start one worker

    docker run  --network dask -p 8786:8786 labcas/workflow worker 


Start the client, same as in previous section but add the `tcp://localhost:8787` argument to the dask client in the `main.py` script 



### Deploy package on pypi

Upgrade the version in file "src/labcas/workflow/VERSION.txt"

Publish the package on pypi:

    pip install build
    pip install twine
    rm dist/*
    python -m build
    twine upload dist/*
   


## Integrators

### Build the Dask worker image

Update the labcas.workflow dependency version as needed in `docker/Dockerfile`, then:

    docker build -f docker/Dockerfile . -t labcas/workflow

### Create a managed AirFlow docker image to be run locally

Use repository https://github.com/aws/aws-mwaa-local-runner, clone it, then:

    ./mwaa-local-env build-image

Then from your local labcas_workflow repository:

    cd mwaa

As needed, update requirements in `requirements` directory and dags in `dags` directory.

### Update the AWS credentials

    aws-login.darwin.amd64
    cp -r ~/.aws .

### Launch the services
 
    docker compose -f docker-compose-local.yml up

Test the server on http://localhost:8080 , login admin/test

### Stop 

    Ctrl^C

### Stop and re-initialize local volumes

    docker compose  -f ./docker-compose-local.yml down -v

    

See the console on http://localhost:8080, admin/test

### Test the requirement.txt files
 
    ./mwaa-local-env test-requirements

### Debug the workflow import

    docker container ls

Pick the container id of image "amazon/mwaa-local:2_10_3", for example '54706271b7fc':

Then open a bash interpreter in the docker container:

    docker exec -it 54706271b7fc bash

And, in the bash prompt:

    cd dags
    python3 -c "import nebraska"


## System administrators

The deployment requires:
- one ECS cluster for the dask cluster.
- Optionally, an EC2 instance client of the Dask cluster
- One managed Airflow

### dask on ECS

Deploy the image created in the previous section on ECR

Have a s3 bucket `labcas-infra` for the terraform state.

Other pre-requisites are:
 - a VPC
 - subnets
 - a security group allowing incoming request whre the client runs, at JPL, on EC2 or Airflow, to port 8786 and port 8787
 - a task role allowing to write on CloudWatch
 - a task execution role which pull image from ECR and standard ECS task Excecution role policy "AmazonECSTaskExecutionRolePolicy"
 

Deploy the ECS cluster with the following terraform command:

    cd terraform
    terraform init
    terraform apply \
        -var consortium="edrn" \
        -var venue="dev" \
        -var aws_fg_image=<uri of the docker image deployed on ECR>
        -var aws_fg_subnets=<private subnets of the AWS account> \
        -var aws_fg_vpc=<vpc of the AWS account> \
        -var aws_fg_security_groups  <security group> \
        -var ecs_task_role <arn of a task role>
        -var ecs_task_execution_role <arn of task execution role>

### Test the dask cluster

#### Connect to an EC2 instance, client of the Dask cluster


    ssh {ip of the EC2 instance}
    aws-login
    export AWS_PROFILE=saml-pub
    git clone {this repository}
    cd workflows
    source venv/bin/activate
    python src/labcas/workflow/manager/main.py


To See Dask Dashboard, open SSH tunnels:

    ssh -L 8787:{dask scheduler ip on ECS}:8787 {username}@{ec2 instance ip}
    ssh -L 8787:{dask scheduler ip on ECS}:8787 {username}@{ec2 instance ip}

in browser: http://localhost:8787


### Apache Airflow

An AWS managed Airflow is deployed in version 2.10.3.

The managed Airflow is authorized to read and write in the data bucket.

The managed Airflow is authorized to access the ECS security group.

It uses s3 bucket {labcas_airflow}.

Upload to S3 the `./mwaa/requirements/requirements.txt` file to the bucket in: `s3:/{labas_airflow}/requirements/`

Upload to S3 the `./mwaa/dags/nebraska.py` file to the bucket in: `s3:/{labas_airflow}/dags/`

Update the version of the `requirements.txt` file in the Airflow configuration console.

Test, go the the Airflow web console, and trigger the nebraska dag.














            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/NASA-PDS/peppi",
    "name": "labcas.workflow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "pds, planetary data, api",
    "author": "Labcas",
    "author_email": "labcas@jpl.nasa.gov",
    "download_url": "https://files.pythonhosted.org/packages/e6/84/0b6658c0e4d33e4a77be0c0c7fa5552c45be809d0593b8ecd44582e7b7ba/labcas_workflow-0.1.13.tar.gz",
    "platform": null,
    "description": "# LabCas Workflow\n\nRun workflows for Labcas\n\nDepending on what you do, there are multiple ways of running a labcase workflow:\n\n- **Developers:** for developers: local run, natively running on your OS\n- **Integrators:** for AWS Managed Apache Airflow integrators (mwaa), with a local mwaa\n- **System Administrators:** for System administors, deployed/configured on AWS\n- **End users:** For end users, using the AWS deployment.\n\n\n## Developers\n\nThe tasks of the workflow run independently from Airflow. TODO: integrate to the airflow python API.\n\n### Install\n\nWith python 3.11, preferably use a virtual environment\n\n\n    pip install -e '.[dev]'\n\n### Set AWS connection\n\n    ./aws-login.darwin.amd64\n    export AWS_PROFILE=saml-pub\n\n### Run/Test the client\n\n#### Without a dask cluster:\n\n    python src/labcas/workflow/manager/main.py\n\n\n#### With a local dask cluster\n\nStart the scheduler:\n\n    docker network create dask\n    docker run --network dask -p 8787:8787 -p 8786:8786 labcas/workflow scheduler\n\nStart one worker\n\n    docker run  --network dask -p 8786:8786 labcas/workflow worker \n\n\nStart the client, same as in previous section but add the `tcp://localhost:8787` argument to the dask client in the `main.py` script \n\n\n\n### Deploy package on pypi\n\nUpgrade the version in file \"src/labcas/workflow/VERSION.txt\"\n\nPublish the package on pypi:\n\n    pip install build\n    pip install twine\n    rm dist/*\n    python -m build\n    twine upload dist/*\n   \n\n\n## Integrators\n\n### Build the Dask worker image\n\nUpdate the labcas.workflow dependency version as needed in `docker/Dockerfile`, then:\n\n    docker build -f docker/Dockerfile . -t labcas/workflow\n\n### Create a managed AirFlow docker image to be run locally\n\nUse repository https://github.com/aws/aws-mwaa-local-runner, clone it, then:\n\n    ./mwaa-local-env build-image\n\nThen from your local labcas_workflow repository:\n\n    cd mwaa\n\nAs needed, update requirements in `requirements` directory and dags in `dags` directory.\n\n### Update the AWS credentials\n\n    aws-login.darwin.amd64\n    cp -r ~/.aws .\n\n### Launch the services\n \n    docker compose -f docker-compose-local.yml up\n\nTest the server on http://localhost:8080 , login admin/test\n\n### Stop \n\n    Ctrl^C\n\n### Stop and re-initialize local volumes\n\n    docker compose  -f ./docker-compose-local.yml down -v\n\n    \n\nSee the console on http://localhost:8080, admin/test\n\n### Test the requirement.txt files\n \n    ./mwaa-local-env test-requirements\n\n### Debug the workflow import\n\n    docker container ls\n\nPick the container id of image \"amazon/mwaa-local:2_10_3\", for example '54706271b7fc':\n\nThen open a bash interpreter in the docker container:\n\n    docker exec -it 54706271b7fc bash\n\nAnd, in the bash prompt:\n\n    cd dags\n    python3 -c \"import nebraska\"\n\n\n## System administrators\n\nThe deployment requires:\n- one ECS cluster for the dask cluster.\n- Optionally, an EC2 instance client of the Dask cluster\n- One managed Airflow\n\n### dask on ECS\n\nDeploy the image created in the previous section on ECR\n\nHave a s3 bucket `labcas-infra` for the terraform state.\n\nOther pre-requisites are:\n - a VPC\n - subnets\n - a security group allowing incoming request whre the client runs, at JPL, on EC2 or Airflow, to port 8786 and port 8787\n - a task role allowing to write on CloudWatch\n - a task execution role which pull image from ECR and standard ECS task Excecution role policy \"AmazonECSTaskExecutionRolePolicy\"\n \n\nDeploy the ECS cluster with the following terraform command:\n\n    cd terraform\n    terraform init\n    terraform apply \\\n        -var consortium=\"edrn\" \\\n        -var venue=\"dev\" \\\n        -var aws_fg_image=<uri of the docker image deployed on ECR>\n        -var aws_fg_subnets=<private subnets of the AWS account> \\\n        -var aws_fg_vpc=<vpc of the AWS account> \\\n        -var aws_fg_security_groups  <security group> \\\n        -var ecs_task_role <arn of a task role>\n        -var ecs_task_execution_role <arn of task execution role>\n\n### Test the dask cluster\n\n#### Connect to an EC2 instance, client of the Dask cluster\n\n\n    ssh {ip of the EC2 instance}\n    aws-login\n    export AWS_PROFILE=saml-pub\n    git clone {this repository}\n    cd workflows\n    source venv/bin/activate\n    python src/labcas/workflow/manager/main.py\n\n\nTo See Dask Dashboard, open SSH tunnels:\n\n    ssh -L 8787:{dask scheduler ip on ECS}:8787 {username}@{ec2 instance ip}\n    ssh -L 8787:{dask scheduler ip on ECS}:8787 {username}@{ec2 instance ip}\n\nin browser: http://localhost:8787\n\n\n### Apache Airflow\n\nAn AWS managed Airflow is deployed in version 2.10.3.\n\nThe managed Airflow is authorized to read and write in the data bucket.\n\nThe managed Airflow is authorized to access the ECS security group.\n\nIt uses s3 bucket {labcas_airflow}.\n\nUpload to S3 the `./mwaa/requirements/requirements.txt` file to the bucket in: `s3:/{labas_airflow}/requirements/`\n\nUpload to S3 the `./mwaa/dags/nebraska.py` file to the bucket in: `s3:/{labas_airflow}/dags/`\n\nUpdate the version of the `requirements.txt` file in the Airflow configuration console.\n\nTest, go the the Airflow web console, and trigger the nebraska dag.\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "apache-2.0",
    "summary": "Get Planetary Data from the Planetary Data System (PDS)",
    "version": "0.1.13",
    "project_urls": {
        "Download": "https://github.com/NASA-PDS/peppi/releases/",
        "Homepage": "https://github.com/NASA-PDS/peppi"
    },
    "split_keywords": [
        "pds",
        " planetary data",
        " api"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ddddbc7507358de45eb516542089fc8f7fdb505fb7604dd2f1f8e6cadd2357e6",
                "md5": "dfe0653aaa12fe187c0b2f52dcb549c8",
                "sha256": "1c0c87bb47f065b3172a15bf18b7b659f9ff6d46a8dd09a4d9196a8b77d60c47"
            },
            "downloads": -1,
            "filename": "labcas_workflow-0.1.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dfe0653aaa12fe187c0b2f52dcb549c8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 5122888,
            "upload_time": "2025-08-29T23:03:09",
            "upload_time_iso_8601": "2025-08-29T23:03:09.257491Z",
            "url": "https://files.pythonhosted.org/packages/dd/dd/bc7507358de45eb516542089fc8f7fdb505fb7604dd2f1f8e6cadd2357e6/labcas_workflow-0.1.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e6840b6658c0e4d33e4a77be0c0c7fa5552c45be809d0593b8ecd44582e7b7ba",
                "md5": "c223d05449031220fcf6891234a922af",
                "sha256": "03fd9852ebe4db42e535fb8cb799c07e97fd8263f3e46342f1c1b5cd8a9b0410"
            },
            "downloads": -1,
            "filename": "labcas_workflow-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "c223d05449031220fcf6891234a922af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 5114971,
            "upload_time": "2025-08-29T23:03:11",
            "upload_time_iso_8601": "2025-08-29T23:03:11.900975Z",
            "url": "https://files.pythonhosted.org/packages/e6/84/0b6658c0e4d33e4a77be0c0c7fa5552c45be809d0593b8ecd44582e7b7ba/labcas_workflow-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-29 23:03:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NASA-PDS",
    "github_project": "peppi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "labcas.workflow"
}
        
Elapsed time: 1.12418s