nebelung


Namenebelung JSON
Version 1.3.0 PyPI version JSON
download
home_pagehttps://github.com/broadinstitute/nebelung
SummaryFirecloud API Wrapper
upload_time2024-10-25 16:04:14
maintainerNone
docs_urlNone
authorDevin McCabe
requires_python>=3.11
licenseNone
keywords terra firecloud
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Nebelung: Python wrapper for the Firecloud API
---

![](https://github.com/broadinstitute/nebelung/blob/main/nebelung.jpg?raw=true)

This package provides a wrapper around the [Firecloud](https://pypi.org/project/firecloud/) package and performs a similar, though cat-themed, function as [dalmation](https://github.com/getzlab/dalmatian).

# Installation

Nebelung requires Python 3.11 or later.

```shell
poetry add nebelung # or pip install nebelung
```

# Usage

The package has two classes, `TerraWorkspace` and `TerraWorkflow`, and a variety of utility functions that wrap a subset of Firecloud API functionality.

## Workspaces

```python
from nebelung.terra_workspace import TerraWorkspace

terra_workspace = TerraWorkspace(
    workspace_namespace="terra_workspace_namespace",
    workspace_name="terra_workspace_name",
    owners=["user1@example.com", "group@firecloud.org"],
)
```

### Entities

```python
# get a workspace data table as a Pandas data frame
df = terra_workspace.get_entities("sample")

# get a workspace data table as a Pandas data frame typed with Pandera
# (`YourPanderaSchema` should subclass `nebelung.types.CoercedDataFrame`)
df = terra_workspace.get_entities("sample", YourPanderaSchema)   

# upsert a data frame to a workspace data table
terra_workspace.upload_entities(df)  # first column of `df` should be, e.g., `entity:sample_id` 

# create a sample set named, e.g., `sample_2024-08-21T17-24-19_call_cnvs"
sample_set_id = terra_workspace.create_entity_set(
    entity_type="sample",
    entity_ids=["sample_id1", "sample_id2"], 
    suffix="call_cnvs",
)
```

### Workflow outputs

```python
# collect workflow outputs from successful jobs as a list of `nebelung.types.TaskResult` objects 
outputs = terra_workspace.collect_workflow_outputs() 

# collect workflow outputs from successful jobs submitted in the last week
import datetime
a_week_ago = datetime.datetime.now() - datetime.timedelta(days=7)
outputs = terra_workspace.collect_workflow_outputs(since=a_week_ago)
```

## Workflow

Here, a "workflow" (standard data pipeline terminology) comprises a "method" and "method config" (Terra terminology).

The standard method for making a WDL-based workflow available in a Terra workspace is to configure the git repo to push to [Dockstore](https://dockstore.org/). Although this would be the recommended technique to make a workflow available publicly, there are several drawbacks:

- The git repo must be public (for GCP-backed Terra workspaces at least).
- Every change to the method (WDL) or method config (JSON) requires creating and pushing a git commit.
- The workflow isn't updated on Dockstore immediately, since it depends on continuous deployment (CD).
- The Dockstore UI doesn't provide great visibility into CD build failures and their causes.

An alternative to Dockstore is to push the WDL directly to Firecloud. However, [that API endpoint](https://api.firecloud.org/#/Method%20Repository/post_api_methods) doesn't support uploading a WDL script that imports other local WDL scripts, nor a zip file of cross-referenced WDL scripts (like Cromwell does). The endpoint will accept WDL that imports other scripts via URLs, but currently only from the `githubusercontent.com` domain.

### Method persistence with GitHub gists

Thus, Nebelung (ab)uses [GitHub gists](https://gist.github.com/) to persist all the WDL scripts for a workflow as multiple files belonging to a single gist, then uploads the top-level WDL script's code to Firecloud. Any `import "./path/to/included/script.wdl" as other_script` statement is rewritten so that the imported script is persisted in the gist and thus imported from a `https://gist.githubusercontent.com` URL. This happens recursively, so local imports can have their own local imports.

### Method config

To aid in automation and make it easier to submit jobs manually without filling out many fields in the job submission UI, a JSON-formatted method config is also required, e.g.:

```json
{
  "deleted": false,
  "inputs": {
    "call_cnvs.sample_id": "this.sample_id"
  },
  "methodConfigVersion": 1,
  "methodRepoMethod": {
    "methodNamespace": "omics_pipelines",
    "methodName": "call_cnvs",
    "methodVersion": 1
  },
  "namespace": "omics_pipelines",
  "name": "call_cnvs",
  "outputs": {
    "call_cnvs.segs": "this.segments"
  },
  "rootEntityType": "sample"
}
```

- Both methods and method configs have their own namespaces. To simplify things, the above example uses the same sets of values for both. This approach might not be ideal if your methods and their configs are not one-to-one.
- The `TerraWorkspace.update_workflow` method will replace the `methodVersion` with an auto-incrementing version number based on the latest method's "snapshot ID" each time the method is updated. The `methodConfigVersion` should be incremented manually if desired.

### Versioning

Some information about a submitted job's method isn't easily recovered via the Firecloud API later on. Both `update_workflow` and `collect_workflow_outputs` are written to make it easier to connect workflow outputs to method versions for use in object (workflow output files and values) versioning. Include these workflow inputs in the WDL to enable this feature:

```wdl
version 1.0

workflow call_cnvs {
    input {
        String workflow_version = "1.0" # internal version number for your use
        String workflow_source_url # populated automatically with URL of this script
    }
}
```

The `update_workflow` method will automatically include these workflow inputs in the new method config's inputs, with `workflow_source_url` being set dynamically to the URL of the GitHub gist of that WDL script and `workflow_version` available for explicitly versioning the WDL.

Because GitHub gist has its own built-in versioning, a `workflow_source_url` stored in a job submission's inputs will always resolve to the exact WDL script that was used in the job, even if that method is updated later. 

### Validation

To avoid persisting potentially invalid WDL, `update_workflow` also validates all the WDL scripts with [WOMtool](https://cromwell.readthedocs.io/en/stable/WOMtool) first.

### Example

See also the [example module](https://github.com/broadinstitute/nebelung/tree/main/example) module in this repo.

```python
import os
from pathlib import Path
from nebelung.terra_workflow import TerraWorkflow

# download the latest WOMtool from https://github.com/broadinstitute/cromwell/releases
os.environ["WOMTOOL_JAR"] = "/path/to/womtool.jar"

# generate a Github personal access token (fine-grained) at 
# https://github.com/settings/tokens?type=beta
# with the "Read and Write access to gists" permission 
os.environ["GITHUB_PAT"] = "github_pat_..."

terra_workflow = TerraWorkflow(
    method_namespace="omics_pipelines", # should match `methodRepoMethod.methodNamespace` from method config
    method_name="call_cnvs", # should match `methodRepoMethod.name` from method config
    method_config_namespace="omics_pipelines", # should match `namespace` from method config
    method_config_name="call_cnvs", # should match `name` from method config
    method_synopsis="This method calls CNVs.",
    workflow_wdl_path=Path("/path/to/call_cnvs.wdl").resolve(),
    method_config_json_path=Path("/path/to/call_cnvs.json").resolve(),
    github_pat="github_pat_...", # (if not using the GITHUB_PAT ENV variable) 
    womtool_jar="/path/to/womtool.jar", # (if not using the WOMTOOL_JAR ENV variable) 
)

# create or update a workflow (i.e. method and method config) directly in Firecloud
terra_workspace.update_workflow(terra_workflow, n_snapshots_to_keep=20)

# submit a job
terra_workspace.submit_workflow_run(
    terra_workflow,
    # any arguments below are passed to `firecloud_api.create_submission`
    entity="sample_2024-08-21T17-24-19_call_cnvs", # from `create_entity_set`
    etype="sample_set", # data type of the `entity` arg
    expression="this.samples", # the root entity (i.e. the WDL expects a single sample)
    use_callcache=True,
    use_reference_disks=False,
    memory_retry_multiplier=1.2,
)
```

## Call Firecloud API directly

All calls to the Firecloud API made internally by Nebelung are retried automatically (with a backoff function) in the case of a networking-related error. This function also detects other errors returned by the API and parses the JSON response if the call was successful.

To use this functionality in the cases where Nebelung doesn't provide an endpoint wrapper, import the Firecloud API and the `call_firecloud_api` function:

```python
from firecloud import api as firecloud_api
from nebelung.utils import call_firecloud_api

# get a job submission
result = call_firecloud_api(
    firecloud_api.get_submission,
    namespace="terra_workspace_namespace",
    workspace="terra_workspace_name",
    max_retries=1,
    # kwargs for `get_submission`
    submission_id="<uuid>",
)
```

# Development

Run `pre-commit run --all-files` to automatically format your code with [Ruff](https://docs.astral.sh/ruff/) and check static types with [Pyright](https://microsoft.github.io/pyright).

To update the [package on pipy.org](https://pypi.org/project/nebelung), update the `version` in `pyproject.toml` and run `poetry publish --build`.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/broadinstitute/nebelung",
    "name": "nebelung",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "terra, firecloud",
    "author": "Devin McCabe",
    "author_email": "dmccabe@broadinstitute.org",
    "download_url": "https://files.pythonhosted.org/packages/ec/b5/110838ed8fb7df873ca8160d38196072211471ece69011c75fdf633385dd/nebelung-1.3.0.tar.gz",
    "platform": null,
    "description": "Nebelung: Python wrapper for the Firecloud API\n---\n\n![](https://github.com/broadinstitute/nebelung/blob/main/nebelung.jpg?raw=true)\n\nThis package provides a wrapper around the [Firecloud](https://pypi.org/project/firecloud/) package and performs a similar, though cat-themed, function as [dalmation](https://github.com/getzlab/dalmatian).\n\n# Installation\n\nNebelung requires Python 3.11 or later.\n\n```shell\npoetry add nebelung # or pip install nebelung\n```\n\n# Usage\n\nThe package has two classes, `TerraWorkspace` and `TerraWorkflow`, and a variety of utility functions that wrap a subset of Firecloud API functionality.\n\n## Workspaces\n\n```python\nfrom nebelung.terra_workspace import TerraWorkspace\n\nterra_workspace = TerraWorkspace(\n    workspace_namespace=\"terra_workspace_namespace\",\n    workspace_name=\"terra_workspace_name\",\n    owners=[\"user1@example.com\", \"group@firecloud.org\"],\n)\n```\n\n### Entities\n\n```python\n# get a workspace data table as a Pandas data frame\ndf = terra_workspace.get_entities(\"sample\")\n\n# get a workspace data table as a Pandas data frame typed with Pandera\n# (`YourPanderaSchema` should subclass `nebelung.types.CoercedDataFrame`)\ndf = terra_workspace.get_entities(\"sample\", YourPanderaSchema)   \n\n# upsert a data frame to a workspace data table\nterra_workspace.upload_entities(df)  # first column of `df` should be, e.g., `entity:sample_id` \n\n# create a sample set named, e.g., `sample_2024-08-21T17-24-19_call_cnvs\"\nsample_set_id = terra_workspace.create_entity_set(\n    entity_type=\"sample\",\n    entity_ids=[\"sample_id1\", \"sample_id2\"], \n    suffix=\"call_cnvs\",\n)\n```\n\n### Workflow outputs\n\n```python\n# collect workflow outputs from successful jobs as a list of `nebelung.types.TaskResult` objects \noutputs = terra_workspace.collect_workflow_outputs() \n\n# collect workflow outputs from successful jobs submitted in the last week\nimport datetime\na_week_ago = datetime.datetime.now() - datetime.timedelta(days=7)\noutputs = terra_workspace.collect_workflow_outputs(since=a_week_ago)\n```\n\n## Workflow\n\nHere, a \"workflow\" (standard data pipeline terminology) comprises a \"method\" and \"method config\" (Terra terminology).\n\nThe standard method for making a WDL-based workflow available in a Terra workspace is to configure the git repo to push to [Dockstore](https://dockstore.org/). Although this would be the recommended technique to make a workflow available publicly, there are several drawbacks:\n\n- The git repo must be public (for GCP-backed Terra workspaces at least).\n- Every change to the method (WDL) or method config (JSON) requires creating and pushing a git commit.\n- The workflow isn't updated on Dockstore immediately, since it depends on continuous deployment (CD).\n- The Dockstore UI doesn't provide great visibility into CD build failures and their causes.\n\nAn alternative to Dockstore is to push the WDL directly to Firecloud. However, [that API endpoint](https://api.firecloud.org/#/Method%20Repository/post_api_methods) doesn't support uploading a WDL script that imports other local WDL scripts, nor a zip file of cross-referenced WDL scripts (like Cromwell does). The endpoint will accept WDL that imports other scripts via URLs, but currently only from the `githubusercontent.com` domain.\n\n### Method persistence with GitHub gists\n\nThus, Nebelung (ab)uses [GitHub gists](https://gist.github.com/) to persist all the WDL scripts for a workflow as multiple files belonging to a single gist, then uploads the top-level WDL script's code to Firecloud. Any `import \"./path/to/included/script.wdl\" as other_script` statement is rewritten so that the imported script is persisted in the gist and thus imported from a `https://gist.githubusercontent.com` URL. This happens recursively, so local imports can have their own local imports.\n\n### Method config\n\nTo aid in automation and make it easier to submit jobs manually without filling out many fields in the job submission UI, a JSON-formatted method config is also required, e.g.:\n\n```json\n{\n  \"deleted\": false,\n  \"inputs\": {\n    \"call_cnvs.sample_id\": \"this.sample_id\"\n  },\n  \"methodConfigVersion\": 1,\n  \"methodRepoMethod\": {\n    \"methodNamespace\": \"omics_pipelines\",\n    \"methodName\": \"call_cnvs\",\n    \"methodVersion\": 1\n  },\n  \"namespace\": \"omics_pipelines\",\n  \"name\": \"call_cnvs\",\n  \"outputs\": {\n    \"call_cnvs.segs\": \"this.segments\"\n  },\n  \"rootEntityType\": \"sample\"\n}\n```\n\n- Both methods and method configs have their own namespaces. To simplify things, the above example uses the same sets of values for both. This approach might not be ideal if your methods and their configs are not one-to-one.\n- The `TerraWorkspace.update_workflow` method will replace the `methodVersion` with an auto-incrementing version number based on the latest method's \"snapshot ID\" each time the method is updated. The `methodConfigVersion` should be incremented manually if desired.\n\n### Versioning\n\nSome information about a submitted job's method isn't easily recovered via the Firecloud API later on. Both `update_workflow` and `collect_workflow_outputs` are written to make it easier to connect workflow outputs to method versions for use in object (workflow output files and values) versioning. Include these workflow inputs in the WDL to enable this feature:\n\n```wdl\nversion 1.0\n\nworkflow call_cnvs {\n    input {\n        String workflow_version = \"1.0\" # internal version number for your use\n        String workflow_source_url # populated automatically with URL of this script\n    }\n}\n```\n\nThe `update_workflow` method will automatically include these workflow inputs in the new method config's inputs, with `workflow_source_url` being set dynamically to the URL of the GitHub gist of that WDL script and `workflow_version` available for explicitly versioning the WDL.\n\nBecause GitHub gist has its own built-in versioning, a `workflow_source_url` stored in a job submission's inputs will always resolve to the exact WDL script that was used in the job, even if that method is updated later. \n\n### Validation\n\nTo avoid persisting potentially invalid WDL, `update_workflow` also validates all the WDL scripts with [WOMtool](https://cromwell.readthedocs.io/en/stable/WOMtool) first.\n\n### Example\n\nSee also the [example module](https://github.com/broadinstitute/nebelung/tree/main/example) module in this repo.\n\n```python\nimport os\nfrom pathlib import Path\nfrom nebelung.terra_workflow import TerraWorkflow\n\n# download the latest WOMtool from https://github.com/broadinstitute/cromwell/releases\nos.environ[\"WOMTOOL_JAR\"] = \"/path/to/womtool.jar\"\n\n# generate a Github personal access token (fine-grained) at \n# https://github.com/settings/tokens?type=beta\n# with the \"Read and Write access to gists\" permission \nos.environ[\"GITHUB_PAT\"] = \"github_pat_...\"\n\nterra_workflow = TerraWorkflow(\n    method_namespace=\"omics_pipelines\", # should match `methodRepoMethod.methodNamespace` from method config\n    method_name=\"call_cnvs\", # should match `methodRepoMethod.name` from method config\n    method_config_namespace=\"omics_pipelines\", # should match `namespace` from method config\n    method_config_name=\"call_cnvs\", # should match `name` from method config\n    method_synopsis=\"This method calls CNVs.\",\n    workflow_wdl_path=Path(\"/path/to/call_cnvs.wdl\").resolve(),\n    method_config_json_path=Path(\"/path/to/call_cnvs.json\").resolve(),\n    github_pat=\"github_pat_...\", # (if not using the GITHUB_PAT ENV variable) \n    womtool_jar=\"/path/to/womtool.jar\", # (if not using the WOMTOOL_JAR ENV variable) \n)\n\n# create or update a workflow (i.e. method and method config) directly in Firecloud\nterra_workspace.update_workflow(terra_workflow, n_snapshots_to_keep=20)\n\n# submit a job\nterra_workspace.submit_workflow_run(\n    terra_workflow,\n    # any arguments below are passed to `firecloud_api.create_submission`\n    entity=\"sample_2024-08-21T17-24-19_call_cnvs\", # from `create_entity_set`\n    etype=\"sample_set\", # data type of the `entity` arg\n    expression=\"this.samples\", # the root entity (i.e. the WDL expects a single sample)\n    use_callcache=True,\n    use_reference_disks=False,\n    memory_retry_multiplier=1.2,\n)\n```\n\n## Call Firecloud API directly\n\nAll calls to the Firecloud API made internally by Nebelung are retried automatically (with a backoff function) in the case of a networking-related error. This function also detects other errors returned by the API and parses the JSON response if the call was successful.\n\nTo use this functionality in the cases where Nebelung doesn't provide an endpoint wrapper, import the Firecloud API and the `call_firecloud_api` function:\n\n```python\nfrom firecloud import api as firecloud_api\nfrom nebelung.utils import call_firecloud_api\n\n# get a job submission\nresult = call_firecloud_api(\n    firecloud_api.get_submission,\n    namespace=\"terra_workspace_namespace\",\n    workspace=\"terra_workspace_name\",\n    max_retries=1,\n    # kwargs for `get_submission`\n    submission_id=\"<uuid>\",\n)\n```\n\n# Development\n\nRun `pre-commit run --all-files` to automatically format your code with [Ruff](https://docs.astral.sh/ruff/) and check static types with [Pyright](https://microsoft.github.io/pyright).\n\nTo update the [package on pipy.org](https://pypi.org/project/nebelung), update the `version` in `pyproject.toml` and run `poetry publish --build`.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Firecloud API Wrapper",
    "version": "1.3.0",
    "project_urls": {
        "Homepage": "https://github.com/broadinstitute/nebelung",
        "Repository": "https://github.com/broadinstitute/nebelung"
    },
    "split_keywords": [
        "terra",
        " firecloud"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2db8c922c89d00ee75b9124351398cbe974c1069285dece6c2d2ed0104417144",
                "md5": "c01fd8fa72b444ca72135900e9e29da5",
                "sha256": "194db1d86b8aa8e4e5025a8c0357542e69c39658f6d3f3931354ec3ea8d32f11"
            },
            "downloads": -1,
            "filename": "nebelung-1.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c01fd8fa72b444ca72135900e9e29da5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 17492,
            "upload_time": "2024-10-25T16:04:13",
            "upload_time_iso_8601": "2024-10-25T16:04:13.262616Z",
            "url": "https://files.pythonhosted.org/packages/2d/b8/c922c89d00ee75b9124351398cbe974c1069285dece6c2d2ed0104417144/nebelung-1.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ecb5110838ed8fb7df873ca8160d38196072211471ece69011c75fdf633385dd",
                "md5": "bd1d0434fb0b86c5cf61bceb5edfc31f",
                "sha256": "a950d54fd3ba7cbf902bb6c30c1882626373368672338641d57379522fce94ac"
            },
            "downloads": -1,
            "filename": "nebelung-1.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bd1d0434fb0b86c5cf61bceb5edfc31f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 18515,
            "upload_time": "2024-10-25T16:04:14",
            "upload_time_iso_8601": "2024-10-25T16:04:14.818247Z",
            "url": "https://files.pythonhosted.org/packages/ec/b5/110838ed8fb7df873ca8160d38196072211471ece69011c75fdf633385dd/nebelung-1.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 16:04:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "broadinstitute",
    "github_project": "nebelung",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "nebelung"
}
        
Elapsed time: 0.68773s