firecloud-dalmatian


Namefirecloud-dalmatian JSON
Version 0.0.20 PyPI version JSON
download
home_pageNone
SummaryUtilities for interacting with Terra via Pandas dataframes
upload_time2025-01-26 18:34:25
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords workflow manager terra
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            
## dalmatian

[![Build Status](https://travis-ci.com/broadinstitute/dalmatian.svg?branch=master)](https://travis-ci.com/broadinstitute/dalmatian)

[FISS](https://github.com/broadinstitute/fiss)' faithful companion.

dalmatian is a collection of high-level functions for interacting with Firecloud via Pandas dataframes.

### Install

`pip install firecloud-dalmatian`

### Requirements

FireCloud uses the Google Cloud SDK (https://cloud.google.com/sdk/) to manage authorization. To use dalmatian, you must install the SDK and login locally with

```
gcloud auth application-default login

```
### Examples
Dalmatian provides the WorkspaceManager class for interacting with FireCloud workspaces.
```
import dalmatian
wm = dalmatian.WorkspaceManager("namespace/workspace")
```

#### Creating and managing workspaces
Create the workspace:
```
wm.create_workspace()
```

Upload samples and sample attributes (e.g., BAM paths). The attributes must be provided as a pandas DataFrame, in the following form:
 * the index must be named 'sample_id', and contain the sample IDs
 * the dataframe must contain the column 'participant_id'
 * if a 'sample_set_id' columns is provided, the corresponding sample sets will be generated
```
wm.upload_samples(attributes_df, add_participant_samples=True)
```
If `add_participant_samples=True`, all samples of a participant are stored in `participant.samples_`.

Add or update workspace attributes:
```
attr = {
    'attribute_name':'gs://attribute_path',
}
wm.update_attributes(attr)
```

Get attributes on samples, sample sets, participants:
```
samples_df = wm.get_samples()
sets_df = wm.get_sample_sets()
participants_df = wm.get_participants()
```

Create or update sets:
```
wm.update_sample_set('all_samples', samples_df.index)
wm.update_participant_set('all_participants', participant_df.index)
```

Copy/move data from workspace:
```
samples_df = wm.get_samples()
dalmatian.gs_copy(samples_df[attribute_name], dest_path)
dalmatian.gs_move(samples_df[attribute_name], dest_path)
```

Clone a workspace:
```
wm2 = dalmatian.WorkspaceManager(namespace2, workspace2)
wm2.create_workspace(wm)
```

#### Running jobs
Submit jobs:
```
wm.create_submission("config_namespace/config_name", sample_id, 'sample', use_callcache=True)
wm.create_submission("config_namespace/config_name", sample_set_id, 'sample_set', expression='this.samples', use_callcache=True)
wm.create_submission("config_namespace/config_name", participant_id, 'participant', expression='this.samples_', use_callcache=True)
```

Monitor jobs:
```
wm.get_submission_status()
```

Get runtime statistics (including cost estimates):
```
status_df = wm.get_sample_status(config_name)
workflow_status_df, task_dfs = wm.get_stats(status_df)
```

Re-run failed jobs (for a sample set):
```
status_df = wm.get_sample_set_status(config_name)
print(status_df['status'].value_counts())  # list sample statuses
wm.update_sample_set('reruns', status_df[status_df['status']=='Failed'].index)
wm.create_submission(config_namespace, config_name, sample_set_id, 'reruns', expression=this.samples, use_callcache=True)
```

### Contents

Including additional FireCloud Tools (enumerated below)

```
workflow_time
create_workspace
delete_workspace
upload_samples
upload_participants
update_participant_samples
update_attributes
get_submission_status
get_storage
get_stats
publish_config
get_samples
get_sample_sets
update_sample_set
delete_sample_set
update_configuration
check_configuration
get_google_metadata
parse_google_stats
calculate_google_cost
list_methods
get_method
get_method_version
list_configs
get_config
get_config_version
print_methods
print_configs
get_wdl
compare_wdls
compare_wdl
redact_outdated_method_versions
update_method
get_vm_cost
```


### Usage

Some functionality depends on the installed `gsutil`.

When using PY3 this creates a potential issue of requiring multiple accessible python installs.

Remediate this issue by defining an `env` variable for gsutil python

```
# replace path with path to local python 2.7 path.
# if using pyenv the following should work
# (assuming of course 2.7.12 is installed)
export CLOUDSDK_PYTHON=/usr/local/var/pyenv/versions/2.7.12/bin/python
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "firecloud-dalmatian",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "Francois Aguet <francois@broadinstitute.org>",
    "keywords": "Workflow manager, Terra",
    "author": null,
    "author_email": "Francois Aguet <francois@broadinstitute.org>",
    "download_url": "https://files.pythonhosted.org/packages/c7/89/d1df55ec6704ac5962431fe92b39ef685e53d6fef28f8bf3d4aeccc2b7de/firecloud_dalmatian-0.0.20.tar.gz",
    "platform": null,
    "description": "\n## dalmatian\n\n[![Build Status](https://travis-ci.com/broadinstitute/dalmatian.svg?branch=master)](https://travis-ci.com/broadinstitute/dalmatian)\n\n[FISS](https://github.com/broadinstitute/fiss)' faithful companion.\n\ndalmatian is a collection of high-level functions for interacting with Firecloud via Pandas dataframes.\n\n### Install\n\n`pip install firecloud-dalmatian`\n\n### Requirements\n\nFireCloud uses the Google Cloud SDK (https://cloud.google.com/sdk/) to manage authorization. To use dalmatian, you must install the SDK and login locally with\n\n```\ngcloud auth application-default login\n\n```\n### Examples\nDalmatian provides the WorkspaceManager class for interacting with FireCloud workspaces.\n```\nimport dalmatian\nwm = dalmatian.WorkspaceManager(\"namespace/workspace\")\n```\n\n#### Creating and managing workspaces\nCreate the workspace:\n```\nwm.create_workspace()\n```\n\nUpload samples and sample attributes (e.g., BAM paths). The attributes must be provided as a pandas DataFrame, in the following form:\n * the index must be named 'sample_id', and contain the sample IDs\n * the dataframe must contain the column 'participant_id'\n * if a 'sample_set_id' columns is provided, the corresponding sample sets will be generated\n```\nwm.upload_samples(attributes_df, add_participant_samples=True)\n```\nIf `add_participant_samples=True`, all samples of a participant are stored in `participant.samples_`.\n\nAdd or update workspace attributes:\n```\nattr = {\n    'attribute_name':'gs://attribute_path',\n}\nwm.update_attributes(attr)\n```\n\nGet attributes on samples, sample sets, participants:\n```\nsamples_df = wm.get_samples()\nsets_df = wm.get_sample_sets()\nparticipants_df = wm.get_participants()\n```\n\nCreate or update sets:\n```\nwm.update_sample_set('all_samples', samples_df.index)\nwm.update_participant_set('all_participants', participant_df.index)\n```\n\nCopy/move data from workspace:\n```\nsamples_df = wm.get_samples()\ndalmatian.gs_copy(samples_df[attribute_name], dest_path)\ndalmatian.gs_move(samples_df[attribute_name], dest_path)\n```\n\nClone a workspace:\n```\nwm2 = dalmatian.WorkspaceManager(namespace2, workspace2)\nwm2.create_workspace(wm)\n```\n\n#### Running jobs\nSubmit jobs:\n```\nwm.create_submission(\"config_namespace/config_name\", sample_id, 'sample', use_callcache=True)\nwm.create_submission(\"config_namespace/config_name\", sample_set_id, 'sample_set', expression='this.samples', use_callcache=True)\nwm.create_submission(\"config_namespace/config_name\", participant_id, 'participant', expression='this.samples_', use_callcache=True)\n```\n\nMonitor jobs:\n```\nwm.get_submission_status()\n```\n\nGet runtime statistics (including cost estimates):\n```\nstatus_df = wm.get_sample_status(config_name)\nworkflow_status_df, task_dfs = wm.get_stats(status_df)\n```\n\nRe-run failed jobs (for a sample set):\n```\nstatus_df = wm.get_sample_set_status(config_name)\nprint(status_df['status'].value_counts())  # list sample statuses\nwm.update_sample_set('reruns', status_df[status_df['status']=='Failed'].index)\nwm.create_submission(config_namespace, config_name, sample_set_id, 'reruns', expression=this.samples, use_callcache=True)\n```\n\n### Contents\n\nIncluding additional FireCloud Tools (enumerated below)\n\n```\nworkflow_time\ncreate_workspace\ndelete_workspace\nupload_samples\nupload_participants\nupdate_participant_samples\nupdate_attributes\nget_submission_status\nget_storage\nget_stats\npublish_config\nget_samples\nget_sample_sets\nupdate_sample_set\ndelete_sample_set\nupdate_configuration\ncheck_configuration\nget_google_metadata\nparse_google_stats\ncalculate_google_cost\nlist_methods\nget_method\nget_method_version\nlist_configs\nget_config\nget_config_version\nprint_methods\nprint_configs\nget_wdl\ncompare_wdls\ncompare_wdl\nredact_outdated_method_versions\nupdate_method\nget_vm_cost\n```\n\n\n### Usage\n\nSome functionality depends on the installed `gsutil`.\n\nWhen using PY3 this creates a potential issue of requiring multiple accessible python installs.\n\nRemediate this issue by defining an `env` variable for gsutil python\n\n```\n# replace path with path to local python 2.7 path.\n# if using pyenv the following should work\n# (assuming of course 2.7.12 is installed)\nexport CLOUDSDK_PYTHON=/usr/local/var/pyenv/versions/2.7.12/bin/python\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Utilities for interacting with Terra via Pandas dataframes",
    "version": "0.0.20",
    "project_urls": {
        "Repository": "https://github.com/getzlab/dalmatian.git"
    },
    "split_keywords": [
        "workflow manager",
        " terra"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6f9d929219db33a7fc323eeaea98e4f8deb516bf495b5f8b94b66792a05197a0",
                "md5": "4af8328c2d042a9f683bb08de6aaece7",
                "sha256": "a6b137aaf877627d37e53fe7f06029b8a8fe9868e136ae44f4ef3754f78547c8"
            },
            "downloads": -1,
            "filename": "firecloud_dalmatian-0.0.20-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4af8328c2d042a9f683bb08de6aaece7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 35933,
            "upload_time": "2025-01-26T18:34:24",
            "upload_time_iso_8601": "2025-01-26T18:34:24.908995Z",
            "url": "https://files.pythonhosted.org/packages/6f/9d/929219db33a7fc323eeaea98e4f8deb516bf495b5f8b94b66792a05197a0/firecloud_dalmatian-0.0.20-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c789d1df55ec6704ac5962431fe92b39ef685e53d6fef28f8bf3d4aeccc2b7de",
                "md5": "29b5b1399abaecb74d3b4ee7d04bd3b1",
                "sha256": "da6d0ee70c86185ea3c57bba1f43ff6633a157db9a13df01ba52d40deb879cb0"
            },
            "downloads": -1,
            "filename": "firecloud_dalmatian-0.0.20.tar.gz",
            "has_sig": false,
            "md5_digest": "29b5b1399abaecb74d3b4ee7d04bd3b1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 149067,
            "upload_time": "2025-01-26T18:34:25",
            "upload_time_iso_8601": "2025-01-26T18:34:25.976662Z",
            "url": "https://files.pythonhosted.org/packages/c7/89/d1df55ec6704ac5962431fe92b39ef685e53d6fef28f8bf3d4aeccc2b7de/firecloud_dalmatian-0.0.20.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-26 18:34:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "getzlab",
    "github_project": "dalmatian",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "firecloud-dalmatian"
}
        
Elapsed time: 0.45106s