snakemake-staging


Namesnakemake-staging JSON
Version 0.0.2 PyPI version JSON
download
home_page
Summary
upload_time2023-06-29 11:54:21
maintainer
docs_urlNone
authorDan Foreman-Mackey
requires_python>=3.9
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # "Staging" for Snakemake

This package provides a mechanism for
[Snakemake](https://snakemake.readthedocs.io) workflows to explicitly "stage
out" the output files from certain rules to a public repository like
[Zenodo](https://zenodo.org) to allow faster re-execution of the workflow, using
these previously generated artifacts. This can be especially useful for
workflows with computationally expensive rules that don't need to be frequently
re-run.

`snakemake-staging` is a spin-off of the
[`showyourwork`](https://github.com/showyourwork/showyourwork) project, which
provides a "caching" framework for Snakemake workflows, to transparently avoid
re-execution of rules that have been cached to [Zenodo](https://zenodo.org). The
implementation of this logic in `showyourwork` is, however, somewhat fragile and
unpredictable. In `snakemake-staging`, we take a more explicit approach, where
"staged" rules are always either explicitly executed or restored.

## Installation

To use `snakemake-staging` in your workflow, you can install it using `pip`
(it's probably best to set up your Snakemake installation [following the
Snakemake
docs](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html)
first):

```bash
python -m pip install snakemake-staging
```

## Quickstart

### The `Snakefile`

While testing, it's probably best to use the [Zenodo
Sandbox](https://sandbox.zenodo.org), rather than the main site, since any
archive published to Zenodo is permanent. To use the sandbox, you'll need a
personal access token stored in the `SANDBOX_TOKEN` environment variable. You
can generate a new token
[here](https://sandbox.zenodo.org/account/settings/applications/).

Once you've added this token to your environment, you can edit the Snakefile for
your workflow to use `snakemake-staging` as follows. First, towards the top of
your Snakefile, add:

```python
import snakemake_staging as staging

stage = staging.ZenodoStage(
    "zenodo-stage",
    config.get("restore", False)
)
```

to create a new stage called `zenodo-stage`. Note that here we're extracting a
`restore` flag from the Snakemake config, which will be used to determine
whether to restore files for the stage. This means that you can control the
behavior of this stage from the command line. By passing `--config restore=True`
to the `snakemake` command line interface, all files staged out by the
`zenodo-stage` stage will be restored from the archive rather than generated.

Then, to stage out a rule, you can apply the stage as follows:

```python
rule expensive:
    input:
        ...
    output:
        stage(
            "path/to/output1.txt",
            "path/to/output2.txt",
        )
    shell:
        ...
```

Finally, _after defining all the rules that you want to stage out_, you must
add the following `include` which defines all the staging rules:

```python
include: staging.snakefile()
```

At this point, here's the full `Snakefile`:

<details>
<summary>Full Snakefile</summary>

```python
import snakemake_staging as staging

stage = staging.ZenodoStage(
    "zenodo-stage",
    config.get("restore", False)
)

rule expensive:
    input:
        ...
    output:
        stage(
            "path/to/output1.txt",
            "path/to/output2.txt",
        )
    shell:
        ...

include: staging.snakefile()
```

</details>

### Usage

With the `Snakefile` defined in the previous section, you can now run your
workflow in 3 ways:

1. **Normal execution**: If you run something like
   `snakemake path/to/output1.txt` (where I have omitted the usual `--cores`
   and `--conda` arguments) will execute the workflow as normal, without
   staging out any files.

2. **Stage upload**: If you instead have Snakemake target the `staging__upload`
   rule, the `expensive` rule will be executed, and the outputs will be uploaded
   to Zenodo, saving the record information to `zenodo-stage.zenodo.json` (this
   filename can be changed by passing the `info_file` argument to the
   `ZenodoStage` constructor).

3. **Stage restore**: Finally, after these outputs have been uploaded to Zenodo,
   you can call Snakemake `--config restore=True` to disable the `expensive`
   rule, and force the outputs to be restored from Zenodo.


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "snakemake-staging",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "",
    "author": "Dan Foreman-Mackey",
    "author_email": "foreman.mackey@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/61/ee/aacf9faab85333a05dd29bac95a1c8d7c1003e15faba8fbb1139f6f39d0b/snakemake_staging-0.0.2.tar.gz",
    "platform": null,
    "description": "# \"Staging\" for Snakemake\n\nThis package provides a mechanism for\n[Snakemake](https://snakemake.readthedocs.io) workflows to explicitly \"stage\nout\" the output files from certain rules to a public repository like\n[Zenodo](https://zenodo.org) to allow faster re-execution of the workflow, using\nthese previously generated artifacts. This can be especially useful for\nworkflows with computationally expensive rules that don't need to be frequently\nre-run.\n\n`snakemake-staging` is a spin-off of the\n[`showyourwork`](https://github.com/showyourwork/showyourwork) project, which\nprovides a \"caching\" framework for Snakemake workflows, to transparently avoid\nre-execution of rules that have been cached to [Zenodo](https://zenodo.org). The\nimplementation of this logic in `showyourwork` is, however, somewhat fragile and\nunpredictable. In `snakemake-staging`, we take a more explicit approach, where\n\"staged\" rules are always either explicitly executed or restored.\n\n## Installation\n\nTo use `snakemake-staging` in your workflow, you can install it using `pip`\n(it's probably best to set up your Snakemake installation [following the\nSnakemake\ndocs](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html)\nfirst):\n\n```bash\npython -m pip install snakemake-staging\n```\n\n## Quickstart\n\n### The `Snakefile`\n\nWhile testing, it's probably best to use the [Zenodo\nSandbox](https://sandbox.zenodo.org), rather than the main site, since any\narchive published to Zenodo is permanent. To use the sandbox, you'll need a\npersonal access token stored in the `SANDBOX_TOKEN` environment variable. You\ncan generate a new token\n[here](https://sandbox.zenodo.org/account/settings/applications/).\n\nOnce you've added this token to your environment, you can edit the Snakefile for\nyour workflow to use `snakemake-staging` as follows. First, towards the top of\nyour Snakefile, add:\n\n```python\nimport snakemake_staging as staging\n\nstage = staging.ZenodoStage(\n    \"zenodo-stage\",\n    config.get(\"restore\", False)\n)\n```\n\nto create a new stage called `zenodo-stage`. Note that here we're extracting a\n`restore` flag from the Snakemake config, which will be used to determine\nwhether to restore files for the stage. This means that you can control the\nbehavior of this stage from the command line. By passing `--config restore=True`\nto the `snakemake` command line interface, all files staged out by the\n`zenodo-stage` stage will be restored from the archive rather than generated.\n\nThen, to stage out a rule, you can apply the stage as follows:\n\n```python\nrule expensive:\n    input:\n        ...\n    output:\n        stage(\n            \"path/to/output1.txt\",\n            \"path/to/output2.txt\",\n        )\n    shell:\n        ...\n```\n\nFinally, _after defining all the rules that you want to stage out_, you must\nadd the following `include` which defines all the staging rules:\n\n```python\ninclude: staging.snakefile()\n```\n\nAt this point, here's the full `Snakefile`:\n\n<details>\n<summary>Full Snakefile</summary>\n\n```python\nimport snakemake_staging as staging\n\nstage = staging.ZenodoStage(\n    \"zenodo-stage\",\n    config.get(\"restore\", False)\n)\n\nrule expensive:\n    input:\n        ...\n    output:\n        stage(\n            \"path/to/output1.txt\",\n            \"path/to/output2.txt\",\n        )\n    shell:\n        ...\n\ninclude: staging.snakefile()\n```\n\n</details>\n\n### Usage\n\nWith the `Snakefile` defined in the previous section, you can now run your\nworkflow in 3 ways:\n\n1. **Normal execution**: If you run something like\n   `snakemake path/to/output1.txt` (where I have omitted the usual `--cores`\n   and `--conda` arguments) will execute the workflow as normal, without\n   staging out any files.\n\n2. **Stage upload**: If you instead have Snakemake target the `staging__upload`\n   rule, the `expensive` rule will be executed, and the outputs will be uploaded\n   to Zenodo, saving the record information to `zenodo-stage.zenodo.json` (this\n   filename can be changed by passing the `info_file` argument to the\n   `ZenodoStage` constructor).\n\n3. **Stage restore**: Finally, after these outputs have been uploaded to Zenodo,\n   you can call Snakemake `--config restore=True` to disable the `expensive`\n   rule, and force the outputs to be restored from Zenodo.\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "85951868b907ee8db840697d3a0b2bd1d618ed938e4b1e220c2e8d627a41ec9c",
                "md5": "fb4aac86a5573b57f3a47b9103ebf410",
                "sha256": "8d96323d390c349a75639318a93c137578a4ddfea4627f31d76582071fe53751"
            },
            "downloads": -1,
            "filename": "snakemake_staging-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fb4aac86a5573b57f3a47b9103ebf410",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 17934,
            "upload_time": "2023-06-29T11:54:20",
            "upload_time_iso_8601": "2023-06-29T11:54:20.306108Z",
            "url": "https://files.pythonhosted.org/packages/85/95/1868b907ee8db840697d3a0b2bd1d618ed938e4b1e220c2e8d627a41ec9c/snakemake_staging-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "61eeaacf9faab85333a05dd29bac95a1c8d7c1003e15faba8fbb1139f6f39d0b",
                "md5": "e168d57398b78fa1ce1780cba2549e0d",
                "sha256": "e3e783db2998662e25a6c9009a9ed57abef606c27bee8050d724eb15da164ed9"
            },
            "downloads": -1,
            "filename": "snakemake_staging-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e168d57398b78fa1ce1780cba2549e0d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 14890,
            "upload_time": "2023-06-29T11:54:21",
            "upload_time_iso_8601": "2023-06-29T11:54:21.783191Z",
            "url": "https://files.pythonhosted.org/packages/61/ee/aacf9faab85333a05dd29bac95a1c8d7c1003e15faba8fbb1139f6f39d0b/snakemake_staging-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-29 11:54:21",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "snakemake-staging"
}
        
Elapsed time: 0.23214s