gisaid-download


Namegisaid-download JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/enviro-lab/gisaid-download
SummaryAssisted download of selected gisaid metadata or EPI_SET creation
upload_time2023-03-17 21:53:31
maintainerSam Kunkleman
docs_urlNone
authorSam Kunkleman
requires_python>=3.8,<4.0
licenseMIT
keywords gisaid download sequences select epi_set epicov
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # gisaid-download
Purpose: Assisted download of selected samples from GISAID or EPI_SET creation.

`gisaid-download` is a tool for acuiring metadata for selected samples from gisaid. It was produced primarily for use with the EpiCoV database but should work just as well for EpiFlu or EpiPox. Fully-automated download from gisaid's website is tricky, but manual download is a slow, error-prone process requiring renaming and moving around files as you go. With `gisaid-download`, you are guided through the process of downloading desired samples.

## Features
Downloading Samples:
* keeps track of which samples you (or your team-mates) have already downloaded so you only get the new ones
* can download batches of samples from multiple locations
* moves/renames files to a consolidated directory as they're downloaded
* can limit which metadata files you download to any combination of the following:
  * fasta: [Nucleotide sequences]
  * meta: [Dates and Location, Patient status metadata, Sequencing technology metadata]
  * ackno: [Acknowledgement pdfs] (can only do this for 500 samples at a time, so EPI_SETs are better)

Uploading samples for further analysis:
* can upload all that data to an hpc via sftp
* can add a command to run after upload to start your analysis pipeline
    (We might make ours available someday!)

Getting an EPI_SET:
* can walk you through getting an EPI_SET identifier for all of your samples
* this will be emailed to you by GISAID

Some of the above features use sftp or ssh via scripted commands with the help of the package [hpc-interact](https://github.com/enviro-lab/hpc-interact). It has its own config for storing login credentials. If needed and not yet made, your credentials will be gathered over the command line. That file can be specified in [gisaid_config.ini](example/gisaid_config.ini) and only requires two lines:
```
username=myuser
password=mypass
```

## Installation
```console
pip install gisaid-download
```

## Usage
The first time you use `gisaid-download`, you'll need to set up a config file: [gisaid_config.ini](example/gisaid_config.ini) . Download it to your pwd or chosen outdir via:
```console
gisaid_download --example -o gisaid/directory
```

Edit the config file to serve your needs. There are details in there that explain most everything.

Now you're ready to do the download and/or get your EPI_SET. Let's say we only want samples that were collected before 2023. We'll set a date (required for creating unique filenames). To run, do this:
```console
sample_date='2023-01-01'
gisaid_download ${sample_date}
```

## Behavior
The above command triggers up to four steps. Steps 1, 3, and 4 only happen if you're interacting with the hpc cluster. If using the `--no_cluster` (or `-n`) flag, they will be skipped.

### Step 1: Update local list of downloaded sequences
If you're planning on transferring downloaded data to an hpc, the above command will first look for samples that already exist on the hpc at your `cluster_epicov_dir` (from config). This uses sftp via (via [hpc-interact](https://github.com/enviro-lab/hpc-interact)). If no data yet exists on the cluster or you're the only one downloading samples and transferring them to the hpc, you can skip this step by adding the `--skip_local_update` (or `-s`) flag like this:
```console
gisaid_download ${sample_date} --skip_local_update
```
The flag `-n` can also be used to skip this step along with step 3 and 4.

### Step 2: Download sequences
This is an interactive download that by default requires pressing enter after each step. If you don't want to press enter as much (like if you get in the rhythm and can't be bothered to stop...), you can add the `--quick` (or `-q`) flag like this
```console
gisaid_download ${sample_date} --quick
```

### Step 3: Upload sequences to hpc
Using sftp (via [hpc-interact](https://github.com/enviro-lab/hpc-interact)), all the data downloaded in Step 2 will be uploaded to the hpc at your `cluster_epicov_dir`.

The flag `-n` can also be used to skip this step along with step 1 and 4.

### Step 4: Run a followup command on the hpc
If specified in your [gisaid_config.ini](example/gisaid_config.ini), `followup_command` will be run by ssh through [hpc-interact](https://github.com/enviro-lab/hpc-interact). This could be any string, but we recommend setting it to run a script that will begin analyzing the data you just uploaded.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/enviro-lab/gisaid-download",
    "name": "gisaid-download",
    "maintainer": "Sam Kunkleman",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "skunklem@uncc.edu",
    "keywords": "gisaid,download,sequences,select,EPI_SET,epicov",
    "author": "Sam Kunkleman",
    "author_email": "skunklem@uncc.edu",
    "download_url": "https://files.pythonhosted.org/packages/db/6c/0f4108524873f64b28f5763bdc3afba9ac11114a8fb20426cd974e88db28/gisaid_download-0.3.0.tar.gz",
    "platform": null,
    "description": "# gisaid-download\nPurpose: Assisted download of selected samples from GISAID or EPI_SET creation.\n\n`gisaid-download` is a tool for acuiring metadata for selected samples from gisaid. It was produced primarily for use with the EpiCoV database but should work just as well for EpiFlu or EpiPox. Fully-automated download from gisaid's website is tricky, but manual download is a slow, error-prone process requiring renaming and moving around files as you go. With `gisaid-download`, you are guided through the process of downloading desired samples.\n\n## Features\nDownloading Samples:\n* keeps track of which samples you (or your team-mates) have already downloaded so you only get the new ones\n* can download batches of samples from multiple locations\n* moves/renames files to a consolidated directory as they're downloaded\n* can limit which metadata files you download to any combination of the following:\n  * fasta: [Nucleotide sequences]\n  * meta: [Dates and Location, Patient status metadata, Sequencing technology metadata]\n  * ackno: [Acknowledgement pdfs] (can only do this for 500 samples at a time, so EPI_SETs are better)\n\nUploading samples for further analysis:\n* can upload all that data to an hpc via sftp\n* can add a command to run after upload to start your analysis pipeline\n    (We might make ours available someday!)\n\nGetting an EPI_SET:\n* can walk you through getting an EPI_SET identifier for all of your samples\n* this will be emailed to you by GISAID\n\nSome of the above features use sftp or ssh via scripted commands with the help of the package [hpc-interact](https://github.com/enviro-lab/hpc-interact). It has its own config for storing login credentials. If needed and not yet made, your credentials will be gathered over the command line. That file can be specified in [gisaid_config.ini](example/gisaid_config.ini) and only requires two lines:\n```\nusername=myuser\npassword=mypass\n```\n\n## Installation\n```console\npip install gisaid-download\n```\n\n## Usage\nThe first time you use `gisaid-download`, you'll need to set up a config file: [gisaid_config.ini](example/gisaid_config.ini) . Download it to your pwd or chosen outdir via:\n```console\ngisaid_download --example -o gisaid/directory\n```\n\nEdit the config file to serve your needs. There are details in there that explain most everything.\n\nNow you're ready to do the download and/or get your EPI_SET. Let's say we only want samples that were collected before 2023. We'll set a date (required for creating unique filenames). To run, do this:\n```console\nsample_date='2023-01-01'\ngisaid_download ${sample_date}\n```\n\n## Behavior\nThe above command triggers up to four steps. Steps 1, 3, and 4 only happen if you're interacting with the hpc cluster. If using the `--no_cluster` (or `-n`) flag, they will be skipped.\n\n### Step 1: Update local list of downloaded sequences\nIf you're planning on transferring downloaded data to an hpc, the above command will first look for samples that already exist on the hpc at your `cluster_epicov_dir` (from config). This uses sftp via (via [hpc-interact](https://github.com/enviro-lab/hpc-interact)). If no data yet exists on the cluster or you're the only one downloading samples and transferring them to the hpc, you can skip this step by adding the `--skip_local_update` (or `-s`) flag like this:\n```console\ngisaid_download ${sample_date} --skip_local_update\n```\nThe flag `-n` can also be used to skip this step along with step 3 and 4.\n\n### Step 2: Download sequences\nThis is an interactive download that by default requires pressing enter after each step. If you don't want to press enter as much (like if you get in the rhythm and can't be bothered to stop...), you can add the `--quick` (or `-q`) flag like this\n```console\ngisaid_download ${sample_date} --quick\n```\n\n### Step 3: Upload sequences to hpc\nUsing sftp (via [hpc-interact](https://github.com/enviro-lab/hpc-interact)), all the data downloaded in Step 2 will be uploaded to the hpc at your `cluster_epicov_dir`.\n\nThe flag `-n` can also be used to skip this step along with step 1 and 4.\n\n### Step 4: Run a followup command on the hpc\nIf specified in your [gisaid_config.ini](example/gisaid_config.ini), `followup_command` will be run by ssh through [hpc-interact](https://github.com/enviro-lab/hpc-interact). This could be any string, but we recommend setting it to run a script that will begin analyzing the data you just uploaded.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Assisted download of selected gisaid metadata or EPI_SET creation",
    "version": "0.3.0",
    "split_keywords": [
        "gisaid",
        "download",
        "sequences",
        "select",
        "epi_set",
        "epicov"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "45c2cf7e1c338361d1b602426021152a87caf5ae8955aaf5d1fc1fca4d46492f",
                "md5": "1ff8d3557d52c117f494a73971bcdf55",
                "sha256": "912cc02e8d5e3e7b5478d32c12136d1947dd259df917f5199e33cfb203bd0315"
            },
            "downloads": -1,
            "filename": "gisaid_download-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1ff8d3557d52c117f494a73971bcdf55",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 17145,
            "upload_time": "2023-03-17T21:53:29",
            "upload_time_iso_8601": "2023-03-17T21:53:29.231574Z",
            "url": "https://files.pythonhosted.org/packages/45/c2/cf7e1c338361d1b602426021152a87caf5ae8955aaf5d1fc1fca4d46492f/gisaid_download-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db6c0f4108524873f64b28f5763bdc3afba9ac11114a8fb20426cd974e88db28",
                "md5": "a2c077f2859fef459c2390b2cb6ae31c",
                "sha256": "5ccf9e3a8627b1af161947851cb7a622e1d325f448b7aae3dfd1c9e6295da6f5"
            },
            "downloads": -1,
            "filename": "gisaid_download-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a2c077f2859fef459c2390b2cb6ae31c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 16710,
            "upload_time": "2023-03-17T21:53:31",
            "upload_time_iso_8601": "2023-03-17T21:53:31.536040Z",
            "url": "https://files.pythonhosted.org/packages/db/6c/0f4108524873f64b28f5763bdc3afba9ac11114a8fb20426cd974e88db28/gisaid_download-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-17 21:53:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "enviro-lab",
    "github_project": "gisaid-download",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "gisaid-download"
}
        
Elapsed time: 0.06106s