fetch-tool


Namefetch-tool JSON
Version 0.9.0 PyPI version JSON
download
home_page
SummaryUtility to fetch public and private RAW read and assembly files from the ENA
upload_time2023-09-01 12:34:05
maintainer
docs_urlNone
author
requires_python>=3.8
licenseApache Software License 2.0
keywords bioinformatics tool metagenomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Testing](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml/badge.svg)](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml)
[![PyPI version](https://badge.fury.io/py/fetch-tool.svg)](https://badge.fury.io/py/fetch-tool)
[![Docker Repository on Quay](https://quay.io/repository/microbiome-informatics/fetch-tool/status "Docker Repository on Quay")](https://quay.io/repository/microbiome-informatics/fetch-tool)

# Microbiome Informatics ENA fetch tool

Set of tools which allows you to fetch RAW read and assembly files from the European Nucleotide Archive (ENA).

## How to set up your development environment

We recommend you to use [miniconda|conda](https://docs.conda.io/en/latest/miniconda.html) to manage the environment.

Clone the repo and install the requirements.

```
$ git clone git@github.com:EBI-Metagenomics/fetch_tool.git
$ cd fetch_tool
$ # activate anv (conda activate xxx)
$ pip install -r requirements-dev.txt
```

### Pre-commit hooks

Setup the git [pre-commit hook](https://pre-commit.com/):

```bash
pre-commit install
```

*Why?*

pre-commit will run a set of pre-configured tools before allowing you to commit files. You can find the currently configure hooks and configurations in [.pre-commit-config.yaml](./.pre-commit-config.yaml)

### Tests

This repo uses [pytest](https://docs.pytest.org).

It requires the aspera cli installed in the default location (`install-aspera.sh` with no parameters).

To run the test suite:
```bash
pytest
```

## Install fetch tool

### Using Conda

```bash
$ conda create -q -n fetch_tool python=3.8
$ conda activate fetch_tool
```

Install from Pypi

```bash
$ pip install fetch-tool
```

Install from the git repo

```bash
$ pip install git+ssh://git@github.com/EBI-Metagenomics/fetch_tool.git
```

#### Configuration file

Setup the configuration file, the template [fetchdata-config-template.json](config/fetchdata-config-template.json) for the configuration file.

The required fields are:
- For Aspera
  - aspera_bin (the path to ascp, usually in the aspera installation under /cli/bin)
  - aspera_cert (the path to the ascp provided cert, usually in the aspera installation under /cli/etc/asperaweb_id_dsa.openssh)
- To pull private ENA data
  - ena_api_user
  - ena_api_password

### Install Aspera

## Install

Run the `install-aspera.sh` command here, it has only one optional parameter (the installation folder).

```bash
./install path/to/installation-i-want
```

Otherwise it will install it in $PWD/aspera-cli

## Fetch read files (amplicon and WGS data)

### Usage

```bash
$ fetch-read-tool -h
usage: fetch-read-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file] [-ru RUNS [RUNS ...]
                       | --run-list RUN_LIST]

optional arguments:
  -h, --help            show this help message and exit
  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
                        Whitespace separated list of project accession(s)
  -l PROJECT_LIST, --project-list PROJECT_LIST
                        File containing line-separated project list
  -d DIR, --dir DIR     Base directory for downloads
  -v, --verbose         Verbose
  --version             Version
  -f, --force           Ignore download errors and force re-download all files
  --ignore-errors       Ignore download errors and continue
  --private             Use when fetching private data
  -i, --interactive     interactive mode - allows you to skip failed downloads.
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        Alternative config file
  --fix-desc-file       Fixed runs in project description file
  -ru RUNS [RUNS ...], --runs RUNS [RUNS ...]
                        Run accession(s), whitespace separated. Use to download only certain project runs
  --run-list RUN_LIST   File containing line-separated run accessions
```

### Example

Download amplicon study:

```bash
$ fetch-read-tool -p SRP062869 -c fetchdata-config-local.json -v -d /home/<user>/temp/
```

## Fetch assembly files

### Usage

```
fetch-assembly-tool -h
usage: fetch-assembly-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file]
                           [-as ASSEMBLIES [ASSEMBLIES ...]] [--assembly-type {primary metagenome,binned metagenome,metatranscriptome}] [--assembly-list ASSEMBLY_LIST]

optional arguments:
  -h, --help            show this help message and exit
  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
                        Whitespace separated list of project accession(s)
  -l PROJECT_LIST, --project-list PROJECT_LIST
                        File containing line-separated project list
  -d DIR, --dir DIR     Base directory for downloads
  -v, --verbose         Verbose
  --version             Version
  -f, --force           Ignore download errors and force re-download all files
  --ignore-errors       Ignore download errors and continue
  --private             Use when fetching private data
  -i, --interactive     interactive mode - allows you to skip failed downloads.
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        Alternative config file
  --fix-desc-file       Fixed runs in project description file
  -as ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...]
                        Assembly ERZ accession(s), whitespace separated. Use to download only certain project assemblies
  --assembly-type {primary metagenome,binned metagenome,metatranscriptome}
                        Assembly type
  --assembly-list ASSEMBLY_LIST
                        File containing line-separated assembly accessions
```

### Example

Download assembly study:

```bash
$ fetch-assembly-tool -p ERP111288 -c fetchdata-config-local.json -v -d /home/<user>/temp/
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "fetch-tool",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "bioinformatics,tool,metagenomics",
    "author": "",
    "author_email": "MGnify team <metagenomics-help@ebi.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/6b/41/329433328795051d2e6eae6c00c35fc54dc5be763724476734f9a8e6890e/fetch-tool-0.9.0.tar.gz",
    "platform": null,
    "description": "[![Testing](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml/badge.svg)](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml)\n[![PyPI version](https://badge.fury.io/py/fetch-tool.svg)](https://badge.fury.io/py/fetch-tool)\n[![Docker Repository on Quay](https://quay.io/repository/microbiome-informatics/fetch-tool/status \"Docker Repository on Quay\")](https://quay.io/repository/microbiome-informatics/fetch-tool)\n\n# Microbiome Informatics ENA fetch tool\n\nSet of tools which allows you to fetch RAW read and assembly files from the European Nucleotide Archive (ENA).\n\n## How to set up your development environment\n\nWe recommend you to use [miniconda|conda](https://docs.conda.io/en/latest/miniconda.html) to manage the environment.\n\nClone the repo and install the requirements.\n\n```\n$ git clone git@github.com:EBI-Metagenomics/fetch_tool.git\n$ cd fetch_tool\n$ # activate anv (conda activate xxx)\n$ pip install -r requirements-dev.txt\n```\n\n### Pre-commit hooks\n\nSetup the git [pre-commit hook](https://pre-commit.com/):\n\n```bash\npre-commit install\n```\n\n*Why?*\n\npre-commit will run a set of pre-configured tools before allowing you to commit files. You can find the currently configure hooks and configurations in [.pre-commit-config.yaml](./.pre-commit-config.yaml)\n\n### Tests\n\nThis repo uses [pytest](https://docs.pytest.org).\n\nIt requires the aspera cli installed in the default location (`install-aspera.sh` with no parameters).\n\nTo run the test suite:\n```bash\npytest\n```\n\n## Install fetch tool\n\n### Using Conda\n\n```bash\n$ conda create -q -n fetch_tool python=3.8\n$ conda activate fetch_tool\n```\n\nInstall from Pypi\n\n```bash\n$ pip install fetch-tool\n```\n\nInstall from the git repo\n\n```bash\n$ pip install git+ssh://git@github.com/EBI-Metagenomics/fetch_tool.git\n```\n\n#### Configuration file\n\nSetup the configuration file, the template [fetchdata-config-template.json](config/fetchdata-config-template.json) for the configuration file.\n\nThe required fields are:\n- For Aspera\n  - aspera_bin (the path to ascp, usually in the aspera installation under /cli/bin)\n  - aspera_cert (the path to the ascp provided cert, usually in the aspera installation under /cli/etc/asperaweb_id_dsa.openssh)\n- To pull private ENA data\n  - ena_api_user\n  - ena_api_password\n\n### Install Aspera\n\n## Install\n\nRun the `install-aspera.sh` command here, it has only one optional parameter (the installation folder).\n\n```bash\n./install path/to/installation-i-want\n```\n\nOtherwise it will install it in $PWD/aspera-cli\n\n## Fetch read files (amplicon and WGS data)\n\n### Usage\n\n```bash\n$ fetch-read-tool -h\nusage: fetch-read-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file] [-ru RUNS [RUNS ...]\n                       | --run-list RUN_LIST]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]\n                        Whitespace separated list of project accession(s)\n  -l PROJECT_LIST, --project-list PROJECT_LIST\n                        File containing line-separated project list\n  -d DIR, --dir DIR     Base directory for downloads\n  -v, --verbose         Verbose\n  --version             Version\n  -f, --force           Ignore download errors and force re-download all files\n  --ignore-errors       Ignore download errors and continue\n  --private             Use when fetching private data\n  -i, --interactive     interactive mode - allows you to skip failed downloads.\n  -c CONFIG_FILE, --config-file CONFIG_FILE\n                        Alternative config file\n  --fix-desc-file       Fixed runs in project description file\n  -ru RUNS [RUNS ...], --runs RUNS [RUNS ...]\n                        Run accession(s), whitespace separated. Use to download only certain project runs\n  --run-list RUN_LIST   File containing line-separated run accessions\n```\n\n### Example\n\nDownload amplicon study:\n\n```bash\n$ fetch-read-tool -p SRP062869 -c fetchdata-config-local.json -v -d /home/<user>/temp/\n```\n\n## Fetch assembly files\n\n### Usage\n\n```\nfetch-assembly-tool -h\nusage: fetch-assembly-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file]\n                           [-as ASSEMBLIES [ASSEMBLIES ...]] [--assembly-type {primary metagenome,binned metagenome,metatranscriptome}] [--assembly-list ASSEMBLY_LIST]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]\n                        Whitespace separated list of project accession(s)\n  -l PROJECT_LIST, --project-list PROJECT_LIST\n                        File containing line-separated project list\n  -d DIR, --dir DIR     Base directory for downloads\n  -v, --verbose         Verbose\n  --version             Version\n  -f, --force           Ignore download errors and force re-download all files\n  --ignore-errors       Ignore download errors and continue\n  --private             Use when fetching private data\n  -i, --interactive     interactive mode - allows you to skip failed downloads.\n  -c CONFIG_FILE, --config-file CONFIG_FILE\n                        Alternative config file\n  --fix-desc-file       Fixed runs in project description file\n  -as ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...]\n                        Assembly ERZ accession(s), whitespace separated. Use to download only certain project assemblies\n  --assembly-type {primary metagenome,binned metagenome,metatranscriptome}\n                        Assembly type\n  --assembly-list ASSEMBLY_LIST\n                        File containing line-separated assembly accessions\n```\n\n### Example\n\nDownload assembly study:\n\n```bash\n$ fetch-assembly-tool -p ERP111288 -c fetchdata-config-local.json -v -d /home/<user>/temp/\n```\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Utility to fetch public and private RAW read and assembly files from the ENA",
    "version": "0.9.0",
    "project_urls": null,
    "split_keywords": [
        "bioinformatics",
        "tool",
        "metagenomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7981f9cc803e42ca3d1ab0bb01e0ab8fe295077c794dc71721300cc762506acf",
                "md5": "ee7b953dba603b1c528018326d4d2ecd",
                "sha256": "e20f035a6e70b5245ba96923ebd38d7785a3436c9f39d2df0a740c13aa1cf8a7"
            },
            "downloads": -1,
            "filename": "fetch_tool-0.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ee7b953dba603b1c528018326d4d2ecd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21635,
            "upload_time": "2023-09-01T12:34:03",
            "upload_time_iso_8601": "2023-09-01T12:34:03.928857Z",
            "url": "https://files.pythonhosted.org/packages/79/81/f9cc803e42ca3d1ab0bb01e0ab8fe295077c794dc71721300cc762506acf/fetch_tool-0.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b41329433328795051d2e6eae6c00c35fc54dc5be763724476734f9a8e6890e",
                "md5": "c90c3edc459d73e7fe498fdacf8d58d4",
                "sha256": "6861088815d890d6da70237900f712bf4e62ba1cc96326c8fa626d34e8cb84d2"
            },
            "downloads": -1,
            "filename": "fetch-tool-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c90c3edc459d73e7fe498fdacf8d58d4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17955,
            "upload_time": "2023-09-01T12:34:05",
            "upload_time_iso_8601": "2023-09-01T12:34:05.710658Z",
            "url": "https://files.pythonhosted.org/packages/6b/41/329433328795051d2e6eae6c00c35fc54dc5be763724476734f9a8e6890e/fetch-tool-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-01 12:34:05",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fetch-tool"
}
        
Elapsed time: 0.10622s