[![Testing](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml/badge.svg)](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml)
[![PyPI version](https://badge.fury.io/py/fetch-tool.svg)](https://badge.fury.io/py/fetch-tool)
[![Docker Repository on Quay](https://quay.io/repository/microbiome-informatics/fetch-tool/status "Docker Repository on Quay")](https://quay.io/repository/microbiome-informatics/fetch-tool)
# Microbiome Informatics ENA fetch tool
Set of tools which allows you to fetch RAW read and assembly files from the European Nucleotide Archive (ENA).
## How to set up your development environment
We recommend you to use [miniconda|conda](https://docs.conda.io/en/latest/miniconda.html) to manage the environment.
Clone the repo and install the requirements.
```
$ git clone git@github.com:EBI-Metagenomics/fetch_tool.git
$ cd fetch_tool
$ # activate anv (conda activate xxx)
$ pip install -r requirements-dev.txt
```
### Pre-commit hooks
Setup the git [pre-commit hook](https://pre-commit.com/):
```bash
pre-commit install
```
*Why?*
pre-commit will run a set of pre-configured tools before allowing you to commit files. You can find the currently configure hooks and configurations in [.pre-commit-config.yaml](./.pre-commit-config.yaml)
### Tests
This repo uses [pytest](https://docs.pytest.org).
It requires the aspera cli installed in the default location (`install-aspera.sh` with no parameters).
To run the test suite:
```bash
pytest
```
## Install fetch tool
### Using Conda
```bash
$ conda create -q -n fetch_tool python=3.8
$ conda activate fetch_tool
```
Install from Pypi
```bash
$ pip install fetch-tool
```
Install from the git repo
```bash
$ pip install git+ssh://git@github.com/EBI-Metagenomics/fetch_tool.git
```
#### Configuration file
Setup the configuration file, the template [fetchdata-config-template.json](config/fetchdata-config-template.json) for the configuration file.
The required fields are:
- For Aspera
- aspera_bin (the path to ascp, usually in the aspera installation under /cli/bin)
- aspera_cert (the path to the ascp provided cert, usually in the aspera installation under /cli/etc/asperaweb_id_dsa.openssh)
- To pull private ENA data
- ena_api_user
- ena_api_password
### Install Aspera
## Install
Run the `install-aspera.sh` command here, it has only one optional parameter (the installation folder).
```bash
./install path/to/installation-i-want
```
Otherwise it will install it in $PWD/aspera-cli
## Fetch read files (amplicon and WGS data)
### Usage
```bash
$ fetch-read-tool -h
usage: fetch-read-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file] [-ru RUNS [RUNS ...]
| --run-list RUN_LIST]
optional arguments:
-h, --help show this help message and exit
-p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
Whitespace separated list of project accession(s)
-l PROJECT_LIST, --project-list PROJECT_LIST
File containing line-separated project list
-d DIR, --dir DIR Base directory for downloads
-v, --verbose Verbose
--version Version
-f, --force Ignore download errors and force re-download all files
--ignore-errors Ignore download errors and continue
--private Use when fetching private data
-i, --interactive interactive mode - allows you to skip failed downloads.
-c CONFIG_FILE, --config-file CONFIG_FILE
Alternative config file
--fix-desc-file Fixed runs in project description file
-ru RUNS [RUNS ...], --runs RUNS [RUNS ...]
Run accession(s), whitespace separated. Use to download only certain project runs
--run-list RUN_LIST File containing line-separated run accessions
```
### Example
Download amplicon study:
```bash
$ fetch-read-tool -p SRP062869 -c fetchdata-config-local.json -v -d /home/<user>/temp/
```
## Fetch assembly files
### Usage
```
fetch-assembly-tool -h
usage: fetch-assembly-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file]
[-as ASSEMBLIES [ASSEMBLIES ...]] [--assembly-type {primary metagenome,binned metagenome,metatranscriptome}] [--assembly-list ASSEMBLY_LIST]
optional arguments:
-h, --help show this help message and exit
-p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
Whitespace separated list of project accession(s)
-l PROJECT_LIST, --project-list PROJECT_LIST
File containing line-separated project list
-d DIR, --dir DIR Base directory for downloads
-v, --verbose Verbose
--version Version
-f, --force Ignore download errors and force re-download all files
--ignore-errors Ignore download errors and continue
--private Use when fetching private data
-i, --interactive interactive mode - allows you to skip failed downloads.
-c CONFIG_FILE, --config-file CONFIG_FILE
Alternative config file
--fix-desc-file Fixed runs in project description file
-as ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...]
Assembly ERZ accession(s), whitespace separated. Use to download only certain project assemblies
--assembly-type {primary metagenome,binned metagenome,metatranscriptome}
Assembly type
--assembly-list ASSEMBLY_LIST
File containing line-separated assembly accessions
```
### Example
Download assembly study:
```bash
$ fetch-assembly-tool -p ERP111288 -c fetchdata-config-local.json -v -d /home/<user>/temp/
```
Raw data
{
"_id": null,
"home_page": "",
"name": "fetch-tool",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "bioinformatics,tool,metagenomics",
"author": "",
"author_email": "MGnify team <metagenomics-help@ebi.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/6b/41/329433328795051d2e6eae6c00c35fc54dc5be763724476734f9a8e6890e/fetch-tool-0.9.0.tar.gz",
"platform": null,
"description": "[![Testing](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml/badge.svg)](https://github.com/EBI-Metagenomics/fetch_tool/actions/workflows/test.yml)\n[![PyPI version](https://badge.fury.io/py/fetch-tool.svg)](https://badge.fury.io/py/fetch-tool)\n[![Docker Repository on Quay](https://quay.io/repository/microbiome-informatics/fetch-tool/status \"Docker Repository on Quay\")](https://quay.io/repository/microbiome-informatics/fetch-tool)\n\n# Microbiome Informatics ENA fetch tool\n\nSet of tools which allows you to fetch RAW read and assembly files from the European Nucleotide Archive (ENA).\n\n## How to set up your development environment\n\nWe recommend you to use [miniconda|conda](https://docs.conda.io/en/latest/miniconda.html) to manage the environment.\n\nClone the repo and install the requirements.\n\n```\n$ git clone git@github.com:EBI-Metagenomics/fetch_tool.git\n$ cd fetch_tool\n$ # activate anv (conda activate xxx)\n$ pip install -r requirements-dev.txt\n```\n\n### Pre-commit hooks\n\nSetup the git [pre-commit hook](https://pre-commit.com/):\n\n```bash\npre-commit install\n```\n\n*Why?*\n\npre-commit will run a set of pre-configured tools before allowing you to commit files. You can find the currently configure hooks and configurations in [.pre-commit-config.yaml](./.pre-commit-config.yaml)\n\n### Tests\n\nThis repo uses [pytest](https://docs.pytest.org).\n\nIt requires the aspera cli installed in the default location (`install-aspera.sh` with no parameters).\n\nTo run the test suite:\n```bash\npytest\n```\n\n## Install fetch tool\n\n### Using Conda\n\n```bash\n$ conda create -q -n fetch_tool python=3.8\n$ conda activate fetch_tool\n```\n\nInstall from Pypi\n\n```bash\n$ pip install fetch-tool\n```\n\nInstall from the git repo\n\n```bash\n$ pip install git+ssh://git@github.com/EBI-Metagenomics/fetch_tool.git\n```\n\n#### Configuration file\n\nSetup the configuration file, the template [fetchdata-config-template.json](config/fetchdata-config-template.json) for the configuration file.\n\nThe required fields are:\n- For Aspera\n - aspera_bin (the path to ascp, usually in the aspera installation under /cli/bin)\n - aspera_cert (the path to the ascp provided cert, usually in the aspera installation under /cli/etc/asperaweb_id_dsa.openssh)\n- To pull private ENA data\n - ena_api_user\n - ena_api_password\n\n### Install Aspera\n\n## Install\n\nRun the `install-aspera.sh` command here, it has only one optional parameter (the installation folder).\n\n```bash\n./install path/to/installation-i-want\n```\n\nOtherwise it will install it in $PWD/aspera-cli\n\n## Fetch read files (amplicon and WGS data)\n\n### Usage\n\n```bash\n$ fetch-read-tool -h\nusage: fetch-read-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file] [-ru RUNS [RUNS ...]\n | --run-list RUN_LIST]\n\noptional arguments:\n -h, --help show this help message and exit\n -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]\n Whitespace separated list of project accession(s)\n -l PROJECT_LIST, --project-list PROJECT_LIST\n File containing line-separated project list\n -d DIR, --dir DIR Base directory for downloads\n -v, --verbose Verbose\n --version Version\n -f, --force Ignore download errors and force re-download all files\n --ignore-errors Ignore download errors and continue\n --private Use when fetching private data\n -i, --interactive interactive mode - allows you to skip failed downloads.\n -c CONFIG_FILE, --config-file CONFIG_FILE\n Alternative config file\n --fix-desc-file Fixed runs in project description file\n -ru RUNS [RUNS ...], --runs RUNS [RUNS ...]\n Run accession(s), whitespace separated. Use to download only certain project runs\n --run-list RUN_LIST File containing line-separated run accessions\n```\n\n### Example\n\nDownload amplicon study:\n\n```bash\n$ fetch-read-tool -p SRP062869 -c fetchdata-config-local.json -v -d /home/<user>/temp/\n```\n\n## Fetch assembly files\n\n### Usage\n\n```\nfetch-assembly-tool -h\nusage: fetch-assembly-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file]\n [-as ASSEMBLIES [ASSEMBLIES ...]] [--assembly-type {primary metagenome,binned metagenome,metatranscriptome}] [--assembly-list ASSEMBLY_LIST]\n\noptional arguments:\n -h, --help show this help message and exit\n -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]\n Whitespace separated list of project accession(s)\n -l PROJECT_LIST, --project-list PROJECT_LIST\n File containing line-separated project list\n -d DIR, --dir DIR Base directory for downloads\n -v, --verbose Verbose\n --version Version\n -f, --force Ignore download errors and force re-download all files\n --ignore-errors Ignore download errors and continue\n --private Use when fetching private data\n -i, --interactive interactive mode - allows you to skip failed downloads.\n -c CONFIG_FILE, --config-file CONFIG_FILE\n Alternative config file\n --fix-desc-file Fixed runs in project description file\n -as ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...]\n Assembly ERZ accession(s), whitespace separated. Use to download only certain project assemblies\n --assembly-type {primary metagenome,binned metagenome,metatranscriptome}\n Assembly type\n --assembly-list ASSEMBLY_LIST\n File containing line-separated assembly accessions\n```\n\n### Example\n\nDownload assembly study:\n\n```bash\n$ fetch-assembly-tool -p ERP111288 -c fetchdata-config-local.json -v -d /home/<user>/temp/\n```\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Utility to fetch public and private RAW read and assembly files from the ENA",
"version": "0.9.0",
"project_urls": null,
"split_keywords": [
"bioinformatics",
"tool",
"metagenomics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7981f9cc803e42ca3d1ab0bb01e0ab8fe295077c794dc71721300cc762506acf",
"md5": "ee7b953dba603b1c528018326d4d2ecd",
"sha256": "e20f035a6e70b5245ba96923ebd38d7785a3436c9f39d2df0a740c13aa1cf8a7"
},
"downloads": -1,
"filename": "fetch_tool-0.9.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ee7b953dba603b1c528018326d4d2ecd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21635,
"upload_time": "2023-09-01T12:34:03",
"upload_time_iso_8601": "2023-09-01T12:34:03.928857Z",
"url": "https://files.pythonhosted.org/packages/79/81/f9cc803e42ca3d1ab0bb01e0ab8fe295077c794dc71721300cc762506acf/fetch_tool-0.9.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6b41329433328795051d2e6eae6c00c35fc54dc5be763724476734f9a8e6890e",
"md5": "c90c3edc459d73e7fe498fdacf8d58d4",
"sha256": "6861088815d890d6da70237900f712bf4e62ba1cc96326c8fa626d34e8cb84d2"
},
"downloads": -1,
"filename": "fetch-tool-0.9.0.tar.gz",
"has_sig": false,
"md5_digest": "c90c3edc459d73e7fe498fdacf8d58d4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17955,
"upload_time": "2023-09-01T12:34:05",
"upload_time_iso_8601": "2023-09-01T12:34:05.710658Z",
"url": "https://files.pythonhosted.org/packages/6b/41/329433328795051d2e6eae6c00c35fc54dc5be763724476734f9a8e6890e/fetch-tool-0.9.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-01 12:34:05",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "fetch-tool"
}