Name | assembly-uploader JSON |
Version |
1.2.0
JSON |
| download |
home_page | None |
Summary | Python scripts to upload primary metagenome and metatranscriptome assemblies to ENA on a per-study basis. This script generates xmls to register a new study and create manifests necessary for submission with webin-cli. |
upload_time | 2024-12-12 16:21:46 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | Apache Software License 2.0 |
keywords |
bioinformatics
tool
metagenomics
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# ENA Assembly uploader
Upload of metagenome and metatranscriptome assemblies to the [European Nucleotide Archive (ENA)](https://www.ebi.ac.uk/ena)
Pre-requisites:
- CSV metadata file. One per study. See test/fixtures/test_metadata for an example
- Compressed assembly fasta files in the locations defined in the metadata file
Set the following environmental variables with your webin details:
ENA_WEBIN
```
export ENA_WEBIN=Webin-0000
```
ENA_WEBIN_PASSWORD
```
export ENA_WEBIN_PASSWORD=password
```
## Installation
Install the package:
```bash
pip install assembly-uploader
```
## Usage
### From the command line
#### Register study and generate pre-upload files
**If you already have a registered study accession for your assembly files skip to step 3.**
#### Step 1: generate XML files for a new assembly study submission
This step will generate a folder STUDY_upload and a project XML and submission XML within it:
```bash
study_xmls
--study STUDY raw reads study ID
--library LIBRARY metagenome or metatranscriptome
--center CENTER center for upload e.g. EMG
--hold HOLD hold date (private) if it should be different from the provided study in format dd-mm-yyyy. Will inherit the release date of the raw read study if not
provided.
--tpa use this flag if the study is a third party assembly. Default False
--publication PUBLICATION
pubmed ID for connected publication if available
--private use flag if your data is private
```
#### Step 2: submit the new assembly study to ENA
This step submit the XML to ENA and generate a new assembly study accession. Keep note of the newly generated study accession:
```bash
submit_study
--study STUDY raw reads study ID
--test run test submission only
```
#### Step 3: make a manifest file for each assembly
This step will generate manifest files in the folder STUDY_UPLOAD for runs specified in the metadata file:
```bash
assembly_manifest
--study STUDY raw reads study ID
--data DATA metadata CSV - run_id, coverage, assembler, version, filepath
--assembly_study ASSEMBLY_STUDY
pre-existing study ID to submit to if available. Must exist in the webin account
--force overwrite all existing manifests
--private use flag if your data is private
--tpa use this flag if the study is a third party assembly. Default False
```
#### Step 4: upload assemblies
Once manifest files are generated, it is necessary to use ENA's webin-cli resource to upload genomes.
To test your submission add the `-test` argument.
A live execution example within this repo is the following:
```bash
ena-webin-cli \
-context=genome \
-manifest=SRR12240187.manifest \
-userName=$ENA_WEBIN \
-password=$ENA_WEBIN_PASSWORD \
-submit
```
More information on ENA's webin-cli can be found [in the ENA docs](<https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html>).
### From a Python script
This `assembly_uploader` can also be used a Python library, so that you can integrate the steps into another Python workflow or tool.
```python
from pathlib import Path
from assembly_uploader.study_xmls import StudyXMLGenerator, METAGENOME
from assembly_uploader.submit_study import submit_study
from assembly_uploader.assembly_manifest import AssemblyManifestGenerator
# Generate new assembly study XML files
StudyXMLGenerator(
study="SRP272267",
center_name="EMG",
library=METAGENOME,
tpa=True,
output_dir=Path("my-study"),
).write()
# Submit new assembly study to ENA
new_study_accession = submit_study("SRP272267", is_test=True, directory=Path("my-study"))
print(f"My assembly study has the accession {new_study_accession}")
# Create manifest files for the assemblies to be uploaded
# This assumes you have a CSV file detailing the assemblies with their assembler and coverage metadata
# see tests/fixtures/test_metadata for an example
AssemblyManifestGenerator(
study="SRP272267",
assembly_study=new_study_accession,
assemblies_csv=Path("/path/to/my/assemblies.csv"),
output_dir=Path("my-study"),
).write()
```
The ENA submission requires `webin-cli`, so follow [Step 4](#step-4-upload-assemblies) above.
(You could still call this from Python, e.g. with `subprocess.Popen`.)
## Development setup
Prerequisites: a functioning conda or pixi installation.
To install the assembly uploader codebase in "editable" mode:
```bash
conda env create -f requirements.yml
conda activate assemblyuploader
pip install -e '.[dev,test]'
pre-commit install
```
### Testing
```
pytest
```
Raw data
{
"_id": null,
"home_page": null,
"name": "assembly-uploader",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "bioinformatics, tool, metagenomics",
"author": null,
"author_email": "MGnify team <metagenomics-help@ebi.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/92/12/4778443f90068245cadd79680bf922d76ff5cdb833e8016b2a0bfc50699b/assembly_uploader-1.2.0.tar.gz",
"platform": null,
"description": "# ENA Assembly uploader\nUpload of metagenome and metatranscriptome assemblies to the [European Nucleotide Archive (ENA)](https://www.ebi.ac.uk/ena)\n\nPre-requisites:\n- CSV metadata file. One per study. See test/fixtures/test_metadata for an example\n- Compressed assembly fasta files in the locations defined in the metadata file\n\nSet the following environmental variables with your webin details:\n\nENA_WEBIN\n```\nexport ENA_WEBIN=Webin-0000\n```\n\nENA_WEBIN_PASSWORD\n```\nexport ENA_WEBIN_PASSWORD=password\n```\n\n## Installation\n\nInstall the package:\n\n```bash\npip install assembly-uploader\n```\n\n## Usage\n### From the command line\n#### Register study and generate pre-upload files\n\n**If you already have a registered study accession for your assembly files skip to step 3.**\n\n#### Step 1: generate XML files for a new assembly study submission\nThis step will generate a folder STUDY_upload and a project XML and submission XML within it:\n\n```bash\nstudy_xmls\n --study STUDY raw reads study ID\n --library LIBRARY metagenome or metatranscriptome\n --center CENTER center for upload e.g. EMG\n --hold HOLD hold date (private) if it should be different from the provided study in format dd-mm-yyyy. Will inherit the release date of the raw read study if not\n provided.\n --tpa use this flag if the study is a third party assembly. Default False\n --publication PUBLICATION\n pubmed ID for connected publication if available\n --private use flag if your data is private\n```\n\n#### Step 2: submit the new assembly study to ENA\n\nThis step submit the XML to ENA and generate a new assembly study accession. Keep note of the newly generated study accession:\n\n```bash\nsubmit_study\n --study STUDY raw reads study ID\n --test run test submission only\n```\n\n#### Step 3: make a manifest file for each assembly\n\nThis step will generate manifest files in the folder STUDY_UPLOAD for runs specified in the metadata file:\n\n```bash\nassembly_manifest\n --study STUDY raw reads study ID\n --data DATA metadata CSV - run_id, coverage, assembler, version, filepath\n --assembly_study ASSEMBLY_STUDY\n pre-existing study ID to submit to if available. Must exist in the webin account\n --force overwrite all existing manifests\n --private use flag if your data is private\n --tpa use this flag if the study is a third party assembly. Default False\n```\n\n#### Step 4: upload assemblies\n\nOnce manifest files are generated, it is necessary to use ENA's webin-cli resource to upload genomes.\n\nTo test your submission add the `-test` argument.\n\nA live execution example within this repo is the following:\n```bash\nena-webin-cli \\\n -context=genome \\\n -manifest=SRR12240187.manifest \\\n -userName=$ENA_WEBIN \\\n -password=$ENA_WEBIN_PASSWORD \\\n -submit\n```\n\nMore information on ENA's webin-cli can be found [in the ENA docs](<https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html>).\n\n### From a Python script\nThis `assembly_uploader` can also be used a Python library, so that you can integrate the steps into another Python workflow or tool.\n\n```python\nfrom pathlib import Path\n\nfrom assembly_uploader.study_xmls import StudyXMLGenerator, METAGENOME\nfrom assembly_uploader.submit_study import submit_study\nfrom assembly_uploader.assembly_manifest import AssemblyManifestGenerator\n\n# Generate new assembly study XML files\nStudyXMLGenerator(\n study=\"SRP272267\",\n center_name=\"EMG\",\n library=METAGENOME,\n tpa=True,\n output_dir=Path(\"my-study\"),\n).write()\n\n# Submit new assembly study to ENA\nnew_study_accession = submit_study(\"SRP272267\", is_test=True, directory=Path(\"my-study\"))\nprint(f\"My assembly study has the accession {new_study_accession}\")\n\n# Create manifest files for the assemblies to be uploaded\n# This assumes you have a CSV file detailing the assemblies with their assembler and coverage metadata\n# see tests/fixtures/test_metadata for an example\nAssemblyManifestGenerator(\n study=\"SRP272267\",\n assembly_study=new_study_accession,\n assemblies_csv=Path(\"/path/to/my/assemblies.csv\"),\n output_dir=Path(\"my-study\"),\n).write()\n```\n\nThe ENA submission requires `webin-cli`, so follow [Step 4](#step-4-upload-assemblies) above.\n(You could still call this from Python, e.g. with `subprocess.Popen`.)\n\n## Development setup\nPrerequisites: a functioning conda or pixi installation.\n\nTo install the assembly uploader codebase in \"editable\" mode:\n\n```bash\nconda env create -f requirements.yml\nconda activate assemblyuploader\npip install -e '.[dev,test]'\npre-commit install\n```\n\n### Testing\n```\npytest\n```\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Python scripts to upload primary metagenome and metatranscriptome assemblies to ENA on a per-study basis. This script generates xmls to register a new study and create manifests necessary for submission with webin-cli.",
"version": "1.2.0",
"project_urls": null,
"split_keywords": [
"bioinformatics",
" tool",
" metagenomics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c51e63cf54471d62bad53d096a986c1c41616517e381cc4ca8fe9f2cff55b397",
"md5": "6f36e935a11497c6c7b97599aae1daf9",
"sha256": "7814a3d0594731b21a5b46268da7023aea2e129a89a79824ec1f1715800325ec"
},
"downloads": -1,
"filename": "assembly_uploader-1.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6f36e935a11497c6c7b97599aae1daf9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18081,
"upload_time": "2024-12-12T16:21:43",
"upload_time_iso_8601": "2024-12-12T16:21:43.507593Z",
"url": "https://files.pythonhosted.org/packages/c5/1e/63cf54471d62bad53d096a986c1c41616517e381cc4ca8fe9f2cff55b397/assembly_uploader-1.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "92124778443f90068245cadd79680bf922d76ff5cdb833e8016b2a0bfc50699b",
"md5": "384d9e2b54b17329dca260fb6c552eb9",
"sha256": "0308ed8006960dcc2d4f8a4aa9677ab6d181d735885d5c2276169cd7564fd90c"
},
"downloads": -1,
"filename": "assembly_uploader-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "384d9e2b54b17329dca260fb6c552eb9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 15510,
"upload_time": "2024-12-12T16:21:46",
"upload_time_iso_8601": "2024-12-12T16:21:46.032363Z",
"url": "https://files.pythonhosted.org/packages/92/12/4778443f90068245cadd79680bf922d76ff5cdb833e8016b2a0bfc50699b/assembly_uploader-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-12 16:21:46",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "assembly-uploader"
}