get-assemblies

Name	get-assemblies JSON
Version	0.11.3 JSON
	download
home_page	https://github.com/davised/get_assemblies
Summary	NCBI E-utilities wrapper for assembly downloads
upload_time	2023-05-09 18:38:49
maintainer	Edward Davis
docs_url	None
author	Edward Davis
requires_python
license	Custom
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            get_assemblies
==============

.. contents:: **Table of Contents**
    :backlinks: none

Who might want this software?
-----------------------------

This software is intended to facilitate the use of the `NCBI assembly database.
<https://www.ncbi.nlm.nih.gov/assembly>`_

The intended audience is scientific researchers, computational biologists, and
bioinformaticians who need sequence data for comparative genomics projects.

Anyone using NCBI sequence data (for reference genomes/transcriptomes) can
benefit as well.

PhD/Masters students and undergraduates are especially encouraged to submit
issues if they are having trouble using this software.

This software is written in python. I'm happy to help
those who are having trouble with software installs.

Windows users, please install WSL to make use of this software. Using a Linux
distribution will make your life as a computational researcher significantly
easier.

Installation
------------

get_assemblies is distributed on `PyPI
<https://pypi.org/project/get-assemblies>`_ as a universal wheel and is
available on Linux/macOS and supports Python 3.7+. This software will work on
Windows using `WSL
<https://docs.microsoft.com/en-us/windows/wsl/install-win10>`_.

get_assemblies depends on `NCBI Entrez Direct
<https://www.ncbi.nlm.nih.gov/books/NBK179288/>`_ which **NO LONGER** requires Perl. Perl is
installed by default on most \*nix systems. If edirect is not currently
installed, please run ``get_assemblies --dledirect`` to install.

.. code-block:: bash

    $ python3 -m pip install -U get-assemblies

Dependencies
------------

Python modules:

1. `rich <https://github.com/willmcgugan/rich>`_

See ``requirements.txt`` for more info.

External programs:

1. `NCBI Entrez Direct <https://www.ncbi.nlm.nih.gov/books/NBK179288/>`_

You can install external programs using the ``get_assemblies --dledirect``
command. These will be installed to ``${HOME}/edirect`` unless otherwise
specified.

Just tell me how to run it
--------------------------

.. code-block:: bash

    $ get_assemblies organism 'Pseudomonas fluorescens'

This will find all genomes tagged as 'Pseudomonas fluorescens' in NCBI's
database. By default, this command will only check to see how many genomes
fall into this category.

To download metadata for the genomes:

.. code-block:: bash

    $ get_assemblies organism 'Pseudomonas fluorescens' --function metadata

Check out the ``metadata.tab`` file that is created after running this command.
Generally you will want to select a subset from your search. One way to do this
is to select the lines that include the genomes of interest, and then saving
the assembly accessions to a file. You can either delete the lines that you
don't want, or use ``grep`` to pull out the lines that you want to keep. Then
you can use ``cut -f 15 > accs.txt`` to get the assembly accesions in a file.

.. code-block:: bash

    $ cat accs.txt | get_assemblies assembly_ids - --function genomes -o fna

This will download the nucleotide fasta files for your genomes of interest.

Overview
--------

This tool was written to make accessing genomic data from NCBI easier. The
output files are renamed such that each assembly has a Genus species strain in
the filename to make it easy to find the genomes that you're interested in. You
won't have to spend time renaming the files by hand.

This software is effectively a wrapper for the NCBI edirect tools that makes
getting genome files easier. If you are interested in starting a comparative
genomics project, this is the tool for you.

The software supports four types of input:

1. organism input, either taxonomy rank names (e.g. Genus species, Family) or
   taxids
2. assembly ids, either accessions or uids
3. nuccore ids (e.g. individual contig/chromosome names)
4. json input (e.g. the intermediate files - docsums - produced by this script)

Five file type outputs are supported:

1. Nucleotide genome sequence (fna)
2. Nucleotide coding sequence (ffn)
3. Amino acid coding sequence (faa)
4. General feature format (i.e. tab-delimited features) (gff)
5. GenBank format (gbk)

The program will attempt to find a unique prefix per genome assembly. This
prefix will be in the resulting filename. A metadata file that contains much
of the relevant information per genome will also be included. This file can
be included as a supplementary table for a manuscript in a comparative genomics
project.

If you need to make phylogenetic trees with these data, check out my other
python package, `automlsa2 <https://pypi.org/project/automlsa2/>`_.

More Examples
-------------

.. code-block:: bash

    $ get_assemblies organism 'Mycobacterium'
    2020-10-15 22:49:53,257 - INFO - Found 7522 genomes to download.
    2020-10-15 22:49:53,257 - INFO - Expect 37610MB to 52654MB of data.

.. code-block:: bash

    $ get_assemblies organism --type ID 167539 --function genomes -o gbk
    2020-10-15 23:10:13,822 - INFO - Found 1 genomes to download.
    2020-10-15 23:10:13,822 - INFO - Expect 5MB to 7MB of data pending the chosen file types for download.
    chunk: 1it [00:01,  1.21s/it]
    docsums: 100%|██████████████████████████████| 1/1 [00:00<00:00, 5146.39it/s]
    2020-10-15 23:10:16,262 - INFO - Downloading 1 files.
    100% [##################################################]           1M / 1M]
    2020-10-15 23:10:18,044 - INFO - P_marinus_CCMP1375_SS120.gbk successfully downloaded.
    download: 100%|███████████████████████████████| 1/1 [00:01<00:00,  1.78s/it]
    $ ls
    docsums0.json       metadata.tab
    get_assemblies.log  P_marinus_CCMP1375_SS120.gbk

.. code-block:: bash

    $ echo GCA_000269645.2 | get_assemblies assembly_ids -
    2020-10-15 23:18:04,107 - INFO - Found 1 genomes to download.
    2020-10-15 23:18:04,107 - INFO - Expect 5MB to 7MB of data pending the chosen file types for download.

Usage
-----

.. code-block:: bash

    $ get_assemblies -h
    usage: get_assemblies [-h] [--debug] [--version] [--dledirect [DLEDIRECT]] {organism,assembly_ids,nuccore_ids,json_input} ...

    Downloads assemblies & annotations from NCBI.

    positional arguments:
      {organism,assembly_ids,nuccore_ids,json_input}
                            Choose from this list of input types.
        organism            Valid NCBI organism or taxids.
        assembly_ids        Valid NCBI assembly IDs.
        nuccore_ids         Valid NCBI nucleotide accessions.
        json_input          Valid NCBI JSON docsums.

optional arguments:

-h, --help            show this help message and exit
--debug               Turn on debugging messages.
--version             show program's version number and exit
--dledirect <[DLEDIRECT]>
                      Download edirect to given location. [~/edirect]

Organism
^^^^^^^^

.. code-block:: bash

    $ get_assemblies organism -h
    usage: get_assemblies organism [-h] [--type {text,ID}] [--function {check,metadata,genomes} [{check,metadata,genomes} ...]]
                                   [--annotation] [--metadata_append] [--typestrain] [--keepmulti] [--force]
                                   [-f {abbr,full,strain}] [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]]
                                   [--edirect EDIRECT] [--debug]
                                   query

    positional arguments:
      query                 Valid NCBI organism text term or ID

optional arguments:

-h, --help            show this help message and exit
--type <{text,ID}>    Input is text term (default) or ID
--function <{check,metadata,genomes} [{check,metadata,genomes} ...]>
                      check counts, download metadata, or genomes. [check]
--annotation          Require annotation? False by default, True if gbk/faa/ffn requested
--metadata_append     Append to metadata, not overwrite.
--typestrain          Only download type strains.
--keepmulti           By default, genomes from large multi-isolatestudies are removed.
--force               Force download attempt of low-quality genomes.
-f <{abbr,full,strain}, --outformat {abbr,full,strain}>
                      Output file prefix. [full]
-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>
                      Output file types.
--edirect EDIRECT     Path to edirect directory.
--debug               Turn on debugging messages.

Assembly IDs
^^^^^^^^^^^^

.. code-block:: bash

    $ get_assemblies assembly_ids -h
    usage: get_assemblies assembly_ids [-h] [--type {acc,uid}]
                                       [--function {check,metadata,genomes} [{check,metadata,genomes} ...]] [--annotation]
                                       [--metadata_append] [--typestrain] [--keepmulti] [--force] [-f {abbr,full,strain}]
                                       [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]] [--edirect EDIRECT]
                                       [--debug]
                                       infile

    positional arguments:
      infile                Input file with NCBI assembly IDs; "-" for stdin

optional arguments:

-h, --help            show this help message and exit
--type <{acc,uid}>    Input is Accession (default) or ID
--function <{check,metadata,genomes} [{check,metadata,genomes} ...]>
                      check counts, download metadata, or genomes. [check]
--annotation          Require annotation? False by default, True if gbk/faa/ffn requested
--metadata_append     Append to metadata, not overwrite.
--typestrain          Only download type strains.
--keepmulti           By default, genomes from large multi-isolatestudies are removed.
--force               Force download attempt of low-quality genomes.
-f <{abbr,full,strain}, --outformat {abbr,full,strain}>
                      Output file prefix. [full]
-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>
                      Output file types.
--edirect EDIRECT     Path to edirect directory.
--debug               Turn on debugging messages.

Nucleotide IDs
^^^^^^^^^^^^^^

.. code-block:: bash

    $ get_assemblies nuccore_ids -h
    usage: get_assemblies nuccore_ids [-h] [--function {check,metadata,genomes} [{check,metadata,genomes} ...]] [--annotation]
                                      [--metadata_append] [--typestrain] [--keepmulti] [--force] [-f {abbr,full,strain}]
                                      [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]] [--edirect EDIRECT] [--debug]
                                      infile

    positional arguments:
      infile                Input file with NCBI nuccore IDs; "-" for stdin

optional arguments:

-h, --help            show this help message and exit
--function <{check,metadata,genomes} [{check,metadata,genomes} ...]>
                      check counts, download metadata, or genomes. [check]
--annotation          Require annotation? False by default, True if gbk/faa/ffn requested
--metadata_append     Append to metadata, not overwrite.
--typestrain          Only download type strains.
--keepmulti           By default, genomes from large multi-isolatestudies are removed.
--force               Force download attempt of low-quality genomes.
-f <{abbr,full,strain}, --outformat {abbr,full,strain}>
                      Output file prefix. [full]
-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>
                      Output file types.
--edirect EDIRECT     Path to edirect directory.
--debug               Turn on debugging messages.

NCBI JSON Docsum input
^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    $ get_assemblies json_input -h
    usage: get_assemblies json_input [-h] [--function {metadata,genomes} [{metadata,genomes} ...]] [--annotation]
                                     [--metadata_append] [--typestrain] [--keepmulti] [--force] [-f {abbr,full,strain}]
                                     [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]] [--edirect EDIRECT] [--debug]
                                     jsonfile [jsonfile ...]

    positional arguments:
      jsonfile              Input JSON file with docsums; "-" for stdin

optional arguments:

-h, --help            show this help message and exit
--function <{metadata,genomes} [{metadata,genomes} ...]>
                      Download metadata and/or genomes. [metadata]
--annotation          Require annotation? False by default, True if gbk/faa/ffn requested
--metadata_append     Append to metadata, not overwrite.
--typestrain          Only download type strains.
--keepmulti           By default, genomes from large multi-isolatestudies are removed.
--force               Force download attempt of low-quality genomes.
-f <{abbr,full,strain}, --outformat {abbr,full,strain}>
                      Output file prefix. [full]
-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>
                      Output file types.
--edirect EDIRECT     Path to edirect directory.
--debug               Turn on debugging messages.


Bugs
----

Viruses are currently not handled well, if at all. Look elsewhere to download
those.

Contributing
------------

Feel free to submit bug reports or pull requests so we can improve this
software. Undoubtedly there will be some erroneous prefixes generated out
there, and I'd like to fix them.

Author Contact
--------------

`Ed Davis <mailto:ed@cgrb.oregonstate.edu>`_

Acknowledgments
----------------

Special thanks for helping me test the software and get the python code packaged:

* `Alex Weisberg <https://github.com/alexweisberg>`_
* `Shawn O'Neil <https://github.com/oneilsh>`_

Also, thanks to these groups for supporting me through my scientific career:

* `OSU Chang Lab <https://github.com/osuchanglab>`_
* `Center for Genome Research and Biocomputing @ OSU <https://cgrb.oregonstate.edu>`_

License
-------

get_assemblies is distributed under the terms listed in the ``LICENSE`` file.
The software is free for non-commercial use.

Copyrights
----------

Copyright (c) 2020 Oregon State University

All Rights Reserved.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/davised/get_assemblies",
    "name": "get-assemblies",
    "maintainer": "Edward Davis",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "ed@cgrb.oregonstate.edu",
    "keywords": "",
    "author": "Edward Davis",
    "author_email": "ed@cgrb.oregonstate.edu",
    "download_url": "https://files.pythonhosted.org/packages/eb/1a/cc72d34655465015f5cfc1dea59bd1c5c3c655a69e51d410f922a90212df/get_assemblies-0.11.3.tar.gz",
    "platform": null,
    "description": "get_assemblies\n==============\n\n.. contents:: **Table of Contents**\n    :backlinks: none\n\nWho might want this software?\n-----------------------------\n\nThis software is intended to facilitate the use of the `NCBI assembly database.\n<https://www.ncbi.nlm.nih.gov/assembly>`_\n\nThe intended audience is scientific researchers, computational biologists, and\nbioinformaticians who need sequence data for comparative genomics projects.\n\nAnyone using NCBI sequence data (for reference genomes/transcriptomes) can\nbenefit as well.\n\nPhD/Masters students and undergraduates are especially encouraged to submit\nissues if they are having trouble using this software.\n\nThis software is written in python. I'm happy to help\nthose who are having trouble with software installs.\n\nWindows users, please install WSL to make use of this software. Using a Linux\ndistribution will make your life as a computational researcher significantly\neasier.\n\nInstallation\n------------\n\nget_assemblies is distributed on `PyPI\n<https://pypi.org/project/get-assemblies>`_ as a universal wheel and is\navailable on Linux/macOS and supports Python 3.7+. This software will work on\nWindows using `WSL\n<https://docs.microsoft.com/en-us/windows/wsl/install-win10>`_.\n\nget_assemblies depends on `NCBI Entrez Direct\n<https://www.ncbi.nlm.nih.gov/books/NBK179288/>`_ which **NO LONGER** requires Perl. Perl is\ninstalled by default on most \\*nix systems. If edirect is not currently\ninstalled, please run ``get_assemblies --dledirect`` to install.\n\n.. code-block:: bash\n\n    $ python3 -m pip install -U get-assemblies\n\nDependencies\n------------\n\nPython modules:\n\n1. `rich <https://github.com/willmcgugan/rich>`_\n\nSee ``requirements.txt`` for more info.\n\nExternal programs:\n\n1. `NCBI Entrez Direct <https://www.ncbi.nlm.nih.gov/books/NBK179288/>`_\n\nYou can install external programs using the ``get_assemblies --dledirect``\ncommand. These will be installed to ``${HOME}/edirect`` unless otherwise\nspecified.\n\nJust tell me how to run it\n--------------------------\n\n.. code-block:: bash\n\n    $ get_assemblies organism 'Pseudomonas fluorescens'\n\nThis will find all genomes tagged as 'Pseudomonas fluorescens' in NCBI's\ndatabase. By default, this command will only check to see how many genomes\nfall into this category.\n\nTo download metadata for the genomes:\n\n.. code-block:: bash\n\n    $ get_assemblies organism 'Pseudomonas fluorescens' --function metadata\n\nCheck out the ``metadata.tab`` file that is created after running this command.\nGenerally you will want to select a subset from your search. One way to do this\nis to select the lines that include the genomes of interest, and then saving\nthe assembly accessions to a file. You can either delete the lines that you\ndon't want, or use ``grep`` to pull out the lines that you want to keep. Then\nyou can use ``cut -f 15 > accs.txt`` to get the assembly accesions in a file.\n\n.. code-block:: bash\n\n    $ cat accs.txt | get_assemblies assembly_ids - --function genomes -o fna\n\nThis will download the nucleotide fasta files for your genomes of interest.\n\nOverview\n--------\n\nThis tool was written to make accessing genomic data from NCBI easier. The\noutput files are renamed such that each assembly has a Genus species strain in\nthe filename to make it easy to find the genomes that you're interested in. You\nwon't have to spend time renaming the files by hand.\n\nThis software is effectively a wrapper for the NCBI edirect tools that makes\ngetting genome files easier. If you are interested in starting a comparative\ngenomics project, this is the tool for you.\n\nThe software supports four types of input:\n\n1. organism input, either taxonomy rank names (e.g. Genus species, Family) or\n   taxids\n2. assembly ids, either accessions or uids\n3. nuccore ids (e.g. individual contig/chromosome names)\n4. json input (e.g. the intermediate files - docsums - produced by this script)\n\nFive file type outputs are supported:\n\n1. Nucleotide genome sequence (fna)\n2. Nucleotide coding sequence (ffn)\n3. Amino acid coding sequence (faa)\n4. General feature format (i.e. tab-delimited features) (gff)\n5. GenBank format (gbk)\n\nThe program will attempt to find a unique prefix per genome assembly. This\nprefix will be in the resulting filename. A metadata file that contains much\nof the relevant information per genome will also be included. This file can\nbe included as a supplementary table for a manuscript in a comparative genomics\nproject.\n\nIf you need to make phylogenetic trees with these data, check out my other\npython package, `automlsa2 <https://pypi.org/project/automlsa2/>`_.\n\nMore Examples\n-------------\n\n.. code-block:: bash\n\n    $ get_assemblies organism 'Mycobacterium'\n    2020-10-15 22:49:53,257 - INFO - Found 7522 genomes to download.\n    2020-10-15 22:49:53,257 - INFO - Expect 37610MB to 52654MB of data.\n\n.. code-block:: bash\n\n    $ get_assemblies organism --type ID 167539 --function genomes -o gbk\n    2020-10-15 23:10:13,822 - INFO - Found 1 genomes to download.\n    2020-10-15 23:10:13,822 - INFO - Expect 5MB to 7MB of data pending the chosen file types for download.\n    chunk: 1it [00:01,  1.21s/it]\n    docsums: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00, 5146.39it/s]\n    2020-10-15 23:10:16,262 - INFO - Downloading 1 files.\n    100% [##################################################]           1M / 1M]\n    2020-10-15 23:10:18,044 - INFO - P_marinus_CCMP1375_SS120.gbk successfully downloaded.\n    download: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:01<00:00,  1.78s/it]\n    $ ls\n    docsums0.json       metadata.tab\n    get_assemblies.log  P_marinus_CCMP1375_SS120.gbk\n\n.. code-block:: bash\n\n    $ echo GCA_000269645.2 | get_assemblies assembly_ids -\n    2020-10-15 23:18:04,107 - INFO - Found 1 genomes to download.\n    2020-10-15 23:18:04,107 - INFO - Expect 5MB to 7MB of data pending the chosen file types for download.\n\nUsage\n-----\n\n.. code-block:: bash\n\n    $ get_assemblies -h\n    usage: get_assemblies [-h] [--debug] [--version] [--dledirect [DLEDIRECT]] {organism,assembly_ids,nuccore_ids,json_input} ...\n\n    Downloads assemblies & annotations from NCBI.\n\n    positional arguments:\n      {organism,assembly_ids,nuccore_ids,json_input}\n                            Choose from this list of input types.\n        organism            Valid NCBI organism or taxids.\n        assembly_ids        Valid NCBI assembly IDs.\n        nuccore_ids         Valid NCBI nucleotide accessions.\n        json_input          Valid NCBI JSON docsums.\n\noptional arguments:\n\n-h, --help            show this help message and exit\n--debug               Turn on debugging messages.\n--version             show program's version number and exit\n--dledirect <[DLEDIRECT]>\n                      Download edirect to given location. [~/edirect]\n\nOrganism\n^^^^^^^^\n\n.. code-block:: bash\n\n    $ get_assemblies organism -h\n    usage: get_assemblies organism [-h] [--type {text,ID}] [--function {check,metadata,genomes} [{check,metadata,genomes} ...]]\n                                   [--annotation] [--metadata_append] [--typestrain] [--keepmulti] [--force]\n                                   [-f {abbr,full,strain}] [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]]\n                                   [--edirect EDIRECT] [--debug]\n                                   query\n\n    positional arguments:\n      query                 Valid NCBI organism text term or ID\n\noptional arguments:\n\n-h, --help            show this help message and exit\n--type <{text,ID}>    Input is text term (default) or ID\n--function <{check,metadata,genomes} [{check,metadata,genomes} ...]>\n                      check counts, download metadata, or genomes. [check]\n--annotation          Require annotation? False by default, True if gbk/faa/ffn requested\n--metadata_append     Append to metadata, not overwrite.\n--typestrain          Only download type strains.\n--keepmulti           By default, genomes from large multi-isolatestudies are removed.\n--force               Force download attempt of low-quality genomes.\n-f <{abbr,full,strain}, --outformat {abbr,full,strain}>\n                      Output file prefix. [full]\n-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>\n                      Output file types.\n--edirect EDIRECT     Path to edirect directory.\n--debug               Turn on debugging messages.\n\nAssembly IDs\n^^^^^^^^^^^^\n\n.. code-block:: bash\n\n    $ get_assemblies assembly_ids -h\n    usage: get_assemblies assembly_ids [-h] [--type {acc,uid}]\n                                       [--function {check,metadata,genomes} [{check,metadata,genomes} ...]] [--annotation]\n                                       [--metadata_append] [--typestrain] [--keepmulti] [--force] [-f {abbr,full,strain}]\n                                       [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]] [--edirect EDIRECT]\n                                       [--debug]\n                                       infile\n\n    positional arguments:\n      infile                Input file with NCBI assembly IDs; \"-\" for stdin\n\noptional arguments:\n\n-h, --help            show this help message and exit\n--type <{acc,uid}>    Input is Accession (default) or ID\n--function <{check,metadata,genomes} [{check,metadata,genomes} ...]>\n                      check counts, download metadata, or genomes. [check]\n--annotation          Require annotation? False by default, True if gbk/faa/ffn requested\n--metadata_append     Append to metadata, not overwrite.\n--typestrain          Only download type strains.\n--keepmulti           By default, genomes from large multi-isolatestudies are removed.\n--force               Force download attempt of low-quality genomes.\n-f <{abbr,full,strain}, --outformat {abbr,full,strain}>\n                      Output file prefix. [full]\n-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>\n                      Output file types.\n--edirect EDIRECT     Path to edirect directory.\n--debug               Turn on debugging messages.\n\nNucleotide IDs\n^^^^^^^^^^^^^^\n\n.. code-block:: bash\n\n    $ get_assemblies nuccore_ids -h\n    usage: get_assemblies nuccore_ids [-h] [--function {check,metadata,genomes} [{check,metadata,genomes} ...]] [--annotation]\n                                      [--metadata_append] [--typestrain] [--keepmulti] [--force] [-f {abbr,full,strain}]\n                                      [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]] [--edirect EDIRECT] [--debug]\n                                      infile\n\n    positional arguments:\n      infile                Input file with NCBI nuccore IDs; \"-\" for stdin\n\noptional arguments:\n\n-h, --help            show this help message and exit\n--function <{check,metadata,genomes} [{check,metadata,genomes} ...]>\n                      check counts, download metadata, or genomes. [check]\n--annotation          Require annotation? False by default, True if gbk/faa/ffn requested\n--metadata_append     Append to metadata, not overwrite.\n--typestrain          Only download type strains.\n--keepmulti           By default, genomes from large multi-isolatestudies are removed.\n--force               Force download attempt of low-quality genomes.\n-f <{abbr,full,strain}, --outformat {abbr,full,strain}>\n                      Output file prefix. [full]\n-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>\n                      Output file types.\n--edirect EDIRECT     Path to edirect directory.\n--debug               Turn on debugging messages.\n\nNCBI JSON Docsum input\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: bash\n\n    $ get_assemblies json_input -h\n    usage: get_assemblies json_input [-h] [--function {metadata,genomes} [{metadata,genomes} ...]] [--annotation]\n                                     [--metadata_append] [--typestrain] [--keepmulti] [--force] [-f {abbr,full,strain}]\n                                     [-o {fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]] [--edirect EDIRECT] [--debug]\n                                     jsonfile [jsonfile ...]\n\n    positional arguments:\n      jsonfile              Input JSON file with docsums; \"-\" for stdin\n\noptional arguments:\n\n-h, --help            show this help message and exit\n--function <{metadata,genomes} [{metadata,genomes} ...]>\n                      Download metadata and/or genomes. [metadata]\n--annotation          Require annotation? False by default, True if gbk/faa/ffn requested\n--metadata_append     Append to metadata, not overwrite.\n--typestrain          Only download type strains.\n--keepmulti           By default, genomes from large multi-isolatestudies are removed.\n--force               Force download attempt of low-quality genomes.\n-f <{abbr,full,strain}, --outformat {abbr,full,strain}>\n                      Output file prefix. [full]\n-o <{fna,ffn,gff,gbk,faa,all} [{fna,ffn,gff,gbk,faa,all} ...]>\n                      Output file types.\n--edirect EDIRECT     Path to edirect directory.\n--debug               Turn on debugging messages.\n\n\nBugs\n----\n\nViruses are currently not handled well, if at all. Look elsewhere to download\nthose.\n\nContributing\n------------\n\nFeel free to submit bug reports or pull requests so we can improve this\nsoftware. Undoubtedly there will be some erroneous prefixes generated out\nthere, and I'd like to fix them.\n\nAuthor Contact\n--------------\n\n`Ed Davis <mailto:ed@cgrb.oregonstate.edu>`_\n\nAcknowledgments\n----------------\n\nSpecial thanks for helping me test the software and get the python code packaged:\n\n* `Alex Weisberg <https://github.com/alexweisberg>`_\n* `Shawn O'Neil <https://github.com/oneilsh>`_\n\nAlso, thanks to these groups for supporting me through my scientific career:\n\n* `OSU Chang Lab <https://github.com/osuchanglab>`_\n* `Center for Genome Research and Biocomputing @ OSU <https://cgrb.oregonstate.edu>`_\n\nLicense\n-------\n\nget_assemblies is distributed under the terms listed in the ``LICENSE`` file.\nThe software is free for non-commercial use.\n\nCopyrights\n----------\n\nCopyright (c) 2020 Oregon State University\n\nAll Rights Reserved.\n",
    "bugtrack_url": null,
    "license": "Custom",
    "summary": "NCBI E-utilities wrapper for assembly downloads",
    "version": "0.11.3",
    "project_urls": {
        "Homepage": "https://github.com/davised/get_assemblies"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0abdf01d18e012e42aea8bd97645b26bf1a9db35216cec940fe1d4e9671bac3b",
                "md5": "bd123e0cdc52153dd6d33592ef6ae284",
                "sha256": "8d618b30cd3c48eef8a86e3813d88bc16fbc0478ce068072f1f09ab93829fd52"
            },
            "downloads": -1,
            "filename": "get_assemblies-0.11.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bd123e0cdc52153dd6d33592ef6ae284",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 19704,
            "upload_time": "2023-05-09T18:38:46",
            "upload_time_iso_8601": "2023-05-09T18:38:46.803570Z",
            "url": "https://files.pythonhosted.org/packages/0a/bd/f01d18e012e42aea8bd97645b26bf1a9db35216cec940fe1d4e9671bac3b/get_assemblies-0.11.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eb1acc72d34655465015f5cfc1dea59bd1c5c3c655a69e51d410f922a90212df",
                "md5": "3fb1ae7d1c17cd6bbe5cc19598726dd4",
                "sha256": "e72b715f637eb0ed1260a44e82ab018405a18905c3906e87b8e9140c1d5dbba7"
            },
            "downloads": -1,
            "filename": "get_assemblies-0.11.3.tar.gz",
            "has_sig": false,
            "md5_digest": "3fb1ae7d1c17cd6bbe5cc19598726dd4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22717,
            "upload_time": "2023-05-09T18:38:49",
            "upload_time_iso_8601": "2023-05-09T18:38:49.136743Z",
            "url": "https://files.pythonhosted.org/packages/eb/1a/cc72d34655465015f5cfc1dea59bd1c5c3c655a69e51d410f922a90212df/get_assemblies-0.11.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-09 18:38:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "davised",
    "github_project": "get_assemblies",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "get-assemblies"
}

Edward Davis