metapyrodigal


Namemetapyrodigal JSON
Version 1.4.1 PyPI version JSON
download
home_pageNone
SummaryPyrodigal cli optimized for metagenomic data
upload_time2025-01-24 21:44:16
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License Copyright (c) 2023 Cody Martin Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # README

## Introduction

This library is a simple wrapper of [pyrodigal](https://github.com/althonos/pyrodigal), which is a cythonized implementation of [prodigal](https://github.com/hyattpd/Prodigal/) that is orders of magnitudes faster.

Pyrodigal is mostly written for single genomes or FASTA files, so this tool was created to batch process metagenomic-scale datasets. Metagenomic data usually consists of large number of genome files for MAGs. Additionally, viral metagenomic datasets tend to store all single-scaffold viruses in a single file, which tends to be much larger than a typical single-genome FASTA file.

This tool uses different load balancing strategies to parallelize pyrodigal over large amounts of files (MAGs) or FASTA files that have a large number of scaffolds (viruses).

## Installation

## Install versioned releases

```bash
pip install metapyrodigal
```

## Install from source

```bash
git clone https://github.com/cody-mar10/metapyrodigal.git
cd metapyrodigal
pip install .
```

## Usage

This tool will overwrite the `pyrodigal` binary, so you can use the metagenome-focused binary that I created.

The help page from `pyrodigal -h` looks like this:

```txt
usage: pyrodigal [-h] (-i FILE [FILE ...] | -d DIR) [-o DIR] [-c INT] [--genes] [--virus-mode] [-x STR]
                 [--allow-unordered]

Find ORFs from query genomes using pyrodigal v3.5.2, the cythonized prodigal API

options:
  -h, --help            show this help message and exit
  -i FILE [FILE ...], --input FILE [FILE ...]
                        fasta file(s) of query genomes (can use unix wildcards)
  -d DIR, --input-dir DIR
                        directory of fasta files to process
  -o DIR, --outdir DIR  output directory (default: /storage2/scratch/ccmartin6/software/metapyrodigal)
  -c INT, --max-cpus INT
                        maximum number of threads to use (default: 1)
  --genes               use to also output the nucleotide genes .ffn file
  --virus-mode          use pyrodigal-gv to activate the virus models (default: False)
  -x STR, --extension STR
                        genome FASTA file extension if using -d/--input-dir (default: fna)
  --allow-unordered     for a single file input, this allows the protein ORFs to be written per scaffold as
                        available. All protein ORFs for each scaffold will be in order, but the scaffolds will not
                        necessarily be in the same order as in the input nucleotide file. **This is useful if you
                        are extremely memory limited,** since the default strategy can lead to the ORFs being
                        stored in memory for awhile before writing to file as the original scaffold order is
                        maintained. NOTE: This is about 20 percent faster, so it is recommended to use this if the
                        order of scaffolds does not matter.
```

`-i` and `-d` are mutually exclusive but one of them must be provided.

The output files have the same basename as the input file. Protein FASTA files will have the extension `.faa`, and nucleotide gene FASTA files will have the extension `.ffn`. For example:

```bash
pyrodigal -i GENOME.fna
```

will output `GENOME.faa`

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "metapyrodigal",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Cody Martin <codycmar10@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/70/5a/f161386bfd5f469b039b1ced5e94b2a5090b4e1f5fd4b343e1140c24ecd2/metapyrodigal-1.4.1.tar.gz",
    "platform": null,
    "description": "# README\n\n## Introduction\n\nThis library is a simple wrapper of [pyrodigal](https://github.com/althonos/pyrodigal), which is a cythonized implementation of [prodigal](https://github.com/hyattpd/Prodigal/) that is orders of magnitudes faster.\n\nPyrodigal is mostly written for single genomes or FASTA files, so this tool was created to batch process metagenomic-scale datasets. Metagenomic data usually consists of large number of genome files for MAGs. Additionally, viral metagenomic datasets tend to store all single-scaffold viruses in a single file, which tends to be much larger than a typical single-genome FASTA file.\n\nThis tool uses different load balancing strategies to parallelize pyrodigal over large amounts of files (MAGs) or FASTA files that have a large number of scaffolds (viruses).\n\n## Installation\n\n## Install versioned releases\n\n```bash\npip install metapyrodigal\n```\n\n## Install from source\n\n```bash\ngit clone https://github.com/cody-mar10/metapyrodigal.git\ncd metapyrodigal\npip install .\n```\n\n## Usage\n\nThis tool will overwrite the `pyrodigal` binary, so you can use the metagenome-focused binary that I created.\n\nThe help page from `pyrodigal -h` looks like this:\n\n```txt\nusage: pyrodigal [-h] (-i FILE [FILE ...] | -d DIR) [-o DIR] [-c INT] [--genes] [--virus-mode] [-x STR]\n                 [--allow-unordered]\n\nFind ORFs from query genomes using pyrodigal v3.5.2, the cythonized prodigal API\n\noptions:\n  -h, --help            show this help message and exit\n  -i FILE [FILE ...], --input FILE [FILE ...]\n                        fasta file(s) of query genomes (can use unix wildcards)\n  -d DIR, --input-dir DIR\n                        directory of fasta files to process\n  -o DIR, --outdir DIR  output directory (default: /storage2/scratch/ccmartin6/software/metapyrodigal)\n  -c INT, --max-cpus INT\n                        maximum number of threads to use (default: 1)\n  --genes               use to also output the nucleotide genes .ffn file\n  --virus-mode          use pyrodigal-gv to activate the virus models (default: False)\n  -x STR, --extension STR\n                        genome FASTA file extension if using -d/--input-dir (default: fna)\n  --allow-unordered     for a single file input, this allows the protein ORFs to be written per scaffold as\n                        available. All protein ORFs for each scaffold will be in order, but the scaffolds will not\n                        necessarily be in the same order as in the input nucleotide file. **This is useful if you\n                        are extremely memory limited,** since the default strategy can lead to the ORFs being\n                        stored in memory for awhile before writing to file as the original scaffold order is\n                        maintained. NOTE: This is about 20 percent faster, so it is recommended to use this if the\n                        order of scaffolds does not matter.\n```\n\n`-i` and `-d` are mutually exclusive but one of them must be provided.\n\nThe output files have the same basename as the input file. Protein FASTA files will have the extension `.faa`, and nucleotide gene FASTA files will have the extension `.ffn`. For example:\n\n```bash\npyrodigal -i GENOME.fna\n```\n\nwill output `GENOME.faa`\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2023 Cody Martin\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "Pyrodigal cli optimized for metagenomic data",
    "version": "1.4.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/cody-mar10/metapyrodigal/issues",
        "Homepage": "https://github.com/cody-mar10/metapyrodigal"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c31d4a8a19c6076d0814f2af065d3afb3ef12ba4fa667fc5fb8903b248998fc0",
                "md5": "3427a2f8500cc034e8b302c5eca2f5a4",
                "sha256": "345ea712ce57fc622c3d9208f4a30340b5e8d26e2a8a7cdadc3dd237d82d0b2d"
            },
            "downloads": -1,
            "filename": "metapyrodigal-1.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3427a2f8500cc034e8b302c5eca2f5a4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 9998,
            "upload_time": "2025-01-24T21:44:14",
            "upload_time_iso_8601": "2025-01-24T21:44:14.891308Z",
            "url": "https://files.pythonhosted.org/packages/c3/1d/4a8a19c6076d0814f2af065d3afb3ef12ba4fa667fc5fb8903b248998fc0/metapyrodigal-1.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "705af161386bfd5f469b039b1ced5e94b2a5090b4e1f5fd4b343e1140c24ecd2",
                "md5": "c1eee2cd48977efd8513cb3a485526d7",
                "sha256": "a28ae83ae79003079490e797ed0176deace622fa594f8abcbe9552f805de3918"
            },
            "downloads": -1,
            "filename": "metapyrodigal-1.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c1eee2cd48977efd8513cb3a485526d7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 9705,
            "upload_time": "2025-01-24T21:44:16",
            "upload_time_iso_8601": "2025-01-24T21:44:16.033660Z",
            "url": "https://files.pythonhosted.org/packages/70/5a/f161386bfd5f469b039b1ced5e94b2a5090b4e1f5fd4b343e1140c24ecd2/metapyrodigal-1.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-24 21:44:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cody-mar10",
    "github_project": "metapyrodigal",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "metapyrodigal"
}
        
Elapsed time: 0.41026s