genotate


Namegenotate JSON
Version 0.15 PyPI version JSON
download
home_pagehttps://github.com/deprekate/genotate
SummaryA tool to annotate microbial genomes
upload_time2024-02-01 00:46:37
maintainer
docs_urlNone
authorKatelyn McNair
requires_python>3.5.2
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
Introduction
------------

Genotate is a tool to annotate prokaryotic<sup>*</sup> and phage genomes.  It uses scrolling amino-acid
windows in all six frames to distinguish between windows that belong to protein coding gene
regions and those that belong to noncoding regions, in order to determine the coding frame
at every position along the genome.
*(the bacteria/archaea model is still being trained)

To install `Genotate`,
```sh
 pip install genotate
```

And to run `Genotate` you only need to specify the FASTA formatted genome file
The command to run using the phage models on the provided phiX174 genome is:
```
 genotate.py test/phiX174.fasta -o predictions.gb
```
The command to run using the partially trained bacterial/archaeal models needs the --bacterial flag. Instead of
a FASTA formatted file, you can provide a Genbank formatted file and Genotate will use only the genomic sequence.
```
 genotate.py test/mycoplasma.gbff.gz -o predictions.gb --bacteria
```
**It is recommended to use a GPU to run Genotate since it will take a long time to for prokaryotic 
genomes.**  Genotate will automatically try to run on GPU, if one isn't found it will run on a CPU.

---

The output of `Genotate` are 'coding region' predictions in GenBank format.  They *should*
match with the true coding gene regions, but are not genes per say, since they are not based
on start and stop codons. Though they have all been trimmed to a stop codon after Genotate
determines which transation table the genome uses (i.e. if it performs stop codon readthrough).

There are three main phases to the Genotate workflow
1. window classification 
2. change-point detection
3. refinement
   * analyze stop codons
   * merge adjacent regions
   * split regions on stop
   * adjust ends to a stop

Genotate determines the translation table by analyzing the initial coding gene region 
predictions.  There are two outcomes for a stop codon that is readthrough: either the stop 
codon appears in the middle of a coding gene region or the region is broken into two pieces at 
the stop codon. If one of the three known stop codons is significantly over represented in the 
middle AND between predicted gene regions, that stop codon can be assumed to be read through. 
With the stop codon usage now known, same frame adjacent coding regions are merged if there is 
not a stop codon between them. Then the regions are split on any internal stop codons and the 
ends adjusted to the nearest stop codon.

** The opposite end is not adjusted to valid start codon since Genotate does not have a translation
initiation site detection method yet, so the beginning of a gene call may be off by a few codons


Currently the best way to visualize the predictions is in a Genome Viewer application, such
as Artemis by Sanger. The example phiX174.gb GenBank file loaded into Artemis shows the 
gene layout:

![](https://github.com/deprekate/genotate/blob/main/src/genes.png)

The predictions.gb file can then be loaded using the 'File>Read An Entry' menu, and the
predictions will be overlaid as grey 'coding regions' in the gene layout window:

![](https://github.com/deprekate/genotate/blob/main/src/predictions.png)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/deprekate/genotate",
    "name": "genotate",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">3.5.2",
    "maintainer_email": "",
    "keywords": "",
    "author": "Katelyn McNair",
    "author_email": "deprekate@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/61/85/ebd72c3048869316fcd2a43485c7c5eec5d8c3a8c610435818fa800bb9e3/genotate-0.15.tar.gz",
    "platform": null,
    "description": "\nIntroduction\n------------\n\nGenotate is a tool to annotate prokaryotic<sup>*</sup> and phage genomes.  It uses scrolling amino-acid\nwindows in all six frames to distinguish between windows that belong to protein coding gene\nregions and those that belong to noncoding regions, in order to determine the coding frame\nat every position along the genome.\n*(the bacteria/archaea model is still being trained)\n\nTo install `Genotate`,\n```sh\n pip install genotate\n```\n\nAnd to run `Genotate` you only need to specify the FASTA formatted genome file\nThe command to run using the phage models on the provided phiX174 genome is:\n```\n genotate.py test/phiX174.fasta -o predictions.gb\n```\nThe command to run using the partially trained bacterial/archaeal models needs the --bacterial flag. Instead of\na FASTA formatted file, you can provide a Genbank formatted file and Genotate will use only the genomic sequence.\n```\n genotate.py test/mycoplasma.gbff.gz -o predictions.gb --bacteria\n```\n**It is recommended to use a GPU to run Genotate since it will take a long time to for prokaryotic \ngenomes.**  Genotate will automatically try to run on GPU, if one isn't found it will run on a CPU.\n\n---\n\nThe output of `Genotate` are 'coding region' predictions in GenBank format.  They *should*\nmatch with the true coding gene regions, but are not genes per say, since they are not based\non start and stop codons. Though they have all been trimmed to a stop codon after Genotate\ndetermines which transation table the genome uses (i.e. if it performs stop codon readthrough).\n\nThere are three main phases to the Genotate workflow\n1. window classification \n2. change-point detection\n3. refinement\n   * analyze stop codons\n   * merge adjacent regions\n   * split regions on stop\n   * adjust ends to a stop\n\nGenotate determines the translation table by analyzing the initial coding gene region \npredictions.  There are two outcomes for a stop codon that is readthrough: either the stop \ncodon appears in the middle of a coding gene region or the region is broken into two pieces at \nthe stop codon. If one of the three known stop codons is significantly over represented in the \nmiddle AND between predicted gene regions, that stop codon can be assumed to be read through. \nWith the stop codon usage now known, same frame adjacent coding regions are merged if there is \nnot a stop codon between them. Then the regions are split on any internal stop codons and the \nends adjusted to the nearest stop codon.\n\n** The opposite end is not adjusted to valid start codon since Genotate does not have a translation\ninitiation site detection method yet, so the beginning of a gene call may be off by a few codons\n\n\nCurrently the best way to visualize the predictions is in a Genome Viewer application, such\nas Artemis by Sanger. The example phiX174.gb GenBank file loaded into Artemis shows the \ngene layout:\n\n![](https://github.com/deprekate/genotate/blob/main/src/genes.png)\n\nThe predictions.gb file can then be loaded using the 'File>Read An Entry' menu, and the\npredictions will be overlaid as grey 'coding regions' in the gene layout window:\n\n![](https://github.com/deprekate/genotate/blob/main/src/predictions.png)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A tool to annotate microbial genomes",
    "version": "0.15",
    "project_urls": {
        "Homepage": "https://github.com/deprekate/genotate"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6460ee5666644ec059d7c13bcb28772ddb2af39be5e5a3d43ca8cbf00719ad39",
                "md5": "3703722e36a927474a70800c3643cd7c",
                "sha256": "5e8c454944af66c5fc439de56537fae52b838e98e4a3edd67fc294471c3e58f1"
            },
            "downloads": -1,
            "filename": "genotate-0.15-cp39-cp39-macosx_12_0_x86_64.whl",
            "has_sig": false,
            "md5_digest": "3703722e36a927474a70800c3643cd7c",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">3.5.2",
            "size": 57542436,
            "upload_time": "2024-02-01T00:46:26",
            "upload_time_iso_8601": "2024-02-01T00:46:26.684407Z",
            "url": "https://files.pythonhosted.org/packages/64/60/ee5666644ec059d7c13bcb28772ddb2af39be5e5a3d43ca8cbf00719ad39/genotate-0.15-cp39-cp39-macosx_12_0_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6185ebd72c3048869316fcd2a43485c7c5eec5d8c3a8c610435818fa800bb9e3",
                "md5": "f0942059d91a0866420530f051a7ea7a",
                "sha256": "cbb90cd85d90bbb4ebf260361b99e12fa65534b6cf90779db4c1563628bf7eb7"
            },
            "downloads": -1,
            "filename": "genotate-0.15.tar.gz",
            "has_sig": false,
            "md5_digest": "f0942059d91a0866420530f051a7ea7a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.5.2",
            "size": 57543061,
            "upload_time": "2024-02-01T00:46:37",
            "upload_time_iso_8601": "2024-02-01T00:46:37.399933Z",
            "url": "https://files.pythonhosted.org/packages/61/85/ebd72c3048869316fcd2a43485c7c5eec5d8c3a8c610435818fa800bb9e3/genotate-0.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-01 00:46:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "deprekate",
    "github_project": "genotate",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "genotate"
}
        
Elapsed time: 0.37144s