Introduction
------------
Genotate is a tool to annotate prokaryotic<sup>*</sup> and phage genomes. It uses scrolling amino-acid
windows in all six frames to distinguish between windows that belong to protein coding gene
regions and those that belong to noncoding regions, in order to determine the coding frame
at every position along the genome.
*(the bacteria/archaea model is still being trained)
To install `Genotate`,
```sh
pip install genotate
```
And to run `Genotate` you only need to specify the FASTA formatted genome file
The command to run using the phage models on the provided phiX174 genome is:
```
genotate.py test/phiX174.fasta -o predictions.gb
```
The command to run using the partially trained bacterial/archaeal models needs the --bacterial flag. Instead of
a FASTA formatted file, you can provide a Genbank formatted file and Genotate will use only the genomic sequence.
```
genotate.py test/mycoplasma.gbff.gz -o predictions.gb --bacteria
```
**It is recommended to use a GPU to run Genotate since it will take a long time to for prokaryotic
genomes.** Genotate will automatically try to run on GPU, if one isn't found it will run on a CPU.
---
The output of `Genotate` are 'coding region' predictions in GenBank format. They *should*
match with the true coding gene regions, but are not genes per say, since they are not based
on start and stop codons. Though they have all been trimmed to a stop codon after Genotate
determines which transation table the genome uses (i.e. if it performs stop codon readthrough).
There are three main phases to the Genotate workflow
1. window classification
2. change-point detection
3. refinement
* analyze stop codons
* merge adjacent regions
* split regions on stop
* adjust ends to a stop
Genotate determines the translation table by analyzing the initial coding gene region
predictions. There are two outcomes for a stop codon that is readthrough: either the stop
codon appears in the middle of a coding gene region or the region is broken into two pieces at
the stop codon. If one of the three known stop codons is significantly over represented in the
middle AND between predicted gene regions, that stop codon can be assumed to be read through.
With the stop codon usage now known, same frame adjacent coding regions are merged if there is
not a stop codon between them. Then the regions are split on any internal stop codons and the
ends adjusted to the nearest stop codon.
** The opposite end is not adjusted to valid start codon since Genotate does not have a translation
initiation site detection method yet, so the beginning of a gene call may be off by a few codons
Currently the best way to visualize the predictions is in a Genome Viewer application, such
as Artemis by Sanger. The example phiX174.gb GenBank file loaded into Artemis shows the
gene layout:

The predictions.gb file can then be loaded using the 'File>Read An Entry' menu, and the
predictions will be overlaid as grey 'coding regions' in the gene layout window:

Raw data
{
"_id": null,
"home_page": "https://github.com/deprekate/genotate",
"name": "genotate",
"maintainer": "",
"docs_url": null,
"requires_python": ">3.5.2",
"maintainer_email": "",
"keywords": "",
"author": "Katelyn McNair",
"author_email": "deprekate@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/61/85/ebd72c3048869316fcd2a43485c7c5eec5d8c3a8c610435818fa800bb9e3/genotate-0.15.tar.gz",
"platform": null,
"description": "\nIntroduction\n------------\n\nGenotate is a tool to annotate prokaryotic<sup>*</sup> and phage genomes. It uses scrolling amino-acid\nwindows in all six frames to distinguish between windows that belong to protein coding gene\nregions and those that belong to noncoding regions, in order to determine the coding frame\nat every position along the genome.\n*(the bacteria/archaea model is still being trained)\n\nTo install `Genotate`,\n```sh\n pip install genotate\n```\n\nAnd to run `Genotate` you only need to specify the FASTA formatted genome file\nThe command to run using the phage models on the provided phiX174 genome is:\n```\n genotate.py test/phiX174.fasta -o predictions.gb\n```\nThe command to run using the partially trained bacterial/archaeal models needs the --bacterial flag. Instead of\na FASTA formatted file, you can provide a Genbank formatted file and Genotate will use only the genomic sequence.\n```\n genotate.py test/mycoplasma.gbff.gz -o predictions.gb --bacteria\n```\n**It is recommended to use a GPU to run Genotate since it will take a long time to for prokaryotic \ngenomes.** Genotate will automatically try to run on GPU, if one isn't found it will run on a CPU.\n\n---\n\nThe output of `Genotate` are 'coding region' predictions in GenBank format. They *should*\nmatch with the true coding gene regions, but are not genes per say, since they are not based\non start and stop codons. Though they have all been trimmed to a stop codon after Genotate\ndetermines which transation table the genome uses (i.e. if it performs stop codon readthrough).\n\nThere are three main phases to the Genotate workflow\n1. window classification \n2. change-point detection\n3. refinement\n * analyze stop codons\n * merge adjacent regions\n * split regions on stop\n * adjust ends to a stop\n\nGenotate determines the translation table by analyzing the initial coding gene region \npredictions. There are two outcomes for a stop codon that is readthrough: either the stop \ncodon appears in the middle of a coding gene region or the region is broken into two pieces at \nthe stop codon. If one of the three known stop codons is significantly over represented in the \nmiddle AND between predicted gene regions, that stop codon can be assumed to be read through. \nWith the stop codon usage now known, same frame adjacent coding regions are merged if there is \nnot a stop codon between them. Then the regions are split on any internal stop codons and the \nends adjusted to the nearest stop codon.\n\n** The opposite end is not adjusted to valid start codon since Genotate does not have a translation\ninitiation site detection method yet, so the beginning of a gene call may be off by a few codons\n\n\nCurrently the best way to visualize the predictions is in a Genome Viewer application, such\nas Artemis by Sanger. The example phiX174.gb GenBank file loaded into Artemis shows the \ngene layout:\n\n\n\nThe predictions.gb file can then be loaded using the 'File>Read An Entry' menu, and the\npredictions will be overlaid as grey 'coding regions' in the gene layout window:\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "A tool to annotate microbial genomes",
"version": "0.15",
"project_urls": {
"Homepage": "https://github.com/deprekate/genotate"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6460ee5666644ec059d7c13bcb28772ddb2af39be5e5a3d43ca8cbf00719ad39",
"md5": "3703722e36a927474a70800c3643cd7c",
"sha256": "5e8c454944af66c5fc439de56537fae52b838e98e4a3edd67fc294471c3e58f1"
},
"downloads": -1,
"filename": "genotate-0.15-cp39-cp39-macosx_12_0_x86_64.whl",
"has_sig": false,
"md5_digest": "3703722e36a927474a70800c3643cd7c",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">3.5.2",
"size": 57542436,
"upload_time": "2024-02-01T00:46:26",
"upload_time_iso_8601": "2024-02-01T00:46:26.684407Z",
"url": "https://files.pythonhosted.org/packages/64/60/ee5666644ec059d7c13bcb28772ddb2af39be5e5a3d43ca8cbf00719ad39/genotate-0.15-cp39-cp39-macosx_12_0_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6185ebd72c3048869316fcd2a43485c7c5eec5d8c3a8c610435818fa800bb9e3",
"md5": "f0942059d91a0866420530f051a7ea7a",
"sha256": "cbb90cd85d90bbb4ebf260361b99e12fa65534b6cf90779db4c1563628bf7eb7"
},
"downloads": -1,
"filename": "genotate-0.15.tar.gz",
"has_sig": false,
"md5_digest": "f0942059d91a0866420530f051a7ea7a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">3.5.2",
"size": 57543061,
"upload_time": "2024-02-01T00:46:37",
"upload_time_iso_8601": "2024-02-01T00:46:37.399933Z",
"url": "https://files.pythonhosted.org/packages/61/85/ebd72c3048869316fcd2a43485c7c5eec5d8c3a8c610435818fa800bb9e3/genotate-0.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-01 00:46:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "deprekate",
"github_project": "genotate",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "genotate"
}