| Name | prfect JSON |
| Version |
0.41
JSON |
| download |
| home_page | https://github.com/deprekate/prfect |
| Summary | A tool to predict programmed ribosomal frameshifts |
| upload_time | 2024-03-14 01:21:30 |
| maintainer | |
| docs_url | None |
| author | Katelyn McNair |
| requires_python | >3.5.2 |
| license | |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# prfect
PRFect is a tool to predict programmed ribosomal frameshifting in eukaryotic, prokaryotic, and viral genomes
The published manuscript is available at:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-024-05701-0
<br>
PRFect takes as input the genome and its annotated CoDing Sequences (CDS) as a GenBank file.<br>
* *If you only have a fasta file we recommend our brand new gene caller [Genotate](https://github.com/deprekate/genotate) that is* <br>
*the only gene caller that can call gene fragments*
PRFect searches through a GenBank file looking for 8 different slippery site motifs associated with
backwards (-1) frameshifts and two motifs associated with forward (+1) frameshifts. When
a motif is encountered, various cellular properties and factors are assessed and
a prediction is made whether the site is involved in programmed ribosomal frameshifting.
<br>
To install:
```
pip install prfect
```
To run:
```
prfect.py input.gbk
```
An example genome for SARS-Cov2 is provided in the test folder. The SARS-Cov2 genome contains 12 genes the first of which happens to be a PRF gene and is denoted as such through the use of the `join` keyword. Any genes already present that use the `join` keyword are split into their two parts and subsequently predicted anew and then tagged with the /label=1 feature tag to indicate a TruePositive. When the genome is run through PRFect the known PRF gene is correctly predicted to utilize programmed ribosomal frameshifting.
```
$ prfect.py test/covid19.gbk
CDS join(266..13468,13468..21555)
/ribosomal_slippage
/direction=-1
/motif=is_threethree
/slippery_sequence=tttaaac
/label=1
/locus=NC_045512
/product="ORF1ab polyprotein"
/product="ORF1ab polyprotein"
```
Another example is bacteriophage lambda, which has the *geneG* and *geneGT* tail assembly chaperone gene that is known to frameshift. The current genbank annotation file (NC_001416) does not have the gene properly denoted with the `join` keyword and so both pieces are in two separate CDS features. When the genome is run through PRFect the gene is correctly identified as being a single PRF gene with the /label=0 to indicate that it is an UnknownPositive.
```
$ prfect.py test/lambda.gbk
CDS join(9711..10115,10115..10549)
/ribosomal_slippage
/direction=-1
/motif=is_threethree
/bases=gggaaag
/label=0
/locus=NC_001416
/product="minor tail protein G"
/product="tail assembly protein T"
```
You can show all the slippery sites that PRFect checked to make sure it evaluated a given site and to see if there were any near hits.
Using the `--dump` flag will show the calculated cellular properites at each potential slippery site:
```
$ prfect.py test/lambda.gbk --dump | head
LOCUS SLIPSITE LOC LABEL N DIR RBS1 RBS2 A0 A1 LF50 HK50 LF100 HK100 PRED PROB MOTIF
NC_001416 gcaaaacgc 4278 0 159 1 13 1.8 0.015 0.025 -0.24 -0.236 -0.523 -0.306 0 1.0 three
NC_001416 ggaaagtgt 10115 0 18 -1 2 0 0.004 0.024 -0.313 -0.287 -0.668 -0.404 -1 0.88 threethree
NC_001416 gcgaaagca 31034 0 30 1 2 1.0 0.029 0.032 -0.282 -0.243 -0.477 -0.326 0 1.0 three
NC_001416 tggaaacgc 33370 0 72 1 1 0 0.015 0.028 -0.124 -0.118 -0.482 -0.36 0 1.0 three
NC_001416 cgtaaatta 33388 0 90 1 0 0 0.009 0.012 -0.15 -0.138 -0.291 -0.237 0 1.0 three
NC_001416 gcagggtgg 33442 0 144 1 0 0 0.017 0.021 -0.092 -0.039 -0.388 -0.274 0 1.0 three
NC_001416 gaaaaggag 42081 0 42 -1 0 0 0.027 0.013 -0.246 -0.149 -0.176 -0.105 0 1.0 twofour
NC_001416 aaaaccttc 42206 0 66 -1 0 0 0.015 0.014 -0.403 -0.266 -0.367 -0.249 0 1.0 fivetwo
NC_001416 cgaaaaaat 43240 0 6 1 2 0 0.019 0.023 -0.513 -0.245 -0.395 -0.294 0 0.98 four
```
The columns are:
```
LOCUS id of the sequence
SLIPSITE bases of the slippery site
LOC location within the bases of the slippery site
LABEL whether the slippery site is already annotated: 0 not a joined gene, 1 a joined gene, -1 a joined gene but is >10bp away
N distance of the slippery site from the in-frame stop codon
DIR direction of the shift
RBS1 Prodigal like ribosomal binding site interference score
RBS2 RAST like ribosomal binding site interference score
A0 frequency of the A-site codon usage in all genes
A1 frequency of the +1 A-site codon usage in all genes
LF50 normalized LinearFold minimum free energy calculation of the downstream 50bp window
LF100 normalized LinearFold minimum free energy calculation of the downstream 100bp window
HK50 normalized HotKnots minimum free energy calculation of the downstream 50bp window
HK100 normalized HotKnots minimum free energy calculation of the downstream 100bp window
PRED type of shift predicted by PRFect to occur: -1 backwards, 0 no shift, +1 forwards
PROB how sure PRFect was for the predicted (PRED) type
MOTIF slippery sequence motif
```
You can even use the flag `-s` to scale the MFE calculations to account for extreme GCcontent/temp/salinity:
```
$ prfect.py test/lambda.gbk -s 1.5 --dump | head -n 2
LOCUS SLIPSITE LOC LABEL N DIR RBS1 RBS2 A0 A1 LF50 HK50 LF100 HK100 PRED PROB MOTIF
NC_001416 gcaaaacgc 4278 0 159 1 13 1.8 0.015 0.025 -0.36 -0.354 -0.785 -0.459 0 1.0 three
NC_001416 ggaaagtgt 10115 0 18 -1 2 0 0.004 0.024 -0.47 -0.431 -1.002 -0.606 -1 0.999 threethree
```
you will notice that the MFE values were scaled by 50% when compared to the above dump, which also caused the trained model to be more confident in the backward -1 PREDiction at LOCation 10115
Raw data
{
"_id": null,
"home_page": "https://github.com/deprekate/prfect",
"name": "prfect",
"maintainer": "",
"docs_url": null,
"requires_python": ">3.5.2",
"maintainer_email": "",
"keywords": "",
"author": "Katelyn McNair",
"author_email": "deprekate@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a3/eb/066c16f97906eb440f422e9b5feedcda7a8ffe8e8a17ed6968b844a113aa/prfect-0.41.tar.gz",
"platform": null,
"description": "# prfect\nPRFect is a tool to predict programmed ribosomal frameshifting in eukaryotic, prokaryotic, and viral genomes\n\n\nThe published manuscript is available at:\nhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-024-05701-0\n\n\n<br>\n\nPRFect takes as input the genome and its annotated CoDing Sequences (CDS) as a GenBank file.<br>\n * *If you only have a fasta file we recommend our brand new gene caller [Genotate](https://github.com/deprekate/genotate) that is* <br>\n *the only gene caller that can call gene fragments*\n\nPRFect searches through a GenBank file looking for 8 different slippery site motifs associated with\nbackwards (-1) frameshifts and two motifs associated with forward (+1) frameshifts. When\na motif is encountered, various cellular properties and factors are assessed and\na prediction is made whether the site is involved in programmed ribosomal frameshifting.\n\n<br>\n\nTo install:\n```\npip install prfect\n```\nTo run:\n```\nprfect.py input.gbk\n```\nAn example genome for SARS-Cov2 is provided in the test folder. The SARS-Cov2 genome contains 12 genes the first of which happens to be a PRF gene and is denoted as such through the use of the `join` keyword. Any genes already present that use the `join` keyword are split into their two parts and subsequently predicted anew and then tagged with the /label=1 feature tag to indicate a TruePositive. When the genome is run through PRFect the known PRF gene is correctly predicted to utilize programmed ribosomal frameshifting.\n\n```\n$ prfect.py test/covid19.gbk \n\n CDS join(266..13468,13468..21555)\n /ribosomal_slippage\n /direction=-1\n /motif=is_threethree\n /slippery_sequence=tttaaac\n /label=1\n /locus=NC_045512\n /product=\"ORF1ab polyprotein\"\n /product=\"ORF1ab polyprotein\"\n\n```\n\nAnother example is bacteriophage lambda, which has the *geneG* and *geneGT* tail assembly chaperone gene that is known to frameshift. The current genbank annotation file (NC_001416) does not have the gene properly denoted with the `join` keyword and so both pieces are in two separate CDS features. When the genome is run through PRFect the gene is correctly identified as being a single PRF gene with the /label=0 to indicate that it is an UnknownPositive.\n\n```\n$ prfect.py test/lambda.gbk\n\n CDS join(9711..10115,10115..10549)\n /ribosomal_slippage\n /direction=-1\n /motif=is_threethree\n /bases=gggaaag\n /label=0\n /locus=NC_001416\n /product=\"minor tail protein G\"\n /product=\"tail assembly protein T\"\n```\n\n\nYou can show all the slippery sites that PRFect checked to make sure it evaluated a given site and to see if there were any near hits.\nUsing the `--dump` flag will show the calculated cellular properites at each potential slippery site:\n```\n$ prfect.py test/lambda.gbk --dump | head\nLOCUS SLIPSITE LOC LABEL N DIR RBS1 RBS2 A0 A1 LF50 HK50 LF100 HK100 PRED PROB MOTIF\nNC_001416 gcaaaacgc 4278 0 159 1 13 1.8 0.015 0.025 -0.24 -0.236 -0.523 -0.306 0 1.0 three\nNC_001416 ggaaagtgt 10115 0 18 -1 2 0 0.004 0.024 -0.313 -0.287 -0.668 -0.404 -1 0.88 threethree \nNC_001416 gcgaaagca 31034 0 30 1 2 1.0 0.029 0.032 -0.282 -0.243 -0.477 -0.326 0 1.0 three\nNC_001416 tggaaacgc 33370 0 72 1 1 0 0.015 0.028 -0.124 -0.118 -0.482 -0.36 0 1.0 three\nNC_001416 cgtaaatta 33388 0 90 1 0 0 0.009 0.012 -0.15 -0.138 -0.291 -0.237 0 1.0 three\nNC_001416 gcagggtgg 33442 0 144 1 0 0 0.017 0.021 -0.092 -0.039 -0.388 -0.274 0 1.0 three\nNC_001416 gaaaaggag 42081 0 42 -1 0 0 0.027 0.013 -0.246 -0.149 -0.176 -0.105 0 1.0 twofour\nNC_001416 aaaaccttc 42206 0 66 -1 0 0 0.015 0.014 -0.403 -0.266 -0.367 -0.249 0 1.0 fivetwo\nNC_001416 cgaaaaaat 43240 0 6 1 2 0 0.019 0.023 -0.513 -0.245 -0.395 -0.294 0 0.98 four\n```\n\n\n\n\nThe columns are:\n```\nLOCUS id of the sequence\nSLIPSITE bases of the slippery site\nLOC location within the bases of the slippery site\nLABEL whether the slippery site is already annotated: 0 not a joined gene, 1 a joined gene, -1 a joined gene but is >10bp away \nN distance of the slippery site from the in-frame stop codon\nDIR direction of the shift\nRBS1 Prodigal like ribosomal binding site interference score\nRBS2 RAST like ribosomal binding site interference score\nA0 frequency of the A-site codon usage in all genes\nA1 frequency of the +1 A-site codon usage in all genes\nLF50 normalized LinearFold minimum free energy calculation of the downstream 50bp window\nLF100 normalized LinearFold minimum free energy calculation of the downstream 100bp window\nHK50 normalized HotKnots minimum free energy calculation of the downstream 50bp window\nHK100 normalized HotKnots minimum free energy calculation of the downstream 100bp window\nPRED type of shift predicted by PRFect to occur: -1 backwards, 0 no shift, +1 forwards\nPROB how sure PRFect was for the predicted (PRED) type\nMOTIF slippery sequence motif\n```\n\n\nYou can even use the flag `-s` to scale the MFE calculations to account for extreme GCcontent/temp/salinity:\n```\n$ prfect.py test/lambda.gbk -s 1.5 --dump | head -n 2\nLOCUS SLIPSITE LOC LABEL N DIR RBS1 RBS2 A0 A1 LF50 HK50 LF100 HK100 PRED PROB MOTIF\nNC_001416 gcaaaacgc 4278 0 159 1 13 1.8 0.015 0.025 -0.36 -0.354 -0.785 -0.459 0 1.0 three\nNC_001416 ggaaagtgt 10115 0 18 -1 2 0 0.004 0.024 -0.47 -0.431 -1.002 -0.606 -1 0.999 threethree \n```\nyou will notice that the MFE values were scaled by 50% when compared to the above dump, which also caused the trained model to be more confident in the backward -1 PREDiction at LOCation 10115\n\n\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "A tool to predict programmed ribosomal frameshifts",
"version": "0.41",
"project_urls": {
"Homepage": "https://github.com/deprekate/prfect"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b21d067d4b28afdcb5ef13ef62272654bdd04801c9bb712bf0f84f50e0e9fbf2",
"md5": "e85f81c3ba8b502be834c865da0fc265",
"sha256": "dde67293c5a9b0c5dda4887761f3aab11d85b1bbe8551726d54095353fb72d35"
},
"downloads": -1,
"filename": "prfect-0.41-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e85f81c3ba8b502be834c865da0fc265",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">3.5.2",
"size": 910902,
"upload_time": "2024-03-14T01:21:28",
"upload_time_iso_8601": "2024-03-14T01:21:28.130925Z",
"url": "https://files.pythonhosted.org/packages/b2/1d/067d4b28afdcb5ef13ef62272654bdd04801c9bb712bf0f84f50e0e9fbf2/prfect-0.41-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a3eb066c16f97906eb440f422e9b5feedcda7a8ffe8e8a17ed6968b844a113aa",
"md5": "f62658ed2be0b3e64315c7fc18653298",
"sha256": "6a9d6c10759b5e9a5e77c205049d44c4e2aed8a11e23ebc5640955797e8a620e"
},
"downloads": -1,
"filename": "prfect-0.41.tar.gz",
"has_sig": false,
"md5_digest": "f62658ed2be0b3e64315c7fc18653298",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">3.5.2",
"size": 1012360,
"upload_time": "2024-03-14T01:21:30",
"upload_time_iso_8601": "2024-03-14T01:21:30.835876Z",
"url": "https://files.pythonhosted.org/packages/a3/eb/066c16f97906eb440f422e9b5feedcda7a8ffe8e8a17ed6968b844a113aa/prfect-0.41.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-14 01:21:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "deprekate",
"github_project": "prfect",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "prfect"
}