Name | StORF-Reporter JSON |
Version |
1.4.2
JSON |
| download |
home_page | https://github.com/NickJD/StORF-Reporter |
Summary | StORF-Reporter - A a tool that takes an annotated genome and returns missing CDS genes (Stop-to-Stop) from unannotated regions. |
upload_time | 2024-10-23 18:27:13 |
maintainer | None |
docs_url | None |
author | Nicholas Dimonaco |
requires_python | >=3.6 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# StORF-Reporter has now been published in NAR: https://doi.org/10.1093/nar/gkad814
### StORF-Reporter, a toolkit that returns missed CDS genes from the Unannotated Regions (URs) of prokaryotic genomes.
# Please use `pip3 install StORF-Reporter' to install StORF-Reporter.
### This will also install the python-standard library numpy (>=1.22.0,<1.24.0), Pyrodigal - (https://github.com/althonos/pyrodigal) and ORForise (https://github.com/NickJD/ORForise).
### Consider using '--no-cache-dir' with pip to ensure the download of the newest version of StORF-Reporter.
## Please Note: To report Con-StORFs (Pseudogenes and genes that have alternative use of stop codons), use "-con_storfs True". To disable the reporting of StORFs use "-con_only".
### The directory "Test_Datasets" is provided to confirm functionality of StORF-Reporter.
#############################################################
# StORF-Reporter:
<img src="./Visual_Abstract.jpg" width="75%" height="50%"/>
## Most common use cases -
### Supplement a current annotation from a tool such as Prokka or Bakta. A new GFF file will be created compatible with downstream pangenome analysis tools such as Roary and Panaroo.
#### For use on a single Prokka/Bakta output directory - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.
```console
StORF-Reporter -anno Prokka Out_Dir -p .../Test_Datasets/Prokka_E-coli/
```
#### For use on multiple Prokka/Bakta output directies - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.
```console
StORF-Reporter -anno Prokka Multiple_Out_Dirs -p ../Test_Datasets/Multi_Prokka_Outs
```
#### For use on a directory containing multiple Prokka/Bakta output gffs - Only produces new GFF files.
```console
StORF-Reporter -anno Prokka Multiple_GFFs -p .../Test_Datasets/Prokka_Outputs/
```
#### For use on a GFF file from a CDS prediction tool such as Prodigal - Provide a GFF file and StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name).
```console
StORF-Reporter -anno Feature_Types Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/Myco.gff
```
#### For use on a directory containing multiple GFF files from a CDS prediction tool such as Prodigal - StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name).
```console
StORF-Reporter -anno Feature_Types Multiple_Genomes -p .../Test_Datasets/Matching_GFF_FASTA/
```
#### For use on a directory containing multiple GFF files with embedded FASTA.
```console
StORF-Reporter -anno Feature_Types Multiple_Combined_GFFs -p .../Test_Datasets/Combined_GFFs/
```
#### To perform a fresh end-to-end annotation of a genome without an annotation, StORF-Reporter will use Pyrodigal to predict CDS genes and then supplement with StORFs.
```console
StORF-Reporter -anno Pyrodigal Single_FASTA -p .../Test_Datasets/Pyrodigal/E-coli.fa
```
### Menu - (StORF-Reporter -h):
```console
StORF-Reporter -anno Ensembl Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff
```
```python
usage: StORF_Reporter.py [-h]
[-anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]]
[-p PATH] [-af ALT_FILENAME] [-oname O_NAME] [-odir O_DIR] [-sout {True,False}] [-lw {True,False}] [-aa {True,False}] [-gz {True,False}]
[-py_train [{longest,individual,meta}]] [-py_fasta {True,False}] [-py_unstorfed {True,False}] [-gene_ident GENE_IDENT] [-min_len MINLEN]
[-max_len MAXLEN] [-ex_len EXLEN] [-spos {True,False}] [-rs {True,False}] [-con_storfs {True,False}] [-con_only {True,False}]
[-ps {True,False}] [-wc {True,False}] [-short_storfs {False,Nolap,Olap}] [-short_storfs_only {True,False}] [-minorf MIN_ORF]
[-maxorf MAX_ORF] [-codons STOP_CODONS] [-olap_filt [{none,single-strand,both-strand}]] [-start_filt {True,False}]
[-so [{start_pos,strand}]] [-f_type [{StORF,CDS,ORF}]] [-non_standard NON_STANDARD] [-olap OVERLAP_NT] [-ao ALLOWED_OVERLAP]
[-overwrite {True,False}] [-verbose {True,False}] [-v]
StORF-Reporter v1.4.2: StORF-Reporter Run Parameters.
Required Options:
-anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]
Select Annotation and Input options for one of the 3 options listed below
### Prokka/Bakta Annotation Option 1:
Prokka = Report StORFs for a Prokka annotation;
Bakta = Report StORFs for a Bakta annotation;
--- Prokka/Bakta Input Options:
Out_Dir = To provide the output directory of either a Prokka or Bakta run (will produce a new GFF and FASTA file containing original and extended annotations);
Multiple_Out_Dirs = To provide a directory containing multiple Prokka/Bakta standard output directories - Will run on each sequentially;
Single_GFF = To provide a single Prokka or Bakta GFF - searches for accompanying ".fna" file (will provide a new extended GFF);
Multiple_GFFs = To provide a directory containing multiple Prokka or Bakta GFF files - searches for accompanying ".fna" files (will provide a new extended GFF);
### Standard GFF Annotation Option 2:
Ensembl = Report StORFs for an Ensembl Bacteria annotation (ID=gene);
Feature_Types = Used in conjunction with -gene_ident to define features such as CDS,rRNA,tRNA for UR extraction (default CDS);
--- Standard GFF Input Options:
Single_Genome = To provide a single Genome - accompanying FASTA must share same name as given gff file (can be .fna, .fa or .fasta);
Multiple_Genomes = To provide a directory containing multiple accompanying GFF and FASTA files - files must share the same name (fasta can be .fna, .fa or .fasta);
Single_Combined_GFF = To provide a GFF file with embedded FASTA at the bottom;
Multiple_Combined_GFFs = To provide a directory containing multiple GFF files with embedded FASTA at the bottom;
### Complete Annotation Option 3:
Pyrodigal = Run Pyrodigal then Report StORFs (provide path to single FASTA or directory of multiple FASTA files ;
--- Complete Annotation Input Options:
Single_FASTA = To provide a single FASTA file;
Multiple_FASTA = To provide a directory containing multiple FASTA files (will detect .fna,.fa,.fasta);
-p PATH Provide input file or directory path
StORF-Reporter Options:
-af ALT_FILENAME Default - Prokka/Bakta output directory share the same prefix with their gff/fna files - Use this option when Prokka/Bakta output
directory name is different from the gff/fna files within and StORF-Reporter will search for the gff/fna with the given prefix
(MyProkkaDir/"altname".gff) - Does not work with "Multiple_Out_Dirs" option
-oname O_NAME Default - Appends '_StORF-Reporter_Extended' to end of input filename - Takes the directory name of Prokka/Bakta output if given as input
or the input for -af if given - Multiple_* runs will be numbered
-odir O_DIR Default - Same directory as input
-sout {True,False} Default - False: Print out StORF sequences separately from Prokka/Bakta annotations
-lw {True,False} Default - True: Line wrap FASTA sequence output at 60 chars
-aa {True,False} Default - False: Report StORFs as amino acid sequences
-gz {True,False} Default - False: Output as .gz
Pyrodigal Options:
-py_train [{longest,individual,meta}]
Default - longest: Type of model training to be done for Pyrodigal CDS prediction: Options: longest = Trains on longest contig;
individual = Trains on each contig separately - runs in meta mode if contig is < 20KB; meta = Runs in meta mode for all sequences
-py_fasta {True,False}
Default - False: Output Pyrodigal+StORF predictions in FASTA format
-py_unstorfed {True,False}
Default - False: Provide GFF containing original Pyrodigal predictions
UR-Extractor Options:
-gene_ident GENE_IDENT
Identifier used for extraction of Unannotated Regions such as
"misc_RNA,gene,mRNA,CDS,rRNA,tRNA,tmRNA,CRISPR,ncRNA,regulatory_region,oriC,pseudo" - To be used with "-anno Feature_Types" -
"-gene_ident Prokka" will select features present in Prokka annotations
-min_len MINLEN Default - 30: Minimum UR Length
-max_len MAXLEN Default - 100,000: Maximum UR Length
-ex_len EXLEN Default - 50: UR Extension Length
StORF-Finder Options:
-spos {True,False} Default - False: Output StORF sequences and GFF positions inclusive of first stop codon -This can break some downstream tools if changed
to True.
-rs {True,False} Default - True: Remove stop "*" from StORF amino acid sequences
-con_storfs {True,False}
Default - False: Output Consecutive StORFs
-con_only {True,False}
Default - False: Only output Consecutive StORFs
-ps {True,False} Default - False: Partial StORFs reported
-wc {True,False} Default - False: StORFs reported across entire sequence
-short_storfs {False,Nolap,Olap}
Default - False: Run StORF-Finder in "Short-StORF" mode. Will only return StORFs between 30 and 120 nt that do not overlap longer StORFs
- Only works with StORFs for now. "Nolap" will filter Short-StORFs which areoverlapped by StORFs and Olap will report Short-StORFs which
do overlap StORFs. Overlap is defined by "-olap".
-short_storfs_only {True,False}
Default - True. Only report Short-StORFs?
-minorf MIN_ORF Default - 99: Minimum StORF size in nt
-maxorf MAX_ORF Default - 60kb: Maximum StORF size in nt
-codons STOP_CODONS Default - ('TAG,TGA,TAA'): List Stop Codons to use
-olap_filt [{none,single-strand,both-strand}]
Default - "both-strand": Filtering level "none" is not recommended, "single-strand" for single strand filtering and both-strand for both-
strand longest-first tiling
-start_filt {True,False}
Default - False: Filter out StORFs without at least one of the 3 common start codons (best used for short-storfs).
-so [{start_pos,strand}]
Default - Start Position: How should StORFs be ordered when >1 reported in a single UR.
-f_type [{StORF,CDS,ORF}]
Default - "CDS": Which GFF feature type for StORFs to be reported as in GFF - "CDS" is probably needed for use in tools such as Roary and
Panaroo
-non_standard NON_STANDARD
Default - 0.20: Reject StORFs with >=20% non-standard nucleotides (A,T,G,C) - Provide % as decimal
-olap OVERLAP_NT Default - 50: Maximum number of nt of a StORF which can overlap another StORF.
-ao ALLOWED_OVERLAP Default - 50 nt: Maximum overlap between a StORF and an original gene.
Misc:
-overwrite {True,False}
Default - False: Overwrite StORF-Reporter output if already present
-verbose {True,False}
Default - False: Print out runtime messages
-v Print out version number and exit
```
###################################
# UR-Extractor:
### Subpackage to extract Unannotated Regions from DNA sequences using FASTA and GFF files as input.
### Menu - (UR-Extractor -h):
```console
UR-Extractor -f .../Test_Datasets/Matching_GFF_FASTA/E-coli.fa -gff .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff
```
```python
usage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-aa {True,False}]
[-lw {True,False}] [-stop_ident {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]
[-verbose {True,False}] [-v]
Single_Genome v1.4.1: StORF-Extractor Run Parameters.
Required Arguments:
-storf_input {Combined,Separate}
Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?
-p PATH Provide input files or directory path
Output:
-gff_out {True,False}
Default - False: Output StORFs in GFF format
-aa {True,False} Default - False: Report StORFs as amino acid sequences
-lw {True,False} Default - True: Line wrap FASTA sequence output at 60 chars
-stop_ident {True,False}
Default - True: Identify Stop Codon positions with '*'
-oname O_NAME Default - Appends '_Extracted_StORFs' to end of input GFF filename
-odir O_DIR Default - Same directory as input FASTA
-gz {True,False} Default - False: Output as .gz
Misc:
-verbose {True,False}
Default - False: Print out runtime messages
-v Default - False: Print out version number and exit
```
## StORF-Finder:
### Subpackage to extract StORFs from Fasta sequences - Works directly with the output of UR-Extractor.
### Menu - (StORF-Finder -h):
```console
StORF-Finder -f .../Test_Datasets/Matching_GFF_FASTA/E-coli_UR.fa
```
```python
usage: StORF_Finder.py [-h] -f FASTA [-ua {True,False}] [-wc {True,False}]
[-ps {True,False}]
[-olap_filt [{none,single-strand,both-strand}]]
[-start_filt {True,False}] [-con_storfs {True,False}]
[-con_only {True,False}]
[-short_storfs {False,Nolap,Olap}]
[-short_storfs_only {True,False}]
[-f_type [{StORF,CDS,ORF}]] [-minorf MIN_ORF]
[-maxorf MAX_ORF] [-codons STOP_CODONS]
[-non_standard NON_STANDARD] [-olap OVERLAP_NT]
[-s SUFFIX] [-so [{start_pos,strand}]] [-oname O_NAME]
[-odir O_DIR] [-gff {True,False}] [-aa {True,False}]
[-aa_only {True,False}] [-lw {True,False}]
[-spos {True,False}] [-stop_ident {True,False}]
[-gff_fasta {True,False}] [-gz {True,False}]
[-verbose {True,False}] [-v]
Single_Genome v1.4.2: StORF-Finder Run Parameters.
Required Arguments:
-f FASTA Input FASTA File - (UR_Extractor output)
Optional Arguments:
-ua {True,False} Default - Treat input as Unannotated: Use "-ua False"
for standard fasta
-wc {True,False} Default - False: StORFs reported across entire
sequence
-ps {True,False} Default - False: Partial StORFs reported
-olap_filt [{none,single-strand,both-strand}]
Default - "both-strand": Filtering level "none" is not
recommended, "single-strand" for single strand
filtering and both-strand for both-strand longest-
first tiling
-start_filt {True,False}
Default - False: Filter out StORFs without at least
one of the 3 common start codons (best used for short-
storfs).
-con_storfs {True,False}
Default - False: Output Consecutive StORFs
-con_only {True,False}
Default - False: Only output Consecutive StORFs
-short_storfs {False,Nolap,Olap}
Default - False: Run StORF-Finder in "Short-StORF"
mode. Will only return StORFs between 30 and 120 nt
that do not overlap longer StORFs - Only works with
StORFs for now. "Nolap" will filter Short-StORFs which
areoverlapped by StORFs and Olap will report Short-
StORFs which do overlap StORFs. Overlap is defined by
"-olap".
-short_storfs_only {True,False}
Default - True. Only report Short-StORFs?
-f_type [{StORF,CDS,ORF}]
Default - "StORF": Which GFF feature type for StORFs
to be reported as in GFF
-minorf MIN_ORF Default - 99: Minimum StORF size in nt
-maxorf MAX_ORF Default - 60kb: Maximum StORF size in nt
-codons STOP_CODONS Default - ('TAG,TGA,TAA'): List Stop Codons to use
-non_standard NON_STANDARD
Default - 0.20: Reject StORFs with >=20% non-standard
nucleotides (A,T,G,C) - Provide % as decimal
-olap OVERLAP_NT Default - 50: Maximum number of nt of a StORF which
can overlap another StORF.
-s SUFFIX Default - Do not append suffix to genome ID
-so [{start_pos,strand}]
Default - Start Position: How should StORFs be ordered
when >1 reported in a single UR.
Output:
-oname O_NAME Default - Appends '_StORF-Finder' to end of input
FASTA filename
-odir O_DIR Default - Same directory as input FASTA
-gff {True,False} Default - True: Output a GFF file
-aa {True,False} Default - False: Report StORFs as amino acid sequences
-aa_only {True,False}
Default - False: Only output Amino Acid Fasta
-lw {True,False} Default - True: Line wrap FASTA sequence output at 60
chars
-spos {True,False} Default - False: Output StORF sequences and GFF
positions inclusive of first stop codon -This can
break some downstream tools if changed to True.
-stop_ident {True,False}
Default - True: Identify Stop Codon positions with '*'
-gff_fasta {True,False}
Default - False: Report all gene sequences (nt) at the
bottom of GFF files in Prokka output mode
-gz {True,False} Default - False: Output as .gz
Misc:
-verbose {True,False}
Default - False: Print out runtime messages
-v Default - False: Print out version number and exit
```
## StORF-Extractor
Subpackage to extract sequences reported by StORF-Reporter from a genome annotation.
### Menu - (StORF-Extractor -h):
```console
StORF-Extractor -storf_input Combined -p .../Test_Datasets/Combined_GFFs/E-coli_Combined_StORF-Reporter_Extended.gff
```
```python
usage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}] [-verbose {True,False}] [-v]
StORF-Reporter v1.4.1: StORF-Extractor Run Parameters.
Required Arguments:
-storf_input {Combined,Separate}
Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?
-p PATH Provide input file or directory path
Output:
-gff_out {True,False}
Default - False: Output StORFs in GFF format
-oname O_NAME Default - Appends '_Extracted_StORFs' to end of input GFF filename
-odir O_DIR Default - Same directory as input FASTA
-gz {True,False} Default - False: Output as .gz
Misc:
-verbose {True,False}
Default - False: Print out runtime messages
-v Default - False: Print out version number and exit
```
## StORF-Remover
Subpackage to remove sequences reported by StORF-Reporter without a Blast/Diamond hit (any alignment in BLAST 6 format).
### Menu - (StORF-Remover -h):
```console
StORF-Remover -gff .../Test_Datasets/StORF_Extractor_And_Remover/Myco_UR_StORF-R.gff -blast .../Test_Datasets/StORF_Extractor_And_Remover/Myco_URs_StORFs_aa_Swiss.tab
```
```python
usage: StORF_Remover.py [-h] [-gff GFF] [-blast BLAST] [-min_score MINSCORE] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]
[-verbose {True,False}] [-v]
StORF-Reporter v1.4.1: UR-Remover Run Parameters.
Required Arguments:
-gff GFF GFF annotation file for the FASTA
-blast BLAST BLAST format 6 annotation file
Optional Arguments:
-min_score MINSCORE Minimum BitScore to keep StORF: Default 30
Output:
-oname O_NAME Default - Appends '_UR' to end of input GFF filename
-odir O_DIR Default - Same directory as input GFF
-gz {True,False} Default - False: Output as .gz
Misc:
-verbose {True,False}
Default - False: Print out runtime messages
-v Default - False: Print out version number and exit
```
## Test Datasets:
### The directory 'Test_Datasets' contains GFF and FASTA files to test the installation and use of StORF-Reporter - Example output files are also provided for comparison.
Raw data
{
"_id": null,
"home_page": "https://github.com/NickJD/StORF-Reporter",
"name": "StORF-Reporter",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Nicholas Dimonaco",
"author_email": "nicholas@dimonaco.co.uk",
"download_url": "https://files.pythonhosted.org/packages/f9/ee/31b49d7f8af683fd12d487dc746f639c83e0ecad8a3a2cf1f4dcd7b28a83/storf_reporter-1.4.2.tar.gz",
"platform": null,
"description": "# StORF-Reporter has now been published in NAR: https://doi.org/10.1093/nar/gkad814\n\n### StORF-Reporter, a toolkit that returns missed CDS genes from the Unannotated Regions (URs) of prokaryotic genomes.\n\n# Please use `pip3 install StORF-Reporter' to install StORF-Reporter.\n### This will also install the python-standard library numpy (>=1.22.0,<1.24.0), Pyrodigal - (https://github.com/althonos/pyrodigal) and ORForise (https://github.com/NickJD/ORForise). \n\n### Consider using '--no-cache-dir' with pip to ensure the download of the newest version of StORF-Reporter.\n\n## Please Note: To report Con-StORFs (Pseudogenes and genes that have alternative use of stop codons), use \"-con_storfs True\". To disable the reporting of StORFs use \"-con_only\". \n\n### The directory \"Test_Datasets\" is provided to confirm functionality of StORF-Reporter.\n\n#############################################################\n# StORF-Reporter:\n<img src=\"./Visual_Abstract.jpg\" width=\"75%\" height=\"50%\"/>\n\n## Most common use cases - \n### Supplement a current annotation from a tool such as Prokka or Bakta. A new GFF file will be created compatible with downstream pangenome analysis tools such as Roary and Panaroo.\n\n#### For use on a single Prokka/Bakta output directory - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.\n```console\nStORF-Reporter -anno Prokka Out_Dir -p .../Test_Datasets/Prokka_E-coli/\n```\n#### For use on multiple Prokka/Bakta output directies - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.\n```console\nStORF-Reporter -anno Prokka Multiple_Out_Dirs -p ../Test_Datasets/Multi_Prokka_Outs\n```\n#### For use on a directory containing multiple Prokka/Bakta output gffs - Only produces new GFF files. \n```console\nStORF-Reporter -anno Prokka Multiple_GFFs -p .../Test_Datasets/Prokka_Outputs/\n```\n\n#### For use on a GFF file from a CDS prediction tool such as Prodigal - Provide a GFF file and StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name). \n```console\nStORF-Reporter -anno Feature_Types Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/Myco.gff\n```\n\n#### For use on a directory containing multiple GFF files from a CDS prediction tool such as Prodigal - StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name). \n```console\nStORF-Reporter -anno Feature_Types Multiple_Genomes -p .../Test_Datasets/Matching_GFF_FASTA/\n```\n\n#### For use on a directory containing multiple GFF files with embedded FASTA. \n```console\nStORF-Reporter -anno Feature_Types Multiple_Combined_GFFs -p .../Test_Datasets/Combined_GFFs/\n```\n\n#### To perform a fresh end-to-end annotation of a genome without an annotation, StORF-Reporter will use Pyrodigal to predict CDS genes and then supplement with StORFs. \n```console\nStORF-Reporter -anno Pyrodigal Single_FASTA -p .../Test_Datasets/Pyrodigal/E-coli.fa\n```\n\n### Menu - (StORF-Reporter -h):\n```console\nStORF-Reporter -anno Ensembl Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff\n```\n```python\nusage: StORF_Reporter.py [-h]\n [-anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]]\n [-p PATH] [-af ALT_FILENAME] [-oname O_NAME] [-odir O_DIR] [-sout {True,False}] [-lw {True,False}] [-aa {True,False}] [-gz {True,False}]\n [-py_train [{longest,individual,meta}]] [-py_fasta {True,False}] [-py_unstorfed {True,False}] [-gene_ident GENE_IDENT] [-min_len MINLEN]\n [-max_len MAXLEN] [-ex_len EXLEN] [-spos {True,False}] [-rs {True,False}] [-con_storfs {True,False}] [-con_only {True,False}]\n [-ps {True,False}] [-wc {True,False}] [-short_storfs {False,Nolap,Olap}] [-short_storfs_only {True,False}] [-minorf MIN_ORF]\n [-maxorf MAX_ORF] [-codons STOP_CODONS] [-olap_filt [{none,single-strand,both-strand}]] [-start_filt {True,False}]\n [-so [{start_pos,strand}]] [-f_type [{StORF,CDS,ORF}]] [-non_standard NON_STANDARD] [-olap OVERLAP_NT] [-ao ALLOWED_OVERLAP]\n [-overwrite {True,False}] [-verbose {True,False}] [-v]\n\nStORF-Reporter v1.4.2: StORF-Reporter Run Parameters.\n\nRequired Options:\n -anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]\n Select Annotation and Input options for one of the 3 options listed below\n ### Prokka/Bakta Annotation Option 1: \n \tProkka = Report StORFs for a Prokka annotation; \n \tBakta = Report StORFs for a Bakta annotation; \n --- Prokka/Bakta Input Options: \n \tOut_Dir = To provide the output directory of either a Prokka or Bakta run (will produce a new GFF and FASTA file containing original and extended annotations); \n \tMultiple_Out_Dirs = To provide a directory containing multiple Prokka/Bakta standard output directories - Will run on each sequentially; \n \tSingle_GFF = To provide a single Prokka or Bakta GFF - searches for accompanying \".fna\" file (will provide a new extended GFF); \n \tMultiple_GFFs = To provide a directory containing multiple Prokka or Bakta GFF files - searches for accompanying \".fna\" files (will provide a new extended GFF); \n \n ### Standard GFF Annotation Option 2: \n \tEnsembl = Report StORFs for an Ensembl Bacteria annotation (ID=gene); \n \tFeature_Types = Used in conjunction with -gene_ident to define features such as CDS,rRNA,tRNA for UR extraction (default CDS); \n --- Standard GFF Input Options: \n \tSingle_Genome = To provide a single Genome - accompanying FASTA must share same name as given gff file (can be .fna, .fa or .fasta); \n \tMultiple_Genomes = To provide a directory containing multiple accompanying GFF and FASTA files - files must share the same name (fasta can be .fna, .fa or .fasta); \n \tSingle_Combined_GFF = To provide a GFF file with embedded FASTA at the bottom; \n \tMultiple_Combined_GFFs = To provide a directory containing multiple GFF files with embedded FASTA at the bottom; \n \n ### Complete Annotation Option 3: \n \tPyrodigal = Run Pyrodigal then Report StORFs (provide path to single FASTA or directory of multiple FASTA files ;\n --- Complete Annotation Input Options: \n \tSingle_FASTA = To provide a single FASTA file; \n \tMultiple_FASTA = To provide a directory containing multiple FASTA files (will detect .fna,.fa,.fasta); \n \n -p PATH Provide input file or directory path\n\nStORF-Reporter Options:\n -af ALT_FILENAME Default - Prokka/Bakta output directory share the same prefix with their gff/fna files - Use this option when Prokka/Bakta output\n directory name is different from the gff/fna files within and StORF-Reporter will search for the gff/fna with the given prefix\n (MyProkkaDir/\"altname\".gff) - Does not work with \"Multiple_Out_Dirs\" option\n -oname O_NAME Default - Appends '_StORF-Reporter_Extended' to end of input filename - Takes the directory name of Prokka/Bakta output if given as input\n or the input for -af if given - Multiple_* runs will be numbered\n -odir O_DIR Default - Same directory as input\n -sout {True,False} Default - False: Print out StORF sequences separately from Prokka/Bakta annotations\n -lw {True,False} Default - True: Line wrap FASTA sequence output at 60 chars\n -aa {True,False} Default - False: Report StORFs as amino acid sequences\n -gz {True,False} Default - False: Output as .gz\n\nPyrodigal Options:\n -py_train [{longest,individual,meta}]\n Default - longest: Type of model training to be done for Pyrodigal CDS prediction: Options: longest = Trains on longest contig;\n individual = Trains on each contig separately - runs in meta mode if contig is < 20KB; meta = Runs in meta mode for all sequences\n -py_fasta {True,False}\n Default - False: Output Pyrodigal+StORF predictions in FASTA format\n -py_unstorfed {True,False}\n Default - False: Provide GFF containing original Pyrodigal predictions\n\nUR-Extractor Options:\n -gene_ident GENE_IDENT\n Identifier used for extraction of Unannotated Regions such as\n \"misc_RNA,gene,mRNA,CDS,rRNA,tRNA,tmRNA,CRISPR,ncRNA,regulatory_region,oriC,pseudo\" - To be used with \"-anno Feature_Types\" -\n \"-gene_ident Prokka\" will select features present in Prokka annotations\n -min_len MINLEN Default - 30: Minimum UR Length\n -max_len MAXLEN Default - 100,000: Maximum UR Length\n -ex_len EXLEN Default - 50: UR Extension Length\n\nStORF-Finder Options:\n -spos {True,False} Default - False: Output StORF sequences and GFF positions inclusive of first stop codon -This can break some downstream tools if changed\n to True.\n -rs {True,False} Default - True: Remove stop \"*\" from StORF amino acid sequences\n -con_storfs {True,False}\n Default - False: Output Consecutive StORFs\n -con_only {True,False}\n Default - False: Only output Consecutive StORFs\n -ps {True,False} Default - False: Partial StORFs reported\n -wc {True,False} Default - False: StORFs reported across entire sequence\n -short_storfs {False,Nolap,Olap}\n Default - False: Run StORF-Finder in \"Short-StORF\" mode. Will only return StORFs between 30 and 120 nt that do not overlap longer StORFs\n - Only works with StORFs for now. \"Nolap\" will filter Short-StORFs which areoverlapped by StORFs and Olap will report Short-StORFs which\n do overlap StORFs. Overlap is defined by \"-olap\".\n -short_storfs_only {True,False}\n Default - True. Only report Short-StORFs?\n -minorf MIN_ORF Default - 99: Minimum StORF size in nt\n -maxorf MAX_ORF Default - 60kb: Maximum StORF size in nt\n -codons STOP_CODONS Default - ('TAG,TGA,TAA'): List Stop Codons to use\n -olap_filt [{none,single-strand,both-strand}]\n Default - \"both-strand\": Filtering level \"none\" is not recommended, \"single-strand\" for single strand filtering and both-strand for both-\n strand longest-first tiling\n -start_filt {True,False}\n Default - False: Filter out StORFs without at least one of the 3 common start codons (best used for short-storfs).\n -so [{start_pos,strand}]\n Default - Start Position: How should StORFs be ordered when >1 reported in a single UR.\n -f_type [{StORF,CDS,ORF}]\n Default - \"CDS\": Which GFF feature type for StORFs to be reported as in GFF - \"CDS\" is probably needed for use in tools such as Roary and\n Panaroo\n -non_standard NON_STANDARD\n Default - 0.20: Reject StORFs with >=20% non-standard nucleotides (A,T,G,C) - Provide % as decimal\n -olap OVERLAP_NT Default - 50: Maximum number of nt of a StORF which can overlap another StORF.\n -ao ALLOWED_OVERLAP Default - 50 nt: Maximum overlap between a StORF and an original gene.\n\nMisc:\n -overwrite {True,False}\n Default - False: Overwrite StORF-Reporter output if already present\n -verbose {True,False}\n Default - False: Print out runtime messages\n -v Print out version number and exit\n\n```\n\n###################################\n\n# UR-Extractor:\n### Subpackage to extract Unannotated Regions from DNA sequences using FASTA and GFF files as input.\n\n### Menu - (UR-Extractor -h): \n```console\nUR-Extractor -f .../Test_Datasets/Matching_GFF_FASTA/E-coli.fa -gff .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff\n```\n\n```python\nusage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-aa {True,False}]\n [-lw {True,False}] [-stop_ident {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]\n [-verbose {True,False}] [-v]\n\nSingle_Genome v1.4.1: StORF-Extractor Run Parameters.\n\nRequired Arguments:\n -storf_input {Combined,Separate}\n Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?\n -p PATH Provide input files or directory path\n\nOutput:\n -gff_out {True,False}\n Default - False: Output StORFs in GFF format\n -aa {True,False} Default - False: Report StORFs as amino acid sequences\n -lw {True,False} Default - True: Line wrap FASTA sequence output at 60 chars\n -stop_ident {True,False}\n Default - True: Identify Stop Codon positions with '*'\n -oname O_NAME Default - Appends '_Extracted_StORFs' to end of input GFF filename\n -odir O_DIR Default - Same directory as input FASTA\n -gz {True,False} Default - False: Output as .gz\n\nMisc:\n -verbose {True,False}\n Default - False: Print out runtime messages\n -v Default - False: Print out version number and exit\n```\n## StORF-Finder:\n### Subpackage to extract StORFs from Fasta sequences - Works directly with the output of UR-Extractor. \n\n### Menu - (StORF-Finder -h): \n```console\nStORF-Finder -f .../Test_Datasets/Matching_GFF_FASTA/E-coli_UR.fa \n```\n\n```python\nusage: StORF_Finder.py [-h] -f FASTA [-ua {True,False}] [-wc {True,False}]\n [-ps {True,False}]\n [-olap_filt [{none,single-strand,both-strand}]]\n [-start_filt {True,False}] [-con_storfs {True,False}]\n [-con_only {True,False}]\n [-short_storfs {False,Nolap,Olap}]\n [-short_storfs_only {True,False}]\n [-f_type [{StORF,CDS,ORF}]] [-minorf MIN_ORF]\n [-maxorf MAX_ORF] [-codons STOP_CODONS]\n [-non_standard NON_STANDARD] [-olap OVERLAP_NT]\n [-s SUFFIX] [-so [{start_pos,strand}]] [-oname O_NAME]\n [-odir O_DIR] [-gff {True,False}] [-aa {True,False}]\n [-aa_only {True,False}] [-lw {True,False}]\n [-spos {True,False}] [-stop_ident {True,False}]\n [-gff_fasta {True,False}] [-gz {True,False}]\n [-verbose {True,False}] [-v]\n\nSingle_Genome v1.4.2: StORF-Finder Run Parameters.\n\nRequired Arguments:\n -f FASTA Input FASTA File - (UR_Extractor output)\n\nOptional Arguments:\n -ua {True,False} Default - Treat input as Unannotated: Use \"-ua False\"\n for standard fasta\n -wc {True,False} Default - False: StORFs reported across entire\n sequence\n -ps {True,False} Default - False: Partial StORFs reported\n -olap_filt [{none,single-strand,both-strand}]\n Default - \"both-strand\": Filtering level \"none\" is not\n recommended, \"single-strand\" for single strand\n filtering and both-strand for both-strand longest-\n first tiling\n -start_filt {True,False}\n Default - False: Filter out StORFs without at least\n one of the 3 common start codons (best used for short-\n storfs).\n -con_storfs {True,False}\n Default - False: Output Consecutive StORFs\n -con_only {True,False}\n Default - False: Only output Consecutive StORFs\n -short_storfs {False,Nolap,Olap}\n Default - False: Run StORF-Finder in \"Short-StORF\"\n mode. Will only return StORFs between 30 and 120 nt\n that do not overlap longer StORFs - Only works with\n StORFs for now. \"Nolap\" will filter Short-StORFs which\n areoverlapped by StORFs and Olap will report Short-\n StORFs which do overlap StORFs. Overlap is defined by\n \"-olap\".\n -short_storfs_only {True,False}\n Default - True. Only report Short-StORFs?\n -f_type [{StORF,CDS,ORF}]\n Default - \"StORF\": Which GFF feature type for StORFs\n to be reported as in GFF\n -minorf MIN_ORF Default - 99: Minimum StORF size in nt\n -maxorf MAX_ORF Default - 60kb: Maximum StORF size in nt\n -codons STOP_CODONS Default - ('TAG,TGA,TAA'): List Stop Codons to use\n -non_standard NON_STANDARD\n Default - 0.20: Reject StORFs with >=20% non-standard\n nucleotides (A,T,G,C) - Provide % as decimal\n -olap OVERLAP_NT Default - 50: Maximum number of nt of a StORF which\n can overlap another StORF.\n -s SUFFIX Default - Do not append suffix to genome ID\n -so [{start_pos,strand}]\n Default - Start Position: How should StORFs be ordered\n when >1 reported in a single UR.\n\nOutput:\n -oname O_NAME Default - Appends '_StORF-Finder' to end of input\n FASTA filename\n -odir O_DIR Default - Same directory as input FASTA\n -gff {True,False} Default - True: Output a GFF file\n -aa {True,False} Default - False: Report StORFs as amino acid sequences\n -aa_only {True,False}\n Default - False: Only output Amino Acid Fasta\n -lw {True,False} Default - True: Line wrap FASTA sequence output at 60\n chars\n -spos {True,False} Default - False: Output StORF sequences and GFF\n positions inclusive of first stop codon -This can\n break some downstream tools if changed to True.\n -stop_ident {True,False}\n Default - True: Identify Stop Codon positions with '*'\n -gff_fasta {True,False}\n Default - False: Report all gene sequences (nt) at the\n bottom of GFF files in Prokka output mode\n -gz {True,False} Default - False: Output as .gz\n\nMisc:\n -verbose {True,False}\n Default - False: Print out runtime messages\n -v Default - False: Print out version number and exit\n\n```\n## StORF-Extractor\nSubpackage to extract sequences reported by StORF-Reporter from a genome annotation.\n\n### Menu - (StORF-Extractor -h): \n```console\nStORF-Extractor -storf_input Combined -p .../Test_Datasets/Combined_GFFs/E-coli_Combined_StORF-Reporter_Extended.gff \n```\n\n```python\nusage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}] [-verbose {True,False}] [-v]\n\nStORF-Reporter v1.4.1: StORF-Extractor Run Parameters.\n\nRequired Arguments:\n -storf_input {Combined,Separate}\n Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?\n -p PATH Provide input file or directory path\n\nOutput:\n -gff_out {True,False}\n Default - False: Output StORFs in GFF format\n -oname O_NAME Default - Appends '_Extracted_StORFs' to end of input GFF filename\n -odir O_DIR Default - Same directory as input FASTA\n -gz {True,False} Default - False: Output as .gz\n\nMisc:\n -verbose {True,False}\n Default - False: Print out runtime messages\n -v Default - False: Print out version number and exit\n\n```\n\n## StORF-Remover\nSubpackage to remove sequences reported by StORF-Reporter without a Blast/Diamond hit (any alignment in BLAST 6 format).\n\n### Menu - (StORF-Remover -h): \n```console\nStORF-Remover -gff .../Test_Datasets/StORF_Extractor_And_Remover/Myco_UR_StORF-R.gff -blast .../Test_Datasets/StORF_Extractor_And_Remover/Myco_URs_StORFs_aa_Swiss.tab \n```\n\n```python\nusage: StORF_Remover.py [-h] [-gff GFF] [-blast BLAST] [-min_score MINSCORE] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]\n [-verbose {True,False}] [-v]\n\nStORF-Reporter v1.4.1: UR-Remover Run Parameters.\n\nRequired Arguments:\n -gff GFF GFF annotation file for the FASTA\n -blast BLAST BLAST format 6 annotation file\n\nOptional Arguments:\n -min_score MINSCORE Minimum BitScore to keep StORF: Default 30\n\nOutput:\n -oname O_NAME Default - Appends '_UR' to end of input GFF filename\n -odir O_DIR Default - Same directory as input GFF\n -gz {True,False} Default - False: Output as .gz\n\nMisc:\n -verbose {True,False}\n Default - False: Print out runtime messages\n -v Default - False: Print out version number and exit\n```\n\n\n\n## Test Datasets: \n### The directory 'Test_Datasets' contains GFF and FASTA files to test the installation and use of StORF-Reporter - Example output files are also provided for comparison. \n",
"bugtrack_url": null,
"license": null,
"summary": "StORF-Reporter - A a tool that takes an annotated genome and returns missing CDS genes (Stop-to-Stop) from unannotated regions.",
"version": "1.4.2",
"project_urls": {
"Bug Tracker": "https://github.com/NickJD/StORF-Reporter/issues",
"Homepage": "https://github.com/NickJD/StORF-Reporter"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "edeb9dcb12c85090c055631d21f465a53d9c64896eed7ac393d3e696fb791d3d",
"md5": "e40fff69ecd2e115a1c5eb92414c1a6c",
"sha256": "32230990cdca9113c3e24ffb81590c9d961a13caf5a45f088aff9a4386ca865f"
},
"downloads": -1,
"filename": "StORF_Reporter-1.4.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e40fff69ecd2e115a1c5eb92414c1a6c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 56516,
"upload_time": "2024-10-23T18:27:11",
"upload_time_iso_8601": "2024-10-23T18:27:11.279642Z",
"url": "https://files.pythonhosted.org/packages/ed/eb/9dcb12c85090c055631d21f465a53d9c64896eed7ac393d3e696fb791d3d/StORF_Reporter-1.4.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f9ee31b49d7f8af683fd12d487dc746f639c83e0ecad8a3a2cf1f4dcd7b28a83",
"md5": "1c25b04537049e8a160db84ef7a75e74",
"sha256": "99720688cfead0335173b049c1ea6205e51b22b24fffd1f49388167bd44abcc6"
},
"downloads": -1,
"filename": "storf_reporter-1.4.2.tar.gz",
"has_sig": false,
"md5_digest": "1c25b04537049e8a160db84ef7a75e74",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 55293,
"upload_time": "2024-10-23T18:27:13",
"upload_time_iso_8601": "2024-10-23T18:27:13.458459Z",
"url": "https://files.pythonhosted.org/packages/f9/ee/31b49d7f8af683fd12d487dc746f639c83e0ecad8a3a2cf1f4dcd7b28a83/storf_reporter-1.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-23 18:27:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NickJD",
"github_project": "StORF-Reporter",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "storf-reporter"
}