StORF-Reporter


NameStORF-Reporter JSON
Version 1.3.4 PyPI version JSON
download
home_pagehttps://github.com/NickJD/StORF-Reporter
SummaryStORF-Reporter - A a tool that takes an annotated genome and returns missing CDS genes (Stop-to-Stop) from unannotated regions.
upload_time2024-02-26 19:23:31
maintainer
docs_urlNone
authorNicholas Dimonaco
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # StORF-Reporter has now been published in NAR: https://doi.org/10.1093/nar/gkad814

### StORF-Reporter, a toolkit that returns missed CDS genes from the Unannotated Regions (URs) of prokaryotic genomes.

# Please use `pip3 install StORF-Reporter' to install StORF-Reporter.
### This will also install the python-standard library numpy (>=1.22.0,<1.24.0), Pyrodigal - (https://github.com/althonos/pyrodigal) and ORForise (https://github.com/NickJD/ORForise). 

### Consider using '--no-cache-dir' with pip to ensure the download of the newest version of StORF-Reporter.

## Please Note: To report Con-StORFs (Pseudogenes and genes that have alternative use of stop codons), use "-con_storfs True". To disable the reporting of StORFs use "-con_only". 

### The directory "Test_Datasets" is provided to confirm functionality of StORF-Reporter.

#############################################################
# StORF-Reporter:
<img src="./Visual_Abstract.jpg"  width="75%" height="50%"/>

## Most common use cases - 
### Supplement a current annotation from a tool such as Prokka or Bakta. A new GFF file will be created compatible with downstream pangenome analysis tools such as Roary and Panaroo.

#### For use on a single Prokka/Bakta output directory - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.
```console
StORF-Reporter -anno Prokka Out_Dir -p .../Test_Datasets/Prokka_E-coli/
```
#### For use on multiple Prokka/Bakta output directies - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.
```console
StORF-Reporter -anno Prokka Multiple_Out_Dirs -p ../Test_Datasets/Multi_Prokka_Outs
```
#### For use on a directory containing multiple Prokka/Bakta output gffs - Only produces new GFF files. 
```console
StORF-Reporter -anno Prokka Multiple_GFFs -p .../Test_Datasets/Prokka_Outputs/
```

#### For use on a GFF file from a CDS prediction tool such as Prodigal - Provide a GFF file and StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name). 
```console
StORF-Reporter -anno Feature_Types Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/Myco.gff
```

#### For use on a directory containing multiple GFF files from a CDS prediction tool such as Prodigal - StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name). 
```console
StORF-Reporter -anno Feature_Types Multiple_Genomes -p .../Test_Datasets/Matching_GFF_FASTA/
```

#### For use on a directory containing multiple GFF files with embedded FASTA. 
```console
StORF-Reporter -anno Feature_Types Multiple_Combined_GFFs -p .../Test_Datasets/Combined_GFFs/
```

#### To perform a fresh end-to-end annotation of a genome without an annotation, StORF-Reporter will use Pyrodigal to predict CDS genes and then supplement with StORFs. 
```console
StORF-Reporter -anno Pyrodigal Single_FASTA -p .../Test_Datasets/Pyrodigal/E-coli.fa
```

### Menu - (StORF-Reporter -h):
```console
StORF-Reporter -anno Ensembl Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff
```
```python
usage: StORF_Reporter.py [-h]
                         [-anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]]
                         [-p PATH] [-af ALT_FILENAME] [-oname O_NAME] [-odir O_DIR] [-sout {True,False}] [-lw {True,False}] [-aa {True,False}] [-gz {True,False}] [-py_train [{longest,individual,meta}]] [-py_fasta {True,False}]
                         [-py_unstorfed {True,False}] [-gene_ident GENE_IDENT] [-min_len MINLEN] [-max_len MAXLEN] [-ex_len EXLEN] [-spos {True,False}] [-rs {True,False}] [-con_storfs {True,False}] [-con_only {True,False}]
                         [-ps {True,False}] [-wc {True,False}] [-short_storfs {False,Nolap,Olap}] [-short_storfs_only {True,False}] [-minorf MIN_ORF] [-maxorf MAX_ORF] [-codons STOP_CODONS]
                         [-olap_filt [{none,single-strand,both-strand}]] [-start_filt {True,False}] [-so [{start_pos,strand}]] [-f_type [{StORF,CDS,ORF}]] [-olap OVERLAP_NT] [-ao ALLOWED_OVERLAP] [-overwrite {True,False}]
                         [-verbose {True,False}] [-v]

StORF-Reporter v1.3.4: StORF-Reporter Run Parameters.

Required Options:
  -anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]
                        Select Annotation and Input options for one of the 3 options listed below
                        ### Prokka/Bakta Annotation Option 1: 
                                Prokka = Report StORFs for a Prokka annotation; 
                                Bakta = Report StORFs for a Bakta annotation; 
                        --- Prokka/Bakta Input Options: 
                                Out_Dir = To provide the output directory of either a Prokka or Bakta run (will produce a new GFF and FASTA file containing original and extended annotations); 
                                Multiple_Out_Dirs = To provide a directory containing multiple Prokka/Bakta standard output directories - Will run on each sequentially; 
                                Single_GFF = To provide a single Prokka or Bakta GFF - searches for accompanying ".fna" file (will provide a new extended GFF); 
                                Multiple_GFFs = To provide a directory containing multiple Prokka or Bakta GFF files - searches for accompanying ".fna" files (will provide a new extended GFF); 
                        
                        ### Standard GFF Annotation Option 2: 
                                Ensembl = Report StORFs for an Ensembl Bacteria annotation (ID=gene); 
                                Feature_Types = Used in conjunction with -gene_ident to define features such as CDS,rRNA,tRNA for UR extraction (default CDS); 
                        --- Standard GFF Input Options: 
                                Single_Genome = To provide a single Genome - accompanying FASTA must share same name as given gff file (can be .fna, .fa or .fasta); 
                                Multiple_Genomes = To provide a directory containing multiple accompanying GFF and FASTA files - files must share the same name (fasta can be .fna, .fa or .fasta); 
                                Single_Combined_GFF = To provide a GFF file with embedded FASTA at the bottom; 
                                Multiple_Combined_GFFs = To provide a directory containing multiple GFF files with embedded FASTA at the bottom; 
                        
                        ### Complete Annotation Option 3: 
                                Pyrodigal = Run Pyrodigal then Report StORFs (provide path to single FASTA or directory of multiple FASTA files ;
                        --- Complete Annotation Input Options: 
                                Single_FASTA = To provide a single FASTA file; 
                                Multiple_FASTA = To provide a directory containing multiple FASTA files (will detect .fna,.fa,.fasta); 
                        
  -p PATH               Provide input file or directory path

StORF-Reporter Options:
  -af ALT_FILENAME      Default - Prokka/Bakta output directory share the same prefix with their gff/fna files - Use this option when Prokka/Bakta output directory name is different from the gff/fna files within and StORF-Reporter
                        will search for the gff/fna with the given prefix (MyProkkaDir/"altname".gff) - Does not work with "Multiple_Out_Dirs" option
  -oname O_NAME         Default - Appends '_StORF-Reporter_Extended' to end of input filename - Takes the directory name of Prokka/Bakta output if given as input or the input for -af if given - Multiple_* runs will be numbered
  -odir O_DIR           Default - Same directory as input
  -sout {True,False}    Default - False: Print out StORF sequences separately from Prokka/Bakta annotations
  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60 chars
  -aa {True,False}      Default - False: Report StORFs as amino acid sequences
  -gz {True,False}      Default - False: Output as .gz

Pyrodigal Options:
  -py_train [{longest,individual,meta}]
                        Default - longest: Type of model training to be done for Pyrodigal CDS prediction: Options: longest = Trains on longest contig; individual = Trains on each contig separately - runs in meta mode if contig is
                        < 20KB; meta = Runs in meta mode for all sequences
  -py_fasta {True,False}
                        Default - False: Output Pyrodigal+StORF predictions in FASTA format
  -py_unstorfed {True,False}
                        Default - False: Provide GFF containing original Pyrodigal predictions

UR-Extractor Options:
  -gene_ident GENE_IDENT
                        Identifier used for extraction of Unannotated Regions such as "misc_RNA,gene,mRNA,CDS,rRNA,tRNA,tmRNA,CRISPR,ncRNA,regulatory_region,oriC,pseudo" - To be used with "-anno Feature_Types" - "-gene_ident
                        Prokka" will select features present in Prokka annotations
  -min_len MINLEN       Default - 30: Minimum UR Length
  -max_len MAXLEN       Default - 100,000: Maximum UR Length
  -ex_len EXLEN         Default - 50: UR Extension Length

StORF-Finder Options:
  -spos {True,False}    Default - False: Output StORF positions inclusive of first stop codon
  -rs {True,False}      Default - True: Remove stop "*" from StORF amino acid sequences
  -con_storfs {True,False}
                        Default - False: Output Consecutive StORFs
  -con_only {True,False}
                        Default - False: Only output Consecutive StORFs
  -ps {True,False}      Default - False: Partial StORFs reported
  -wc {True,False}      Default - False: StORFs reported across entire sequence
  -short_storfs {False,Nolap,Olap}
                        Default - False: Run StORF-Finder in "Short-StORF" mode. Will only return StORFs between 30 and 120 nt that do not overlap longer StORFs - Only works with StORFs for now. "Nolap" will filter Short-StORFs
                        which areoverlapped by StORFs and Olap will report Short-StORFs which do overlap StORFs. Overlap is defined by "-olap".
  -short_storfs_only {True,False}
                        Default - True. Only report Short-StORFs?
  -minorf MIN_ORF       Default - 99: Minimum StORF size in nt
  -maxorf MAX_ORF       Default - 60kb: Maximum StORF size in nt
  -codons STOP_CODONS   Default - ('TAG,TGA,TAA'): List Stop Codons to use
  -olap_filt [{none,single-strand,both-strand}]
                        Default - "both-strand": Filtering level "none" is not recommended, "single-strand" for single strand filtering and both-strand for both-strand longest-first tiling
  -start_filt {True,False}
                        Default - False: Filter out StORFs without at least one of the 3 common start codons (best used for short-storfs).
  -so [{start_pos,strand}]
                        Default - Start Position: How should StORFs be ordered when >1 reported in a single UR.
  -f_type [{StORF,CDS,ORF}]
                        Default - "CDS": Which GFF feature type for StORFs to be reported as in GFF - "CDS" is probably needed for use in tools such as Roary and Panaroo
  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which can overlap another StORF.
  -ao ALLOWED_OVERLAP   Default - 50 nt: Maximum overlap between a StORF and an original gene.

Misc:
  -overwrite {True,False}
                        Default - False: Overwrite StORF-Reporter output if already present
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Print out version number and exit

```

###################################

# UR-Extractor:
### Subpackage to extract Unannotated Regions from DNA sequences using FASTA and GFF files as input.

### Menu - (UR-Extractor -h):  
```console
UR-Extractor -f .../Test_Datasets/Matching_GFF_FASTA/E-coli.fa -gff .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff
```

```python
usage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-aa {True,False}]
                          [-lw {True,False}] [-stop_ident {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]
                          [-verbose {True,False}] [-v]

Single_Genome v1.3.4: StORF-Extractor Run Parameters.

Required Arguments:
  -storf_input {Combined,Separate}
                        Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?
  -p PATH               Provide input files or directory path

Output:
  -gff_out {True,False}
                        Default - False: Output StORFs in GFF format
  -aa {True,False}      Default - False: Report StORFs as amino acid sequences
  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60 chars
  -stop_ident {True,False}
                        Default - True: Identify Stop Codon positions with '*'
  -oname O_NAME         Default - Appends '_Extracted_StORFs' to end of input GFF filename
  -odir O_DIR           Default - Same directory as input FASTA
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit
```
## StORF-Finder:
### Subpackage to extract StORFs from Fasta sequences - Works directly with the output of UR-Extractor.  

### Menu - (StORF-Finder -h):   
```console
StORF-Finder -f .../Test_Datasets/Matching_GFF_FASTA/E-coli_UR.fa 
```

```python
usage: StORF_Finder.py [-h] [-f FASTA] [-ua {True,False}] [-wc {True,False}] [-ps {True,False}] [-olap_filt [{none,single-strand,both-strand}]] [-start_filt {True,False}] [-con_storfs {True,False}] [-con_only {True,False}] [-short_storfs {False,Nolap,Olap}] [-short_storfs_only {True,False}]
                       [-stop_ident {True,False}] [-f_type [{StORF,CDS,ORF}]] [-minorf MIN_ORF] [-maxorf MAX_ORF] [-codons STOP_CODONS] [-olap OVERLAP_NT] [-s SUFFIX] [-so [{start_pos,strand}]] [-spos {True,False}] [-oname O_NAME] [-odir O_DIR] [-gff {True,False}] [-aa {True,False}] [-aa_only {True,False}]
                       [-lw {True,False}] [-gff_fasta {True,False}] [-gz {True,False}] [-verbose {True,False}] [-v]

StORF-Reporter v1.3.4: StORF-Finder Run Parameters.

Required Arguments:
  -f FASTA              Input FASTA File - (UR_Extractor output)

Optional Arguments:
  -ua {True,False}      Default - Treat input as Unannotated: Use "-ua False" for standard fasta
  -wc {True,False}      Default - False: StORFs reported across entire sequence
  -ps {True,False}      Default - False: Partial StORFs reported
  -olap_filt [{none,single-strand,both-strand}]
                        Default - "both-strand": Filtering level "none" is not recommended, "single-strand" for single strand filtering and both-strand for both-strand longest-first tiling
  -start_filt {True,False}
                        Default - False: Filter out StORFs without at least one of the 3 common start codons (best used for short-storfs).
  -con_storfs {True,False}
                        Default - False: Output Consecutive StORFs
  -con_only {True,False}
                        Default - False: Only output Consecutive StORFs
  -short_storfs {False,Nolap,Olap}
                        Default - False: Run StORF-Finder in "Short-StORF" mode. Will only return StORFs between 30 and 120 nt that do not overlap longer StORFs - Only works with StORFs for now. "Nolap" will filter Short-StORFs which areoverlapped by StORFs and Olap will report Short-StORFs which do overlap StORFs.
                        Overlap is defined by "-olap".
  -short_storfs_only {True,False}
                        Default - True. Only report Short-StORFs?
  -stop_ident {True,False}
                        Default - True: Identify Stop Codon positions with '*'
  -f_type [{StORF,CDS,ORF}]
                        Default - "StORF": Which GFF feature type for StORFs to be reported as in GFF
  -minorf MIN_ORF       Default - 99: Minimum StORF size in nt
  -maxorf MAX_ORF       Default - 60kb: Maximum StORF size in nt
  -codons STOP_CODONS   Default - ('TAG,TGA,TAA'): List Stop Codons to use
  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which can overlap another StORF.
  -s SUFFIX             Default - Do not append suffix to genome ID
  -so [{start_pos,strand}]
                        Default - Start Position: How should StORFs be ordered when >1 reported in a single UR.
  -spos {True,False}    Default - False: Print out StORF positions inclusive of first stop codon

Output:
  -oname O_NAME         Default - Appends '_StORF-R' to end of input FASTA filename
  -odir O_DIR           Default - Same directory as input FASTA
  -gff {True,False}     Default - True: Output a GFF file
  -aa {True,False}      Default - False: Report StORFs as amino acid sequences
  -aa_only {True,False}
                        Default - False: Only output Amino Acid Fasta
  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60 chars
  -gff_fasta {True,False}
                        Default - False: Report all gene sequences (nt) at the bottom of GFF files in Prokka output mode
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit

```
## StORF-Extractor
Subpackage to extract sequences reported by StORF-Reporter from a genome annotation.

### Menu - (StORF-Extractor -h):   
```console
StORF-Extractor -storf_input Combined -p .../Test_Datasets/Combined_GFFs/E-coli_Combined_StORF-Reporter_Extended.gff 
```

```python
usage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}] [-verbose {True,False}] [-v]

StORF-Reporter v1.3.4: StORF-Extractor Run Parameters.

Required Arguments:
  -storf_input {Combined,Separate}
                        Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?
  -p PATH               Provide input file or directory path

Output:
  -gff_out {True,False}
                        Default - False: Output StORFs in GFF format
  -oname O_NAME         Default - Appends '_Extracted_StORFs' to end of input GFF filename
  -odir O_DIR           Default - Same directory as input FASTA
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit

```

## StORF-Remover
Subpackage to remove sequences reported by StORF-Reporter without a Blast/Diamond hit (any alignment in BLAST 6 format).

### Menu - (StORF-Remover -h):   
```console
StORF-Remover -gff .../Test_Datasets/StORF_Extractor_And_Remover/Myco_UR_StORF-R.gff -blast .../Test_Datasets/StORF_Extractor_And_Remover/Myco_URs_StORFs_aa_Swiss.tab 
```

```python
usage: StORF_Remover.py [-h] [-gff GFF] [-blast BLAST] [-min_score MINSCORE] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]
                        [-verbose {True,False}] [-v]

StORF-Reporter v1.3.4: UR-Remover Run Parameters.

Required Arguments:
  -gff GFF              GFF annotation file for the FASTA
  -blast BLAST          BLAST format 6 annotation file

Optional Arguments:
  -min_score MINSCORE   Minimum BitScore to keep StORF: Default 30

Output:
  -oname O_NAME         Default - Appends '_UR' to end of input GFF filename
  -odir O_DIR           Default - Same directory as input GFF
  -gz {True,False}      Default - False: Output as .gz

Misc:
  -verbose {True,False}
                        Default - False: Print out runtime messages
  -v                    Default - False: Print out version number and exit
```



## Test Datasets: 
### The directory 'Test_Datasets' contains GFF and FASTA files to test the installation and use of StORF-Reporter - Example output files are also provided for comparison. 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/NickJD/StORF-Reporter",
    "name": "StORF-Reporter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Nicholas Dimonaco",
    "author_email": "nicholas@dimonaco.co.uk",
    "download_url": "https://files.pythonhosted.org/packages/99/d0/b787189f5f5049f4afe44d20c4b3c2a2722ed4ac8ac0a554d79e2f716265/StORF-Reporter-1.3.4.tar.gz",
    "platform": null,
    "description": "# StORF-Reporter has now been published in NAR: https://doi.org/10.1093/nar/gkad814\n\n### StORF-Reporter, a toolkit that returns missed CDS genes from the Unannotated Regions (URs) of prokaryotic genomes.\n\n# Please use `pip3 install StORF-Reporter' to install StORF-Reporter.\n### This will also install the python-standard library numpy (>=1.22.0,<1.24.0), Pyrodigal - (https://github.com/althonos/pyrodigal) and ORForise (https://github.com/NickJD/ORForise). \n\n### Consider using '--no-cache-dir' with pip to ensure the download of the newest version of StORF-Reporter.\n\n## Please Note: To report Con-StORFs (Pseudogenes and genes that have alternative use of stop codons), use \"-con_storfs True\". To disable the reporting of StORFs use \"-con_only\". \n\n### The directory \"Test_Datasets\" is provided to confirm functionality of StORF-Reporter.\n\n#############################################################\n# StORF-Reporter:\n<img src=\"./Visual_Abstract.jpg\"  width=\"75%\" height=\"50%\"/>\n\n## Most common use cases - \n### Supplement a current annotation from a tool such as Prokka or Bakta. A new GFF file will be created compatible with downstream pangenome analysis tools such as Roary and Panaroo.\n\n#### For use on a single Prokka/Bakta output directory - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.\n```console\nStORF-Reporter -anno Prokka Out_Dir -p .../Test_Datasets/Prokka_E-coli/\n```\n#### For use on multiple Prokka/Bakta output directies - Will also create a new fasta file with Prokka/Bakta genes and StORF sequences.\n```console\nStORF-Reporter -anno Prokka Multiple_Out_Dirs -p ../Test_Datasets/Multi_Prokka_Outs\n```\n#### For use on a directory containing multiple Prokka/Bakta output gffs - Only produces new GFF files. \n```console\nStORF-Reporter -anno Prokka Multiple_GFFs -p .../Test_Datasets/Prokka_Outputs/\n```\n\n#### For use on a GFF file from a CDS prediction tool such as Prodigal - Provide a GFF file and StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name). \n```console\nStORF-Reporter -anno Feature_Types Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/Myco.gff\n```\n\n#### For use on a directory containing multiple GFF files from a CDS prediction tool such as Prodigal - StORF-Reporter will find the matching .fa/.fasta/.fna (must have the same name). \n```console\nStORF-Reporter -anno Feature_Types Multiple_Genomes -p .../Test_Datasets/Matching_GFF_FASTA/\n```\n\n#### For use on a directory containing multiple GFF files with embedded FASTA. \n```console\nStORF-Reporter -anno Feature_Types Multiple_Combined_GFFs -p .../Test_Datasets/Combined_GFFs/\n```\n\n#### To perform a fresh end-to-end annotation of a genome without an annotation, StORF-Reporter will use Pyrodigal to predict CDS genes and then supplement with StORFs. \n```console\nStORF-Reporter -anno Pyrodigal Single_FASTA -p .../Test_Datasets/Pyrodigal/E-coli.fa\n```\n\n### Menu - (StORF-Reporter -h):\n```console\nStORF-Reporter -anno Ensembl Single_Genome -p .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff\n```\n```python\nusage: StORF_Reporter.py [-h]\n                         [-anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]]\n                         [-p PATH] [-af ALT_FILENAME] [-oname O_NAME] [-odir O_DIR] [-sout {True,False}] [-lw {True,False}] [-aa {True,False}] [-gz {True,False}] [-py_train [{longest,individual,meta}]] [-py_fasta {True,False}]\n                         [-py_unstorfed {True,False}] [-gene_ident GENE_IDENT] [-min_len MINLEN] [-max_len MAXLEN] [-ex_len EXLEN] [-spos {True,False}] [-rs {True,False}] [-con_storfs {True,False}] [-con_only {True,False}]\n                         [-ps {True,False}] [-wc {True,False}] [-short_storfs {False,Nolap,Olap}] [-short_storfs_only {True,False}] [-minorf MIN_ORF] [-maxorf MAX_ORF] [-codons STOP_CODONS]\n                         [-olap_filt [{none,single-strand,both-strand}]] [-start_filt {True,False}] [-so [{start_pos,strand}]] [-f_type [{StORF,CDS,ORF}]] [-olap OVERLAP_NT] [-ao ALLOWED_OVERLAP] [-overwrite {True,False}]\n                         [-verbose {True,False}] [-v]\n\nStORF-Reporter v1.3.4: StORF-Reporter Run Parameters.\n\nRequired Options:\n  -anno [{Prokka,Bakta,Out_Dir,Multiple_Out_Dirs,Single_GFF,Multiple_GFFs,Ensembl,Feature_Types,Single_Genome,Multiple_Genomes,Single_Combined_GFF,Multiple_Combined_GFFs,Pyrodigal,Single_FASTA,Multiple_FASTA} ...]\n                        Select Annotation and Input options for one of the 3 options listed below\n                        ### Prokka/Bakta Annotation Option 1: \n                                Prokka = Report StORFs for a Prokka annotation; \n                                Bakta = Report StORFs for a Bakta annotation; \n                        --- Prokka/Bakta Input Options: \n                                Out_Dir = To provide the output directory of either a Prokka or Bakta run (will produce a new GFF and FASTA file containing original and extended annotations); \n                                Multiple_Out_Dirs = To provide a directory containing multiple Prokka/Bakta standard output directories - Will run on each sequentially; \n                                Single_GFF = To provide a single Prokka or Bakta GFF - searches for accompanying \".fna\" file (will provide a new extended GFF); \n                                Multiple_GFFs = To provide a directory containing multiple Prokka or Bakta GFF files - searches for accompanying \".fna\" files (will provide a new extended GFF); \n                        \n                        ### Standard GFF Annotation Option 2: \n                                Ensembl = Report StORFs for an Ensembl Bacteria annotation (ID=gene); \n                                Feature_Types = Used in conjunction with -gene_ident to define features such as CDS,rRNA,tRNA for UR extraction (default CDS); \n                        --- Standard GFF Input Options: \n                                Single_Genome = To provide a single Genome - accompanying FASTA must share same name as given gff file (can be .fna, .fa or .fasta); \n                                Multiple_Genomes = To provide a directory containing multiple accompanying GFF and FASTA files - files must share the same name (fasta can be .fna, .fa or .fasta); \n                                Single_Combined_GFF = To provide a GFF file with embedded FASTA at the bottom; \n                                Multiple_Combined_GFFs = To provide a directory containing multiple GFF files with embedded FASTA at the bottom; \n                        \n                        ### Complete Annotation Option 3: \n                                Pyrodigal = Run Pyrodigal then Report StORFs (provide path to single FASTA or directory of multiple FASTA files ;\n                        --- Complete Annotation Input Options: \n                                Single_FASTA = To provide a single FASTA file; \n                                Multiple_FASTA = To provide a directory containing multiple FASTA files (will detect .fna,.fa,.fasta); \n                        \n  -p PATH               Provide input file or directory path\n\nStORF-Reporter Options:\n  -af ALT_FILENAME      Default - Prokka/Bakta output directory share the same prefix with their gff/fna files - Use this option when Prokka/Bakta output directory name is different from the gff/fna files within and StORF-Reporter\n                        will search for the gff/fna with the given prefix (MyProkkaDir/\"altname\".gff) - Does not work with \"Multiple_Out_Dirs\" option\n  -oname O_NAME         Default - Appends '_StORF-Reporter_Extended' to end of input filename - Takes the directory name of Prokka/Bakta output if given as input or the input for -af if given - Multiple_* runs will be numbered\n  -odir O_DIR           Default - Same directory as input\n  -sout {True,False}    Default - False: Print out StORF sequences separately from Prokka/Bakta annotations\n  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60 chars\n  -aa {True,False}      Default - False: Report StORFs as amino acid sequences\n  -gz {True,False}      Default - False: Output as .gz\n\nPyrodigal Options:\n  -py_train [{longest,individual,meta}]\n                        Default - longest: Type of model training to be done for Pyrodigal CDS prediction: Options: longest = Trains on longest contig; individual = Trains on each contig separately - runs in meta mode if contig is\n                        < 20KB; meta = Runs in meta mode for all sequences\n  -py_fasta {True,False}\n                        Default - False: Output Pyrodigal+StORF predictions in FASTA format\n  -py_unstorfed {True,False}\n                        Default - False: Provide GFF containing original Pyrodigal predictions\n\nUR-Extractor Options:\n  -gene_ident GENE_IDENT\n                        Identifier used for extraction of Unannotated Regions such as \"misc_RNA,gene,mRNA,CDS,rRNA,tRNA,tmRNA,CRISPR,ncRNA,regulatory_region,oriC,pseudo\" - To be used with \"-anno Feature_Types\" - \"-gene_ident\n                        Prokka\" will select features present in Prokka annotations\n  -min_len MINLEN       Default - 30: Minimum UR Length\n  -max_len MAXLEN       Default - 100,000: Maximum UR Length\n  -ex_len EXLEN         Default - 50: UR Extension Length\n\nStORF-Finder Options:\n  -spos {True,False}    Default - False: Output StORF positions inclusive of first stop codon\n  -rs {True,False}      Default - True: Remove stop \"*\" from StORF amino acid sequences\n  -con_storfs {True,False}\n                        Default - False: Output Consecutive StORFs\n  -con_only {True,False}\n                        Default - False: Only output Consecutive StORFs\n  -ps {True,False}      Default - False: Partial StORFs reported\n  -wc {True,False}      Default - False: StORFs reported across entire sequence\n  -short_storfs {False,Nolap,Olap}\n                        Default - False: Run StORF-Finder in \"Short-StORF\" mode. Will only return StORFs between 30 and 120 nt that do not overlap longer StORFs - Only works with StORFs for now. \"Nolap\" will filter Short-StORFs\n                        which areoverlapped by StORFs and Olap will report Short-StORFs which do overlap StORFs. Overlap is defined by \"-olap\".\n  -short_storfs_only {True,False}\n                        Default - True. Only report Short-StORFs?\n  -minorf MIN_ORF       Default - 99: Minimum StORF size in nt\n  -maxorf MAX_ORF       Default - 60kb: Maximum StORF size in nt\n  -codons STOP_CODONS   Default - ('TAG,TGA,TAA'): List Stop Codons to use\n  -olap_filt [{none,single-strand,both-strand}]\n                        Default - \"both-strand\": Filtering level \"none\" is not recommended, \"single-strand\" for single strand filtering and both-strand for both-strand longest-first tiling\n  -start_filt {True,False}\n                        Default - False: Filter out StORFs without at least one of the 3 common start codons (best used for short-storfs).\n  -so [{start_pos,strand}]\n                        Default - Start Position: How should StORFs be ordered when >1 reported in a single UR.\n  -f_type [{StORF,CDS,ORF}]\n                        Default - \"CDS\": Which GFF feature type for StORFs to be reported as in GFF - \"CDS\" is probably needed for use in tools such as Roary and Panaroo\n  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which can overlap another StORF.\n  -ao ALLOWED_OVERLAP   Default - 50 nt: Maximum overlap between a StORF and an original gene.\n\nMisc:\n  -overwrite {True,False}\n                        Default - False: Overwrite StORF-Reporter output if already present\n  -verbose {True,False}\n                        Default - False: Print out runtime messages\n  -v                    Print out version number and exit\n\n```\n\n###################################\n\n# UR-Extractor:\n### Subpackage to extract Unannotated Regions from DNA sequences using FASTA and GFF files as input.\n\n### Menu - (UR-Extractor -h):  \n```console\nUR-Extractor -f .../Test_Datasets/Matching_GFF_FASTA/E-coli.fa -gff .../Test_Datasets/Matching_GFF_FASTA/E-coli.gff\n```\n\n```python\nusage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-aa {True,False}]\n                          [-lw {True,False}] [-stop_ident {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]\n                          [-verbose {True,False}] [-v]\n\nSingle_Genome v1.3.4: StORF-Extractor Run Parameters.\n\nRequired Arguments:\n  -storf_input {Combined,Separate}\n                        Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?\n  -p PATH               Provide input files or directory path\n\nOutput:\n  -gff_out {True,False}\n                        Default - False: Output StORFs in GFF format\n  -aa {True,False}      Default - False: Report StORFs as amino acid sequences\n  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60 chars\n  -stop_ident {True,False}\n                        Default - True: Identify Stop Codon positions with '*'\n  -oname O_NAME         Default - Appends '_Extracted_StORFs' to end of input GFF filename\n  -odir O_DIR           Default - Same directory as input FASTA\n  -gz {True,False}      Default - False: Output as .gz\n\nMisc:\n  -verbose {True,False}\n                        Default - False: Print out runtime messages\n  -v                    Default - False: Print out version number and exit\n```\n## StORF-Finder:\n### Subpackage to extract StORFs from Fasta sequences - Works directly with the output of UR-Extractor.  \n\n### Menu - (StORF-Finder -h):   \n```console\nStORF-Finder -f .../Test_Datasets/Matching_GFF_FASTA/E-coli_UR.fa \n```\n\n```python\nusage: StORF_Finder.py [-h] [-f FASTA] [-ua {True,False}] [-wc {True,False}] [-ps {True,False}] [-olap_filt [{none,single-strand,both-strand}]] [-start_filt {True,False}] [-con_storfs {True,False}] [-con_only {True,False}] [-short_storfs {False,Nolap,Olap}] [-short_storfs_only {True,False}]\n                       [-stop_ident {True,False}] [-f_type [{StORF,CDS,ORF}]] [-minorf MIN_ORF] [-maxorf MAX_ORF] [-codons STOP_CODONS] [-olap OVERLAP_NT] [-s SUFFIX] [-so [{start_pos,strand}]] [-spos {True,False}] [-oname O_NAME] [-odir O_DIR] [-gff {True,False}] [-aa {True,False}] [-aa_only {True,False}]\n                       [-lw {True,False}] [-gff_fasta {True,False}] [-gz {True,False}] [-verbose {True,False}] [-v]\n\nStORF-Reporter v1.3.4: StORF-Finder Run Parameters.\n\nRequired Arguments:\n  -f FASTA              Input FASTA File - (UR_Extractor output)\n\nOptional Arguments:\n  -ua {True,False}      Default - Treat input as Unannotated: Use \"-ua False\" for standard fasta\n  -wc {True,False}      Default - False: StORFs reported across entire sequence\n  -ps {True,False}      Default - False: Partial StORFs reported\n  -olap_filt [{none,single-strand,both-strand}]\n                        Default - \"both-strand\": Filtering level \"none\" is not recommended, \"single-strand\" for single strand filtering and both-strand for both-strand longest-first tiling\n  -start_filt {True,False}\n                        Default - False: Filter out StORFs without at least one of the 3 common start codons (best used for short-storfs).\n  -con_storfs {True,False}\n                        Default - False: Output Consecutive StORFs\n  -con_only {True,False}\n                        Default - False: Only output Consecutive StORFs\n  -short_storfs {False,Nolap,Olap}\n                        Default - False: Run StORF-Finder in \"Short-StORF\" mode. Will only return StORFs between 30 and 120 nt that do not overlap longer StORFs - Only works with StORFs for now. \"Nolap\" will filter Short-StORFs which areoverlapped by StORFs and Olap will report Short-StORFs which do overlap StORFs.\n                        Overlap is defined by \"-olap\".\n  -short_storfs_only {True,False}\n                        Default - True. Only report Short-StORFs?\n  -stop_ident {True,False}\n                        Default - True: Identify Stop Codon positions with '*'\n  -f_type [{StORF,CDS,ORF}]\n                        Default - \"StORF\": Which GFF feature type for StORFs to be reported as in GFF\n  -minorf MIN_ORF       Default - 99: Minimum StORF size in nt\n  -maxorf MAX_ORF       Default - 60kb: Maximum StORF size in nt\n  -codons STOP_CODONS   Default - ('TAG,TGA,TAA'): List Stop Codons to use\n  -olap OVERLAP_NT      Default - 50: Maximum number of nt of a StORF which can overlap another StORF.\n  -s SUFFIX             Default - Do not append suffix to genome ID\n  -so [{start_pos,strand}]\n                        Default - Start Position: How should StORFs be ordered when >1 reported in a single UR.\n  -spos {True,False}    Default - False: Print out StORF positions inclusive of first stop codon\n\nOutput:\n  -oname O_NAME         Default - Appends '_StORF-R' to end of input FASTA filename\n  -odir O_DIR           Default - Same directory as input FASTA\n  -gff {True,False}     Default - True: Output a GFF file\n  -aa {True,False}      Default - False: Report StORFs as amino acid sequences\n  -aa_only {True,False}\n                        Default - False: Only output Amino Acid Fasta\n  -lw {True,False}      Default - True: Line wrap FASTA sequence output at 60 chars\n  -gff_fasta {True,False}\n                        Default - False: Report all gene sequences (nt) at the bottom of GFF files in Prokka output mode\n  -gz {True,False}      Default - False: Output as .gz\n\nMisc:\n  -verbose {True,False}\n                        Default - False: Print out runtime messages\n  -v                    Default - False: Print out version number and exit\n\n```\n## StORF-Extractor\nSubpackage to extract sequences reported by StORF-Reporter from a genome annotation.\n\n### Menu - (StORF-Extractor -h):   \n```console\nStORF-Extractor -storf_input Combined -p .../Test_Datasets/Combined_GFFs/E-coli_Combined_StORF-Reporter_Extended.gff \n```\n\n```python\nusage: StORF_Extractor.py [-h] [-storf_input {Combined,Separate}] [-p PATH] [-gff_out {True,False}] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}] [-verbose {True,False}] [-v]\n\nStORF-Reporter v1.3.4: StORF-Extractor Run Parameters.\n\nRequired Arguments:\n  -storf_input {Combined,Separate}\n                        Are StORFs to be extracted from Combined GFF/FASTA or Separate GFF/FASTA files?\n  -p PATH               Provide input file or directory path\n\nOutput:\n  -gff_out {True,False}\n                        Default - False: Output StORFs in GFF format\n  -oname O_NAME         Default - Appends '_Extracted_StORFs' to end of input GFF filename\n  -odir O_DIR           Default - Same directory as input FASTA\n  -gz {True,False}      Default - False: Output as .gz\n\nMisc:\n  -verbose {True,False}\n                        Default - False: Print out runtime messages\n  -v                    Default - False: Print out version number and exit\n\n```\n\n## StORF-Remover\nSubpackage to remove sequences reported by StORF-Reporter without a Blast/Diamond hit (any alignment in BLAST 6 format).\n\n### Menu - (StORF-Remover -h):   \n```console\nStORF-Remover -gff .../Test_Datasets/StORF_Extractor_And_Remover/Myco_UR_StORF-R.gff -blast .../Test_Datasets/StORF_Extractor_And_Remover/Myco_URs_StORFs_aa_Swiss.tab \n```\n\n```python\nusage: StORF_Remover.py [-h] [-gff GFF] [-blast BLAST] [-min_score MINSCORE] [-oname O_NAME] [-odir O_DIR] [-gz {True,False}]\n                        [-verbose {True,False}] [-v]\n\nStORF-Reporter v1.3.4: UR-Remover Run Parameters.\n\nRequired Arguments:\n  -gff GFF              GFF annotation file for the FASTA\n  -blast BLAST          BLAST format 6 annotation file\n\nOptional Arguments:\n  -min_score MINSCORE   Minimum BitScore to keep StORF: Default 30\n\nOutput:\n  -oname O_NAME         Default - Appends '_UR' to end of input GFF filename\n  -odir O_DIR           Default - Same directory as input GFF\n  -gz {True,False}      Default - False: Output as .gz\n\nMisc:\n  -verbose {True,False}\n                        Default - False: Print out runtime messages\n  -v                    Default - False: Print out version number and exit\n```\n\n\n\n## Test Datasets: \n### The directory 'Test_Datasets' contains GFF and FASTA files to test the installation and use of StORF-Reporter - Example output files are also provided for comparison. \n",
    "bugtrack_url": null,
    "license": "",
    "summary": "StORF-Reporter - A a tool that takes an annotated genome and returns missing CDS genes (Stop-to-Stop) from unannotated regions.",
    "version": "1.3.4",
    "project_urls": {
        "Bug Tracker": "https://github.com/NickJD/StORF-Reporter/issues",
        "Homepage": "https://github.com/NickJD/StORF-Reporter"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18a1a051f88c7663ab6d645a24c667bbabba36c97d4db6f582b21fb103e0a9ca",
                "md5": "b1f257dfabd23ed285b124f09b76a15f",
                "sha256": "ac385eecb2fdc7d36fd72c8cad5029d7050b47f3ae88d49500dda1a8c21a41e1"
            },
            "downloads": -1,
            "filename": "StORF_Reporter-1.3.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b1f257dfabd23ed285b124f09b76a15f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 55141,
            "upload_time": "2024-02-26T19:23:28",
            "upload_time_iso_8601": "2024-02-26T19:23:28.144694Z",
            "url": "https://files.pythonhosted.org/packages/18/a1/a051f88c7663ab6d645a24c667bbabba36c97d4db6f582b21fb103e0a9ca/StORF_Reporter-1.3.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "99d0b787189f5f5049f4afe44d20c4b3c2a2722ed4ac8ac0a554d79e2f716265",
                "md5": "38da2417db6905b60e697eeaced8343b",
                "sha256": "2ad94f9f8a8fcffccf864ac66688309a398787b13e2748929a9c1d5dc4e34421"
            },
            "downloads": -1,
            "filename": "StORF-Reporter-1.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "38da2417db6905b60e697eeaced8343b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 53710,
            "upload_time": "2024-02-26T19:23:31",
            "upload_time_iso_8601": "2024-02-26T19:23:31.237223Z",
            "url": "https://files.pythonhosted.org/packages/99/d0/b787189f5f5049f4afe44d20c4b3c2a2722ed4ac8ac0a554d79e2f716265/StORF-Reporter-1.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-26 19:23:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NickJD",
    "github_project": "StORF-Reporter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "storf-reporter"
}
        
Elapsed time: 0.18925s