# ncRNAFinder
ncRNAFinder is an automatic and scalable system for large-scale data annotation analysis of ncRNAs which use both sequence and structural search strategy for ncRNA annotation.
## Install
To use the ncRNAFinder, it is necessary to install some dependencies and databases. First, the necessary tools are [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) (version 2.15.0) and [INFERNAL](http://eddylab.org/infernal/) (1.1.5). Second, the databases needed are [RNAcentral](https://rnacentral.org) (version 24) and [Rfam](https://rfam.org) (version 14.10). Lastly, the Python libraries required are biopython, joblib, matplotlib, matplotlib_venn, numpy and pandas.
> [!IMPORTANT]
> The download of the RNAcentral database takes some time, almost 7 hours.
## Requirements
- [Python](https://www.python.org)
- [Biopython](https://biopython.org)
- [Joblib](https://joblib.readthedocs.io/en/stable/)
- [Matplotlib](https://matplotlib.org)
- [Matplotlib_venn](https://pypi.org/project/matplotlib-venn/)
- [Numpy](https://numpy.org)
- [Pandas](https://pandas.pydata.org)
- [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi)
- [INFERNAL](http://eddylab.org/infernal/)
> [!NOTE]
> All Python libraries are automatically installed. The BLAST and INFERNAL tools should be download by the user and add to the PATH.
## Usage
To execute the tool, simply use the following command:
~~~
import ncRNAFinder as nf
nf.ncRNAFinder(input_file, output_name, pident, coverage, threads, BestHit)
or
annotation = nf.ncRNAFinder(input_file, output_name, pident, coverage, threads, BestHit)
~~~
### Mandatory parameters:
~~~
input_file <file_name> Input file in FASTA format
output_name <output_name> Output name to save the results
~~~
### Optional parameters:
~~~
pident <integer> Minimun percentage of identity of BLASTn. (Default: 95)
coverage <integer> Minimun percentage of coverage of BLASTn. (Default: 95)
threads <integer> Number of threads. (Default: 1)
BestHit <1|0> Option to filter only the best result between two strands, based on E-value), 1-yes or 0-no. (Default: 1)
~~~
## Output
The ncRNAFinder function outputs the annotation in format of dataframe. Besides that, it automatically outputs the annotation in GFF and CSV formats, along with a text file containing the IDs with the original annotation from each tool (BLAST and INFERNAL), a table with the number of ncRNAs annotated, and three graphs: (i) a bar plot showing the number of ncRNAs annotated, (ii) a Venn diagram with the number of ncRNAs identified by each tool (BLAST and INFERNAL), and (iii) a stacked bar plot with the number of ncRNAs identified in each chromosome. Additionally, the ncRNAFinder generates exclusive outputs for miRNA and tRNA, including their annotations in CSV and GFF formats, their sequences, and a table with each type.
```
<output_name>/
├── <output_name>_annotation.gff
├── <output_name>_annotation.csv
├── <output_name>_ID_annotation.txt
├── <output_name>_sequences_ncRNA.fa
├── <output_name>_Table_quantity_ncRNAs.csv
├── Figures/
│ ├── <output_name>_BarPlot.png
│ ├── <output_name>_DiagramaVeen.png
│ └── <output_name>_StackedPlot.png
├── miRNA/
│ ├── <output_name>_miRNA_annotation.gff
│ ├── <output_name>_miRNAs.csv
│ ├── <output_name>_miRNA_sequences_ncRNA.fa
│ └── <output_name>_miRNA_Table_quantity_ncRNAs.csv
└── tRNA/
├── <output_name>_tRNA_annotation.gff
├── <output_name>_tRNAs.csv
├── <output_name>_tRNA_sequences_ncRNA.fa
└── <output_name>_tRNA_Table_quantity_ncRNAs.csv
```
## Reference
## Contact
To report bugs, to ask for help and to give any feedback, please contact Alexandre R. Paschoal (paschoal@utfpr.edu.br) or Vitor Gregorio (vitor-gregorio@hotmail.com).
Raw data
{
"_id": null,
"home_page": "https://github.com/user/reponame",
"name": "ncRNAFinder",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "ncRNA, annotation, BLAST, INFERNAL, structure",
"author": "gregoriovitor",
"author_email": "vitor-gregorio@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/2d/01/eb18e14e96ae348665aa03d6cb5b54fde2519b87aec8efca7355501d870a/ncRNAFinder-0.0.5.tar.gz",
"platform": null,
"description": "# ncRNAFinder\nncRNAFinder is an automatic and scalable system for large-scale data annotation analysis of ncRNAs which use both sequence and structural search strategy for ncRNA annotation.\n\n## Install\nTo use the ncRNAFinder, it is necessary to install some dependencies and databases. First, the necessary tools are [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) (version 2.15.0) and [INFERNAL](http://eddylab.org/infernal/) (1.1.5). Second, the databases needed are [RNAcentral](https://rnacentral.org) (version 24) and [Rfam](https://rfam.org) (version 14.10). Lastly, the Python libraries required are biopython, joblib, matplotlib, matplotlib_venn, numpy and pandas. \n\n> [!IMPORTANT]\n> The download of the RNAcentral database takes some time, almost 7 hours.\n\n## Requirements\n- [Python](https://www.python.org)\n - [Biopython](https://biopython.org)\n - [Joblib](https://joblib.readthedocs.io/en/stable/)\n - [Matplotlib](https://matplotlib.org)\n - [Matplotlib_venn](https://pypi.org/project/matplotlib-venn/)\n - [Numpy](https://numpy.org)\n - [Pandas](https://pandas.pydata.org)\n- [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi)\n- [INFERNAL](http://eddylab.org/infernal/)\n\n> [!NOTE]\n> All Python libraries are automatically installed. The BLAST and INFERNAL tools should be download by the user and add to the PATH.\n\n## Usage\nTo execute the tool, simply use the following command:\n~~~\nimport ncRNAFinder as nf\nnf.ncRNAFinder(input_file, output_name, pident, coverage, threads, BestHit)\n\nor\n\nannotation = nf.ncRNAFinder(input_file, output_name, pident, coverage, threads, BestHit)\n~~~\n\n### Mandatory parameters:\n~~~\ninput_file <file_name> Input file in FASTA format\n\noutput_name <output_name> Output name to save the results\n~~~\n\n### Optional parameters:\n~~~\npident <integer> Minimun percentage of identity of BLASTn. (Default: 95)\n\ncoverage <integer> Minimun percentage of coverage of BLASTn. (Default: 95)\n\nthreads <integer> Number of threads. (Default: 1)\n\nBestHit <1|0> Option to filter only the best result between two strands, based on E-value), 1-yes or 0-no. (Default: 1)\n~~~\n\n## Output\nThe ncRNAFinder function outputs the annotation in format of dataframe. Besides that, it automatically outputs the annotation in GFF and CSV formats, along with a text file containing the IDs with the original annotation from each tool (BLAST and INFERNAL), a table with the number of ncRNAs annotated, and three graphs: (i) a bar plot showing the number of ncRNAs annotated, (ii) a Venn diagram with the number of ncRNAs identified by each tool (BLAST and INFERNAL), and (iii) a stacked bar plot with the number of ncRNAs identified in each chromosome. Additionally, the ncRNAFinder generates exclusive outputs for miRNA and tRNA, including their annotations in CSV and GFF formats, their sequences, and a table with each type. \n\n```\n<output_name>/\n\u251c\u2500\u2500 <output_name>_annotation.gff\n\u251c\u2500\u2500 <output_name>_annotation.csv\n\u251c\u2500\u2500 <output_name>_ID_annotation.txt\n\u251c\u2500\u2500 <output_name>_sequences_ncRNA.fa\n\u251c\u2500\u2500 <output_name>_Table_quantity_ncRNAs.csv\n\u251c\u2500\u2500 Figures/\n\u2502 \u251c\u2500\u2500 <output_name>_BarPlot.png\n\u2502 \u251c\u2500\u2500 <output_name>_DiagramaVeen.png\n\u2502 \u2514\u2500\u2500 <output_name>_StackedPlot.png\n\u251c\u2500\u2500 miRNA/\n\u2502 \u251c\u2500\u2500 <output_name>_miRNA_annotation.gff\n\u2502 \u251c\u2500\u2500 <output_name>_miRNAs.csv\n\u2502 \u251c\u2500\u2500 <output_name>_miRNA_sequences_ncRNA.fa\n\u2502 \u2514\u2500\u2500 <output_name>_miRNA_Table_quantity_ncRNAs.csv\n\u2514\u2500\u2500 tRNA/\n \u251c\u2500\u2500 <output_name>_tRNA_annotation.gff\n \u251c\u2500\u2500 <output_name>_tRNAs.csv\n \u251c\u2500\u2500 <output_name>_tRNA_sequences_ncRNA.fa\n \u2514\u2500\u2500 <output_name>_tRNA_Table_quantity_ncRNAs.csv\n```\n\n## Reference\n\n## Contact\nTo report bugs, to ask for help and to give any feedback, please contact Alexandre R. Paschoal (paschoal@utfpr.edu.br) or Vitor Gregorio (vitor-gregorio@hotmail.com).\n",
"bugtrack_url": null,
"license": "GPL",
"summary": "ncRNAfinder is an automatic and scalable system for large-scale data annotation analysis of ncRNAs which use both sequence and structural search strategy for ncRNA annotation.",
"version": "0.0.5",
"project_urls": {
"Download": "https://github.com/GregorioVitor/ncRNAfinder/archive/refs/tags/v_01.tar.gz",
"Homepage": "https://github.com/user/reponame"
},
"split_keywords": [
"ncrna",
" annotation",
" blast",
" infernal",
" structure"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2d01eb18e14e96ae348665aa03d6cb5b54fde2519b87aec8efca7355501d870a",
"md5": "104975df36acc67787d9f6a2b4d75e7d",
"sha256": "e57f0c10fa042b5601c44c1039d0c580e59b8de0229733700072a36c5b4d8fb0"
},
"downloads": -1,
"filename": "ncRNAFinder-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "104975df36acc67787d9f6a2b4d75e7d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 30848,
"upload_time": "2024-07-19T00:19:00",
"upload_time_iso_8601": "2024-07-19T00:19:00.394633Z",
"url": "https://files.pythonhosted.org/packages/2d/01/eb18e14e96ae348665aa03d6cb5b54fde2519b87aec8efca7355501d870a/ncRNAFinder-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-19 00:19:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "user",
"github_project": "reponame",
"github_not_found": true,
"lcname": "ncrnafinder"
}