Name | isocirc JSON |
Version |
1.0.4
JSON |
| download |
home_page | https://github.com/Xinglab/isoCirc |
Summary | isoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data |
upload_time | 2021-07-15 10:25:24 |
maintainer | |
docs_url | None |
author | Yan Gao |
requires_python | |
license | GLP |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# isoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data
<!-- [![Github All Releases](https://img.shields.io/github/downloads/Xinglab/isoCirc/total.svg?label=Download)](https://github.com/Xinglab/isoCirc/releases) -->
[![Latest Release](https://img.shields.io/github/release/Xinglab/isoCirc.svg?label=Release)](https://github.com/Xinglab/isoCirc/releases/latest)
[![PyPI](https://img.shields.io/pypi/dm/isocirc.svg?label=pip%20install)](https://pypi.python.org/pypi/isocirc)
[![License](https://img.shields.io/badge/License-GPL-black.svg)](https://github.com/Xinglab/isoCirc/blob/master/LICENSE)
[![Published in Nat. Commun.](https://img.shields.io/badge/Published%20in-NatCommun-orange.svg)](https://doi.org/10.1038/s41467-020-20459-8)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4264644.svg)](https://doi.org/10.5281/zenodo.4264644)
[![GitHub Issues](https://img.shields.io/github/issues/Xinglab/isoCirc.svg?label=Issues)](https://github.com/Xinglab/isoCirc/issues)
<!-- [![Published in Bioinformatics](https://img.shields.io/badge/Published%20in-Bioinformatics-purple.svg)](https://doi.org/XXX) -->
<!-- [![Build Status](https://travis-ci.org/yangao07/TideHunter.svg?branch=master)](https://travis-ci.org/yangao07/TideHunter) -->
<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/version.svg)](https://anaconda.org/darts-comp-bio/darts_dnn) -->
<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/installer/conda.svg)](https://conda.anaconda.org/darts-comp-bio) -->
## <a name="isocirc"></a>What is isoCirc ?
isoCirc is a long-read sequencing strategy coupled with an integrated
computational pipeline to characterize full-length circRNA isoforms
using rolling circle amplification (RCA) followed by long-read sequencing.
## Table of Contents
- [What is isoCirc?](#isocirc)
- [Installation](#install)
- [Dependencies](#depen)
- [Install isoCirc with `pip`](#pip)
- [Install isoCirc from source](#src)
- [Getting started](#start)
- [Input and output](#input_output)
- [Input files](#input_file)
- [Output files](#output_file)
- [1. `isocirc.out`](#isocirc_out)
- [2. `isocirc.bed`](#isocirc_bed)
- [3. `isocirc_stats.out`](#isocirc_stats_out)
- [Circular long-read alignment of isoCirc read](#isocirc_plot)
- [FAQ](#FAQ)
- [Contact](#contact)
- [Changelog](#change)
## <a name="install"></a>Installation
### <a name="depen"></a>Dependencies
isoCirc is dependent on two open-source software packages: [`bedtools`(>= v2.27.0)](https://bedtools.readthedocs.io/) and minimap2 [`minimap2`(>= 2.11)](https://github.com/lh3/minimap2).
Please ensure that these packages are installed before running isoCirc.
### <a name="pip"></a>Install isoCirc with `pip`
**isoCirc** is written with `python`, please use `pip` to install **isoCirc**:
```
pip install isocirc # first time installation
pip install isocirc --upgrade # update to the latest version
```
### <a name="src"></a>Install isoCirc from source
Alternatively, you can install **isoCirc** from source:
```
git clone https://github.com/Xinglab/isoCirc.git
cd isoCirc/isoCirc_pipeline && pip install .
```
## <a name="start"></a>Getting started with toy example in `test_data`
```
cd isoCirc/test_data
isocirc -t 1 read_toy.fa chr16_toy.fa chr16_toy.gtf chr16_circRNA_toy.bed output
```
Detailed arguments:
```
usage: isocirc [-h] [-v] [-t THREADS] [--bedtools BEDTOOLS]
[--minimap2 MINIMAP2] [--short-read short.fa/fq] [--lordec LORDEC]
[--kmer KMER] [--solid SOLID] [--trf TRF] [--match MATCH]
[--mismatch MISMATCH] [--indel INDEL] [--match-frac MATCH_FRAC]
[--indel-frac INDEL_FRAC] [--min-score MIN_SCORE]
[--max-period MAX_PERIOD] [--min-len MIN_LEN]
[--min-copy MIN_COPY] [--min-frac MIN_FRAC]
[--high-max-ratio HIGH_MAX_RATIO]
[--high-min-ratio HIGH_MIN_RATIO]
[--high-iden-ratio HIGH_IDEN_RATIO]
[--high-repeat-ratio HIGH_REPEAT_RATIO]
[--low-repeat-ratio LOW_REPEAT_RATIO]
[--cano-motif {GT/AG,all}] [--bsj-xid BSJ_XID]
[--key-bsj-xid KEY_BSJ_XID] [--min-circ-dis MIN_CIRC_DIS]
[--rescue-low] [--fsj-xid FSJ_XID] [--key-fsj-xid KEY_FSJ_XID]
[--Alu ALU] [--flank-len FLANK_LEN] [--all-repeat ALL_REPEAT]
long.fa/fq ref.fa anno.gtf circRNA.bed/gtf out_dir
isocirc: circular RNA profiling and analysis using long-read sequencing
positional arguments:
long.fa/fq Long-read sequencing data generated with isoCirc
ref.fa Reference genome sequence file
anno.gtf Gene annotation file in GTF format
circRNA.bed/gtf circRNA database annotation file in BED or GTF format
Use ',' to separate multiple circRNA annotation files
out_dir Output directory for final result and temporary files
optional arguments:
-h, --help Show this help message and exit
-v, --version Show program's version number and exit
General options:
-t THREADS, --threads THREADS
Number of threads to use (default: 8)
--bedtools BEDTOOLS Path to bedtools (default: bedtools)
--minimap2 MINIMAP2 Path to minimap2 (default: minimap2)
Hybrid error-correction with short-read data (LoRDEC):
--short-read short.fa/fq
Short-read data for error correction
Use ',' to connect multiple or paired-end short-read data
(default: )
--lordec LORDEC Path to lordec-correct (default: lordec-correct)
--kmer KMER k-mer size (default: 21)
--solid SOLID Solid k-mer abundance threshold (default: 3)
Consensus calling with Tandem Repeats Finder (TRF)):
--trf TRF Path to TRF program (default: trf409.legacylinux64)
--match MATCH Match score (default: 2)
--mismatch MISMATCH Mismatch penalty (default: 7)
--indel INDEL Indel penalty (default: 7)
--match-frac MATCH_FRAC
Match probability (default: 80)
--indel-frac INDEL_FRAC
Indel probability (default: 10)
--min-score MIN_SCORE
Minimum alignment score to report (default: 100)
--max-period MAX_PERIOD
Maximum period size to report (default: 2000)
Filtering and mapping of consensus sequences (minimap2):
--min-len MIN_LEN Minimum consensus length to keep (default: 30)
--min-copy MIN_COPY Minimum copy number of consensus to keep
(default: 2.0)
--min-frac MIN_FRAC Minimum fraction of original long read to keep
(default: 0.0)
--high-max-ratio HIGH_MAX_RATIO
Maximum mappedLen / consLen ratio for high-quality
alignment (default: 1.1)
--high-min-ratio HIGH_MIN_RATIO
Minimum mappedLen /consLen ratio for high-quality
alignment (default: 0.9)
--high-iden-ratio HIGH_IDEN_RATIO
Minimum identicalBases/ consLen ratio for high-quality
alignment (default: 0.75)
--high-repeat-ratio HIGH_REPEAT_RATIO
Maximum mappedLen / consLen ratio for high-quality
self-tandem consensus (default: 0.6)
--low-repeat-ratio LOW_REPEAT_RATIO
Minimum mappedLen / consLen ratio for low-quality
self-tandem alignment (default: 1.9)
Identifying high-confidence BSJs and full-length circRNAs:
--cano-motif {GT/AG,all}
Canonical back-splice motif (GT/AG or all three
motifs: GT/AG, GC/AG, AT/AC) (default: GT/AG)
--bsj-xid BSJ_XID Maximum allowed mis/ins/del for 20-bp exonic sequence
flanking BSJ (10-bp each side) (default: 1)
--key-bsj-xid KEY_BSJ_XID
Maximum allowed mis/ins/del for 4-bp exonic sequence
flanking BSJ (2-bp each side) (default: 0)
--min-circ-dis MIN_CIRC_DIS
Minimum distance between genomic coordinates of
two back-splice sites (default: 150)
--rescue-low Use high-mapping quality reads to rescue low-mapping
quality reads (default: False)
--fsj-xid SJ_XID Maximum allowed mis/ins/del for 20-bp exonic sequence
flanking FSJ (10-bp each side) (default: 1)
--key-fsj-xid KEY_SJ_XID
Maximum allowed mis/ins/del for 4-bp exonic sequence
flanking FSJ (2-bp each side) (default: 0)
Other options:
--Alu ALU Alu repetitive element annotation in BED format
(default: )
--flank-len FLANK_LEN
Length of upstream and downstream flanking sequence to
search for Alu (default: 500)
--all-repeat ALL_REPEAT
All repetitive element annotation in BED format
(default: )
```
## <a name="input_output"></a>Input and output
### <a name="input_file"></a>Input files
isoCirc takes a long read containing multiple copies of a circRNA sequence as input
It also requires a reference genome sequence and gene annotation to be provided.
### <a name="output_file"></a>Output files
isoCirc outputs three result files in a user-specified directory:
| No. | File name | Explanation |
|:---:| :--- | --- |
| 1 | isocirc.out | detailed information of each circRNA isoform with a high-confidence BSJ, in tabular format |
| 2 | isocirc.bed | bed12 format file of `isocirc.out` |
| 3 | isocirc_stats.out | basic summary stats numbers for `isocirc.out` |
#### <a name="isocirc_out"></a>1. isocirc.out
`isocirc.out` is a 35-column tabular file, each line represents one unique circRNA isoform that has a high-confidence BSJ:
| No. | Column name | Explanation |
|:---:| :--- | --- |
| 1 | isoformID | assigned isoform ID |
| 2 | chrom | chromosome ID |
| 3 | startCoor0based | start coordinate of circRNA, 0-based |
| 4 | endCoor | end coordinate of circRNA |
| 5 | geneStrand | gene strand (+/-) |
| 6 | geneID | gene ID |
| 7 | geneName | gene name |
| 8 | blockCount | number of block |
| 9 | blockSize | size of each block, separated by `,` |
| 10 | blockStarts | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |
| 11 | refMapLen | total length of circRNA |
| 12 | blockType | category of each block. E: exon, I: intron, N: intergenic |
| 13 | blockAnno | detailed annotation for each block, in format: "TransID:E1(100)+I(50)+E2(30)", where TransID is ID of corresponding transcript; E1 and E2 are 1st and 2nd exon of transcript; multiple blocks are separated by `,`; and multiple transcripts of one block are separated by `;` |
| 14 | isKnownSS | `True` if SS is catalogued in gene annotation, `False` if not, separated by `,` |
| 15 | isKnownFSJ | `True` if FSJ is catalogued in gene annotation, `False` if not, separated by `,` |
| 16 | canoFSJMotif | strandness and canonical motifs of FSJs, e.g., `-GT/AG`, `NA` if FSJ is not canonical, separated by `,`|
| 17 | isHighFSJ | `True` if alignment around FSJ is high-quality, `False` if not, separated by `,` |
| 18 | isKnownExon | `True` if block is a catalogued exon in gene annotation, `False` if not, separated by ‘,’ |
| 19 | isKnownBSJ | `True` if BSJ exists in circRNA annotation, `False` if not; multiple circRNA annotations are separated by `,` |
| 20 | isCanoBSJ | `True` if BSJ has canonical motif (GT/AG), `False` if not |
| 21 | canoBSJMotif | strandness and canonical motif of BSJ, e.g., `-GT/AG`, `NA` if BSJ is not canonical |
| 22 | isFullLength | `True` if isoform is considered as `full-length isoform`, `False` if not |
| 23 | BSJCate | category of BSJs: `FSM`/`NIC`/`NNC`, see explanation below. |
| 24 | FSJCate | category of FSJs: `FSM`/`NIC`/`NNC` |
| 25 | CDS | number of bases that are mapped to CDS region |
| 26 | UTR | number of bases that are mapped to UTR region |
| 27 | lincRNA | number of bases that are mapped to lincRNA region |
| 28 | antisense | number of bases that are mapped to antisense region |
| 29 | rRNA | number of bases that are mapped to rRNA region |
| 30 | Alu | number of bases that are mapped to Alu element; `NA` if Alu annotation is not provided |
| 31 | allRepeat | number of bases that are mapped to all repeat elements; `NA` if repeat annotation is not provided |
| 32 | upFlankAlu | flanking Alu element in upstream; `NA` if none or Alu annotation is not provided |
| 33 | downFlankAlu | flanking Alu element in downstream; `NA` if none or Alu annotation is not provided |
| 34 | readCount | number of reads that come from this circRNA isoform |
| 35 | readIDs | ID of reads that come from this circRNA isoform, separated by `,` |
#### <a name="isocirc_bed"></a>2. isocirc.bed
`isocirc.bed` is a bed12 format file, each line represents one unique circRNA isoform from `isocirc.out`:
| No. | Column name | Explanation |
|:---:| :--- | --- |
| 1 | chrom | chromosome ID |
| 2 | startCoor0based | start coordinate of circRNA, 0-based |
| 3 | endCoor | end coordinate of circRNA |
| 4 | isoformName | name of the circRNA isoform |
| 5 | score | indicates how dark the peak will be displayed in the browser (0-1000), set as 0 by `isoCirc` |
| 6 | strand | +/- to denote strand |
| 7 | thickStart | the starting position at which the feature is drawn thickly, set as 0 by `isoCirc` |
| 8 | thickEnd | the ending position at which the feature is drawn thickly, set as 0 by `isoCirc` |
| 9 | itemRgb | an RGB value of the form R,G,B (e.g. 255,0,0), set as 0 by `isoCirc` |
| 10 | blockCount | number of block |
| 11 | blockSize | size of each block, separated by `,` |
| 12 | blockStarts | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |
#### <a name="isocirc_stats_out"></a>3. isocirc_stats.out
`isocirc_stats.out` contains 27 basic stats numbers for `isocirc.out`:
| No. | Name | Explanation |
|:---:| :--- | --- |
| 1 | Total reads | Number of raw reads in sample |
| 2 | Total reads with cons | Number of reads with consensus sequence called |
| 3 | Total mappable reads with cons | Number of reads with consensus sequence called, mappable to genome |
| 4 | Total reads with candidate BSJ | Number of reads with consensus sequence called, mappable to genome, and with BSJs ("candidate BSJs") |
| 5 | Total candidate BSJs | Number of candidate BSJs (circRNA species) |
| 6 | Total known candidate BSJs | Number of candidate BSJs reported in existing circRNA BSJ database (circBase / MiOncoCirc) |
| 7 | Total reads with high BSJs | Number of reads with consensus sequence called, mappable to genome, and with high-confidence BSJs (based on additional inspection of alignment around BSJs) |
| 8 | Total high BSJs | Number of high-confidence BSJs |
| 9 | Total known high BSJs | Number of high-confidence BSJs that are known |
| 10 | Total isoforms with high BSJs | Number of circRNA isoforms with high-confidence BSJs |
| 11 | Total isoforms with high BSJs high FSJs | Number of circRNA isoforms with high-confidence BSJs, and all FSJs are high-confidence (canonical, high-quality alignment around internal splice sites) |
| 12 | Total isoforms with high BSJ known SSs | Number of circRNA isoforms with high-confidence BSJs, and all SS are known (based on existing transcript GTF annotations for splice sites in linear RNA) |
| 13 | Total isoforms with high BSJs high FSJs known SSs | Number of circRNA isoforms with high-confidence BSJs, all FSJs are high-confidence, and all SS are known |
| 14 | Total full-length isoforms | Number of circRNA isoforms with high-confidence BSJs, and FSJs are high-confidence or all SS are known |
| 15 | Total reads for full-length isoforms | Number of reads for circRNA isoforms with high-confidence BSJs, and all FSJs arehigh-confidence or all SS are known |
| 16 | Total full-length isoforms with FSM BSJ | Number of full-length circRNA isoforms with FSM BSJs |
| 17 | Total reads for full-length isoforms with FSM BSJ | Number of reads for full-length circRNA isoforms with FSM BSJs |
| 18 | Total full-length isoforms with NIC BSJ | Number of full-length circRNA isoforms with NIC BSJs |
| 19 | Total reads for full-length isoforms with NIC BSJ | Number of reads for full-length circRNA isoforms with NIC BSJs |
| 20 | Total full-length isoforms with NNC BSJ | Number of full-length circRNA isoforms with NNC BSJs |
| 21 | Total reads for full-length isoforms with NNC BSJ | Number of reads for full-length circRNA isoforms with NNC BSJs |
| 22 | Total full-length isoforms with FSM FSJ | Number of full-length circRNA isoforms with FSM FSJs |
| 23 | Total reads for full-length isoforms with FSM FSJ | Number of reads for full-length circRNA isoforms with FSM FSJs |
| 24 | Total full-length isoforms with NIC FSJ | Number of full-length circRNA isoforms with NIC internal FSJs |
| 25 | Total reads for full-length isoforms with NIC FSJ | Number of reads for full-length circRNA isoforms with NIC FSJs |
| 26 | Total full-length isoforms with NNC FSJ | Number of full-length circRNA isoforms with NNC FSJs |
| 27 | Total reads for full-length isoforms with NNC FSJ | Number of reads for full-length circRNA isoforms with NNC FSJs |
* BSJ: Back-Splice Junction
* FSJ: Forward-Splice Junction
* FSS: Forward-Splice Site
* SS: Splice Site
* cons: consensus sequence
* cano: canonical
* high: high-confidence (canonical and high-quality alignment around FSJ/BSJ)
* FSM: Full Splice Match
* NIC: Novel In Catalog
* NNC: Novel Not in Catalog
## <a name="isocirc_plot"></a>Circular alignment of isoCirc long read
With the result file generated by `isocirc`, we can visulize the circular alignment of full-length isoCirc reads. Let's use the toy example in the `test_data` again:
```
isocircPlot ./read_toy.fa ./chr16_toy.fa ./chr16_toy.gtf ./output/isocirc.bed ./isocircPlot_toy.list SJ ./output
```
A `.png` file will be generated in the `output` folder indicating how the isoCirc long read is aligned to the reference genome multiple times.
## <a name="FAQ"></a>FAQ
## <a name="contact"></a>Contact
Yan Gao gaoy286@mail.sysu.edu.cn
Yi Xing yi.xing@pennmedicine.upenn.edu
[github issues](https://github.com/Xinglab/isoCirc/issues)
Raw data
{
"_id": null,
"home_page": "https://github.com/Xinglab/isoCirc",
"name": "isocirc",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Yan Gao",
"author_email": "yangao07@hit.edu.cn",
"download_url": "https://files.pythonhosted.org/packages/ea/ef/20dc4deeb6b15d86288bd94d9fc3b699bc4f4b026d8aba82e518ea7e1647/isocirc-1.0.4.tar.gz",
"platform": "",
"description": "# isoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data\n<!-- [![Github All Releases](https://img.shields.io/github/downloads/Xinglab/isoCirc/total.svg?label=Download)](https://github.com/Xinglab/isoCirc/releases) -->\n[![Latest Release](https://img.shields.io/github/release/Xinglab/isoCirc.svg?label=Release)](https://github.com/Xinglab/isoCirc/releases/latest)\n[![PyPI](https://img.shields.io/pypi/dm/isocirc.svg?label=pip%20install)](https://pypi.python.org/pypi/isocirc)\n[![License](https://img.shields.io/badge/License-GPL-black.svg)](https://github.com/Xinglab/isoCirc/blob/master/LICENSE)\n[![Published in Nat. Commun.](https://img.shields.io/badge/Published%20in-NatCommun-orange.svg)](https://doi.org/10.1038/s41467-020-20459-8)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4264644.svg)](https://doi.org/10.5281/zenodo.4264644)\n[![GitHub Issues](https://img.shields.io/github/issues/Xinglab/isoCirc.svg?label=Issues)](https://github.com/Xinglab/isoCirc/issues)\n<!-- [![Published in Bioinformatics](https://img.shields.io/badge/Published%20in-Bioinformatics-purple.svg)](https://doi.org/XXX) -->\n<!-- [![Build Status](https://travis-ci.org/yangao07/TideHunter.svg?branch=master)](https://travis-ci.org/yangao07/TideHunter) -->\n<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/version.svg)](https://anaconda.org/darts-comp-bio/darts_dnn) -->\n<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/installer/conda.svg)](https://conda.anaconda.org/darts-comp-bio) -->\n\n## <a name=\"isocirc\"></a>What is isoCirc ?\nisoCirc is a long-read sequencing strategy coupled with an integrated \ncomputational pipeline to characterize full-length circRNA isoforms \nusing rolling circle amplification (RCA) followed by long-read sequencing.\n\n## Table of Contents\n\n- [What is isoCirc?](#isocirc)\n- [Installation](#install)\n - [Dependencies](#depen)\n - [Install isoCirc with `pip`](#pip)\n - [Install isoCirc from source](#src)\n- [Getting started](#start)\n- [Input and output](#input_output)\n - [Input files](#input_file)\n - [Output files](#output_file)\n - [1. `isocirc.out`](#isocirc_out)\n - [2. `isocirc.bed`](#isocirc_bed)\n - [3. `isocirc_stats.out`](#isocirc_stats_out)\n- [Circular long-read alignment of isoCirc read](#isocirc_plot)\n- [FAQ](#FAQ)\n- [Contact](#contact)\n- [Changelog](#change)\n\n## <a name=\"install\"></a>Installation\n### <a name=\"depen\"></a>Dependencies \nisoCirc is dependent on two open-source software packages: [`bedtools`(>= v2.27.0)](https://bedtools.readthedocs.io/) and minimap2 [`minimap2`(>= 2.11)](https://github.com/lh3/minimap2).\nPlease ensure that these packages are installed before running isoCirc.\n\n### <a name=\"pip\"></a>Install isoCirc with `pip`\n**isoCirc** is written with `python`, please use `pip` to install **isoCirc**:\n```\npip install isocirc # first time installation\npip install isocirc --upgrade # update to the latest version\n```\n### <a name=\"src\"></a>Install isoCirc from source\nAlternatively, you can install **isoCirc** from source:\n```\ngit clone https://github.com/Xinglab/isoCirc.git\ncd isoCirc/isoCirc_pipeline && pip install .\n```\n\n## <a name=\"start\"></a>Getting started with toy example in `test_data`\n```\ncd isoCirc/test_data\nisocirc -t 1 read_toy.fa chr16_toy.fa chr16_toy.gtf chr16_circRNA_toy.bed output\n```\n\n\nDetailed arguments:\n```\nusage: isocirc [-h] [-v] [-t THREADS] [--bedtools BEDTOOLS]\n [--minimap2 MINIMAP2] [--short-read short.fa/fq] [--lordec LORDEC]\n [--kmer KMER] [--solid SOLID] [--trf TRF] [--match MATCH]\n [--mismatch MISMATCH] [--indel INDEL] [--match-frac MATCH_FRAC]\n [--indel-frac INDEL_FRAC] [--min-score MIN_SCORE]\n [--max-period MAX_PERIOD] [--min-len MIN_LEN]\n [--min-copy MIN_COPY] [--min-frac MIN_FRAC]\n [--high-max-ratio HIGH_MAX_RATIO]\n [--high-min-ratio HIGH_MIN_RATIO]\n [--high-iden-ratio HIGH_IDEN_RATIO]\n [--high-repeat-ratio HIGH_REPEAT_RATIO]\n [--low-repeat-ratio LOW_REPEAT_RATIO]\n [--cano-motif {GT/AG,all}] [--bsj-xid BSJ_XID]\n [--key-bsj-xid KEY_BSJ_XID] [--min-circ-dis MIN_CIRC_DIS]\n [--rescue-low] [--fsj-xid FSJ_XID] [--key-fsj-xid KEY_FSJ_XID]\n [--Alu ALU] [--flank-len FLANK_LEN] [--all-repeat ALL_REPEAT]\n long.fa/fq ref.fa anno.gtf circRNA.bed/gtf out_dir\n\nisocirc: circular RNA profiling and analysis using long-read sequencing\n\npositional arguments:\n long.fa/fq Long-read sequencing data generated with isoCirc\n ref.fa Reference genome sequence file\n anno.gtf Gene annotation file in GTF format\n circRNA.bed/gtf circRNA database annotation file in BED or GTF format\n Use ',' to separate multiple circRNA annotation files\n out_dir Output directory for final result and temporary files\n\noptional arguments:\n -h, --help Show this help message and exit\n -v, --version Show program's version number and exit\n\nGeneral options:\n -t THREADS, --threads THREADS\n Number of threads to use (default: 8)\n --bedtools BEDTOOLS Path to bedtools (default: bedtools)\n --minimap2 MINIMAP2 Path to minimap2 (default: minimap2)\n\nHybrid error-correction with short-read data (LoRDEC):\n --short-read short.fa/fq\n Short-read data for error correction \n Use ',' to connect multiple or paired-end short-read data\n (default: )\n --lordec LORDEC Path to lordec-correct (default: lordec-correct)\n --kmer KMER k-mer size (default: 21)\n --solid SOLID Solid k-mer abundance threshold (default: 3)\n\nConsensus calling with Tandem Repeats Finder (TRF)):\n --trf TRF Path to TRF program (default: trf409.legacylinux64)\n --match MATCH Match score (default: 2)\n --mismatch MISMATCH Mismatch penalty (default: 7)\n --indel INDEL Indel penalty (default: 7)\n --match-frac MATCH_FRAC\n Match probability (default: 80)\n --indel-frac INDEL_FRAC\n Indel probability (default: 10)\n --min-score MIN_SCORE\n Minimum alignment score to report (default: 100)\n --max-period MAX_PERIOD\n Maximum period size to report (default: 2000)\n\nFiltering and mapping of consensus sequences (minimap2):\n --min-len MIN_LEN Minimum consensus length to keep (default: 30)\n --min-copy MIN_COPY Minimum copy number of consensus to keep \n (default: 2.0)\n --min-frac MIN_FRAC Minimum fraction of original long read to keep\n (default: 0.0)\n --high-max-ratio HIGH_MAX_RATIO\n Maximum mappedLen / consLen ratio for high-quality\n alignment (default: 1.1)\n --high-min-ratio HIGH_MIN_RATIO\n Minimum mappedLen /consLen ratio for high-quality\n alignment (default: 0.9)\n --high-iden-ratio HIGH_IDEN_RATIO\n Minimum identicalBases/ consLen ratio for high-quality\n alignment (default: 0.75)\n --high-repeat-ratio HIGH_REPEAT_RATIO\n Maximum mappedLen / consLen ratio for high-quality\n self-tandem consensus (default: 0.6)\n --low-repeat-ratio LOW_REPEAT_RATIO\n Minimum mappedLen / consLen ratio for low-quality\n self-tandem alignment (default: 1.9)\n\nIdentifying high-confidence BSJs and full-length circRNAs:\n --cano-motif {GT/AG,all}\n Canonical back-splice motif (GT/AG or all three\n motifs: GT/AG, GC/AG, AT/AC) (default: GT/AG)\n --bsj-xid BSJ_XID Maximum allowed mis/ins/del for 20-bp exonic sequence\n flanking BSJ (10-bp each side) (default: 1)\n --key-bsj-xid KEY_BSJ_XID\n Maximum allowed mis/ins/del for 4-bp exonic sequence\n flanking BSJ (2-bp each side) (default: 0)\n --min-circ-dis MIN_CIRC_DIS\n Minimum distance between genomic coordinates of\n two back-splice sites (default: 150)\n --rescue-low Use high-mapping quality reads to rescue low-mapping\n quality reads (default: False)\n --fsj-xid SJ_XID Maximum allowed mis/ins/del for 20-bp exonic sequence\n flanking FSJ (10-bp each side) (default: 1)\n --key-fsj-xid KEY_SJ_XID\n Maximum allowed mis/ins/del for 4-bp exonic sequence\n flanking FSJ (2-bp each side) (default: 0)\n\nOther options:\n --Alu ALU Alu repetitive element annotation in BED format\n (default: )\n --flank-len FLANK_LEN\n Length of upstream and downstream flanking sequence to\n search for Alu (default: 500)\n --all-repeat ALL_REPEAT\n All repetitive element annotation in BED format\n (default: )\n```\n\n## <a name=\"input_output\"></a>Input and output\n### <a name=\"input_file\"></a>Input files\nisoCirc takes a long read containing multiple copies of a circRNA sequence as input\n\nIt also requires a reference genome sequence and gene annotation to be provided.\n\n### <a name=\"output_file\"></a>Output files\nisoCirc outputs three result files in a user-specified directory:\n\n| No. | File name | Explanation | \n|:---:| :--- | --- |\n| 1 | isocirc.out | detailed information of each circRNA isoform with a high-confidence BSJ, in tabular format |\n| 2 | isocirc.bed | bed12 format file of `isocirc.out` |\n| 3 | isocirc_stats.out | basic summary stats numbers for `isocirc.out` |\n\n#### <a name=\"isocirc_out\"></a>1. isocirc.out\n`isocirc.out` is a 35-column tabular file, each line represents one unique circRNA isoform that has a high-confidence BSJ:\n\n| No. | Column name | Explanation | \n|:---:| :--- | --- |\n| 1 | isoformID | assigned isoform ID |\n| 2 | chrom | chromosome ID |\n| 3 | startCoor0based | start coordinate of circRNA, 0-based |\n| 4 | endCoor | end coordinate of circRNA |\n| 5 | geneStrand | gene strand (+/-) |\n| 6 | geneID | gene ID |\n| 7 | geneName | gene name |\n| 8 | blockCount | number of block |\n| 9 | blockSize | size of each block, separated by `,` |\n| 10 | blockStarts | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |\n| 11 | refMapLen | total length of circRNA |\n| 12 | blockType | category of each block. E: exon, I: intron, N: intergenic |\n| 13 | blockAnno | detailed annotation for each block, in format: \"TransID:E1(100)+I(50)+E2(30)\", where TransID is ID of corresponding transcript; E1 and E2 are 1st and 2nd exon of transcript; multiple blocks are separated by `,`; and multiple transcripts of one block are separated by `;` |\n| 14 | isKnownSS | `True` if SS is catalogued in gene annotation, `False` if not, separated by `,` |\n| 15 | isKnownFSJ | `True` if FSJ is catalogued in gene annotation, `False` if not, separated by `,` | \n| 16 | canoFSJMotif | strandness and canonical motifs of FSJs, e.g., `-GT/AG`, `NA` if FSJ is not canonical, separated by `,`|\n| 17 | isHighFSJ | `True` if alignment around FSJ is high-quality, `False` if not, separated by `,` |\n| 18 | isKnownExon | `True` if block is a catalogued exon in gene annotation, `False` if not, separated by \u2018,\u2019 | \n| 19 | isKnownBSJ | `True` if BSJ exists in circRNA annotation, `False` if not; multiple circRNA annotations are separated by `,` | \n| 20 | isCanoBSJ | `True` if BSJ has canonical motif (GT/AG), `False` if not | \n| 21 | canoBSJMotif | strandness and canonical motif of BSJ, e.g., `-GT/AG`, `NA` if BSJ is not canonical | \n| 22 | isFullLength | `True` if isoform is considered as `full-length isoform`, `False` if not |\n| 23 | BSJCate | category of BSJs: `FSM`/`NIC`/`NNC`, see explanation below. |\n| 24 | FSJCate | category of FSJs: `FSM`/`NIC`/`NNC` |\n| 25 | CDS | number of bases that are mapped to CDS region |\n| 26 | UTR | number of bases that are mapped to UTR region |\n| 27 | lincRNA | number of bases that are mapped to lincRNA region |\n| 28 | antisense | number of bases that are mapped to antisense region |\n| 29 | rRNA | number of bases that are mapped to rRNA region |\n| 30 | Alu | number of bases that are mapped to Alu element; `NA` if Alu annotation is not provided |\n| 31 | allRepeat | number of bases that are mapped to all repeat elements; `NA` if repeat annotation is not provided |\n| 32 | upFlankAlu | flanking Alu element in upstream; `NA` if none or Alu annotation is not provided |\n| 33 | downFlankAlu | flanking Alu element in downstream; `NA` if none or Alu annotation is not provided |\n| 34 | readCount | number of reads that come from this circRNA isoform |\n| 35 | readIDs | ID of reads that come from this circRNA isoform, separated by `,` |\n\n#### <a name=\"isocirc_bed\"></a>2. isocirc.bed\n`isocirc.bed` is a bed12 format file, each line represents one unique circRNA isoform from `isocirc.out`:\n\n| No. | Column name | Explanation | \n|:---:| :--- | --- |\n| 1 | chrom | chromosome ID |\n| 2 | startCoor0based | start coordinate of circRNA, 0-based |\n| 3 | endCoor | end coordinate of circRNA |\n| 4 | isoformName | name of the circRNA isoform | \n| 5 | score | indicates how dark the peak will be displayed in the browser (0-1000), set as 0 by `isoCirc` |\n| 6 | strand | +/- to denote strand |\n| 7 | thickStart | the starting position at which the feature is drawn thickly, set as 0 by `isoCirc` |\n| 8 | thickEnd | the ending position at which the feature is drawn thickly, set as 0 by `isoCirc` |\n| 9 | itemRgb | an RGB value of the form R,G,B (e.g. 255,0,0), set as 0 by `isoCirc` |\n| 10 | blockCount | number of block |\n| 11 | blockSize | size of each block, separated by `,` |\n| 12 | blockStarts | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |\n\n#### <a name=\"isocirc_stats_out\"></a>3. isocirc_stats.out\n`isocirc_stats.out` contains 27 basic stats numbers for `isocirc.out`:\n\n| No. | Name | Explanation | \n|:---:| :--- | --- |\n| 1 | Total reads | Number of raw reads in sample |\n| 2 | Total reads with cons | Number of reads with consensus sequence called |\n| 3 | Total mappable reads with cons | Number of reads with consensus sequence called, mappable to genome |\n| 4 | Total reads with candidate BSJ | Number of reads with consensus sequence called, mappable to genome, and with BSJs (\"candidate BSJs\") |\n| 5 | Total candidate BSJs | Number of candidate BSJs (circRNA species) |\n| 6 | Total known candidate BSJs | Number of candidate BSJs reported in existing circRNA BSJ database (circBase / MiOncoCirc) |\n| 7 | Total reads with high BSJs | Number of reads with consensus sequence called, mappable to genome, and with high-confidence BSJs (based on additional inspection of alignment around BSJs) |\n| 8 | Total high BSJs | Number of high-confidence BSJs |\n| 9 | Total known high BSJs | Number of high-confidence BSJs that are known |\n| 10 | Total isoforms with high BSJs | Number of circRNA isoforms with high-confidence BSJs |\n| 11 | Total isoforms with high BSJs high FSJs | Number of circRNA isoforms with high-confidence BSJs, and all FSJs are high-confidence (canonical, high-quality alignment around internal splice sites) |\n| 12 | Total isoforms with high BSJ known SSs | Number of circRNA isoforms with high-confidence BSJs, and all SS are known (based on existing transcript GTF annotations for splice sites in linear RNA) |\n| 13 | Total isoforms with high BSJs high FSJs known SSs | Number of circRNA isoforms with high-confidence BSJs, all FSJs are high-confidence, and all SS are known |\n| 14 | Total full-length isoforms | Number of circRNA isoforms with high-confidence BSJs, and FSJs are high-confidence or all SS are known |\n| 15 | Total reads for full-length isoforms | Number of reads for circRNA isoforms with high-confidence BSJs, and all FSJs arehigh-confidence or all SS are known |\n| 16 | Total full-length isoforms with FSM BSJ | Number of full-length circRNA isoforms with FSM BSJs |\n| 17 | Total reads for full-length isoforms with FSM BSJ | Number of reads for full-length circRNA isoforms with FSM BSJs |\n| 18 | Total full-length isoforms with NIC BSJ | Number of full-length circRNA isoforms with NIC BSJs |\n| 19 | Total reads for full-length isoforms with NIC BSJ | Number of reads for full-length circRNA isoforms with NIC BSJs |\n| 20 | Total full-length isoforms with NNC BSJ | Number of full-length circRNA isoforms with NNC BSJs |\n| 21 | Total reads for full-length isoforms with NNC BSJ | Number of reads for full-length circRNA isoforms with NNC BSJs |\n| 22 | Total full-length isoforms with FSM FSJ | Number of full-length circRNA isoforms with FSM FSJs |\n| 23 | Total reads for full-length isoforms with FSM FSJ | Number of reads for full-length circRNA isoforms with FSM FSJs |\n| 24 | Total full-length isoforms with NIC FSJ | Number of full-length circRNA isoforms with NIC internal FSJs |\n| 25 | Total reads for full-length isoforms with NIC FSJ | Number of reads for full-length circRNA isoforms with NIC FSJs |\n| 26 | Total full-length isoforms with NNC FSJ | Number of full-length circRNA isoforms with NNC FSJs |\n| 27 | Total reads for full-length isoforms with NNC FSJ | Number of reads for full-length circRNA isoforms with NNC FSJs |\n\n * BSJ: Back-Splice Junction\n * FSJ: Forward-Splice Junction\n * FSS: Forward-Splice Site\n * SS: Splice Site\n * cons: consensus sequence\n * cano: canonical\n * high: high-confidence (canonical and high-quality alignment around FSJ/BSJ)\n * FSM: Full Splice Match\n * NIC: Novel In Catalog\n * NNC: Novel Not in Catalog\n\n## <a name=\"isocirc_plot\"></a>Circular alignment of isoCirc long read\nWith the result file generated by `isocirc`, we can visulize the circular alignment of full-length isoCirc reads. Let's use the toy example in the `test_data` again:\n```\nisocircPlot ./read_toy.fa ./chr16_toy.fa ./chr16_toy.gtf ./output/isocirc.bed ./isocircPlot_toy.list SJ ./output\n```\nA `.png` file will be generated in the `output` folder indicating how the isoCirc long read is aligned to the reference genome multiple times.\n\n## <a name=\"FAQ\"></a>FAQ\n## <a name=\"contact\"></a>Contact\n\nYan Gao gaoy286@mail.sysu.edu.cn\n\nYi Xing yi.xing@pennmedicine.upenn.edu\n\n[github issues](https://github.com/Xinglab/isoCirc/issues)\n\n\n\n",
"bugtrack_url": null,
"license": "GLP",
"summary": "isoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data",
"version": "1.0.4",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "acf93bf6cee0108c8e6bf7de143d42f3b128773b837da80f90cabd8a5a1ba6a7",
"md5": "302a208ac7528f9f55c62e8cee60aca7",
"sha256": "c58fb3e659d344c96aa8a341c7a5f6589e1d2fe9d6dc069bd056124420eb2cf6"
},
"downloads": -1,
"filename": "isocirc-1.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "302a208ac7528f9f55c62e8cee60aca7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 25453378,
"upload_time": "2021-07-15T10:24:50",
"upload_time_iso_8601": "2021-07-15T10:24:50.177943Z",
"url": "https://files.pythonhosted.org/packages/ac/f9/3bf6cee0108c8e6bf7de143d42f3b128773b837da80f90cabd8a5a1ba6a7/isocirc-1.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "eaef20dc4deeb6b15d86288bd94d9fc3b699bc4f4b026d8aba82e518ea7e1647",
"md5": "3027030201183b32da365a6d1ab3e046",
"sha256": "01574c0b1a789fd4a3ac3000a08eb8949aea87ae47f94521f9e7761fe18f7562"
},
"downloads": -1,
"filename": "isocirc-1.0.4.tar.gz",
"has_sig": false,
"md5_digest": "3027030201183b32da365a6d1ab3e046",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25293220,
"upload_time": "2021-07-15T10:25:24",
"upload_time_iso_8601": "2021-07-15T10:25:24.598915Z",
"url": "https://files.pythonhosted.org/packages/ea/ef/20dc4deeb6b15d86288bd94d9fc3b699bc4f4b026d8aba82e518ea7e1647/isocirc-1.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-07-15 10:25:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "Xinglab",
"github_project": "isoCirc",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "isocirc"
}